Keywords:

1 Introduction

Glioblastoma, the most common malignant primary brain tumor, carries an invariably poor prognosis (3, 5, 6). Targeting underlying biological foundations of the disease will be crucial to developing more effective treatment strategies (3, 5, 6). Transcriptional profiling through microarray analysis and protein expression profiling through immunohistochemistry (IHC)-based microarrays represent vital resources for researchers seeking to accomplish these goals (3, 5, 6).

Here, we first describe protocols for gathering gene expression data with transcript-based microarrays. Next, we review the various methods of data analysis and clustering along with merits and demerits of each approach. Finally, we highlight important considerations to keep in mind while selecting the optimal approach to test your particular hypothesis.

2 Materials

2.1 Sample Types and Associated Culture Media and Equipment

  1. 1.

    Cells.

    1. a.

      Glioblastoma cells.

      1. i.

        Types of cells.

        1. 1.

          Glioblastoma-derived tumor-initiating stem cells (7).

        2. 2.

          Glioma cell lines such as A172, CCF-SSTG1,T98G, U373MG, U178MG, TP365MG, U118MG, U251MG, GL15, U105MG, U251MG, U343MG, U373MG, and SF767 (8, 9).

        3. 3.

          Primary glioblastoma cultures.

      2. ii.

        Normal control cells for comparison.

        1. 1.

          Neural stem cells such as CB541 and CB660 (10).

        2. 2.

          Peripheral blood collected in blue-top monoject tubes (3.5 % sodium citrate anticoagulant, Teromo Corp., Japan) (10).

    2. b.

      Medium.

      1. i.

        Stem-cell medium (8) 9 made of DMEM/F-12 containing 20 % bovine serum albumin, insulin and transferrin (BIT)-serum-free supplement, and basic fibroblast and epidermal growth factors (Provitro, 20 ng/mL each) (8).

      2. ii.

        DMEM, containing 10 % fetal bovine serum (FBS).

      3. iii.

        RPMI-1640 Medium (Sigma-Aldrich Sweden AB, Stockholm, Sweden) (9).

  2. 2.

    Tissue.

    1. a.

      Patient glioblastoma specimens.

      1. i.

        Flash-frozen paraffin-embedded (FFPE) tumor sections [11)—poorer quality RNA from paraffin sections requires special preparatory protocols and stringent purity criteria.

      2. ii.

        Frozen tumor pieces

    2. b.

      Frozen pieces from subcutaneous or intracranial xenografts treated with vehicle versus drug of interest.

2.2 Materials for Transcript-Based Microarrays

RNA Isolation

  1. 1.

    RecoverAll Total Nucleic Acid Isolation Kit (Ambion, Inc.) (1).

  2. 2.

    RNeasy kit (Qiagen).

  3. 3.

    TotalPrep RNA Amplification kit (Illumina).

  4. 4.

    Blood and Cell Culture Kit (Qiagen).

  5. 5.

    DNaseI (Invitrogen).

  6. 6.

    Cesium chloride column.

  7. 7.

    Ultracentrifuge.

Assessment of RNA Quality

  1. 8.

    ND-1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE) (9).

  2. 9.

    Agilent 2100 bioanalyzer (Agilent).

Hybridization-Ready Sample Preparation

  1. 10.

    SuperscriptII (Invitrogen).

  2. 11.

    Reference total RNA obtained from nonneoplastic human brain tissue samples of five individuals (Bio-Chain) (8).

  3. 12.

    HybBag mixing system with 1× OneArray Hybridization Buffer (Phalanx Biotech) (11).

  4. 13.

    Salmon sperm DNA (Promega) (11).

  5. 14.

    Molecular Dynamics™ Axon 4100A scanner (11).

  6. 15.

    ABI PRISM 7900 (Applied Biosystems) (8) for RT-PCR for validating the transcript-based microarray data.

  7. 16.

    Absolute SYBR Green ROX Mix (ABgene).

  8. 17.

    Biotin-16-UTP.

  9. 18.

    Cy5 NHS ester (GE Healthcare Life Sciences) (11).

Microarrays and Signal Detection

  1. 19.

    Illumina Human whole-genome Sentrix-6V2 BeadChip array.

  2. 20.

    Affymetrix GeneChip expression arrays (Human Genome U133 Plus 2.0 Array) (9).

  3. 21.

    Whole-Genome DASL Assay with HumanRef-8 BeadChips (Illumina, Inc.; San Diego, CA) (1).

  4. 22.

    Whole Human Genome Oligo Microarray 4x44K (Agilent) (1).

  5. 23.

    Human HT-12 v4 Expression BeadChip Kits (Illumina; San Diego, CA) (1).

  6. 24.

    Human Whole Genome OneArray v2 (Phalanx Biotech) (11).

  7. 25.

    GeneChip Expression Analysis Technical Manual (Rev. 5, Affymetrix Inc., Santa Clara, CA) (9).

  8. 26.

    Fluidics Station 450 (Affymetrix Inc.) for washing and staining microarrays.

  9. 27.

    45 °C incubator, capable of rotation up to 60 rpm.

  10.  28.

    Bead station array scanner.

  11.  29.

    GeneChip® Scanner 3000 7G (Affymetrix Inc.) (9).

2.3 Ways of Classifying Transcript-Based Microarrays

  1. 1.

    Length of probe—arrays can be classified into “complementary DNA (cDNA) arrays,” which use long probes of hundreds or thousands of base pairs (bps), and “oligonucleotide arrays,” which use short probes (usually 50 bps or less). Manufacturing methods include “deposition” of previously synthesized sequences and “in situ synthesis.”

  2. 2.

    Manufacturing technique—Usually, cDNA arrays are manufactured using deposition, while oligonucleotide arrays are manufactured using in situ technologies. In situ technologies include “photolithography” (e.g., Affymetrix, Santa Clara, CA), “ink-jet printing” (e.g., Agilent, Palo Alto, CA), and “electrochemical synthesis” (e.g., Combimatrix, Mukilteo, WA) (12).

  3. 3.

    Number of samples— “Single-channel arrays” analyze a single sample at a time, whereas “multiple-channel arrays” can analyze two or more samples simultaneously. An example of an oligonucleotide, single-channel array is the Affymetrix GeneChip (12).

2.4 Types of Protein Microarrays

  1. 1.

    Analytical/capture microarrays, where a library of antibodies, aptamers, or affibodies arrayed on the support surface act as capture molecules since each binds specifically to a particular protein: Samples such as cell lysates can be then applied to the array, and a variety of detection methods can be used to determine the relative levels of array proteins found in the sample solution (13).

  2. 2.

    Functional protein microarrays/target protein microarrays, which are constructed by immobilizing large numbers of purified proteins and are used to identify protein–protein, protein–DNA, protein–RNA, protein–phospholipid, and protein–small-molecule interactions, to assay enzymatic activity and to detect antibodies and demonstrate their specificity. They differ from analytical arrays in that they contain full-length functional proteins or protein domains and can in some cases be used to study the biochemical activities of the entire proteome in a single experiment (13).

  3. 3.

    Reverse-phase protein arrays (RPPA), so called because in this case, the sample, which can be cell lysate or complex tissue lysate, is applied to the microarray, and then probed with antibodies against the target proteins of interest. Methods of detection are usually chemiluminescence, fluorescence, or colorimetry. Reference peptides are printed on the arrays to allow for protein quantification of the sample lysates (13).

  4. 4.

    Tissue microarrays (TMA) probed using IHC protocols are where laser-capture-microdissected tissue may be spotted in an array format, and then assayed with a variety of antibodies towards expressed proteins. The added benefit of IHC-based arrays is the fact that expression and tissue localization of proteins can be simultaneously studied. A significant drawback is the lack of molecular weight verification of identified proteins, which means that the detection antibodies must be thoroughly validated using western blotting prior to use in the IHC technique.

2.5 Typical Workflow of Microarray-Based Experiments

Illustrated in Fig. 1 is the typical workflow of microarray-based experiments. Note that although protein-based microarrays are outside the scope of this chapter, the analysis methodologies described here can be applied irrespective of whether expression levels are measured based on transcript or protein.

Fig. 1
figure 1

A representative microarray-based experimental workflow. Shown are the typical steps taken in microarray analysis from sample processing to data analysis

2.6 Examples of Software and Databases Used for Microarray Data Clustering and Analysis

2.6.1 Software

  1. 1.

    Bioconductor packages.

    1. a.

      DESeq (normalizing tag counts for transcriptome tag sequencing) (7).

    2. b.

      Signaling pathway impact analysis (SPIA) (7).

    3. c.

      Affy.

    4. d.

      Org.Hs., e.g., (tests for enrichment of gene ontology terms) (7).

    5. e.

      DNAcopy (7).

    6. f.

      CGHcall.

    7. g.

      CGHnormaliter (correction for intensity dependence).

    8. h.

      Bead array R package (svn release 1.7.0) (8).

    9. i.

      Lumi R package (release 1.1.0) for variance stabilizing and spline normalizing (8).

  2. 2.

    Recount program (to correct for potential sequencing errors during transcriptome tag sequencing) (7).

  3. 3.

    TagDust (7).

  4. 4.

    Bowtie short read aligner (to remove tags coming from mitochondrial RNA or rRNA) (7).

  5. 5.

    limma (comparing between microarrays) (7).

  6. 6.

    Ingenuity Pathways Knowledge Base and Analysis Software (www.ingenuity.com) (8).

  7. 7.

    BLAT (Kent 2002) (14).

  8. 8.

    AltAnalyze (15, 16) for quintile normalization to look at differential gene expression (14).

  9. 9.

    Partek genomic suite (http://www.partek.com/) for analysis of the microarray data (14).

  10. 10.

    Significance Analysis of Microarrays (SAM) 3.0 (Stanford University) for statistical analyses (17).

  11. 11.

    Imagene 6.0 data extraction software (BioDiscovery Inc.) (17).

  12. 12.

    AROMA (18).

2.6.2 Databases

  1. 13.

    Gene Ontology.

  2. 14.

    Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database.

  3. 15.

    The Cancer Genome Atlas (TCGA) dataset consisting of 397 glioblastoma cases.

  4. 16.

    NCI-60 expression data from CellMiner (9).

  5. 17.

    Database for Annotation, Visualization and Integrated Discovery (DAVID) Bioinformatics Resources 6.7 (http://david.abcc.ncifcrf.gov/home.jsp) for Gene Set Enrichment Analysis (GSEA) (9).

  6. 18.

    Biocarta.

  7. 19.

    PANTHER.

  8. 20.

    SPSS 16.0 (SPSS Inc., Chicago) (17).

  9. 21.

    ArrayExpress database (accession no. E-MEXP-3296) (1).

  10. 22.

    GenePixPro™ Software (11).

  11. 23.

    C5.BP.V3.0 (gene ontology: biological processes) and C2.CP.V3.0 (canonical pathways) MSigDB gene sets for GSEA (11).

  12. 24.

    Chinese glioma genome atlas (2).

3 Methods for Transcript-Based Microarrays

  1. 1.

    Based on your initial sample, use the appropriate reagents for isolating total RNA (items 1–4, Section 2.2). In most cases, the specific instructions are given by the kit manufacturers.

  2. 2.

    Determine RNA quality and integrity utilizing an Agilent 2100 Bioanalyzer (Agilent Technologies) and absorbance at A260/A280. Only high-quality RNA, having a RIN of >7.0 and an A260/280 absorbance ratio of >1.8, should be utilized for further experimentation. This step is particularly important for RNA derived from paraffin-embedded tissue, whose purity may be limited, often requiring re-purification (11).

  3. 3.

    Most microarrays require 2 μg of high-quality total RNA from each sample. Most microarrays require conversion of RNA to biotinylated fragmented complementary RNA (cRNA). cRNA is necessary because the oligonucleotides are in the sense direction and so one has to use antisense RNA. Amplification is necessary since most microarrays require about 25–100 μg of total RNA to be hybridized (14). Microarrays are generally classified into two broad categories based on their method of synthesis. The two categories are spotted microarrays and oligonucleotide microarrays. In the case of spotted microarrays, the probes can be oligonucleotides, cDNA, or small fragments of PCR products corresponding to mRNAs, and they are synthesized prior to deposition on the array surface and are then “spotted” onto glass. For such spotted arrays one can use either mRNA, cDNA, or cRNA because both strands are used as probes on the microarray (13). In the case of oligonucleotide microarrays, probes can be either produced by piezoelectric deposition with full-length oligonucleotides or in situ synthesis. While spotted microarrays are more amenable to in-house printing for custom-made arrays, oligonucleotide microarrays have higher probe density and also higher reproducibility from one array to another in terms of experimental results. Biotinylation of the test sample is necessary when the microarray has streptavidin to capture the RNA.

  4. 4.

    On any given microarray, once the capture probe is immobilized to the substrate, it is important to perform two additional steps prior to using the microarray. If a covalent chemistry was used for immobilization, any residual reactive groups on the surface should be removed. This is commonly called quenching the surface. Under certain conditions, this is also referred to as capping. For example, residual epoxide (EP) groups can be reacted with an amine compound such as ethanolamine, whereas aldehyde groups can be reduced to alcohols using sodium borohydride. The second process is commonly called blocking. Once residual reactive groups are destroyed, the issue of nonspecific adsorption will need to be addressed. What you choose to block with depends on several factors such as the treated surface, hybridization cocktail, and sample matrix. Common blocking agents include detergents such as Tween 20, salmon sperm DNA, tRNA, or proteins such as bovine serum albumin (BSA) (19).

  5. 5.

    The key physicochemical process involved in microarrays is hybridization. Samples are typically hybridized overnight (12–16 h) at a temperature between 42 and 45 °C.

  6. 6.

    The arrays hybridized with sample must then be washed, stained, and scanned with methods appropriate for the microarray of choice. For example, for Affymetrix microarrays, you can use the Fluidics Station 450 (Affymetrix Inc.) and scan with the GeneChip Scanner 3000 7G (Affymetrix Inc.) (9).

  7. 7.

    The first step in any analysis is to apply a background correction which accounts for the percent of intensity coming from nonspecific binding to the microarray. Background correction can be applied using the intensity levels in the vicinity of spots in the case of spotted arrays. In the case of high-density arrays, mismatch probes can be used to estimate the amount of signal coming from nonspecific binding.

  8. 8.

    After background correction, the data is generally log-transformed. The log transformation improves the characteristics of the data distribution and allows the use of classical parametric statistics for analysis. With two-channel arrays, the intensity values of the two competing samples are expressed as ratios and then log-transformed. In contrast, with single-channel technology (e.g., Affymetrix), the “absolute” expression level of the genes is log-transformed. Logarithmic transformation also converts multiplicative error into additive error (12).

  9. 9.

    Normalization of the raw data is a subsequent necessary step so that the final data can be compared across platforms. The aim of normalization is to remove any systematic biases that may be causing artifactual intensity variance between samples on account of inherent differences in dye characteristics, array manufacturing, and spatial location of the sample on a given array. Some examples of freely available normalization tools are Bioconductor packages such as MAS 5.0, Robust Microarray Average (RMA), and GC-RMA33 for single-channel arrays, and LOESS normalization for two-channel arrays (12).

  10. 10.

    Once the data is normalized, it can be used for hypothesis testing. Analysis methods described from this point on can also be used for meta-analysis of existing expression data in databases such as KEGG, TCGA, DAVID, ArrayExpress database, and others.

  11. 11.

    Currently, there are three major types of applications of transcript-based microarrays in medicine. The first involves finding differences in expression levels between predefined groups of samples. This is called a “class comparison” experiment. A second application, “class prediction,” involves identifying the class membership of a sample based on its gene expression profile. This requires the construction of a classifier (a mathematical model) able to analyze the gene expression profile of a sample and predict its class membership. The classifier is constructed based on a representative set of samples with known class membership. This classifier will then be used to assess the likelihood of developing glioblastoma in patients not included in construction of the classifier. The third type of application involves analyzing a given set of gene expression profiles with the goal of discovering subgroups that share common features. This application is known as “class discovery.” For example, the expression profiles of a large number of patients with glioblastoma will be measured with the goal of identifying subgroups of patients who have a similar gene expression profile. This effort is conducted to generate a molecular taxonomy of disease. In other words, how many molecular types of glioblastoma are in a sample of patients affected by the disease? (12).

  12. 12.

    An unsupervised clustering analysis can be carried out in order to search for obvious patterns. Clusters identified in such a manner can then be further validated (1). These clusters can be graphically illustrated in the form of “heatmaps” showing upregulated and downregulated gene sets from one sample to the next (Fig. 2).

    Fig. 2
    figure 2

    A representative heatmap of gene expression obtained by microarray analysis. Shown is an unpublished heatmap showing differentially expressed genes in a glioblastoma cell engineered to express shRNA targeting autophagy gene ATG7

  13. 13.

    If there is no clustering detected then an unbiased gene selection approach may be used, where the samples to be compared are clustered on high-variance probes (top 98th percentile and above) (1), and then examined for correlations with any classes established through other means such as histopathology or imaging.

  14. 14.

    In class comparison and class discovery studies, the expression characterization of the groups (e.g., health vs. disease) is often followed by “functional profiling.” The purpose of this task is to gain insight into the biological processes that are altered in disease. GSEA is currently the most widely used method of functional profiling. GSEA is a computational method that determines whether an a priori-defined set of genes shows statistically significant, concordant differences between two biological states (e.g., phenotypes) (19). When comparing two distinct biological phenotypes, there are some major limitations to the simple approach of identifying the genes that show the largest expression differences across the phenotypes in question. The limitations are as follows:

    1. i.

      No individual gene may meet the threshold for statistical significance, due to a small signal-to-noise ratio.

    2. ii.

      In case of a long list of statistically significant genes without known biological connections between them, it becomes difficult to interpret the data meaningfully.

    3. iii.

      Since cellular processes typically involve a large number of genes acting in concert, seemingly minor expression changes in a set of related genes may be more interesting to follow up on as compared to a small set of unrelated genes that show largely statistically significant differences in expression levels between the groups compared.

    4. iv.

      When different groups study the same biological system, the list of statistically significant genes from the two studies may show very little overlap while there may be identical genetic pathways being affected that remain undetected because of a limitation in the analysis methodology.

GSEA is a computational protocol that seeks to get around the limitations listed above (20).

4 Notes

  1. 1.

    The sample preparation technique greatly limits the range of microarrays that can be used for a given study. Most translational studies begin with FFPE samples, while studies aimed at deciphering underlying molecular pathways might use cell lines as beginning material. Cell line-derived samples are invariably of higher quality than frozen tissue-derived samples, which in turn are of significantly higher quality than FFPE samples. While cell line-derived samples or frozen tissue-derived samples can be used directly as starting material for most commercially available arrays, FFPE material, on the other hand, suffers from having degraded and low-quality RNA. As a result, specialized microarray assays such as the cDNA-mediated annealing, selection, extension, and ligation (DASL) assay by Illumina must be used when working with FFPE samples. The DASL assay uses random priming in the cDNA synthesis, and therefore does not depend on an intact poly(A) tail for T7-oligo-d(T) priming. In addition, the assay requires a relatively short target sequence of about 50 nucleotides for query oligonucleotide annealing, allowing the assay to perform well with significantly degraded RNAs (21).

  2. 2.

    The subsequent algorithms and software packages used for analysis are usually linked to the particular microarray of choice. However the underlying analysis strategies are common across software packages, and need to be chosen based on the type of statistical analysis deemed necessary to answer the questions posed by the researchers. Here we have presented in detail a prototypical microarray experimental workflow. Depending on sample type, microarray choice, and software used, readers must draw parallels or make choices based on their own unique research goals.