Introduction

Research over the last two decades has begun to reveal the incredible diversity of microorganisms in natural environments. Torsvik et al. [91, 92, 94], for example, have estimated that as many as 10,000 bacterial species may be present in a gram of surface soil or marine sediment. A more recent study has estimated the number of distinct microbial genomes in soil to be over a million [29]—two orders of magnitude greater than earlier estimates. However, the factors that control these populations, and their often extreme heterogeneity in the environment, are largely unknown [111]. The ultimate goal of microbial ecology research is to elucidate these factors by examining the interactions of microorganisms with each other and the biotic and abiotic characteristics of the environment. The development of technologies such as polymerase chain reaction (PCR) fingerprinting, reverse transcriptase PCR, real-time PCR, reporter genes, and fluorescence in situ hybridization (FISH), have made it possible to study the dynamics of simple communities or small groups of dominant organisms in natural settings. However, to fully understand the ecology of complex environments such as surface soils, it is necessary to analyze the dynamics and/or activity of hundreds to thousands of different microbial populations simultaneously.

Microarrays have the unprecedented potential to achieve this objective as specific, sensitive, quantitative, and high-throughput tools for microbial detection, identification, and characterization in natural environments. Due to rapid advances in printing technology, microarrays can now be produced that contain thousands to hundreds of thousands of probes. Microarrays have been primarily developed and used for gene expression profiling of pure cultures of individual organisms, but major advances have recently been made in their application to environmental samples. However, the analysis of environmental samples presents several challenges not encountered during the analysis of pure cultures. Like most other techniques, microarrays currently detect only the dominant populations in many environmental samples [20, 72]. In addition, some environments contain low levels of biomass, making it difficult to obtain enough material for use in microarray analysis without first amplifying the nucleic acids. Such techniques, even if applied with the utmost care, may introduce biases into the analyses [71], but perhaps the greatest challenge to the analysis of environmental samples using microarrays is the vast number of unknown DNA sequences in these samples. The importance of an organism, which may be dominant and critical to the ecosystem under study, can be completely overlooked if the organism does not have a corresponding probe on the array. Probes designed to be specific to known sequences can also cross-hybridize to similar, unknown sequences from related or unrelated genes, resulting in either an underestimated signal due to weaker binding of a slightly divergent sequence or a completely misleading signal due to binding of a different gene. Furthermore, it is often a challenge to analyze microarray results from environmental samples due to the massive amounts of data generated and a lack of standardized controls and data analysis procedures.

Despite these challenges, several types of microarrays have been successfully applied to microbial ecology research. These arrays can be divided into at least five categories based on the genes targeted by the array: (1) Phylogenetic oligonucleotide arrays (POAs) are designed based on a conserved marker such as the 16S ribosomal RNA (rRNA) gene, which is used to compare the relatedness of communities in different environments. (2) Functional gene arrays (FGAs) are designed for key functional genes that code for proteins catalyzing various biogeochemical processes, such as the carbon, nitrogen, and sulfur cycles and may also provide information on the microbial populations controlling these processes. (3) Community genome arrays (CGAs) contain the whole genomic DNA of cultured organisms and can describe a community based on its relationship to these cultivated organisms. (4) Metagenomic arrays (MGA) are a potentially powerful technique because, unlike the other arrays, they contain probes produced directly from environmental DNA itself and can be applied with no prior sequence knowledge of the community. (5) Whole-genome open reading frame (ORF) arrays (WGA) contain probes for all of the ORFs in one or multiple genomes. These arrays have traditionally been used for functional genomic analyses of individual organisms, but they can also be used for comparative genomic analyses or to investigate the interactions of multiple organisms at the transcriptional level (Table 1).

Table 1 Characteristics of microarrays for microbial ecology researcha

In this review, we primarily discuss specific applications of these five major types of arrays to microbial ecology research along with the challenges of applying this technology to environmental samples and some of the latest research addressing these issues. Earlier reviews that may provide additional information of interest have also been published [6, 17, 79, 80, 112, 113].

Phylogenetic Oligonucleotide Arrays

Small-Subunit Ribosomal RNA as a Phylogenetic Marker

Although a number of biomolecules have potential as phylogenetic markers, it was the studies of Carl Woese and colleagues, beginning in the 1970s, that identified rRNA as uniquely suited for molecular phylogenetic analyses [98]. Several factors make rRNA in general, and 16S-like, or small-subunit rRNA (ssu rRNA), in particular, ideal for the study of evolutionary relationships [59]: (1) The rRNAs are found in all organisms, enabling a universal phylogeny. (2) Lateral transfer of rRNAs between organisms is extremely rare, ensuring that the evolutionary history of the rRNA reflects the evolutionary history of the organism. (3) The longer rRNAs (16S- and 23S-like) contain regions of highly conserved, moderately variable, and highly variable sequence. The highly conserved regions act as alignment guides to ensure that only homologous nucleotides are compared among organisms. In addition, conserved regions serve as convenient sites to which “universal” primers can be annealed for sequencing or for amplification by the PCR. Conversely, the variable regions serve as targets for group- and organism-specific hybridization probes and provide a phylogenetic signal for determining the relationships among organisms.

Most POAs to date have contained short oligonucleotide probes complementary to specific regions of the ssu rRNA gene. The use of rRNAs for the assessment of microbial diversity in naturally occurring microbial communities was first conceptualized, developed, and applied by Pace and co-workers in the early 1980s [64] and has since revolutionized our view of microbial diversity on earth, providing a framework for comparing and relating microorganisms to one another based on the evolutionary information contained in this conserved and universally distributed molecule. Inherent in this approach is the abrogation of the need to culture microorganisms in the laboratory before they can be characterized. This “characterization without cultivation” has freed microbiologists from the onerous, often seemingly impossible burden of cultivating microorganisms in the laboratory.

Lest there be any doubt about the impact of rRNA-based methodology on the field of microbial ecology, today, more than 20 years after the publication of the first article that used rRNA to characterize uncultivated naturally occurring microbial communities [84], the latest issuesFootnote 1 of two of the leading environmental microbiology journals, Environmental Microbiology and Microbial Ecology, devoted a full 60% and 67%, respectively, to studies that employed rRNA-based approaches, whereas the microbial ecology section of Applied and Environmental Microbiology committed 67% of its research articles to the subject.

However, the power and utility of POAs lay not only in the unique properties of the rRNA and the high-throughput capacity of microarray technology, but also in the vast amount of rRNA sequence data available via the Ribosomal Database Project (RDP) [16, 43, 65]. Now in its ninth release, the RDP II contains 136,355 aligned bacterial rRNA sequences: 57,325 from cultured microorganisms and 79,030 from environmental clones [16]. The sequences span the breadth of known phylogenetic diversity within the bacterial domain as well as archaeal and eukaryal domains.

In addition, a major difference between POAs and FGAs is that the former are well suited for the design of broad-range group and universal probes, whereas, due to the degeneracy of the genetic code, FGA probes are more typically designed to be only organism specific.

Secondary Structure and Probe Design

One of the major challenges for rRNA-based analysis is the inherent secondary structural properties of these molecules. There are numerous software programs that can identify self-complementarity in oligonucleotide probes, but it is more difficult to predict the effects of target secondary structure on hybridization efficiency. In a study not restricted to rRNAs, computer models suggested that many random secondary structures may be formed by genome-wide RNA transcripts and single-stranded DNA targets due to intramolecule base pairing [69].This could result in decreased binding to the microarray-bound oligonucleotide probes and corresponding signal reduction or elimination. These models suggested that greater than 50% of Mfold-predicted RNA conformers could assume stable secondary structures [116]. Likewise, close to 30% of the nucleotides in single-stranded DNA molecules were potentially involved in stable secondary structure conformations at 67°C [69]. The effect was even more pronounced in RNAs, where greater than 60% of the nucleotides were involved in stable secondary structures at that temperature [69].

However, unlike the situation in mRNA and single-stranded DNA where medium to long stretches of secondary structure occur primarily as a consequence of random base pairing, the secondary structure of rRNA has evolved to be specific and ordered, allowing the problem of target accessibility to be overcome by judicious use of the extensive rRNA sequence and secondary structure databases as aids in probe design. The secondary structures of the ssu [104] and large subunitFootnote 2 (lsu) [60] rRNAs were originally established more than two decades ago by a combination of comparative sequence analysis, T1 oligonucleotide catalogs, and chemical and enzymatic modification experiments [61, 105]. The explosion in the number of rRNA sequences from both cultured and uncultured microorganisms in that time has enabled additional comparative sequence analysis, resulting in the refinement of the original secondary structure model to the point that the single- and double-stranded regions of the molecule can be reliably predicted [32]. As such, there are at least three ways to overcome the problem of target secondary structure: designing probes to target regions of rRNA not involved in secondary structure, shearing or fragmenting the target before hybridization, and disrupting intrastand pairing in the target rRNA. The first solution is obvious given the ready availability of reliable secondary structure models [10], but this severely restricts the regions of rRNA that can serve as a target. Thus, we will limit our discussion to the latter two approaches.

Several investigators have studied the effects of rRNA target secondary structure on oligonucleotide microarray-based detection [12, 13, 66, 83]. Generally, these studies suggest that false positives (nonspecific binding of oligonucleotide probes to rRNA targets) can be eliminated by judicious choice of hybridization conditions: high hybridization temperature and/or the inclusion of formamide in the hybridization buffer [66]. Elimination of false negatives (instances where the probe is unable to bind as a consequence of secondary structure in the target molecule) is another concern, however. Under hybridization conditions where false positives were virtually eliminated, Peplies et al. [66] found that 17 of 41 array-bound, rRNA-specific oligonucleotide probes were unable to bind to their specific rRNA targets. Although significant, this problem may be overcome by fragmenting or shearing the target before hybridization [12, 13, 69] or by the addition of oligonucleotide “helper probes” to the hybridization mixture [66]; the helper probe is designed to specifically bind to the target adjacent to the probe binding site and disrupt the local secondary structure. It should be pointed out, however, that stabilization by base stacking may result in nonspecific binding of the rRNA target if the helper probe is designed to bind too closely to the 5′ or 3′ end of the capture probe binding site [12]. In addition, disruption of secondary structure in one region may result in the formation of suboptimal, yet stable, secondary structures in other regions of the target, affecting the binding sites of other probes [66]. Since the helper probes also need to have a level of specificity similar to the capture probes, it will be much more difficult to design suitable helper probes and use this approach with higher density arrays.

It is possible that the limitations imposed on probe design by the secondary structure of the rRNA target may be overcome altogether. In two recent studies, Yilmaz et al. [109, 110] have shown that all regions of the ssu rRNA from Escherichia coli are accessible by oligonucleotide probes as long as the thermodynamic affinity of probe for target is sufficient (average ΔG°overall = −13.5 kcal/mol) and the incubation period of the hybridization is extended to optimize the kinetics of target unfolding and probe binding. If the results of these experiments, which were performed on whole cells in solution, are extensible to ssu rRNA hybridizations where the probes are bound to a solid support, and the target percent GC is significantly higher than that of the ssu rRNA from E. coli, then it should considerably relax the current theoretical restrictions in probe design.

Due to the conserved nature of rRNA genes, it is often necessary to use short oligonucleotides (∼20-mer) for POAs in order for the probes to be specific to individual organisms. One of the more common formats consists of arraying multiple probes that perfectly match a given target along with corresponding probes containing a single mismatch, usually at the central position [26, 66, 98, 103]. Greater signal intensity for the perfectly matched probes compared to the mismatched probes indicates detection of the target sequence. This approach enables very specific detection of target sequences but does have some potential disadvantages. These drawbacks are discussed in a later section on specificity along with potential approaches to improve specificity including the use of dissociation profiles for probe–target duplexes.

Applications

The POAs are among the most commonly used microarrays to date due to the widespread application of rRNA as a tool for characterizing naturally occurring microbial communities. There are several noteworthy studies that have employed POAs in environmental investigation of both total microbial populations using the rRNA gene as the target [11, 53, 55, 76] and active microbial populations using rRNA as the target (Table 2) [1, 26, 42, 83]. These studies have examined a diverse set of environments including lake water and sediments [11, 76], estuary sediments [26] and enrichments [42], soil extracts [55, 83], activated sludge [1], and hypersaline cyanobacterial mats [53]. However, the targeted organisms have been more restricted in scope, focusing primarily on ammonia oxidizers [1], cyanobacteria [11, 76], and metal- and sulfate-reducing prokaryotes [26, 42, 53, 55, 83].

Table 2 Highlights from selected studies that applied microarray analyses to microbial ecology research

In one of the most illustrative ecological applications of POAs to date, Loy et al. [55] used a POA containing 132 probes (18-mer) to characterize sulfate-reducing bacteria at four depths (ranging from 0 to 30 cm) in two acidic, low-sulfate fens (wetland soils) in Germany. The POA consisted of probes specific to the rRNAs of individual and groups of organisms, spanning and inclusive of all known lineages and individuals of sulfate-reducing bacteria. The fens differed in iron content, vegetation, acidity, and to some degree, seasonal water saturation. The POA results indicated that stable sulfate-reducing populations varied little with depth within each of the two sites but were different between the sites. Members of the Syntrophobacteraceae were detected in the upper 30 cm of both sites, but Desulfomonile spp. were only found in one soil, which also contained a more diverse sulfate-reducing community. These results were confirmed by direct PCR amplification with the appropriate group-specific rRNA primers and by the detection of the corresponding dsrAB genes from the samples. Development of this particular “PhyloChip” was made possible by the availability of a large sulfate-reducing bacteria-specific rRNA probe database, dubbed “ProbeBase”Footnote 3 [54]. This is particularly noteworthy because it illustrates the importance and power of a comprehensive probe database in the POA-based analysis of naturally occurring microbial communities. More recently, the same research group developed a POA targeting all of the cultured and uncultured members of the Rhodocyclales [56]. The array detected Rhodocyclales populations representing less than 1% of the total community, following Rhodocyclales-selective PCR amplification. The POA indicated the presence of several uncultured Zoogloea-, Ferribacterium/Dechloromonas-, and Sterolibacterium-like organisms in activated sludge from an industrial wastewater treatment plant, which was corroborated by the results from a 16S rRNA gene clone library. The results also demonstrated that the Rhodocyclales community in the reactor, thought to represent the major denitrifiers in the system, had dramatically changed, possibly as a result of alterations in treatment plant operations.

The most comprehensive POA developed so far contained 31,179 perfectly matched hierarchical 20-mer probes (with a corresponding number of single mismatch probes as negative, mismatch controls) targeting 1945 prokaryotic and 431 eukaryotic sequences from the RDP [103]. The array was developed with the Affymetrix system in which oligonucleotides were directly synthesized on the array, thus enabling higher printing densities than possible using other methods. The POA correctly identified 15 of 17 tested pure bacterial cultures. The array was then used to investigate microorganisms collected from a 1.4-million-liter air sample. The POA results generally agreed with those from an rRNA gene clone library, but could only resolve differences to the third level of phylogenetic rank, as defined by RDP, and could not identify individual species. Eight of 10 phylogenetic clusters detected by the array were represented in the rRNA gene clone libraries, and the organisms not detected had relatively low signals on the array. Approximately 7% of the clones were not detected by the POA, but these were from novel organisms not represented in the RDP or on the array. In contrast, there was not a good correlation between the relative numbers of clones in each group and the signal intensity of that group detected by the array indicating a potential limitation with respect to microbial quantitation with this system.

Despite the excitement generated by the combination of the rRNA-based phylogenetic analysis with the high-throughput potential of microarrays, there have been relatively few studies that have used POAs for comprehensive environmental studies, and there are still technological limitations to POA analysis as the previously discussed study illustrates [103]. As these technological issues continue to be resolved, POAs will undoubtedly find wide application in microbial ecology research.

Functional Gene Arrays

Selection of Probe Targets

Unlike POAs, which are designed primarily for the detection of specific microorganisms and phylogenetic differentiation between samples, FGAs measure genes involved in some process of interest, and thus not only provide a degree of phylogenetic classification but also give information on genetic capacity for, or activity of, a given process in the environment under study. Genes encoding key enzymes in metabolic processes are often good targets, and several categories of these genes have been used for FGAs including those involved in biogeochemical cycles [5, 15, 38, 72, 85, 86, 88, 90, 106] and contaminant remediation [15, 20, 21, 72]. A key point to consider when selecting functional genes for inclusion on a FGA is the vast differences in available sequence data for various genes, even within a given pathway. For example, for the microbial nitrogen fixation genes nifD, nifH, and nifK, there are 1784 nifH genes in public databases, but only 89 nifK and 180 nifD gene sequences are available [79]. An ideal candidate gene for an FGA (1) encodes a critical enzyme or protein in the process of interest, (2) is evolutionarily conserved but has enough sequence divergence in different microorganisms to allow probe design for individual species, and (3) has substantial sequence data from isolates and environmental samples available in public databases. If only limited sequence data are available, it may be beneficial to initially do clone libraries for the gene and environment of interest in order to obtain the necessary sequence information for FGA probe design.

With the rapid advances in printing technologies, the development of comprehensive FGAs is limited only by the availability of requisite cultures or sequence data and the capital necessary for array construction. The largest FGA published to date contained 1662 probes for genes involved in the carbon, nitrogen, and sulfur cycles, organic contaminant degradation, and metals resistance and reduction [72], but this FGA has recently been expanded to over 24,000 probes [Schadt et al., unpublished].

Either PCR products from amplification of various functional genes [15, 106] or shorter, synthesized oligonucleotide probes designed from these genes can be used for FGAs [20, 72]. A major advantage of PCR-derived probes is that they can be amplified from various isolates without prior sequence knowledge by using primers designed from conserved regions of the gene in other organisms. However, it can be virtually impossible to acquire all of the necessary isolates and environmental clones from their various sources to produce a comprehensive PCR fragment-based FGA. The major advantage of synthesized, oligonucleotide probes is that they can be designed directly from available sequence data. Use of oligonucleotide probes also allows the researcher more control and flexibility, such as the avoidance of highly conserved regions, in the probe design. The choice of PCR fragment-based or oligonucleotide probes also has implications for specificity and sensitivity as discussed later in the review.

Applications

Several recent studies have used FGAs to investigate microbial involvement in environmental processes including nitrogen fixation, nitrification, denitrification, and sulfate reduction in freshwater and marine systems [38, 85, 88, 90, 106]; degradation of organic contaminants including polychlorinated biphenyls (PCBs) [20] and polycyclic aromatic hydrocarbons (PAHs) [72] in soils and sediments; and methane-oxidizing capacity and diversity in landfill-simulating soil [5, 86] (Table 2). However, many of these applications were conducted primarily as proofs of concept and did not analyze enough samples or treatments to enable biologically meaningful conclusions to be formed.

For example, Tiquia et al. [90] used an FGA containing 50-mer oligonucleotide probes for 763 genes involved in nitrogen cycling and sulfate reduction to characterize microbial populations in a marine sediment from the Gulf of Mexico. Tests with pure cultures indicated that the array could achieve species-level resolution of microorganisms. The array detected several genes encoding nitrogenases (nifH), ammonia monooxygenase (amoA), nitrite reductase (nirS/K), methane monooxygenase (pmoA), and dissimilatory sulfite reductase (dsrAB) indicating its potential for comprehensive analysis of environmental samples.

In the largest-scale FGA application to date, Stalis-Pavese et al. [86] used an array containing 68 different 17- to 27-mer probes, primarily targeting the particulate methane monooxygenase (pmoA) genes of several methanotrophs, to investigate the impact of five different plant covers on methanotrophic activity in lysimeters under landfill-simulating conditions. The lysimeters contained sewage sludge compost, and half received a constant feed of artificial biogas (CH4/CO2, 3:2) to simulate landfill emissions. The authors linked the methanotrophic community structure in the vegetated lysimeters, in which type II methanotrophs had a competitive advantage over type Ia methanotrophs, with increased methane oxidation relative to the nonvegetated lysimeters. Not surprisingly, the relative abundances of methanotrophs were lower in the lysimeters that did not receive biogas. These results have obvious implications for the management of methane emissions at landfill sites and illustrate how FGA technology can be used to address real-world issues. The authors did note that their approach was useful only for comparing very similar samples due to the potential biases introduced when the relative abundances of genes were PCR-amplified before FGA analysis.

A major potential benefit of FGAs is that they cannot only be used to determine the presence of important genes in an environment by measuring DNA, but they can also be used to determine the expression of these genes by measuring mRNA. However, only a handful of studies have used FGAs for mRNA analysis [2, 21, 72]. Dennis et al. [21] constructed a PCR product FGA (271- to 1300-bp fragments) containing probes for 64 genes including several from the 2,4-dichlorophenoxyacetic acid (2,4-D)-degradation pathway of Ralstonia eutropha JMP134 and related organisms. Mixed cultures were created consisting of four isolates from a batch reactor treating pulp mill effluent and varying concentrations of R. eutropha JMP134. The R. eutropha JMP134 concentrations were 3.7, 0.37, 0.037, and 0.0037% of a total population of 108 cells/mL−1. The cultures were amended with 2 mM of 2,4-D and incubated 6 h before mRNA extraction. Significant induction of 2,4-D degradation genes was detected from populations as low as 0.0037% (3.7 × 103 cells in 108 total community) to 3.7%, depending on the specific genes detected and sequence similarity of the probes that were used. The authors also detected significant increases in resin acid degradation genes in a pulp mill effluent-treating bioreactor after it was spiked with a resin compound.

Rhee et al. [72] used a 50-mer oligonucleotide FGA (containing 1662 probes including organic contaminant degradation genes) to determine both the presence and the expression of naphthalene-degradation genes in soil enrichments. Soil from a site contaminated with PAHs was enriched with naphthalene or pyruvate (as a control). At midgrowth phase, DNA was extracted from aliquots, and the remaining naphthalene enrichment was split into two separate flasks and amended with either pyruvate (control) or additional naphthalene. After 3 h, mRNA was harvested and analyzed. Four different naphthalene-degradation genes, three of which were from Rhodococcus spp., were detected at higher levels in the naphthalene-amended enrichment based on DNA analysis. Likewise, the mRNA results indicated that three different Rhodococcus sp. genes involved in naphthalene degradation were up-regulated (40- to 100-fold) in the naphthalene-amended enrichment, including two of the genes detected by DNA analysis. This corroborated the DNA hybridization results and indicated that this strain was actively degrading naphthalene in the enrichment. The results also revealed that other potential naphthalene-degrading organisms, whose genes were detected in the enrichments by the DNA hybridizations, were not responsible for naphthalene degradation under the tested conditions.

All of these studies that analyzed mRNA used relatively simple systems, mixed cultures, or enrichments. There are still several limitations for application of FGAs to mRNA analysis in complex environmental samples, e.g., surface soils and sediments, including difficulties in extracting sufficient quantities of high-quality mRNA from these samples within a reasonable time frame and the lack of sequence knowledge for environmental samples [78]. Advances in RNA extraction techniques [9, 37, 82] and newly available commercial kits (e.g., TruRNA from Atom Sciences, Inc. and FastRNA from Qbiogene) are helping with this process, but it still requires considerable effort to remove impurities and DNA from many samples—often substantially decreasing mRNA recovery. This further exasperates investigation of many low-biomass environments that already do not contain sufficient mRNA for FGA analysis. Unfortunately, unlike eukaryotic mRNAs, which can be amplified via their poly(A) tail with commercially available kits, it is difficult to amplify prokaryotic mRNA. However, new methods, such as that of Botero et al. [7] where a poly(A) tail is added to prokaryotic RNA for subsequent amplification, may make this feasible in the near future.

Researchers should ultimately exercise caution when attempting to link FGA results from environmental samples with the capacity for a specific biogeochemical function. The presence, or even activity, of a given pathway gene in a sample does not indicate the presence of all the genes necessary to carry out the complete transformation. For instance, although dsrAB is a key gene in sulfate reduction, it is also found in some non-sulfate-reducing bacteria [117]. In addition, a process may proceed via related genes and/or pathways not represented on the FGA.

Community Genome Arrays

Wu et al. [107] developed a novel prototype array that contained the entire genomic DNA of 67 different bacteria including α-, β-, and γ-Proteobacteria and Gram-positive bacteria with most of the organisms being Azoarcus, Pseudomonas, or Shewanella spp. The array was termed a community genome array (CGA) because it contained whole genomic DNA (one species' genome per spot) and was initially designed as a tool to detect specific microorganisms within a natural microbial community. The CGA could achieve species- to strain-level differentiation depending on the hybridization temperature. The CGA was used to compare the microbial populations in four marine sediments, three river sediments, and three soils. Principal components analysis of the CGA results grouped the three types of samples into three distinct groups, indicating that the microbial populations from a given type of sample (e.g., soil) were more similar to one another than those in the other types of samples (e.g., marine and river sediments). The CGA results also correlated well with the differences in biogeochemical and physical properties between the sites. These results demonstrate the potential of CGAs as a comparative tool for determining the relatedness of microbial communities in different samples. The CGAs can also be used to determine the genomic relatedness of isolated bacteria to each other and also the organisms represented on the array.

The CGA is conceptually analogous to membrane-based reverse sample genome probing (RSGP) [30], but uses a nonporous hybridization surface and fluorescence-based detection that enable high throughput analyses but decrease sensitivity [107]. Like RSGP, the main potential disadvantage of CGAs is that only the cultured components of a community are included on the array. However, with recent advances in the generation of large insert-sized metagenomic libraries, it is also possible to use DNA from uncultured organisms for microarray generation as discussed in the next section.

Metagenomic Arrays

Due to the growing evidence that most environmental microorganisms cannot be isolated using current techniques, the field of metagenomics, or the direct extraction and cloning of nucleic acids from environmental samples, has developed [33]. These techniques have even been used to sequence entire communities in an acid mine drainage site [97] and a portion of the community in a sample from the Sargasso Sea [100], but it is not yet possible to assemble even the dominant genomes from more diverse sites such as surface soils and sediments [93, 95, 100]. However, the combination of microarray and metagenomic technologies has the potential to reveal detailed information on these yet to be cultured organisms.

Sebat et al. [81] generated an MGA using a cosmid library from a groundwater enrichment. Approximately 1-kb inserts were amplified from 672 cosmids and placed on the array along with several rRNA gene control probes. The MGA was used as a high-throughput library screening technique. Groundwater isolates, reference strains, and community DNA were hybridized to the array. Ten bacteria isolated from the enrichment hybridized specifically to 10 individual probes on the array. Other probes hybridized to multiple related bacteria, indicating that these probes likely contained conserved genes. Some probes hybridized to community genomic DNA from the enrichment but did not hybridize to any of the isolates, indicating that the organisms bearing these DNA fragments were not cultured. The cosmid inserts corresponding to these probes were sequenced and were related to genes involved in several ecologically important processes including denitrification, hydrogen oxidation, and transposition.

The above MGA contained DNA fragments that were only ∼1 kb, but larger fragments (>50 kb) from fosmid or bacterial artificial chromosome libraries could be used to provide higher genomic throughput [4, 107]. If sufficient mRNA could be obtained, it may also be possible to generate MGAs from a cDNA library that could then be used to create a site-specific FGA for measuring microbial activity. The MGA technology is still in the early stages of development, but it has tremendous potential for environmental applications since the enormous amount of unknown sequences in these environments is one of the major limitations for microarray analysis.

Whole-Genome Open Reading Frame Arrays

Organisms that are closely related, based on 16S rRNA genes, can exhibit strikingly different phenotypic characteristics and may actually have substantially dissimilar genomes due to processes such as lateral gene exchange [62, 63]. Whole-genome ORF arrays (WGAs), which contain probes for all of the ORFs in a genome can be very useful for comparative genomics of different organisms with specific application to the processes of lateral gene transfer and microevolution [3, 22, 25, 58, 62, 63, 77]. The WGAs can also be used to study genome-wide transcription in response to different environmental stimuli. This is commonly done in functional genomics for pure cultures [52, 89] but can also be applied to small groups of organisms to study their interactions [2].

Dong et al. [22] used a WGA containing 96% of the annotated ORFs in E. coli K-12 to comparatively interrogate the genome of the closely related (97% based on 16S rRNA gene) Klebsiella pneumoniae 342, which is a maize endophyte. Only 70% of E. coli K-12 ORFs were found in K. pneumoniae 342 (≥55% similarity cutoff), whereas 24% were not present in K. pneumoniae 342. The signal was too low to make a determination for a small portion of the genes (n = 68). Highly conserved genes including those for energy, amino acid, and fatty acid metabolism along with cofactor synthesis, cell division, DNA replication, transcription, translation, transport, and regulatory proteins were among those found to be shared. The E. coli K-12 ORFs not found in K. pneumoniae included many hypothetical and putative regulatory proteins, chaperones, and enzymes in addition to genes thought to have been acquired from phage, plasmids, or transposons via lateral transfer. The WGA results agreed with the phenotypic characteristics of the bacteria. Murray et al. [58] also used a WGA to discover evidence of lateral gene transfer in several Shewanella spp.

Barnett et al. [2] gave an excellent example of the results a WGA can produce when used to investigate the interactions of multiple organisms. The authors created a symbiosis chip using the Affymetrix system, as discussed previously for POAs, containing probes for all of the ORFs in the genome of Sinorhizobium meliloti and ∼10,000 expressed genes in its host organism, the legume Medicago truncatula [57]. Over 200 M. truncatula genes had increased expression in nodules versus uninfected root tissue. Most of these genes had been previously demonstrated to be up-regulated in nodules, thus confirming the reliability of the approach. Furthermore, the expressed M. truncatula genes were very similar in nodules inhabited by wild-type S. meliloti and those containing a non-nitrogen-fixing mutant S. meliloti, indicating that the plant responses were due to the presence of the bacteria rather than their nitrogen fixation. In contrast, there were large differences in the genes expressed by wild-type and mutant S. meliloti in the nodules, indicating that most of the increased bacterial gene expression in nodules was due to nitrogen fixation. Although this relatively simple system was composed of only two organisms, it illustrates that the potential to investigate the interactions of organisms at the transcriptional level in complex systems is becoming more feasible as additional sequence data become available and microarray technologies continue to improve. The WGA approach can also be combined with reporter gene technology for noninvasive, real-time analysis [99].

Other Types of Arrays

Researchers have also developed other types of microarrays that have potential applications in microbial ecology research. Arrays containing probes generated from random genomic fragments have been used in situations where the genome sequences of the target organisms were unknown [14, 40]. Kim et al. [40] digested genomic DNA from Gordonia amarae, Zooglea ramigera, and Mycobacterium peregrinum with restriction enzymes and used some of the resulting ∼200- to 1500-bp fragments (∼50 for each organism) for the array. The array was then tested with pure cultures, mixed cultures, and environmental samples. Most of the tested combinations had <5% cross-hybridization to nontarget probes in mixed culture, although M. peregrinum genomic DNA had 29 and 36% cross-hybridization to G. amarae and Z. ramigera probes, respectively, when applied alone. However, expansion of the array to include more probes may reduce the probability that all probes would bind to a given nontarget organism and would thus likely improve the specificity. The array detected G. amarae, which can cause foaming and bulking in wastewater treatment plants at high populations [19], in activated sludge at populations as low as 103/mL. Cho and Tiedje [14] used a similar approach to differentiate bacteria using an array composed of 60–96 ∼1-kb genomic fragments from four fluorescent Pseudomonas spp. Hybridization profiles of 12 well-characterized Pseudomonas spp. indicated that the array could achieve species- to strain-level resolution.

Randomly selected oligonucleotide probes have also been used to fingerprint bacteria. Kingsley et al. [41] developed a prototype array consisting of 47 nonamer probes randomly generated based on the E. coli K-12 genome. The array was tested using 14 closely related Xanthomonas pathovars. Ten of the 47 probes had diagnostic value, based on statistical tests, and were used to generate fingerprints that revealed differences in the bacteria, including two strains that could not be distinguished using traditional gel electrophoresis of REP-PCR products. Since this method is based on random nonamers, it could potentially be used to fingerprint any microorganism.

Use of Microarrays with Complementary Analyses

Numerous other techniques can be combined with microarray analysis not only to validate results but also to produce powerful synergistic tools for investigating microbial interactions and processes. For example, Loy et al. [55] used clone libraries of sulfite reductase genes (dsrAB) along with a POA to study sulfate-reducing prokaryote communities in acidic, low sulfate fens in Germany. The clone libraries corroborated POA results that indicated the sulfate-reducing communities at the two tested sites were different. The clone libraries also identified additional sulfate reducers that were not detected by the POA.

The integration of isotope and microarray technologies produces one of the potentially most powerful combined approaches for microbial ecology research. Microarray analysis of DNA or RNA labeled with isotopes can differentiate between active and inactive organisms in a sample and/or identify those organisms that metabolize a labeled substrate. These isotopes can be either radioisotopes such as 14C or stable isotopes such as 13C [1, 67, 68]. Adamczyk et al. [1] used 14C-labeled bicarbonate and a POA to study ammonia-oxidizing bacterial communities in two samples of nitrifying activated sludge. Scanning for radioactivity in the rRNA hybridized to the POA enabled detection of populations that consumed the [14C]bicarbonate. The approach detected populations that composed less than 5–10% of the community. This technique could potentially be applied to 13C-labeled materials also, but it may be more difficult to obtain enough 13C-labeled DNA for microarray analyses since the labeled DNA would have to be separated from nonlabeled DNA before microarray analysis unless the array could be directly scanned for 13C.

Challenges for Microarray Applications

Specificity

The highly conserved nature of many genes and the vast amount of unknown sequence data in environmental samples make it difficult to design and validate microarray probes that are specific to a given target sequence. As mentioned earlier, a major advantage of oligonucleotide probes is the ability to avoid conserved regions of genes or areas containing stable secondary structure. Furthermore, shorter oligonucleotide probes (∼20-mer) can differentiate a single mismatch in a probe–target hybridization making them ideal for use in POAs [98, 103, 114]. This level of specificity can also be achieved by using similarly designed WGAs. A common format for these arrays includes sets of probes that perfectly match a target sequence and corresponding sets of probes containing one mismatched nucleotide, usually at a central position, with greater signal intensity for the perfectly matched probes indicating detection of the target sequence. Zhou et al. [114] systematically tested a prototype POA consisting of 19-mer probes for different bacterial 16S rRNA gene sequences with one to five mismatches in the mismatched probes. A single mismatch at the central position of the probes decreased the signal to only 15–25% the intensity of the perfectly matched probes. Three to five mismatches reduced the signal to undetectable levels. Although these results demonstrate the potential specificity of this approach, it is still difficult to achieve complete discrimination of rRNA genes using only a single mismatch. Chandler and Jarrell [13] summarized this well when they stated “... because only a small portion of the natural microbial diversity has been identified and because microarray hybridization specificity is not perfect, it is practically and theoretically difficult to know if and when hybridization signals in a new environment result from a perfectly matched or a mismatched probe–target combination.”

One common approach to address this problem is to design multiple perfectly matched and mismatched probe combinations for each organism of interest and then to compare the probe pairs statistically. Unpredictable probes or those providing abnormal results (higher signal intensity for the mismatched probe) are removed from the array or discarded during data analysis. Other researchers have improved the discrimination of matched and mismatched probes by determining the thermal dissociation curve for each probe–target duplex on an array [26, 50, 98]. This has more commonly been done using three-dimensional array platforms but has been demonstrated to work with planar arrays [45]. Li et al. [45] found that this approach could discriminate hybridization to short oligonucleotide probes (18- to 20-mers) with one or two internal (but not terminal) mismatches from hybridization to perfectly matched probes on a planar, rRNA-based array.

The longer oligonucleotide probes (∼40–70-mers) typically used for FGAs are less specific than those used for POAs. However, since most functional genes are more variable than rRNA genes, longer oligonucleotide probes can be used to increase detection sensitivity while still achieving species-level specificity. Rhee et al. [72] reported that 50-mer probes could discriminate sequences less than 88–94% similar to the probes with hybridization at 50°C and 50% formamide. Taroncher-Oldenburg et al. [88] reported a similar value of 87% for a 70-mer probe FGA. This is slightly higher than the 80–85% sequence identity discrimination power of a 400- to 800-bp PCR product-based FGA [106]. The MGAs based on shorter inserts would likely behave similar to the PCR-based FGAs. In addition to percent similarity, long stretches of a probe that are complementary to a nontarget sequence can lead to substantial nonspecific hybridization and should be considered during probe design [36, 39]. The position of mismatches (those distributed across a probe, rather than localized to a select region, produce more specific binding) [44] and the amount of free energy of probe–target duplexes can also affect specificity [35, 46, 88]. A recent study [48] found that by simultaneously considering multiple probe–target characteristics during the design process, specific probes could be produced using more relaxed design criteria than was possible when each factor was examined separately. The results indicated that specific hybridization could be achieved using 50-mer probes with a free energy release of ≤−35 kcal/mol and ≤90% similarity and ≤20 bp continuous stretches to nontarget sequences. The ability to even slightly relax design criteria should increase the percentage of genes, in a given data set, for which probes can be designed. This could be extremely valuable when designing probes from very similar sequence data such as that generated from environmental clone libraries.

Depending on the probe design and objectives of the research, specificity can also be increased or decreased, to a point, by adjusting the stringency of the hybridization conditions (temperature, formamide concentration, salt concentration, etc.) [34, 44, 106]. For example, Wu et al. [107] developed a CGA that could distinguish bacteria at the species level when hybridized at 55°C or at the strain level at 65 or 75°C. This helps to illustrate the need for caution in conducting and interpreting microarray analyses. Use of an array under more or less stringent conditions than that for which it was designed can lead to inaccurate conclusions based on overestimated or underestimated results. Inclusion of control DNA in the hybridization solution that has varying similarity to corresponding control probes on the array can potentially ensure that the correct hybridization stringency is achieved.

Several software programs are currently available for the design of oligonucleotide probes for microarrays. These include ArrayOligoSelector [8], OligoArray [74], OligoArray 2.0 [75], Oligopicker [102], OligoWiz [59], PRIMEGENS [108], PROBEmer [27], ProbeSelect [46], and ROSO [70]. Most of these programs work well for designing probes from whole-genome sequences. However, research by Li et al. [47] found that a considerable portion of probes designed by some of these software programs from groups of orthologous functional gene sequences (such as those produced by clone libraries) were not specific to the target sequence (based on experimentally determined values [72]). With the growing database of environmental sequences, especially the highly similar sequences which are often obtained from clone libraries, it can be difficult to design probes that will not cross-hybridize to related sequences.

To address these issues, Li et al. [47] designed a new software tool called CommOligo. The program uses a new global alignment algorithm to design single or multiple unique probes for each gene using multiple, simultaneously implemented, user-specified criteria, such as the maximal sequence similarity, the maximal length of continuous perfectly matched nucleotides, free energy, self-binding, melting temperature, and GC content. A major advantage of CommOligo is that it can also design single or multiple group-specific probes for related groups of genes that are too similar for the design of unique probes—analogous to the design of PCR primers from conserved regions of grouped genes. CommOligo is currently undergoing additional prerelease testing. Until CommOligo or other improved probe design software is available, researchers should be cautious when using software originally developed for use on whole-genome data for designing oligonucleotide probes from environmental sequence data.

Sensitivity

Although oligonucleotide probes have many advantages for probe design, they are typically ∼10- to 100-fold less sensitive than longer PCR-based or CGA probes [20, 72, 106, 107]. The sensitivities of all of the different array formats have not been directly compared, but examples from the literature have reported limits of 0.2 ng of target genomic DNA for a CGA [107], 1 ng for a PCR-based FGA [106], and 5–8 ng for 50-mer oligonucleotide FGAs [72, 90] in the absence of background DNA. These sensitivities were about 10-fold lower in the presence of background DNA simulating environmental samples [72, 90, 107]. For the 50-mer FGA, this detection limit corresponded to ∼107 cells or 5% of the total community, which agreed with other published studies [15]. He et al. [34] recently compared the detection sensitivity of PCR amplicon and oligonucleotide probes. The PCR amplicon probes had a detection limit of 5 ng of genomic DNA, and the 70-, 60-, and 50-mer oligonucleotide probes had detection limits of 25, 100, and 100 ng of genomic DNA, respectively. These limits equaled approximately 1.9 × 106, 9.2 × 106, 3.7 × 107, and 3.7 × 107 gene copies for the PCR amplicon, 70-mer, 60-mer, and 50-mer probes, respectively. The probes were also used to evaluate gene expression of Shewanella oneidensis MR-1 under different conditions with the 70-mer probe results again being most comparable to those from the PCR amplicons. The detection sensitivities of MGAs and WGAs are likely to resemble PCR fragment-based FGAs and POAs, respectively, depending on the probe design.

Different nucleic acid labeling methods may increase sensitivity [20, 85, 115]. Denef et al. [20] used tyramide signal amplification labeling to increase the signal intensity of a 70-mer FGA ∼10-fold over the commonly used Cy dye-labeling techniques. This approach reduced the detection limit to 1% of cells in the total community. Although the above methods are sufficient for detecting the dominant members of relatively high biomass communities, new approaches are needed for investigating less abundant, but ecologically important, populations.

One currently available option to detect less dominant microorganisms within a community is to PCR-amplify these specific populations, although this has its own set of well-documented limitations [18, 28, 71, 87, 101]. Bodrossy et al. [5] used this approach with an FGA containing 15- to 26-mer probes for particulate methane monooxygenase (pmoA). Although very short oligonucleotide probes were used, populations were detected that comprised as little as 5% of the total community. The use of magnetic beads or other capture techniques may also be useful for enriching certain populations [96]. An option for low-biomass environments that do not produce sufficient quantities of DNA for FGA analysis is the nonspecific amplification of whole community DNA before FGA analysis. Wu et al. [unpublished] have developed a whole community genome amplification (WCGA) procedure that can amplify ng quantities of DNA to microgram quantities with a linear relationship between starting template and final concentration (r 2 = 0.96–0.98). An FGA analysis of DNA amplified by WCGA from low-biomass groundwater samples, which were contaminated with high levels of nitrate and uranium, revealed a correlation between microbial diversity and groundwater geochemistry and contaminant levels.

Many microarrays are currently printed on planar glass slide platforms because this enables high-density printing and high-throughput analyses. The trade-off for this capacity is reduced detection sensitivity. Membrane-based hybridizations are several orders of magnitude more sensitive than microarray hybridizations on nonporous surfaces most likely due to the limited amount of probe material that can be attached to the nonporous surfaces [15]. Researchers are developing new slide chemistries, including ultrathin three-dimensional platforms, which have increased binding capacities but maintain the high-throughput characteristics that make microarray analyses advantageous [31, 98, 114].

Quantitation and Data Analysis

There has been some concern regarding the quantitative ability of microarrays given the potential variability in steps including DNA extraction, labeling, hybridization, and analysis. However, recent research indicates that FGAs and CGAs can be quantitative within a range of concentrations. Wu et al. [106] found a strong linear relationships (r 2 = 0.96) between the amount of pure culture DNA hybridized to a PCR-product FGA and the signal intensity within a range of 1–100 ng. The authors also found a good linear relationship (r 2 = 0.94) for a mixture of 11 different genes varying in concentration from 1 pg to 1 ng. Likewise, Rhee et al. [72] found a strong linear relationship for both DNA and mRNA using a 50-mer oligonucleotide FGA. Various amounts of Thauera aromatica K172 genomic DNA was mixed with 1 μg of S. oneidensis MR-1 DNA as a background. The FGA signal intensity was linear (r 2 = 0.95–0.99) for all detected genes over a range of 75–1000 ng genomic DNA. For mRNA analysis, 3.0 × 106 to 1.6 × 109 Pseudomonas putida PpG7 cells that had been incubated with naphthalene were mixed with 1.9 × 109 S. oneidensis MR-1 cells as background RNA. The FGA signal intensity was again linear for all detected genes (r 2 = 0.96–0.99) over a range of at least 5.0 × 107 to 1.6 × 109 cells. Wu et al. [107] indicated that CGA detection was also linear (r 2 = 0.98) from 25 to 1,000 ng of DNA. The MGAs have not been tested for quantitation but likely resemble FGAs and CGAs. It is unknown if POAs and WGAs based on perfect match–mismatch probes are quantitative.

It can be difficult to compare data between, and even within, microarray experiments due to the use of different analysis methods and variability in printing, labeling, and hybridization. The two-color dye-swap techniques that are commonly used in gene expression experiments work well to determine relative levels of gene expression in pure cultures, but they do not facilitate comparison between experiments and laboratories unless the same control DNA is used. Researchers have developed modified approaches where known amounts of labeled oligonucleotides or DNA fragments are spiked into the hybridization solution as a control [5, 15, 24]. Microarray results are then normalized based on the signal intensity resulting from hybridization of this control DNA with corresponding control probes on the array. The oligonucleotide approach is especially promising for standardizing microarray results since the probes could be synthesized directly from sequence data and thus would be readily available to any researcher; however, additional research is needed to optimize this method for different array formats.

Even if microarray experiments are meticulously designed and conducted, it could be difficult to quantitatively correlate differences in hybridization signals with changes in specific populations due to the large amount of unknown nucleic acid sequences in environmental samples. It is typically assumed that hybridization signal intensity is directly proportional to the abundance of the target organism, but nonspecific hybridization due to uncharacterized microorganisms in environmental samples may occur and confound interpretation. Analysis of key genes with other methods such as real-time PCR may help to validate the quantitative accuracy of major results and strengthen the conclusions drawn from microarray data [72].

Future Perspectives

An ultimate goal of microarray analysis, with respect to microbial ecology research, is to simultaneously measure the activity of multiple microbial populations in relation to different environmental factors. This has traditionally been accomplished by measuring mRNA expression. However, it is still difficult to recover sufficient high-quality mRNA for microarray analysis from many environmental samples. Further advances in mRNA extraction and amplification methods are needed to make microarray analysis of mRNA possible for a broader range of samples. Since mRNA levels are only an indirect measurement of activity, with the translated proteins actually being responsible for most biological processes, it would be best if the actual protein levels could ultimately be quantified. With advances in proteomics, researchers are beginning to develop protein arrays for identifying proteins and studying protein expression and protein–ligand interactions [23, 73]. Protein arrays have not yet been applied to the study of complex environmental samples, but if the technology is further developed and can be successfully adapted, it could be very useful for investigating enzymatic expression in environmental samples.

One of the greatest needs for microarray analysis of microbial communities is the development of standardized methods for data analysis and interpretation. It is continually becoming more difficult to analyze microarray results as more comprehensive arrays, which are necessary to understand many complex communities, are developed. Statistical methods developed for functional genomics may not be appropriate for analyzing the complex data sets often produced from microarray analysis of environmental samples. New statistical methods need to be devised and/or existing methods adapted to meet the specific challenges posed by these types of arrays. The development of improved universal standards would also enhance data analysis and enable comparison of array data between experiments and laboratories.

In order for microarray technology to reach its full high-throughput potential and provide real-time information on microbial populations in environmental samples, it will be necessary for the technology to eventually be automated and field deployable [13]. With advances in microfabrication and microfluidic technologies, it is now becoming possible to assemble all of the chambers, pumps, valves, mixers, heaters, and detectors that are required for microarray analysis on a single chip [49, 51]. These “laboratories-on-a-chip” still face the same analytical challenges as encountered with manual microarrays and are just in early stages of development, but they have the potential to revolutionize microarray analysis of environmental microbial populations. Ultimately, for whichever array format is used, more comprehensive, broad-scale applications are necessary to further validate and demonstrate the analytical power of microarrays for investigating various biological questions.

Conclusions

Researchers have invested considerable effort over the last few years to adapt microarray technology for the analysis of microbial communities. It is now becoming possible to produce microarrays capable of simultaneously characterizing the dynamics and activities of most, if not all, of the microbial populations even in complex samples such as soils and sediments. Several recent studies have successfully used microarrays to investigate aspects of major ecological issues. However, most of these studies were of limited scope with very few yet utilizing the full high-throughput potential of microarray analysis. Further technological advances are needed to improve methods for data analysis in order for microarrays to be applicable to a broader range of samples and for results to be comparable across experiments and between laboratories. Continued development may ultimately allow microarray technology to achieve its promise for comprehensive high-throughput, near-real-time monitoring of microbial populations within ecological communities.