Initial considerations

We define “extreme acidophiles” as those organisms whose growth optimum is < pH3. At present, genomic information is available only for bacteria and archaea, although eukaryotic microorganisms are abundant in some acidophilic environments (Johnson 2008; Baker et al. 2009; Cid et al. 2010).

Bioleaching (or biomining) refers to the use of microorganisms to solubilize metals, principally copper, from ores (Rawlings and Johnson 2007). Bioleaching occurs in heaps of crushed ore. A related process, termed biooxidation, takes place in stirred reactors and is used principally for the recovery of gold (Rawlings and Johnson 2007). Both bioleaching and biooxidation use many similar microorganisms, and both can potentially result in the production of acid mine drainage (AMD). Acid rock drainage (ARD) is similar to AMD, but results from natural processes (e.g., thermal acid springs). The genomics of bacteria and archaea from both man-made and natural acidic environments will be considered in this review, although the principle thrust will be on bioleaching environments. Principles that emerge from this focus on bioleaching might be applicable, at least in part, to an understanding of the biology of biooxidation, AMD and ARD (and vice versa).

This mini-review focuses on work that uses genomics, bioinformatics, and “omics” derivatives (transcriptomics, proteomics, metagenomics, etc.) as the principal sources of information. Information has also been included that is derived from a fusion of omics approaches with experimentally oriented research. However, papers that describe almost exclusively experimental results have not been evaluated. Many important literature citations are not reported in this mini-review, and attention is drawn to other reviews where the missing information can be found (Valenzuela et al. 2006; Holmes and Bonnefoy 2007; Quatrini et al. 2007c; Jerez 2008; Siezen and Wilson 2009; Bonnefoy 2010).

Lots of genomes, but are they sufficient?

The first sequenced genome of an extreme acidophile was the bioleaching γ-proteobacterium Acidithiobacillus ferrooxidans ATCC 23270 published in draft form over a decade ago (Selkov et al. 2000). There are now at least 56 genomes of extreme acidophiles completed or in progress with representatives of bacteria (30 genomes, Tables 1 and 3) and archaea (26 genomes, Tables 2 and 3). There are also eight metagenome projects of extremely acidic environments, four of which are associated with the AMD of Iron Mountain. The latter provide sufficient metagenomic sequence coverage to describe draft genomes of four bacterial and nine archaeal species (Table 3). In addition, complete sequences have been determined for 38 plasmids (Table 4) and 29 viruses (Table 5) from acidic environments.

Table 1 Draft in progress and complete acidophilic bacterial genomes
Table 2 Draft in progress and complete acidophilic archaeal genomes
Table 3 Metagenomic projects of acidophilic environments
Table 4 Completely sequenced plasmids from acidophilic environments
Table 5 Completely sequenced viruses from acidophilic environments

Representatives of psychrotolerant, mesophilic, moderately thermophilic and thermophilic microorganisms Gram-positive and Gram-negative bacteria and archaea have been or are being sequenced, providing a first glimpse of the genomics of acidophilic life over a range of environmental conditions. An important question is whether this genome information is sufficient to provide a reasonably complete description of the genomic complexity and, by inference, of the full metabolic potential present in bioleaching operations. It is suggested that the answer to this question is “no” for two major reasons.

First, it is clear that the currently available genomic information has been significantly underexploited as a resource for information, and there is much more that can be squeezed from the existing data. For example, Table 1 shows the presence or absence of nine metabolic features or characteristics (Fe(II) oxidation, sulfur oxidation, Fe(III) reduction, N2 fixation, nitrate reduction, presence of flagellum, type of tricarboxylic acid cycle (TCA) cycle, -trophy and CO2 fixation) for 26 bacterial genomes, predicted from an analysis of the respective genomes. This sums to a total of 234 descriptive features (26 genomes × nine properties). However, inspection of Table 1 shows that 91 or 39% of these features remain to be evaluated (shown by “?” in Table 1). These lacunae need to be filled. Obviously, there are many additional metabolic properties not presented in the Tables that can be predicted from the existing genome data but, as yet, have not been determined.

Second, considerably more microbial diversity is now recognized than was apparent in initial surveys of bioleaching heaps and other acidic environments (Demergasso et al. 2010; González-Toril et al. 2010). For example, a recent study of the variation of 16S rRNA gene sequences of Acidimicrobium spp. revealed extensive strain variation that was so substantial that it might include new species or even new genera (Schippers et al. 2010). In addition, classical techniques of microbial identification can significantly underestimate the true genetic diversity even within a species. For example, although 100% identical at the 16S rRNA gene sequence level, two strains of A. ferrooxidans have 16% difference in their gene content (Valdés et al. 2010). Clearly, a metagenomic and/or metatranscriptomic approach would help assess the genetic variability present during bioleaching. Although several such cultivation-independent projects have been carried out on AMD, none have evaluated the composition and/or dynamics of bioleaching consortia.

It is also unlikely that the full range of bioleaching habitats has been sampled for microbial diversity, especially if one considers the spatial and temporal variations known to occur in bioleaching heaps and the significant variety of mineral substrates being bioleached in different parts of the world, all of which contribute to the diversification of habitat. There are likely to be many discoveries in the future of novel microorganisms that contribute to bioleaching.

Another issue of concern is that many genome sequencing projects were carried out on strains that had been maintained in the laboratory, some for decades, allowing the potential accumulation of genetic modifications such as genome rearrangements, mutations and gene loss. For example, A. ferrooxidans ATCC 19859 is known to possess transposable elements in which mobilization in laboratory growth conditions affect genotype and phenotype (Cabrejos et al. 1999). However, several of the new genome projects, for example those currently underway at the BHPB-FCV-UCN, Biosigma and others listed as “personal communication” (see Tables 1 and 2), describe microorganisms that have been isolated directly from bioleaching heaps with minimal culturing and therefore have had less time to accumulate genetic changes post-isolation.

In spite of these caveats, a deeper understanding of the microbial assemblages, their gene pools, and metabolic potential in bioleaching operations is beginning to emerge from genomics. Future work in this area is likely to be accelerated as the cost of DNA sequencing continues to decline and new high throughput technologies are developed (Eid et al. 2009; Ozsolak et al. 2009).

Metabolic models

Metabolic models have been derived from bioinformatic interpretation of the genome sequences for Fe(II) oxidation, sulfur oxidation, Fe(III) reduction, N2 fixation, nitrate reduction, flagellum formation and the presence/absence of a complete TCA cycle and type of CO2 fixation pathway (Tables 1, 2 and 3). Information regarding these models can be found in the principal publication describing the respective genome (Tables 1, 2 and 3) and references therein. Some additional unpublished information is also provided in Tables 1, 2 and 3. Some of the models listed have received experimental support, but clearly, additional experimentation is needed in many cases to validate the bioinformatic predictions.

Unlocking the secrets of acidophilic proteins

Mechanisms of pH homeostasis used by acidophiles to maintain their intracellular pH around neutral have been recently reviewed (Dopson 2010). However, little information is available concerning the mechanisms that proteins use to correctly fold, make protein–protein contact and maintain function at very low extracellular pH values. This would include proteins located in the periplasm, outer membrane and those that are excreted outside the cell or are embedded in the cytoplasmic membrane but have folds that extrude into the periplasm. These issues are beginning to be addressed using genome sequence data to evaluate the proteomes of acidophiles (Bouchal et al. 2006) to predict subcellular locations (Chi et al. 2007) and to assess protein folding in acidic conditions (Kanao et al. 2010). Also, molecular modeling and simulation processes can predict single protein physicochemical and folding differences between acidophilic and neutrophilic orthologs and can suggest how membrane transporters of acidophiles function when confronted by a ΔpH of about 6 or 7 orders of magnitude across the periplasmic membrane (pH6.5 inside to < pH1 outside) (Duarte et al. 2009).

Many acidophiles are also thermophiles and their virtual proteomes could suggest useful thermo–acido stable proteins. However, only a few enzymes (extremozymes) (Dopson 2010) and one electron transfer protein (Yamada et al. 2004) from extreme acidophiles have been used or have been proposed for use in biotechnological applications.

The predicted proteomes of extreme acidophiles provide a rich, but largely unexploited, hunting ground for proteins that might have useful functions in biotechnological applications.

Comparative genomics can generate models of the ecophysiology of acidic environments

Over a decade ago, investigations began to reveal the complex interactions between microbes inhabiting natural and man-made acidic environments (Johnson 1998; Baker and Banfield 2003). Recent genomic-based analyses of acidophilic microbes have revealed further insight into the metabolic capabilities and the potential interactions that shape these microbial communities, in particular, those related to bioleaching (Barreto et al. 2003; Osorio et al. 2008a; Valdés et al. 2008a, b, 2010). Also, a number of studies on the composition, structure and function of extreme acidic aerial (Gonzalez-Toril et al. 2003; Garrido et al. 2008) and subaerial streams (Tyson et al. 2004; Allen and Banfield 2005; Ram et al. 2005; Whitaker and Banfield 2006; Allen et al. 2007; Lo et al. 2007; Andersson and Banfield 2008; Simmons et al. 2008; Goltsman et al. 2009; VerBerkmoes et al. 2009; Denef et al. 2010a) have contributed to our understanding of the ecophysiology of biofilm-based AMD communities. The use of genomics-enabled methods to study communities with reduced levels of species richness, such as those found in the Iron Mountain AMD, has resulted in a better understanding of the metabolic networks and evolutionary processes that operate within them. In such defined model systems, the molecular and evolutionary base for ecological patterns have begun to emerge, not only facilitating the construction of predictive ecosystem models but also uncovering principles that may explain behavior in more complex systems (Denef et al. 2010b; Mueller et al. 2010)

During bioleaching, the composition of the microbial consortia changes over time as the bioleaching heap undergoes, among other changes, a temperature increase from ambient temperature to about 70–80 °C due to exothermic oxidation reactions. Initially, mesophilic (20–40 °C) consortia rich in bacteria dominate, but as bioleaching proceeds, these microbial communities are replaced first by moderately thermophilic (40–55 °C) consortia, and finally by extremely thermophilic (55–80 °C) consortia dominated by Archaea (Rawlings and Johnson 2007). It is important to know the composition and activity of these evolving consortia in order to develop a better understanding of the biology of bioleaching. Predictions of metabolic potential from genomic data allow preliminary models of the ecophysiology of such consortia to be built that begin to address questions such as who is capable of doing what, to whom, where, when and under what circumstances. For example, genome sequence information provides a catalog of the diverse pathways used by bioleaching autotrophs to obtain fixed carbon and suggests which autotrophs are providing fixed carbon to the heterotrophs at the different stages of bioleaching (Valdés et al. 2010).

Inspection of Tables 1, 2 and 3 permits similar predictions to be made regarding who are the primary fixers of atmospheric N2 in the bioleaching consortia, as has been done for the Iron mountain AMD community (Tyson et al. 2004). It is envisioned that a more detailed understanding of the ecophysiology could indicate if the relationships between microorganisms, for example, between autotroph and heterotroph, are beneficial or detrimental to the bioleaching process.

The ability to develop predictive models of interactions in bioleaching communities, albeit in its infancy, is arguably the most important contribution that can result from an analysis of the genetic and metabolic potential of multiple genomes.

Genomics predicts multiple pathways for CO2 fixation in bioleaching microorganisms

Having so many genome sequences available has changed our perspective of the complexity of pathways that bioleaching microorganisms use to fix CO2. Although the Calvin cycle still appears to be the principal CO2 fixation process at ambient temperatures, it is now clear that other CO2 fixation pathways such as the reverse TCA cycle and the modified 3-hydroxypropionate pathway come into play (Tables 1 and 2) and eventually dominate as bioleaching proceeds and temperatures rise in the heaps (Valdés et al. 2010). It is important to deepen our knowledge of these additional routes and evaluate the role that they play in permitting thermophilic microbial consortia to fix CO2 in bioleaching dumps. A fourth pathway for CO2 fixation has recently been described in members of the anaerobic Archaeal Desulfurococcales and Thermoproteales families (Berg et al. 2010b). Genomes of acidophiles can now be searched for genes potentially encoding this novel pathway.

Increasing knowledge of pathways for fixing CO2 is helping to build models for how carbon fixation might have evolved in early life (Berg et al. 2010a). The study of chemolithoautotrophs has played a particularly important role in the development of such models, for example, the iron–sulfur theory of the origin of life is based on the structural and catalytic similarity of their mineral substrates (e.g., pyrite, FeS2) with the catalytic Fe–S centers of many enzymes and cofactors of chemolithoautotrophs (Wächtershäuser 1988, 2007).

The incomplete TCA cycle is a hallmark of obligate autotrophy in acidophiles

It has been suggested that the absence of genes encoding the irreversible oxidative α-ketoglutarate dehydrogenase complex in the TCA cycle is a hallmark of obligate autotrophy (Wood et al. 2004). The lack of this complex results in an incomplete TCA cycle or a so-called TCA “horseshoe” in which pyruvate can be used as a source to reoxidize NADH (oxidative branch) and for the formation of the biosynthetic precursor molecules citrate and a-ketoglutarate. Published information derived from genome projects supplemented with unpublished data indicates that the horseshoe TCA cycle is found in obligate autotrophic acidophilic bacteria that use the Calvin cycle to fix CO2 (labeled “H” in the TCA cycle column of Table 1). Obligate autotrophic bacteria that use the reverse TCA cycle to fix CO2 also lack the α-ketoglutarate dehydrogenase complex in their TCA cycle. However, they are predicted to contain genes encoding a reversible 2-oxoacid: ferredoxin oxidoreductase complex that could assume the responsibility of the missing α-ketoglutarate dehydrogenase complex (labeled “R” in the TCA cycle column of Tables 1, 2 and 3). All sequenced acidophilic Archaea also use a reversible 2-oxoacid: ferredoxin oxidoreductase complex, but other steps in their TCA cycle are also thought to be absent (labeled “I” in the TCA cycle column of Tables 1, 2 and 3). The presence or absence of genes for specific steps in the TCA cycle/horseshoe could help predict obligate autotrophy in novel microbial genomes.

Genomics proposes models for anaerobic respiration

Industrial bioleaching operations pump air into the bioleaching heap, providing oxygen and CO2 to support microbial growth. However, anaerobic conditions are known to occur in zones in bioleaching heaps where the air has not permeated or where intense microbial activity has resulted in the production of microaerophilic conditions. Whereas considerable information is available describing the oxidation reactions that support microbial growth in bioleaching heaps, less is known about the enzymes and electron transport pathways involved in anaerobic or microaerophilic growth. Metagenomic and genomic data are beginning to be exploited to predict novel candidate genes and inferred enzymes and electron transport pathways that might be used in anaerobiosis. For example, potential anaerobic pathways have been identified in microorganisms that have been demonstrated experimentally to grow anaerobically using Fe(III) or nitrate as final electron acceptors (Tables 1, 2, and 3). Genomics also permits the prediction of anaerobic growth for new genomes and metagenomes, for example: Leptospirillum ferrodiazotrophum, L. ferrooxidans, L. rubarum, and Leptospirillum sp. “5way CG” (using Fe(III)) (Goltsman et al. 2009) and Thiomonas intermedia K-12 (unpublished) and Thiomonas sp. 3A (using nitrate) (Arsène-Ploetze et al. 2010) (see Tables 1, 2, and 3).

Genomic predictions for motility, chemotaxis and biofilm formation

Knowledge of the fundamental physical and biological interactions between a microorganism and a mineral surface is central to understanding the intricacies of interfacial phenomena, such as bacterial recognition and attachment to specific mineral surfaces and biofilm formation. These areas are crucial for understanding the bioleaching process. Whereas advances in understanding motility, chemotaxis and biofilm formation in bioleaching microorganisms have been made through experimental approaches, little has been done to data mine the genome sequences for novel information. Bioinformatic models with supporting experimental evidence have been developed for biofilm formation (Barreto et al. 2005a, b) and quorum sensing (Farah et al. 2005; Rivas et al. 2005, 2007; Soulère et al. 2008; Castro et al. 2009) in a few bioleaching microorganisms. Also, predictions have been made for the presence of flagella genes (Tables 1, 2 and 3). However, it is clear that current genome sequence information is underexploited as a resource for advancing our understanding in this important area.

Metalomics

The study of metal resistance in biomining bacteria using genome data and bioinformatics is another area that is relatively under-exploited. It is known that acidophiles are extremely resistant to a number of metals and metalloids compared to their neutrophilic counterparts, and mechanisms that potentially account for this resistance have recently been reviewed (Dopson 2010). However, no large-scale genomic comparison of metal resistance has been undertaken in acidophiles. Such a study might reveal the presence of global mechanisms employed by acidophiles, as well as supplement our knowledge of genes and pathways involved in resistance to high levels of mercury, arsenic, copper, iron, etc.

Nearly a decade ago, genome information was exploited to predict metal resistance genes in A. ferrooxidans (Holmes et al. 2001). A bioinformatic and experimental analysis of iron homeostasis and its potential regulation has been conducted for A. ferrooxidans (Quatrini et al. 2004, 2005a, b, 2007b), including predictions and experimental validation of binding sites for the master iron regulator Fur and the prediction of the gene clusters that it might regulate (Quatrini et al. 2007a). These data permit the elaboration of integrated regulatory mechanisms and provide a wider overview of how these specific functions could be connected in a major regulatory plan. A bioinformatic analysis of iron uptake and homeostasis has recently been extended to include other bioleaching microorganisms (Osorio et al. 2008a, b). Recently, work has begun to elucidate mechanisms of copper resistance in A. ferrooxidans (Navarro et al. 2009) and Ferroplasma acidarmanus Fer1 using combined genomic and experimental approaches (Baker-Austin et al. 2005).

Metabolic regulation

The study of gene regulation, including transcription factor characterization and promoter structure elucidation, has been significantly improved by the availability of whole genome DNA sequences and the use of high throughput methods to evaluate gene expression. However, current discoveries are concentrated in model organisms like Bacillus subtilis, Pseudomonas aeruginosa, and Escherichia coli K-12, where large amounts of experimental data have been generated. The scenario is dramatically different for many newly sequenced microorganisms, where limited amounts of experimental data are available or, in some cases, where they are difficult to manipulate in the laboratory, as is the case for many extreme acidophiles.

Bioinformatic analysis of the genome data of A. ferrooxidans has been used to predict the regulation of nitrogen metabolism (Barreto et al. 2003; Levican et al. 2008), sulfur assimilation and its regulatory interplay with nitrogen fixation, hydrogen oxidation and energy metabolism (Valdés et al. 2003), iron homeostasis (references in preceding paragraph), CO2 fixation (Esparza et al. 2009, 2010) and other aspects of central carbon metabolism (Appia-Ayme et al. 2006). Some of these models have been supported with experimental evidence.

Genome data of bioleaching microorganisms is beginning to be mined to identify and predict the role of small regulatory RNAs (srRNAs) in gene regulation (Shmaryahu and Holmes 2007). Also, preliminary investigations are beginning to reveal mechanisms involved in the regulation of Fe(II) and S oxidation in A. ferrooxidans (Amouric et al. 2009) including the possible use of a srRNA (Shmaryahu et al. 2009).

The examples of regulatory models described provide some initial insights into the regulatory mechanisms and dynamics operating in bioleaching and provide rudimentary models that help to explain some of the specific adaptations that promote and sustain life in extreme acidic environments.

Metabolic engineering

Metabolic engineering—the practice of manipulating the genetic and regulatory processes within a cell in order to increase the production of a substance or to improve the activity of the organism for some process—has not been exploited in any extreme acidophile. Metabolic engineering requires at least a rough understanding of the metabolic fluxes within the cell in order to identify potential bottlenecks in the reactions that can be manipulated by genetic engineering. Only one such analysis has been published for an extreme acidophile (Hold et al. 2009). Unfortunately, this analysis incorporates a complete TCA cycle into the proposed flux model, whereas it has been shown that A. ferrooxidans is more likely to have an incomplete TCA cycle (Valdés et al. 2008b). The effect of this possible error on the overall interpretation of the flux analysis has not been determined.

It is expected that with the increasing genomic data and bioinformatic interpretation available, metabolic flux analysis and other tools of metabolomics will assume increasingly important roles in helping to understand bioleaching.

Genome diversity

Genome diversity across species and genera is a critical issue to make a more precise interpretation of the metabolic potential of a natural microbial community. In the specific case of extreme acidophilic microbes, only a few species have been used to explore genome diversity and the potential evolutionary processes responsible for this variation.

Several high throughput approaches can be used to study population genomics and evolution in natural environments. The tools for these studies range from whole genome sequencing of isolated representatives and the subsequent elaboration of specifically designed comparative genome hybridization (CGH) microarrays to the use of metagenomic approaches for the generation of a picture of the microbial diversity and predicted functional properties of an environmental sample.

Genome sequencing followed by sequence interrogation using specifically designed CGH microarrays has been carried out across a single phylogenetic branch of eight strains of Thiomonas to evaluate genome variation (Arsène-Ploetze et al. 2010). The results suggest that the Thiomonas genome has evolved through the gain or loss of genomic islands and that this evolution is influenced by the specific environmental conditions in which the strains live.

Sequencing and analyses of three representatives of the Acidithiobacillus genus (A. ferrooxidans, A. thiooxidans, and A. caldus) have provided a snapshot of the main functional differences that help shape the ecophysiology of the extreme acidic and biomining niches. Major differences in gene content between the three species demonstrate that different branches of the Acidithiobacillus genus have evolved different strategies for the oxidation of reduced inorganic sulfur compounds (RISCs) and that some have specialized to carry out critical metabolic processes such as iron oxidation and nitrogen fixation (Valdés et al. 2009; Valdés and Holmes 2009). In addition, a phylogenomic approach based on gene family comparisons of the three acidithiobacilli has identified a conserved genome core inherited from their common ancestor and sets of dispensable and exclusive genes, constituting the pangenome of the Acidithiobacillus genus (unpublished data).

An assessment of genome variation in extreme acidic environments has also been provided by metagenomics studies in AMD (Simmons et al. 2008). In this study, a population analysis of strain genomic variation was determined for Leptospirillum group II by deep metagenomic genome sequence analysis (about 20× coverage). Results show that the population is dominated by one sequence type, but relatively abundant variants (>99.5% sequence identity to the dominant type) at multiple loci, and a few rare variants are also present. Blocks of other Leptospirillum group II types (approximately 94% sequence identity) have recombined into one or more variants. Heterogeneity in genetic potential within the population arises from localized variation in gene content, typically focused in integrated plasmid/phage-like regions. Some laterally transferred gene blocks encode physiologically important genes, including quorum-sensing genes of the LuxIR system. This study demonstrates that significant intrapopulation sequence variation occurs due to recombination and mutation and to the acquisition or loss of unique gene features by phage, plasmid, or transposon insertion/deletion.

Extensive intraspecies genome variation has been detected in A. ferrooxidans. A comparison of the genome sequences of A. ferrooxidans ATCC 23270 and ATCC 53993 demonstrated that, whereas they are 100% identical at the ribosomal DNA level, they exhibit 16% difference in gene content. This difference is mainly accounted for by the presence of two large genome islands (indels) close to 300 kb in ATCC 23270 and 200 kb in ATCC 53993, respectively, and several smaller indels not shared between the two genomes (Valdés et al. 2010). These indels show extensive differences in gene content including in tRNA genes, EPS synthesis genes, phage immunity systems of the CRISPR/cas type, and metal resistance/tolerance genes. Expression and correct loading of some of the tRNAs have been experimentally verified (Levicán et al. 2009). The indels contain genetic elements such as terminal repeated sequences and genes potentially encoding enzymes involved in DNA recombination, suggesting that they were incorporated into the genomes via lateral gene transfer.

An analysis of the biogeography and the spatial–temporal distribution of the variable gene content of seven Sulfolobus islandicus genome sequences also discovered extensive genome variation explained mainly by recent strain-specific integration of mobile elements and sectors of gene loss (Reno et al. 2009). Although S. islandicus is not present in biomining operations, perhaps the major conclusions of this study may be extrapolated to the related Sulfolobus species found in bioleaching heaps. The evolutionary independence of each population allowed the exploration of genome dynamics over very recent evolutionary time, beginning approximately 910,000 years ago. On this time scale, genome variation largely consists of recent strain-specific integration of mobile elements. Localized sectors of parallel gene loss were identified; however, the balance between the gain and loss of genetic material suggests that S. islandicus genomes acquire material slowly over time, primarily from closely related Sulfolobus species. Examination of the genome dynamics through population genomics in S. islandicus exposes the process of allopatric speciation in thermophilic Archaea and brings us closer to a generalized framework for understanding microbial genome evolution in a spatial context.

These investigations demonstrate that extensive genome variation can occur within species. They provoke questions about the rate and degree of genome evolutionary processes that can occur in fairly restricted environmental conditions. They also raise the possibility that current molecular techniques based on ribosomal DNA typing for detecting and characterizing microorganisms in environments such as bioleaching heaps might significantly underestimate the true genetic and metabolic diversity present.

Mobile elements are agents of genome flux

Comparative genomics provides an unprecedented opportunity to evaluate the extent to which horizontal gene transfer occurs and how genetic material is dynamically added (or lost) from prokaryotic genomes through promiscuous genetic exchanges. Diverse mobile genetic elements (MGEs), including plasmids, viruses and transposons facilitate the flow of genes between prokaryotes. In extreme acidophiles, the best studied MGEs are plasmids from bacteria of the Acidithiobacillus genus (Rawlings and Kusano 1994; Rawlings 2005; Lipps 2006; van Zyl et al. 2008) and from archaea of the Sulfolobales family (Prangishvili et al. 1998). Several of the above, plus other plasmids from acidophiles amounting to 38 in total have been completely sequenced (Table 4).

The availability of genome sequences from several closely related extreme acidophiles has provided the basis for analyses of the frequency, location and phylogeny of insertion sequence elements (IS) and non-autonomous miniature inverted (MITE)-like repeat elements from Sulfolobus spp., Thermoplasma spp., Ferroplasma spp., and Picrophilus torridus (Brügger et al. 2004; Filee et al. 2007). The number and diversity of IS and MITE-like elements differ greatly between species (Brügger et al. 2004) and between strains (Allen et al. 2007). Such abundance and diversity are of great relevance since significant levels of transposon-mediated genome rearrangements have been shown to occur in archaea (Redder and Garrett 2006; Allen et al. 2007). Information on IS and transposons for other acidophiles is more scattered, although some details are known for the Acidithiobacilli (Yates and Holmes 1987; Zhao and Holmes 1993; Clennel et al. 1995; Oppon et al. 1998; Holmes et al. 2001; Kondrat’eva et al. 2005; Tuffin et al. 2005; Kotze et al. 2006; Kondrat’eva et al. 2008) and the Leptospirilli (Tuffin et al. 2006; Goltsman et al. 2009).

There are 29 completely sequenced viral genomes from acidic environments (Table 5). In the Sulfolobales, a large spectrum of viruses belonging to previously uncharacterized viral families have been sequenced and described (Prangishvili et al. 2001, 2006). Metagenomic studies have also reported the simultaneous sampling of microorganisms and co-occurring viruses (Andersson and Banfield 2008), providing the first glimpses into the dynamics of virus–host interactions and into the effect that such interactions may have on fine-scale genetic heterogeneity within communities. A study of the biogeography and the spatial–temporal distribution of the variable gene content of seven S. islandicus genome sequences explores genome dynamics over very recent evolutionary time (Reno et al. 2009). On this time scale, genome variation largely consists of recent strain-specific integration of MSEs and localized sectors of gene loss and gain, in which the gain was primarily from other Sulfolobus strains within the community. Although S. islandicus is not present in biomining operations, evidence from this microorganism may possibly be of relevance to the understanding of the structure and evolution of bioleaching communities that contain related Archaea.

Concluding remarks

  • Bioinformatic interpretation of genome sequences has greatly enhanced our understanding of microbial metabolic potential in natural and anthropogenic acidic environments, including industrial bioleaching heaps. Most importantly, it has also allowed preliminary models to be constructed of metabolic and genetic interactions (ecophysiology) within these microbial communities and has provided a rich intellectual resource for microbiologists that has potential to open innovative and efficient research avenues. Genomic approaches have been especially valuable given the dearth of information coming from classical genetic manipulation and other areas of experimental research.

  • Despite these promising beginnings, a major conclusion is that the genome projects have helped focus attention on the tremendous effort still required to understand the biological principles that support life in extremely acidic environments, especially those that might allow engineers to take appropriate action designed to improve the efficiency and rate of bioleaching and to protect the environment.

  • Although deeper interpretation of existing genome data and analysis of more genomes will help, major novel insights into the metabolic potential of bioleaching communities and how these communities change in space and time during the lifetime of a bioleaching operation will probably come from metagenomics associated with high throughput metatranscriptomic and metaproteomic studies coupled with multidimensional data analysis. The ever decreasing costs of DNA sequencing will allow more researchers in the bioleaching area to use high throughput genomic techniques. This information, when linked to physicochemical studies of the bioleaching environment, might suggest operational parameters that could be manipulated to enhance bioleaching. Genomic and transcriptomic tools (e.g., microarray analysis) are likely to mature into the development of routine monitoring tools for assessing microbial presence and function during bioleaching.