Introduction

In response to their biotic and abiotic surroundings, microorganisms in the course of evolution have developed a broad variety of genetic and physiological traits which allow them to survive and proliferate successfully in their habitats. In natural habitats, microorganisms usually do not exist as pure clonal populations but rather in communities of varying complexity consisting of several to thousands of taxonomically different organisms. Taken together, the number of prokaryotic taxa on earth has been estimated to amount to 106–108 distinct genospecies (Simon and Daniel 2011) which outnumbers the list of described isolated species (roughly 104) by several orders of magnitude. This vast microbial diversity represents a huge natural resource for the isolation of new genes, enzymes, and metabolic pathways which await investigation and eventually exploitation for biotechnological applications. However, the use of molecular marker techniques has demonstrated that, depending on the habitat investigated, only a very small fraction of the organisms (for example from soil samples approximately 0.1–1 %) can be cultivated using standard techniques (Amann et al. 1995). Within the past years, new high-throughput–omics methods have become available which allow the characterization of genes, gene transcripts, proteins, and metabolites from environmental samples without prior cultivation of the respective organisms. The collection of methods focused on isolation, cloning, and sequencing of environmental DNA has been coined metagenomics (Handelsman et al. 1998). Extraction of bacterial DNA from environmental samples was reported more than three decades ago (Torsvik and Goksøyr 1978) followed by its use to study microorganisms (Olsen et al. 1986; Pace et al. 1986) and to construct and perform screenings on environmental gene libraries, and to functionally express metagenomic genes (e.g., Healy et al. 1995; Handelsman et al. 1998).

Today, metagenomic methods are often used to characterize the composition and the dynamics of changes within microbial communities, e.g., by amplification, cloning, and sequence analysis of conserved marker genes or gene fragments, such as 16S rRNA gene sequences derived from environmental DNA samples. On the other hand, metagenomic approaches are not restricted to phylogenetic biodiversity analysis but also allow the retrieval of functional information directly from environmental samples; for example, the identification of novel enzymes for biotechnological applications including biocatalytic reactions implemented within chemical synthesis routes (see Lorenz and Eck 2005; Steele et al. 2009). Along these lines, various metagenomic studies using soil, arctic sediment, hot spring, termite intestine, or cow rumen samples were reported (Wang et al. 2012; Fu et al. 2013; Graham et al. 2011; Nimchua et al. 2012; Ferrer et al. 2012).

The retrieval of genes encoding new and biotechnologically relevant enzymes from the environment can follow two different approaches. (1) Sequence-based strategies depend on methods like PCR-based screening or hybridization of clone DNA against degenerated probes or massive random sequencing of environmental DNA. As a priori sequence information is needed, this approach will deliver only new enzyme variants belonging to already known enzyme families, but fails to identify genes encoding truly novel enzymes unrelated to known enzymes (Liebl 2011; Steele et al. 2009). (2) Sequence-independent functional screening bears the potential to uncover genes for enzymes and enzyme classes with little or even no homology to already known enzyme families (e.g., Delavat et al. 2012). However, success of function-based screening is dependent on the functional heterologous expression of metagenomic genes in a given host organism and the availability of appropriate screening assays (Leis et al. 2013; Steele et al. 2009). In the light of the recent decay of sequencing costs, it becomes increasingly attractive to use a combined approach of both sequence and function-based searches.

Previous reviews have dealt with various aspects of metagenomic studies such as methods of sampling, library construction, high-throughput sequencing, in silico sequence analysis and interpretation of metagenomic data, functional screenings, and the use of metagenomics for discovery of industrial biocatalysts and pharmaceutically relevant compounds (Daniel 2005; Delmont et al. 2011a, b; Ekkers et al. 2012; Leis et al. 2013; Lorenz et al. 2002; Lorenz and Eck 2005; Shokralla et al. 2012; Simon and Daniel 2011; Streit and Schmitz 2004; Taupp et al. 2011; Thomas et al. 2012; Wooley et al. 2010). Here, we will look at metagenomics from a different angle and describe novel expression tools and alternative host organism(s) which we consider useful for metagenomic library construction and screening.

Microbial cloning and expression hosts play an important role in metagenomics for gene library construction, gene amplification, and, in the case of functional metagenomics, also for gene expression. Metagenomic libraries often comprise several Gbp of environmental DNA harboring millions of genes. Strikingly, however, most of this enormous biodiversity remains unused for the discovery and utilization of new proteins and metabolites due to drawbacks inherent to the most widely used expression host Escherichia coli. To date, only very few reports are available which describe attempts to use shuttle vectors to broaden the host-range for functional screening purposes (Aakvik et al. 2009; Angelov et al. 2009; Courtois et al. 2003; Craig et al. 2010; Kakirde et al. 2011; Martinez et al. 2004; Troeschel et al. 2010). Recent advances in “recombineering”, i.e., homologous recombination methods based on targeted recombination using phage-derived recombinase enzymes, now offer interesting perspectives for establishing further multihost systems for comparative functional screening in different hosts (Leis et al. 2013).

In the following, we discuss drawbacks of more traditional microbial expression systems and report about recently developed strategies to further improve already existing or establish new systems for functional metagenome analysis to demonstrate the need for alternative screening hosts and to show the potential which such hosts bear for metagenomic library exploitation.

Drawbacks of established expression systems

The vast majority of genes in any metagenomic library constructed from a complex microbial community is routinely cloned and screened in standard cloning and expression hosts, particularly in E. coli because of its ease of handling, efficient genetic manipulation, and the availability of sophisticated genetic tools. However, it should be noted that the functional expression of genes and operons is a complex molecular biological process, which involves transcription, translation, protein folding, and sometimes export or secretion. Thus, it is highly unlikely that one single expression host like E. coli will be able to functionally express all the heterologous genes and operons contained in a complex metagenomic library (Fig. 1). It can be assumed that close phylogenetic relatedness of the DNA donor strain and the expression host increases the probability of functional expression (Leis et al. 2013). However, in complex metagenomic libraries which can contain DNA from many different deep-branching phyla, the majority of genes originate from organisms unrelated to the screening host. Hence, functional metagenomic screenings for new enzymes can presently access only a small fraction of the tremendous genetic biodiversity. The main reasons include (1) a lack of efficient methods to construct and stably maintain large (meta)genomic gene libraries in prokaryotic organisms other than E. coli (Leis et al. 2013) and (2) the limited expression capacity of commonly used screening hosts due to missing promoter recognition, different codon usage, failure to correctly fold and assemble enzyme proteins, missing capacity for cofactor synthesis, etc.

Fig. 1
figure 1

Strategy to maximize the yield of genes of interest from functional screening of metagenomic libraries. Not all genes encoding an enzymatic function can be identified with sequence analysis because in silico detection relies on similarity to known sequences. The same holds true for other sequence-based methods such as PCR or hybridization strategies using degenerate primers (not depicted in the figure). Therefore, completely novel genes for a given function will be missed with purely sequence-based search efforts. Function-based screening on the other hand also allows the detection of genes which lack similarity to previously known sequences encoding the function of interest. Due to (1) the diverse nature of the cloned environmental DNA with respect to promoter sequences, G + C content etc., and (2) differences in the gene expression machineries of the hosts it can be expected that the detection frequency as well as the nature of the genes identified will differ for the different host organisms used in functional screening. Therefore, since no single host organism is able to express all genes present in a complex metagenome, combining the use of several hosts can improve the yield

In functional screening approaches in practice today, large libraries often containing millions of cloned metagenomic genes must be expressed in a high-throughput manner to identify only one or a few enzymes of interest although there are differences concerning the enzyme type sought for (e.g., see Lorenz and Eck 2005; Lorenz et al. 2002). Because this is a highly labor- and cost-consuming process, the question arises on how current metagenomic approaches can be improved to increase the yield of retrieval of new genes for a desired function from metagenomes. Strategies that aid to achieve this goal include (1) the (parallel) use of different host organisms for metagenomic library screening, (2) the improvement of the expression of heterologous genes within a given host, and (3) the development of rapid, more sensitive screening assays for enzymatic functions of interest. In particular, alternative cloning and expression hosts need to be established and sophisticated genetic tools and methods for their efficient genetic modification must be constructed.

In the following, a selection of specific examples is presented demonstrating the value of using novel microbial expression systems for the functional expression of metagenome-derived genes and operons.

Engineered E. coli strains equipped with foreign sigma factors

A recent transcriptome analysis of E. coli Epi 300 carrying different metagenome fosmids suggested that the transcription of metagenome-derived genes is a limiting step. The E. coli Epi 300 strain (Epicentre, Madison, USA) is the strain that is most commonly used as a host for metagenome fosmid libraries as it is part of a commercial fosmid cloning kit that allows rather efficient fosmid cloning and thus has been used by many labs. Using RNAseq technology in the background of this strain, we observed that many of the genes that were derived from bacteria phylogenetically distantly related to E. coli were less frequently transcribed compared to those genes that originated from closely related species (W.R.S., unpublished data). In bacteria, promoter recognition is carried out by the initiation factor σ, which recruits RNA polymerase core enzyme to the promoter for transcription initiation. Bacteria encode a single housekeeping σ-factor and a variable number of accessory σ-factors that turn on transcription of specific sets of genes in response to environmental stimuli (Wösten 1998). In E. coli, seven σ-factors are known of which rpoD encodes for the housekeeping σ-factor. E. coli rpoD is responsible for the majority of transcription of essential genes during exponential growth and it recognizes a typical −10 and −35 binding motif (Gruber and Gross 2003). Thus, the addition of heterologous σ-factors to the E. coli genome may aid to enlarge its transcription proficiency. As a first step towards this direction, we have constructed E. coli strains harboring an additional rpoD gene of the phylogenetic distant bacterium Clostridium cellulolyticum. One of these strains was designated E. coli UHH01. Initial tests with this strain showed an increase in detection frequency by 20–30 % in functional metagenome screens (W.R.S., unpublished data). This was done by screening for hydrolytic enzymes (i.e., lipases and amylases) on agar plates and in liquid media. Thereby, we observed that both, the parent and the engineered strain, resulted in the detection of different fosmid clones suggesting that the additional rpoD gene resulted in the elevated transcription of functional genes that differed from the parent strain. Although we cannot exclude that the foreign rpoD gene causes an increased stress in the engineered strain, the concept of using additional rpoD genes appears to be promising and it has so far allowed the detection of functional genes that would not have been detected using the nonmodified screening host. Besides the functional metagenomic screening purposes, transcription factor modification and the use of exogenous sigma factors in expression host strains, which has also been called “transcriptional engineering”, can be useful for the improvement of the productivity of valuable compounds in recombinant bacteria and even industrial production hosts (Wang et al. 2014; Yu et al. 2008).

Heterologous expression of large gene clusters

Bacteria produce numerous metabolites with high-value activities such as antibiosis, cytotoxicity, and immunosuppression (Newman and Cragg 2012). Their biosynthesis is genetically encoded by clustered genes which are difficult to target by metagenome screening, as their synthesis in a heterologous production host is hampered by various limitations: (1) The co-expression of all relevant genes has to be achieved, (2) the respective host needs to produce functional enzymes which may to be assembled to higher-order enzyme complexes, and (3) an appropriate screening method must exist allowing to identify the produced metabolite.

(1) The expression of clustered genes is challenging

The concerted functional expression of many genes located in large gene clusters is often limited, since the original promoters are not necessarily recognized by the host RNA polymerases. Furthermore, the use of flanking host-specific promoters only rarely allows complete transcription of metagenomic genes because premature transcription termination frequently occurs due to large DNA template length or transcription termination signals. Moreover, gene clusters are often composed of several transcriptional units arranged in different orientations (Fischbach and Voigt 2010) inevitably rendering genes inaccessible to a single flanking host-specific promoter. As an alternative, the use of T7 RNA polymerase (T7RP) for the expression of clustered genes has been suggested as it was reported to be highly processive and to ignore bacterial transcription termination sites (Zhang et al. 2011; Ongley et al. 2013). In nonmetagenome studies, the T7 system has already proven useful for directed heterologous expression of polyketide and other gene clusters in E. coli and Rhodobacter capsulatus (Zhang et al. 2011; Stevens et al. 2013; Ongley et al. 2013; Arvani et al. 2012). An expression tool named TREX was recently established (Loeschcke et al. 2013) which utilizes convergent T7RP-dependent expression of a given DNA fragment thereby enabling full transcription of all cluster genes irrespective of their orientation and operon structure. As a proof of concept, bidirectional transcription and metabolite production was shown using a carotenoid (6.9 kb) and a prodigiosin gene cluster (21.8 kb). Comparative expression studies demonstrated that the TREX system is applicable in different host organisms such as E. coli, Pseudomonas putida, and R. capsulatus. Thus, in order to overcome the mentioned limitations at the transcriptional level, genetic tools allowing to tune the host RNA polymerases for recognizing metagenomic promoters (see above) or to use alternative viral promoter/polymerase systems for the concerted expression of metagenomic genes can help to adapt standard and alternative bacterial expression hosts for functional metagenome analysis.

(2) Suitability of the host metabolic background is largely unpredictable

The host organism provides the critical background for successful metabolite production by expression of metagenome-derived metabolic pathways. Functional enzymes must be synthesized requiring appropriate codon usage and a folding machinery, supply of suitable precursor molecules, and persistence of intermediates and end products, which finally should not be toxic to the host. These highly complex processes necessarily produce completely different outcomes from different pathway/host combinations. Accordingly, from directed heterologous expression of known pathway-encoding gene clusters such as enterocin AS-48 from Enterococcus faecalis, isomigrastatin from Streptomyces platensis, or violacein cluster from a species of Duganella sp., it is known that results are differential depending on host organisms (Fernández et al. 2007; Feng et al. 2009; Yang et al. 2011; Jiang et al. 2010). In addition, comparative screenings of metagenomic libraries with different hosts led to significantly more positive hits. For example, Craig et al. screened for phenotypes such as pigmentation and antibiosis using Agrobacterium tumefaciens, Burkholderia graminis, Caulobacter vibrioides, E. coli, P. putida, and Ralstonia metallidurans, each displaying different results (Craig et al. 2010). In another study, a metagenome library was functionally screened in Streptomyces lividans focusing on phenotypes such as hemolytic activity and pigment production. The positive clones found were also tested in E. coli, where none of them produced the screened phenotype (McMahon et al. 2012). In this context, again, it is worth to mention that the highly variable results of function-based screening approaches are usually caused by host-specific differences at the expression as well as the metabolic level.

(3) Detection of novel metagenomic activities

Naturally, the assay determines the outcome of any screening. Hence, function-based metagenome screenings for novel metabolites have focused on defined and easily detectable phenotypes, such as antibiotic resistances, antibiotic activities, morphological changes, or pigmentations. To expand the group of targeted compounds, screening hosts can genetically be modified to apply simple high-throughput screening methods. For example, an elegant E. coli-based colorimetric screening method for terpene synthases was developed recently which can be used to identify terpenoid pathways (Furubayashi et al. 2014). Another strategy combining sequence- and function-based methods of screening metagenomic libraries led to the identification of tryptophan dimer biosynthesis clusters in E. coli (Chang and Brady 2011; Chang and Brady 2014). Nevertheless, new screening approaches are needed to uncover more of the microbial chemical world.

Metagenomic strategies can be employed successfully to identify novel enzymes with biocatalytic potential. To this end, hydrolases and oxidoreductases are of special interest. Appropriate screening assays are needed for their detection (Franken et al. 2010; Reymond 2006). Among them, fluorimetric assays have the advantage of higher sensitivity as compared to chromogenic ones, which is of particular importance with respect to the usually moderate expression levels observed in metagenomic libraries. The most common fluorimetric probe is umbelliferone, which has been used in various compositions for screening (; Reymond 2006, 2009). The fluorogenic moiety of the respective substrate molecules (ROUmb) is either located close to site of the enzymatic reaction (entries 1–5 in Scheme 1) or remote from it (entries 6–10) with the latter type of substrates being significantly more stable thus also reducing the frequency of false-positive hits drastically.

Scheme 1
scheme 1

Umbelliferyl substrates (ROUmb) can be used for various screenings maintaining a high sensitivity. The umbelliferonate (UmbO) can be released directly by hydrolysis (entries 1 + 2) or by spontaneous (entries 37) or triggered (entries 810) secondary reactions. The primary enzymatic reaction taking place is indicated by the arrow

These fluorogenic screenings are highly parallelisable using microtiter plates and robotic liquid handling systems. The step from high- to ultrahigh-throughput has recently been demonstrated (Ruff et al. 2012) with an umbelliferone-based monooxygenase screening system using fluorescence-activated cell sorting (FACS). This method allows testing of 10,000 of single cells per minute thus enabling the screening of large metagenomic libraries and further highlights the exquisite signal-to-noise-ratio of fluorescence probes.

High-throughput screening can also be carried out conveniently by using agar plates containing chromogenic substrates; such assays have been described for various hydrolases (see, e.g., Topakas and Christakopoulos 2014; Jaeger and Kovacic 2014) and also for laccases which can degrade, e.g., xenobiotics in waste water and lignin. Golyshin and co-workers demonstrated the use of colorimetric assays for identifying an unknown laccase from mammal ruminal metagenome (Beloqui et al. 2006).

The above-mentioned screening systems allow for the functional identification of novel biocatalytic activities within metagenomic libraries. However, it should be noted that the detailed biochemical characterization of an enzyme still requires time and often high-end specialized equipment, especially when addressing the issue of enantioselectivity (Franken et al. 2010).

Besides the modification of E. coli strains and the development of new tools and detection systems, the expansion of the available expression systems beyond this traditional host via the establishment of phylogenetically diverse new expression hosts and the use of more than one host for screening of metagenomic libraries can help to overcome the drawbacks mentioned above. If sophisticated genetic tools and methods for efficient genetic modification are made available, such hosts can be used in high-throughput functional screening strategies to enhance the detection frequency of the genes of interest (Fig. 1). Examples for novel host bacteria used in the authors’ groups (Table 1) are briefly described in the following paragraphs.

Table 1 Characteristics of alternative host bacteria described in this work

Thermus thermophilus, a thermophilic host bacterium

In the past years, the extremely thermophilic bacterium T. thermophilus has been developed as a host for large-insert library construction and functional screening of genomic and metagenomic libraries at elevated temperatures. T. thermophilus is a heterotrophic aerobic Gram-negative representative of the Deinococcus-Thermus phylum that grows at temperatures up to 85 °C. Genome sequences of T. thermophilus strains have been reported, and some genetic tools for cloning, selection and counterselection, genome modification, and inducible gene expression are available (see Cava et al. 2009; Angelov et al. 2009; Angelov et al. 2013; Liebl 2004). Importantly, T. thermophilus cells are highly and constitutively competent for natural transformation and are not discriminatory with respect to the source of the externally added DNA, which enables efficient introduction of heterologous DNA and genetic modification.

An important issue in functional metagenomic screening approaches is the decision about the insert sizes used for library construction. Small fragments (<15 kb) are cloned into plasmid vectors while high-molecular weight DNA can be used for cloning into fosmids (up to 40 kb) or BACs (up to 200 kb). In small-insert libraries, each clone carries only a few genes, but by using high-copy number replication origins and sometimes strong vector promoters the detection of even weakly expressed genes and weakly active enzymes can be enhanced. In contrast, large-insert libraries carry many genes on each insert but heterologous expression must be driven mainly by native promoters located on the insert. For E. coli, various cosmid, fosmid, and BAC vectors with single-copy origin or alternatively inducible multicopy origins are available (see Leis et al. 2013).

Large DNA fragments of course bear more metagenomic information than small inserts; therefore, theoretically less clones from a library must be screened with functional screening assays, and, in addition, complete gene clusters can be expressed. However, the tradeoff for less assays is the risk that probably not all promoters on the metagenomic inserts will be active in the host’s transcriptional background (Liebl 2011). In addition, other factors such as G + C content, Shine Dalgarno sequences, codon usage, etc. can cause a more or less pronounced bias on heterologous expression of metagenomic DNA, but unfortunately, such effects elicited by the host’s expression apparatus have not been studied systematically.

The possibility to conveniently construct large-insert libraries from high molecular weight metagenomic DNA, i.e., using commercial fosmid vectors with cos sites for packaging of the ligated DNA into λ phage particles prior to infection of E. coli host cells, is a large advantage of the E. coli cloning system which is not available for other host bacteria. For the thermophilic host, T. thermophilus tools are now available which allow the transfer of recombinant fosmid inserts from E. coli to T. thermophilus. To this end, a fosmid library is first constructed in E. coli using the two-host fosmid vector pCT3FK (Angelov et al. 2009) which carries an antibiotic resistance marker which can be selected for in the thermophilic host, and DNA fragments that flank the chromosomal pyrE gene of T. thermophilus HB27. Recombinant fosmids isolated from E. coli library clones are introduced into T. thermophilus by natural transformation where site-specific integration into the T. thermophilus chromosome occurs via homologous recombination at the pyrE locus. In proof-of-principle studies, this two-host fosmid vector system has been used for the comparative functional screening of fosmid libraries constructed from chromosomal DNA from two thermophilic species, Spirochaeta thermophila and Thermus brockianus, in T. thermophilus as well as E. coli. Corresponding clones of both hosts (in E. coli each cloned insert is carried on the recombinant fosmid whereas in T. thermophilus the identical insert is integrated into the host’s chromosome) were subjected to screening for hydrolase activities using plate assays. In both cases, more active clones were found with the host T. thermophilus than with E. coli (Angelov et al. 2009; Leis et al. 2013).

Pseudomonas antarctica, a psychrophilic host bacterium

Heterologous expression of proteins is often hampered due to instability or toxicity of proteins in the mesophilic host E. coli. As many enzymes have a relatively low activity within the psychrophilic temperature range, expression at low temperatures can be an advantage when enzymes show a harmful effect on the metabolism, cell wall, or membrane of the host. Therefore, a psychrophilic expression host was developed in the laboratory of one of the authors (W.R.S.). The P. antarctica strain Shivaji CMS 35 is a nonpathogenic, free-living Gram-negative bacterium phylogenetically related to P. fluorescens and other Pseudomonads. The psychrophilic bacterium is able to grow in common Luria-Bertani (LB) and CASO broth between 4 and 30 °C with an optimum growth temperature at 22 °C (Reddy et al. 2004). It possesses only weak endogenic lipase activity and can utilize adonitol, meso-erythritol, d-galactose, d-glucose, glycerol, meso-inositol, and d-mannitol as carbon sources in contrast to, e.g., d-cellobiose, lactose, d-maltose, and sucrose (Reddy et al. 2004). Fortunately, it is sensitive to most antibiotics commonly used for cloning and it accepts and replicates commonly used broad host range vectors such as pBBR1MCS-5 (Kovach et al. 1995). In order to transform plasmids into the cells, protocols were established that allow easy transformation of P. antarctica by heat shock and electroporation. The uptake of the vector was confirmed by molecular methods, in particular plasmid isolation and PCR. In order to investigate the expression of functional enzymes within the psychrophilic bacterium, six different genes of metagenomic lipases and esterases within pBBR1MCS-5 under control of the lac promoter were transformed into the strain that was grown at 22 °C without further induction. Activity assays on agar plates containing tributyrin and olive oil/rhodamine B as substrate revealed lipolytic activity of the crude cell extracts that was considerably higher than the weak endogenic lipase activity of the wild type (unpublished work).

The genome of P. antarctica was sequenced. It has an overall size of ~6.3 Mb with a G + C content of 59.6 %; it encodes a number of secretory systems including a complete set of genes for the assembly of a type 2 secretion machinery (Chow 2012, and own unpublished data). With its ability to grow at low temperatures, its easy transformability, and physiological properties, P. antarctica has the potential to become a promising expression and screening host for a variety of proteins that cannot be easily expressed in E. coli.

R. capsulatus, a facultative phototrophic host bacterium

The heterologous expression of membrane proteins is a major concern, in particular for biomedical research since nearly 70 % of the available drugs are either directly or indirectly targeting human membrane proteins (Lundstrom 2007). Furthermore, many enzymes of microbes, plants, and mammals that are involved in the synthesis and functionalization of hydrophobic natural compounds, such as fatty acids and terpenes, are either peripheral or intrinsic membrane proteins. However, the intricate nature of membrane proteins often hampers their structural and functional studies because commonly used expression hosts like E. coli are in general optimized for the production of soluble proteins (Schlegel et al. 2010). Consequently, the activity of the membrane protein folding and translocation machinery as well as the intrinsic storage capacity of the host’s membrane is commonly not appropriate or sufficient for foreign membrane proteins produced at high amounts (e.g., Wagner et al. 2006; Nannenga and Baneyx 2011). As a result of these limitations, heterologous expression of membrane proteins often leads to the formation of inclusion bodies consisting of misfolded membrane proteins or it is toxic to the host cell. Therefore, the development of alternative expression hosts is key for the function-based identification and production of novel membrane-bound proteins and enzymes.

R. capsulatus is a photosynthetic Gram-negative α-proteobacterium that has been used as model organism over decades to study the regulation and function of anoxygenic photosynthesis as well as CO2 and N2 fixation (e.g., Wu and Bauer 2008; Gregor and Klug 2002; Tichi and Tabita 2002; Masepohl and Hallenbeck 2010). Beside the photoautotrophic growth mode, where R. capsulatus uses carbon dioxide and dinitrogen as sole C and N sources, its metabolic versatility further enables this bacterium to grow under a broad range of different conditions in the light and dark. Because of its facultative phototrophic nature, R. capsulatus is a promising alternative expression host that is particularly suited for the functional expression of heterologous membrane-bound enzymes and, in turn, for the catalytic conversion and storage of hydrophobic substrates and products: Phototrophic growth conditions induce an intracellular differentiation of the inner membrane, leading to the formation of membrane vesicles that house the photosynthetic apparatus. These membrane vesicles also provide an intrinsically high folding and incorporation capacity for recombinant membrane proteins and can further serve to accumulate catalytically converted hydrophobic compounds. These properties form the prerequisite for a rapid identification and functional overexpression of novel membrane proteins of biotechnological interest.

For the heterologous expression of single and multiple target genes, a set of different broad-host-range tools has been developed allowing comparative expression studies in R. capsulatus under different growth conditions as well as in other Gram-negative bacteria including E. coli and P. putida (Katzke et al. 2010; Katzke et al. 2012; Arvani et al. 2012; Loeschcke et al. 2013). The expression toolbox comprises replicative broad host range expression vectors (termed pRho plasmids) and cassettes for chromosomal integration (ΩSp-PT7 and TREX, see above) carrying either the bacterial aphII promoter for constitutive and moderate expression or the viral T7 promoter for inducible T7RP-mediated high-level expression of target genes. Because of its inducer- and T7RP-independent activity, the aphII promoter is primarily useful for parallelized high-throughput screening approaches in various Gram-negative expression hosts. In contrast, the utilization of T7-RNA polymerase-dependent promoters requires appropriate host strains but allows, as outlined above, the concerted expression of multiple target genes, which are located on a metagenomic DNA fragment or a cluster of functionally coupled genes.

The R. capsulatus T7 expression system could already be used to express soluble recombinant proteins such as the yellow and the flavin-binding fluorescent proteins under heterotrophic and phototrophic conditions achieving protein yields of up to 80 mg l−1 of culture (Drepper et al. 2007; Katzke et al. 2010) and the light-operating protochlorophyllide reductase from the marine phototrophic bacterium Dinoroseobacter shibae (Kaschner et al. 2014). Furthermore, the functional expression in R. capsulatus of microbial and human membrane proteins including membrane-bound enzymes (e.g., P450 monooxygenases) and receptors (e.g., rhodopsins) was successfully demonstrated (Malach, Özgür, Heck, Jaeger & Drepper, unpublished data). Finally, the T7 expression toolbox was also employed to facilitate the concerted expression of naturally clustered genes including the [NiFe] hydrogenase encoding gene cluster from R. capsulatus (Arvani et al. 2012) and the crt gene cluster from Pantoea ananatis (Loeschcke et al. 2013).

Gluconobacter oxydans, a special host for the expression of membrane dehydrogenases

An interesting case where the prerequisites for the in vivo expression and screening of membrane-bound enzymes from metagenomic DNA have recently been established is the case of the acetic acid bacterium G. oxydans. Acetic acid bacteria are acid-tolerant aerobic bacteria known for their special metabolic lifestyle of utilizing membrane-bound, pyrroloquinoline quinone (PQQ)- or flavin adenine dinucleotide (FAD)-dependent dehydrogenases for the incomplete oxidation of alcohols, aldehydes, polyols, sugars, and sugar derivatives. Their membrane dehydrogenases oxidize their substrates on the outer surface of the cytoplasmic membrane in a stereo- and regio-specific manner, feeding the electrons directly into the respiratory electron transport chain. These bacteria are currently used in various efficient whole-cell biocatalytic processes for the production of bulk and speciality chemicals such as organic acids, erythrulose, dihydroxyacetone, pharmaceuticals, etc. After the establishment of efficient genetic tools for chromosomal gene insertion and replacement (Peters et al. 2013a; Kostner et al. 2013), G. oxydans strains have been constructed via step-by-step markerless deletion of all major membrane-bound dehydrogenases (Peters et al. 2013b). These G. oxydans multideletion strains have been successfully used for the expression of heterologous membrane dehydrogenase genes isolated from metagenomes of acetic acid bacteria-containing mother of vinegar microbial communities (Peters, Liebl and Ehrenreich, unpublished work). Functional expression of such metagenomic membrane dehydrogenases in the multideletion strain allows for the rapid and detailed in vivo characterization of their substrate specificity using a sensitive whole-cell activity assay (Peters et al. 2013b).

Conclusion

Today, metagenomic techniques are applied to characterize the composition of microbial communities from environmental samples and to investigate the abundance of marker genes indicative of certain physiological traits. From the biotechnological perspective, metagenomics represents the most important methodology to identify novel genes encoding single biocatalysts or entire biochemical pathways allowing to produce novel enzymes and valuable metabolites. The availability of advanced and high-throughput-compatible gene expression tools, including alternative and broadly applicable microbial expression systems, which can be combined to increase the yield of genes of interest from functional screening of (meta)genomic libraries, will be essential to access the vast natural biodiversity.