2.1 Introduction

Metagenomics has since its introduction in the late 1990s (Handelsman et al. 1998) proven to be a powerful tool for describing microbial communities and their metabolic potentials irrespectively of cultivability. Over the years, both sequence- and function-based screening approaches have led to the discovery of numerous new enzymes and metabolites fulfilling various academic and industrial needs (Ferrer et al. 2015; Fernandez-Arrojo et al. 2010; Novakova and Farkasovsky 2013). The pipeline for Functional Metagenomics spans from sampling, isolation of high-quality environmental DNA (eDNA), and its cloning (including vector design) to metagenomic library construction (including host transformation and transfer), heterologous gene expression, and production of functional molecules in amounts sufficient for detection in high throughput screening (Fig. 2.1). The function-based screening route of metagenome-based bioprospecting therewith complements the sequence-based route, in which eDNA is sequenced using next-generation sequencing methods and resulting sequence datasets mined bioinformatically for genes of interest (Lewin et al. 2013).

Fig. 2.1
figure 1

Graphical representation of the Functional Metagenomics biodiscovery pipeline with its key challenges and potential solutions

Irrespective of the chosen screening route, successful bioprospecting of a metagenomic library starts with the isolation of the eDNA. Its quality and quantity are of major importance for the achievable number of clones of the constructed library and consequently the representation of biodiversity in an environmental sample (Zhou et al. 1996). In order to capture as much of the biodiversity as possible, the applied DNA isolation procedures need to be highly effective in sampling from the diverse microorganisms inhabiting the selected environment (Kakirde et al. 2010). In addition, isolated DNA needs to have a high degree of purity and be free of contaminating substances, such as humic acids that are often present in soil and hamper efficient library construction (Tebbe and Vahjen 1993). Several studies document eDNA isolation procedures that resulted in contamination-free high molecular weight (HMW) DNA (Zhou et al. 1996; Brady 2007; Liles et al. 2008; Pel et al. 2009; Cheng et al. 2014). Contaminating compounds co-isolated with the eDNA can also be successfully removed by gel electrophoretic methods, including conventional (Craig et al. 2010), pulse-field (Cheng et al. 2014), or nonlinear electrophoresis (Pel et al. 2009), followed by size selection of the random fragmented DNA, prior to cloning.

New and improved enzyme discovery is currently the largest field of application for Functional Metagenomics tools. Aside from the catalytic function itself, beneficial properties like robustness under harsh conditions or high activity at low temperatures are often required in industrial applications. Consequently, dependent on the aims of a bioprospecting approach, different environments might serve as eDNA sources (Taupp et al. 2011). The microbial habitat to be sampled usually reflects the desired properties, i.e., subjecting a metagenomic library originating from a thermal vent or a hot deep subsurface oil reservoir to thermostable enzyme screening is likely to have a higher success rate compared to subjecting a glacier or permafrost soil-originating library to the same screening. In many examples such directed metagenomics sampling strategies aiming to increase probability of finding the desired properties have proven successful (Vester et al. 2015; Taupp et al. 2011). Selected examples are, among many others, a cold-adapted esterase enzyme from Antarctic desert soil (Hu et al. 2012), hydrolytic enzymes from cow rumen metagenome (Ferrer et al. 2007), and thermostable lipolytic enzymes from water, sediment, and biofilm samples from the Azores, Portugal (Leis et al. 2015a). However, due to often lower microbial density within some, particularly extreme environments, sufficient DNA yields may not be readily obtainable (Kennedy et al. 2008; Vester et al. 2015; Kotlar et al. 2011). In such cases, isolated metagenomic DNA can be subjected to isothermal amplification (like Phi29 whole genome amplification, WGA) in order to increase DNA yields prior to cloning (Rodrigue et al. 2009; Zhang et al. 2006). However, the challenge of this technology with respect to the formation of amplification artifacts, like chimeras, duplications, and inversions, needs to be considered. Therefore it is well suited for small-insert libraries for the purpose of enzyme discovery, but less suitable for large-insert library cloning where intact biosynthetic gene clusters are targeted.

Following sampling and successful isolation, the eDNA is usually either sequenced directly or cloned in suitable vectors for functional screening approaches (Sect. 2.2). The choice of the vector usually depends on the envisioned eDNA insert sizes, as well as the screening targets and methodology. However, for successful expression of genetic information contained in metagenomic DNA libraries, several additional factors need to be taken into account (Fig. 2.1). Suitable vector systems need to carry host-compatible selection markers, replicate stably and autonomously (ideally in combination with the possibility to control the copy number), may contain functional gene regulatory elements like inducible promotors for high level expression, and preferably enable vector transfer to other host organisms. Suitable host organisms in turn need to provide functionality of the vector elements involved in the production of functional products and allow efficient transcription and translation (Sect. 2.3). In addition, proper folding, possible cofactor supply, sufficient precursor availability for metabolite product formation, as well as means for nontoxic product localization, like secretion mechanisms, are needed. In order to meet the different demands for functional expression, such as codon usage, different assay temperatures, precursor requirements, etc. (Lam and Charles 2015; Uchiyama and Miyazaki 2009), different approaches can be applied in order to maximize the probability of successful expression (Fig. 2.1). E. coli systems designed and optimized for this purpose have so far been most widely used and extensively covered elsewhere (Guazzaroni et al. 2015). The scope of this chapter is therefore to summarize developments of various hosts and heterologous expression systems for functional metagenome screening beyond the common systems available for E. coli only.

2.2 Cloning and Expression Vectors for Environmental DNA

Selection of a suitable vector system for random metagenomic library construction will largely be guided by (1) the expected DNA size encoding the targeted compound of interest, (2) the envisioned subsequent screening approach involving one or more expression hosts, (3) the desired design of the library to be established, as well as in some occasions (4) the quantity of DNA available (Sect. 2.2.1). For approaches to identify new enzymes, small-insert libraries with eDNA sizes of 5–10 kb will in most cases be sufficient to obtain a sufficiently large number of complete gene sequences. Isolation of DNA for small-insert libraries is normally straightforward, since DNA shearing is not a major concern. However, it needs to be considered that a library with an average insert size of 10 kb will require 3–20 times more clones compared to a library with inserts of 30–40 kb to cover the same amount of genetic potential (Sabree et al. 2009). Hence comparably larger amounts of DNA are needed. To identify encoded functions that rely on single genes or small gene loci (e.g., enzyme function or genetic determinants of antibiotic resistance (Riesenfeld et al. 2004)), small-insert libraries are normally sufficient (Kakirde et al. 2010; Sabree et al. 2009). However, in cases where a desired function depends on multiple gene products, libraries harboring larger inserts are needed. These are normally constructed as cosmid, fosmid (30–40 kb), or bacterial artificial chromosome (BAC) libraries (up to ≥100 kb). The construction of comprehensive large-insert libraries can be very laborious, both with respect to the isolation of HMW DNA and successful cloning and transformation of the host. In addition, the lower stability of large inserts in the generated library needs to be considered. Also, the aspect of a higher degree of degradation of low guanine + cytosine (G + C) content DNA and some DNA modifications, which can impair cloning of HMW DNA, can result in a bias within large-insert libraries (Danhorn et al. 2012).

The choice of suitable vector systems is usually also related to the available expression host organism for subsequent screening experiments (Sect. 2.3). Moreover, for some targets, screening in multiple hosts can increase the hit rates (Mullany 2014). Hence library transfer and broad host-range capabilities of an expression vector (Sect. 2.2.2) can be desired characteristics (Craig et al. 2010; Aakvik et al. 2009; Kakirde et al. 2010).

2.2.1 Small- and Large-Insert Random Cloning Vectors

Cloning vectors useful for small-insert metagenomic library construction usually contain a defined promoter for transcription of the inserted DNA sequence. In some cases they are even equipped with two promoters (dual promotor vectors), flanking both sides of the cloning site in order to achieve gene expression regardless of insert orientation (Lammle et al. 2007). The promoters can have different, independent induction mechanisms in order to achieve expression in only one direction at a time to prevent potential mRNA duplex formation that may result in lower protein production (Lale et al. unpublished). For cloning and construction of small-insert metagenomic libraries in Escherichia coli as primary host organisms, standard cloning vectors, such as pUC derivatives, pBluescript SK(+), and pTOPO, or their derivatives (Mullany 2014; Sabree et al. 2009) are frequently used.

In order to allow metagenomic library clones to cover entire biosynthetic pathways, like secondary metabolite clusters, large-insert libraries are required. Such libraries can be generated as cosmids or fosmids based on phage packaging of the eDNA ligated to a respective vector fragment or for very large inserts (up to 100 kb or more) as BACs (Kakirde et al. 2010; Danhorn et al. 2012). Fosmid and cosmid cloning vectors carry inserts of 30–40 kb, and both approaches utilize phage-based transfer of the cloned DNA into the host, usually E. coli. Consequently, the resulting library clones carry inserts within a narrow size range, determined by the packing capacity of the phage particle, and generally rely on gene expression from promoters included in the cloned insert. Cosmids are hybrid plasmids containing cos sequences from the λ phage, whereas fosmids are based on the F-factor replicon from E. coli. Compared to cosmids, fosmids are more tightly regulated with respect to copy number and are hence more stable (Kim et al. 1992; Kakirde et al. 2010). Both cosmids and fosmids are designed to carry antibiotic resistance markers and have broad host-range capabilities (Craig et al. 2010; Cheng et al. 2014; Aakvik et al. 2009; Wexler et al. 2005). Due to the frequent use of both cosmid and fosmid systems for metagenomic library construction, several variants (including commercial ones) are available (Lam et al. 2015; Mullany 2014; Kim et al. 1992; Parks and Graham 1997; Li et al. 2011; Terron-Gonzalez et al. 2013).

For random cloning of very large inserts, 40–100 kb and above, BACs are normally used, relying on the F-factor replicon (Danhorn et al. 2012; Shizuya et al. 1992). BAC vectors have been used in several metagenomic studies (Brady 2007) using, e.g., soil samples (Rondon et al. 2000) and murine bowel microbiota (Yoon et al. 2013). Similar to fosmids and cosmids, there are different BAC systems available, with some of them allowing inducible high copy numbers (Mullany 2014; Warburton et al. 2009; Wild et al. 2002) and/or having broad host-range capability (Mullany 2014; Aakvik et al. 2009; Kakirde et al. 2010). The US-based company Lucigen Corp. (Madison, WI; www.lucigen.com) has developed dedicated broad host-range vector systems for use in Functional Metagenomics. The pBAC-SBO and pSMART-BAC-S vectors both attribute efficient library construction in E. coli and are transferable to both Gram-positive and Gram-negative hosts. They have features allowing selection in several host organisms and gene expression from both insert-flanking regions, and are inducible in copy number (see Chap. 1). pSMART-BAC-S vector provides integration in the host genome only, whereas the pBAC-SBO vector allows both chromosomal integration, as well as extrachromosomal propagation in the recipient.

For DNA experiencing superhelical stress due to, e.g., regions dense in tandem and/or inverted repeats, cloning into circular plasmids can be challenging. In such cases, linear plasmids, such as the pJAZZ vector series (Lucigen), have been designed which can carry large DNA inserts and contain features like transcriptional terminators flanking the cloning site to hinder vector-insert transcriptional interference (Godiska et al. 2010).

2.2.2 Broad Host-Range Expression Vectors

Depending on the desired activity, functional screening in different (or several) hosts can be of high value. As mentioned, E. coli is the most commonly used host both for library construction and functional screening. However, for certain screening activities, such as thermostable enzymes, or for bioactive secondary metabolite production, hosts like Thermus thermophilus (Angelov et al. 2009) and Streptomyces (or other Actinobacteria), respectively, might be beneficial due to their inherent features (Kakirde et al. 2010; Martinez et al. 2004) (see Sects. 2.3.1 and 2.3.2). Metagenomic libraries can be constructed directly in the host where they will be screened. However, the number of transformants obtained is often much lower in such hosts compared to the number of clones that can be obtained in E. coli. Thus, the common method is to utilize shuttle and/or broad host-range vectors for library construction in E. coli, which allows library transfer and screening in the host organism of choice. There are various such vectors available, both for small and large inserts. E. coliBacillus subtilis shuttle systems (plasmid and BAC) have been used for screening soil metagenomes for antimicrobial activities (Biver et al. 2013), and the pMDB14 vector (McMahon et al. 2012) can be shuttled between E. coli, Pseudomonas putida, and Streptomyces lividans, allowing gene expression in different hosts, similar to other systems reported (Sosio et al. 2000; Martinez et al. 2004). For development of psychrophilic expression systems, E. coli shuttle vectors such as a pGEM derivative and a pJRD215 derivative have been constructed, allowing the transfer of constructed libraries from E. coli to, e.g., Psychrobacter sp. and Shewanella livingstonensis (Cavicchioli et al. 2011; Miyake et al. 2007; Tutino et al. 2001). Also, E. coliT. thermophilus shuttle systems have been designed (Angelov et al. 2009; Leis et al. 2015b). Apart from these, several other broad host-range systems have been developed. The pUvBBAC system supports replication in both Gram-positive and Gram-negative bacteria and allows functional screening in Listeria hosts (Hain et al. 2008). pGNS-BAC-1 presents opportunities for a copy induction in E. coli, as well as replication and functional screening in a broad spectrum of Gram-negative species (Kakirde et al. 2010). The pRS44 plasmid system (Aakvik et al. 2009) has been constructed both as fosmid and BAC system, which enables induction based on control on the vector copy number in E. coli and conjugative transfer into other hosts. In addition to the transferable BAC systems, several broad host-range cosmid vectors have also been reported (Craig et al. 2010; Cheng et al. 2014; Wexler et al. 2005).

In order to exploit the benefits of metagenomic library screening in several hosts with complementary features (Martinez et al. 2004; Leis et al. 2015a, b), efficient library transfer between host strains is of high importance. Though library vector isolation by simple plasmid DNA extraction followed by re-transformation into the alternative host is possible, conjugation is in most cases the transformation method of choice. This is generally regardless of whether the library originally was constructed as a cosmid (Wexler et al. 2005; Craig et al. 2010; Cheng et al. 2014), fosmid (Aakvik et al. 2009), or BAC (Kakirde et al. 2010). The conjugative transfer of abovementioned vectors requires the full set of tra genes to be present in the donor (F positive) strain. Libraries to be transferred are often large, and therefore library transfer is preferably done in a high throughput fashion, similar to the high throughput conjugation procedure described by Martinez and co-workers (Martinez et al. 2004).

2.3 Expression Host Organisms

As previously mentioned, E. coli is presently the most commonly used expression host in metagenomic functional screening efforts (Ekkers et al. 2012; Kennedy et al. 2008; Aakvik et al. 2009; Rondon et al. 2000; Parachin and Gorwa-Grauslund 2011). Several dedicated tools for metagenomic library screening have also been developed for E. coli, such as engineered strains suitable for stable replication and copy control of large vectors. These remain at one single copy prior to screening to minimize the potential toxic effects of insert-encoded proteins or other produced metabolites (Taupp et al. 2011). E. coli strains have been modified for optimized heterologous expression, e.g., by expression of heterologous sigma factors that allow recognition of a wider range of promoter structures than E. coli wild-type strains (Gaida et al. 2015), and for heterologous expression of polyketide synthase (PKS) encoding secondary metabolite gene clusters and production of derivable natural products (Zhang et al. 2015). However, even engineered E. coli strains are not in all cases the best-suited hosts with respect to expressing metagenome-encoded functions. This accounts particularly for screening of metagenomic libraries harboring eDNA from extreme environments at conditions not compatible with E. coli’s natural lifestyle as a mesophilic human commensal, like very high or low temperatures. In addition, functional expression of genes from species that are phylogenetically distant from E. coli can be challenging (Warren et al. 2008). This can be due to, e.g., the differences in codon usage, improper promoter recognition, lack of transcription and/or translation factors, hampered protein folding, absence of cofactors, gene product toxicity, and absence of precursor metabolites. It has been shown that only approximately 40% of all genes can be heterologously expressed in E. coli (Gabor et al. 2004). Therefore, the use of multiple, complementary screening hosts has been proposed to express more of the diversity within a metagenomic library (Liebl et al. 2014). Table 2.1 summarizes the most commonly used as well as high potential future host systems for functional metagenome screening.

Table 2.1 Key features of frequently used as well as high potential future hosts for functional expression and screening of metagenomic libraries

2.3.1 Extremophiles as Expression Hosts for Metagenome Screening

Similar to sampling and cloning of metagenomic DNA from an environment that matches the desired properties of an enzyme, it appears reasonable to use a heterologous expression host for metagenomic library screening that functions optimally under respective conditions, such as in vivo screening for thermostable enzymes at elevated temperature in a thermophilic host. In terms of thermophilic hosts for heterologous gene expression, the hyperthermophilic, Gram-negative bacterium T. thermophilus (Deinococcus-Thermus phylum), growing optimally at temperatures as high as 85 °C, is the most well-studied species (Tabata et al. 1993; Cava et al. 2009). Several T. thermophilus strains have been genome sequenced, and their natural competence (Hidaka et al. 1994; Koyama et al. 1986) renders them very efficient in taking up external DNA without source discrimination (Schwarzenlander and Averhoff 2006). In addition, T. thermophilus has been shown to acquire DNA by conjugation, however, not as effectively as by utilizing its natural competence (Ramirez-Arcos et al. 1998; Cava et al. 2007).

A large number of genetic tools have been developed to genetically amend T. thermophilus (Tamakoshi et al. 1997; de Grado et al. 1999). In the 1980s a selection marker in the form of a thermostable version of a kanamycin resistance was developed using mutagenesis (Matsumura and Aiba 1985; Liao et al. 1986), allowing antibiotic-based selection for transformed T. thermophilus cells. Since then other selectable markers stable at high temperature have been used such as the bleomycin-binding protein conferring bleomycin resistance (Brouns et al. 2005), and a hygromycin B phosphotransferase evolved to thermostability (Nakamura et al. 2005). Several plasmids and vectors are available, like the cryptic pTT8 plasmid (Koyama et al. 1990) used to transfer genes into T. thermophilus. pTT8 has also been supplemented with the gene providing thermostable kanamycin resistance described above, resulting in the selectable cloning vector pMKM001 (Mather and Fee 1992) and variants thereof. The Thermus-compatible plasmids have also been engineered further into E. coliThermus shuttle vectors by integration of the cryptic Thermus vectors with commonly used E. coli plasmids from the pUC series, resulting in several variants, e.g., pMY1-3 and pLU1-4 (Lasa et al. 1992; Wayne and Xu 1997). In addition, there are other plasmids available for Thermus, like plasmid pTA103 (Chu et al. 2006), the pS4C, and pL4C plasmids harboring both integrase and transposase (Ruan and Xu 2007) as well as the widely used pMK18 vector carrying the multiple cloning site from pUC18 (de Grado et al. 1999).

As a consequence of the available genetic tools for T. thermophilus, these strains have been used as thermophilic cell factories to complement production of certain proteins in, e.g., E. coli. T. thermophilus has been used to homologously produce Tth DNA polymerase more efficiently than in E. coli (Moreno et al. 2005), as well as for production of an active thermostable Mn-dependent catalase which failed to express in E. coli (Hidalgo et al. 2004). T. thermophilus has also been successfully used in metagenomic approaches, e.g., by Angelov and co-workers (2009). In their work, large-insert fosmid libraries were constructed in E. coli and transferred to a T. thermophilus host. Screening was performed in both species, resulting in different hit spectra. This clearly illustrates the benefits of high-temperature screening for thermostable enzymes. The same authors also constructed a pCC1fos derivative (denoted pCT3FK) which carries T. thermophilus HB27 chromosomal DNA sequences which allow integration in the host chromosome by homologous recombination (Angelov et al. 2009). This vector has been used in the screening of a metagenomic library for thermostable esterases in both E. coli and T. thermophilus hosts, resulting in a higher number of thermostable enzyme candidates in the T. thermophilus than in the E. coli screening (Leis et al. 2015a, b).

On the opposite end of the temperature range, cold environments provide a large understudied biodiversity. Particularly psychrophilic enzymes from such environments are sought due to their unique characteristics, i.e., high activity at low and moderate temperatures, necessitating lower enzyme concentrations to achieve a similar performance compared to higher temperature homologues. Psychrophilic enzymes are considered to be less stable compared to their mesophilic homologues, as their structural flexibility enables them to function at low temperatures and imparts a decreased thermal stability (Feller 2013). However, the biodiscovery of relevant gene functions from these environments is limited to their expression and function in mesophilic hosts. For instance, the utilization of E. coli as a host for the expression of psychrophilic enzymes limits the growth temperature to around 15 °C, which presents a significant barrier to their exploitation in biotechnology (Struvay and Feller 2012).

There are several examples where E. coli has been successfully used in the production of cold-adapted enzymes (Cavicchioli et al. 2011; Wang et al. 2010; Zhang and Zeng 2008). However, the total number of such reports is comparably low, reflecting significant challenges. Two strategies to overcome these challenges are (1) low-temperature adaptation of existing mesophilic expression systems and (2) the development of new psychrophilic expression hosts. The former approach includes engineering the mesophilic expression host for sufficient growth at low temperatures to promote correct folding of recombinant proteins. The co-expression of Cpn60 and Cpn10 from Oleispira antarctica, cold-adapted homologues of the E. coli GroELS chaperonins, provided E. coli with an operational folding system at 4–12 °C (Ferrer et al. 2003). This led to improved growth at low temperatures and enhanced solubility of the recombinant proteins produced. Another example is the utilization of cold-shock promoter systems together with solubility partners for psychrophilic genes in E. coli. Bjerga and Williamson showed that cspA-driven expression of maltose-binding protein (MBP), thioredoxin (TRX), small ubiquitin-like modifier (SUMO), and trigger factor (TF) encoding gene fusion enabled high level production of soluble protein (Bjerga and Williamson 2015).

Dedicated host-expression systems for the production of cold-adapted products have been developed, such as the pTAUp and pTADw vectors for Psychrobacter, found to replicate by rolling circle mechanisms (Tutino et al. 2000). Also, the cryptic replicon plasmid pMtBL from Pseudoalteromonas sp. has been used as a psychrophilic expression vector, shown to have a broad host-range profile compatible to not only psychrophiles but also mesophilic species after fusion with a pGEM derivate (Tutino et al. 2001). Other broad host-range vectors for cold-adapted expression include a variant of pJRD215 carrying a regulatory promoter from Shewanella and a β-lactamase reporter from Desulfotalea (Miyake et al. 2007) and a shuttle vector based on the p54 plasmid originating from a psychrophilic Arthrobacter sp. isolated from a Greenland glacier and pUC18. The latter example resulted in a low-temperature expression system transferrable to not only E. coli but also some high G + C Gram-positive bacteria (Miteva et al. 2008).

2.3.2 Actinobacteria as Hosts for Heterologous Natural Product Formation

The phylum Actinobacteria comprises a comprehensive and diverse group of Gram-positive bacteria predominantly with a mycelial lifestyle. They are potent producers of a plethora of natural products with a wide spectrum of medical applications, including antibacterial, antifungal, anthelmintic, and immunosuppressant compounds (Barka et al. 2016). Among them, members of the Streptomyces taxon are particularly prolific in this respect, accounting for the majority of antibiotics in medical use today (Hopwood 2007). Actinomycete genomes contain a multitude of secondary metabolite gene clusters (Bentley et al. 2002; Ohnishi et al. 2008; Oliynyk et al. 2007; Udwary et al. 2007) of which, however, only a subset is expressed and the respective compounds produced under laboratory conditions. Hence, the majority of gene clusters remains silent, rendering them cryptic, with functions yet to be discovered. Also among Actinobacteria, cultivable strains represent only a minute fraction of the entire diversity (Maldonado et al. 2005), leaving a vast resource of new potential drug candidates untapped, unless new methods to enable cultivation (Zengler et al. 2002), or efficiently allow the heterologous realization of their genetic potential, become available. In that respect, well-described members of the Actinobacteria themselves, like the model species Streptomyces coelicolor, have been proposed as hosts for the heterologous expression of natural product gene clusters (Gomez-Escribano and Bibb 2011, 2012). Their versatility with respect to expressing complex biosynthetic gene clusters, their high G + C codon usage, and the provision of important precursors necessary to simultaneously form natural products of different compound classes (like polyketides, non-ribosomal peptides, lantibiotics, etc.) are excellent rationales to select such strains for metagenome screening for new bioactive compounds. In addition, these Streptomyces spp. strains might prove useful in accessing the potential of cryptic gene clusters of cultivable strains by heterologous expression. In-depth understanding of gene regulation and precursor supply will be instrumental in optimizing model Actinobacteria as functional metagenome screening hosts.

S. coelicolor has been extensively studied with respect to the regulation of secondary metabolite production, and all necessary genetic tools for genetic manipulation, like plasmids and inducible promoters, and large-insert library tools for chromosomal integration (Gust et al. 2004; Kieser et al. 2000; Jones et al. 2013), are fully developed. Also, new tools for fast and efficient genome editing, like the CRISPR/Cas system (Garneau et al. 2010), have been optimized and applied to Actinobacteria to make deletions and directed genomic mutations (Tong et al. 2015; Huang et al. 2015; Cobb et al. 2015). Though its applicability to introduce larger gene clusters into the Streptomyces genome is currently limited, it can be expected that this technology will develop into a powerful tool for reprogramming Actinobacteria for the production of new bioactive compounds. Wild-type S. coelicolor produces several antibiotic compounds of different classes, including the polyketides actinorhodin (Act, Rudd and Hopwood 1979) and coelimycin (Cpk, Gomez-Escribano et al. 2012), the prodiginine undecylprodigiosin (Red, Feitelson et al. 1985), the lipopeptide calcium-dependent antibiotic (CDA, Hopwood and Wright 1983), and the plasmid-encoded cyclopentanoid methylenomycin (Mmy, Wright and Hopwood 1976). However, its genome sequence revealed a much larger potential of bioactive compounds, represented by more than 20 different, mostly non-expressed gene clusters for secondary metabolites (Bentley et al. 2002). Extensive research has been performed to detect and study cryptic gene clusters (Medema et al. 2011; Nett et al. 2009; Zerikly and Challis 2009; Baltz 2008) and ultimately activate them for product formation (Ochi et al. 2014; Rutledge and Challis 2015; Yoon and Nodwell 2014; Zhu et al. 2014). However, regulation of antibiotic production by S. coelicolor is complex and needs to be understood in depth when considering it as a generic cell factory for heterologous natural product formation.

Several factors are involved in triggering antibiotic production in Streptomyces in correlation with the species’ life cycle (Bibb 2005; van Wezel and McDowall 2011). Nutrient depletion and cessation of growth induce morphological differentiation and antibiotic production via the stringent response and guanosine tetra- and pentaphosphate (p)ppGpp (Potrykus and Cashel 2008). Programmed cell death and the release of N-acetyl glucosamine (GlcNAc) trigger the onset of development and antibiotic production via the global regulator DasR (Rigali et al. 2006, 2008). Also, induced mycelial fragmentation by overexpression of cell division activator protein SsgA affects antibiotic production in S. coelicolor (van Wezel et al. 2009). From responses of the global regulatory network, information is passed on to pathway-specific activators encoded within biosynthetic gene clusters, usually controlled in a growth phase-dependent manner (Wietzorrek and Bibb 1997). Once produced in sufficient amount, these are solely responsible for all further downstream regulation of the biosynthetic gene cluster expression. Removal of pathway-specific regulators (Smanski et al. 2012) or exchange of native promotors (Du et al. 2013) as well as overexpression of export proteins (Huo et al. 2012) have led to improved production yields of platencin, gougerotin, and bottromycin, respectively.

Taking all the different layers of regulation into account will be the key for developing Streptomyces into potent heterologous production platforms for natural product discovery, from both silent gene cluster in cultivable microorganisms and realizing the biosynthetic potential in environmental metagenomes. S. coelicolor has been extensively used as heterologous expression platform for antibiotic gene clusters as recently reviewed by Gomez-Escribano and Bibb (2014). By successively deleting the biosynthetic gene clusters for Act, Red, CDA, and Cpk in the plasmid-free (thus Mmy negative) wild-type M145 of S. coelicolor, a strain (M1146) was obtained with largely reduced background of bioactive compounds produced and secreted to the medium (Gomez-Escribano and Bibb 2011). In the same work, additional introduction of point mutations in the genes rpoB and rpsL, encoding the RNA polymerase β-subunit and the ribosomal protein S12, respectively, (strain M1154) led to a pleiotropic increase in the level of secondary metabolite production. Each of these mutations had previously been shown to enhance antibiotic production levels in Streptomyces without negative effects on growth (Shima et al. 1996; Okamoto-Hosoya et al. 2000; Hu et al. 2002) and has been proposed as a new strategy to activate silent gene clusters for new drug discovery (Ochi and Hosaka 2013). M1146 and M1154 have been successfully applied for the heterologous production of numerous antibiotics of diverse classes (Gomez-Escribano and Bibb 2014).

A further optimization of the existing heterologous host strains of S. coelicolor as an optimized Superhost for new antibiotics discovery from environmental metagenomes may be guided by the comprehensive knowledge of physiology and gene regulation of antibiotic production, as well as systems biology understanding of this species. A dedicated fermentation strategy for system scale studies of metabolic switching in S. coelicolor has been established (Wentzel et al. 2012a), allowing reproducible cultivations of S. coelicolor and high-resolution time-scale sampling for full ‘omics analysis (Battke et al. 2011). The dynamic architecture of the metabolic switch in S. coelicolor was studied at the gene expression (Nieselt et al. 2010), the proteome (Thomas et al. 2012) and the metabolome level (Wentzel et al. 2012b). By studying the effect of different mutations, the complex regulatory interplay of nitrogen and phosphate metabolism was elucidated (Martin et al. 2012; Waldvogel et al. 2011). A genome scale model for S. coelicolor is available (Alam et al. 2010), and detailed insight in the structure of the transcription factor mediated regulatory network has been gained (Iqbal et al. 2012).

In addition to S. coelicolor, other Actinobacteria species have been considered as heterologous expression hosts. S. avermitilis, for example, has been engineered as an expression host for heterologous gene clusters (Komatsu et al. 2013), and also S. lividans and S. albus as well as Saccharopolyspora (Baltz 2010) have been used for that purpose. S. lividans was used as host organism in successful screening for anti-mycobacterial compounds (Wang et al. 2000), and both S. lividans and S. albus have been shown to be able to produce products from an introduced Type II PKS pathway (King et al. 2009). Nonomuraea sp. ATCC 39727 heterologously produced microbisporicin and planosporicin (Marcone et al. 2010) more efficiently than as Streptomyces hosts (Foulston and Bibb 2010; Sherwood and Bibb 2013), indicating potential benefits of using several actinobacterial expression hosts for bioactive compound screening of metagenome libraries. Streptomyces spp. have proven to be useful in heterologous gene cluster expression and functional screening for associated bioactivity (Kakirde et al. 2010; Martinez et al. 2005). Screening of a BAC library from soil DNA produced in E. coli and transferred to Pseudomonas putida (low G + C) and Streptomyces lividans (high G + C) resulted in different expression patterns (Martinez et al. 2004), indicating usefulness of the high G + C Streptomyces hosts as complement to other metagenome screening platforms for bioactivity, like polyketide production-optimized E. coli BTRA (Zhang et al. 2015).

Recently, the “Tectomicrobia” candidate phylum including the “Entotheonella” candidate genus has been discovered by a combined single cell- and metagenomics-based approach to describe microbial consortia producing bioactive polyketides and peptides in association with the marine sponge species Theonella swinhoei (Wilson et al. 2014). This study exemplifies the huge potential of marine environments to identify new compounds produced by non-cultivable microbial strains. The genetic optimization of different actinobacterial model strains for natural product formation will help in establishing a platform of different optimized host strains that in combination can potentially be useful in functional screening also for new natural products from such biodiversity with an increased success rate.

2.3.3 Other Expression Hosts for Metagenome Screening

There are several other species apart from E. coli and those discussed above (Sects. 2.3.1 and 2.3.2) that have been considered as hosts for metagenome expression and screening, all with their respective benefits and drawbacks. These species can contribute to building a flexible platform for multi-host expression and screening of microbial metagenomes as suggested before (Liebl et al. 2014).

Mesophilic hosts applied for metagenomic screening, apart from E. coli and the Actinobacteria covered in detail above (Sect. 2.3.2), include species like Agrobacterium tumefaciens (alphaproteobacteria), Burkholderia graminis (betaproteobacteria), Caulobacter vibrioides (alphaproteobacteria), Pseudomonas putida (gammaproteobacteria), and Ralstonia metallidurans (betaproteobacteria) that have been used to screen a soil metagenome (Craig et al. 2010). Also, the alphaproteobacterium Rhizobium leguminosarum has been used in metagenome screening for alcohol/aldehyde dehydrogenases (Wexler et al. 2005). Other mesophilic host bacteria utilized in metagenomic screening include Rhodobacter capsulatus and Gluconobacter oxydans (Liebl et al. 2014), where R. capsulatus has been shown to be suitable for expression of membrane proteins, and G. oxydans to be tolerant to screening at acidic conditions. Also, the low G + C, Gram-positive bacterium Bacillus subtilis, widely used for recombinant enzyme production due to its capability to secrete protein in the medium, has been used in metagenome screening (Biver et al. 2013). Similarly, species of Burkholderia, Sphingomonas, and Pseudomonas (Ekkers et al. 2012; Martinez et al. 2004) have been used, and, by using the bacterial symbiont Sinorhizobium meliloti as expression host, a greater diversity of clones was found compared to screening in E. coli (Lam et al. 2015). In addition, the gammaproteobacteria Pseudomonas fluorescens and Xanthomonas campestris (Aakvik et al. 2009) as well as integrase-mediated recombination of libraries in hosts S. meliloti and Agrobacterium tumefaciens (Heil et al. 2012) have been shown to be applicable for functional metagenome screening.

Even though prokaryotic hosts have been applied successfully in screening of metagenomic DNA libraries with content including eukaryotic DNA (Geng et al. 2012), eukaryotic host systems may be an important area for further development of metagenomic tools and expression hosts. Even though much more prokaryotic vector-host systems have been developed and used through history, there are genetic tools available for yeasts such as Saccharomyces cerevisiae (e.g., Drew and Kim 2012) and Pichia pastoris (e.g., Daly and Hearn 2005), as well as filamentous fungi, for example, Aspergillus (Nevalainen et al. 2005). A mutant strain of S. cerevisiae, defective in di-/tripeptide uptake, has been used in a functional screening of a soil metagenome library for the identification of novel oligopeptide transporters (Damon et al. 2011), demonstrating the potential of eukaryotic hosts in functional screening of environmental metagenomes.

2.4 In Vitro Expression Systems for Functional Metagenomics

Cell-free protein synthesis (CFPS) covers the in vitro transcription of coding DNA to mRNA and its subsequent translation into polypeptide and functional protein by using cell extracts. CFPS is a field in rapid development with the potential to make large impact in both protein production and screening for new enzyme functions in the future. The first CFPS system of E. coli was already introduced in 1961, with the main purpose of studying the process of translation (Matthaei and Nirenberg 1961). Since then, a multitude of advanced CFPS systems using extracts of organisms from all three domains of life, including from Bacteria, Archaea, fungi, plants, insects, and mammals (Zemella et al. 2015), has been developed. With their open nature, CFPS systems bypass a number of limitations existing in cellular, in vivo expression systems, as they are highly flexible with respect to the physicochemical environment, the reaction conditions, and the reaction format for gene expression to take place. In addition, they allow incorporation of nonnatural amino acids/cofactors, avoid biological background, and are not constraint by cell viability in response to toxic proteins being produced. In the absence of membranes to be bypassed, almost unlimited use of substrates for screening of gene libraries is enabled, and library sizes that are not restricted by transformation efficiency of expression host cells. This renders CFPS an increasingly recognized alternative option to cell-based expression systems for both protein screening and production (Catherine et al. 2013).

Several key challenges associated with CFPS have recently been successfully addressed and mitigated, such as low productivities, quality and quantity constraints of DNA templates, posttranslational modifications, and clonal separation for genotype-phenotype coupling. Low productivity has been a major issue due to the rapid depletion of the chemical energy carrier ATP and stoichiometric accumulation of phosphate, binding vital magnesium ions. The development of ATP regeneration methods, in particular utilization of the intact glycolytic pathway to produce ATP from glucose by oxidative phosphorylation (Jewett and Swartz 2004; Calhoun and Swartz 2007; Kim and Kim 2009), represented a major breakthrough in achieving larger protein amounts. Moreover, in situ supply of glucose by hydrolysis of polymeric carbohydrates like maltodextrin or starch could be implemented to control the ATP delivery rate (Wang and Zhang 2009). Other metabolic functions in crude cell extracts for CFPS were used to be beneficial, for example, for the provision of cofactors for produced target enzymes (Kwon et al. 2013).

Several studies have suggested solutions to the challenges connected to high template amounts required, as well as high exonucleolytic degradation of linear DNA template in crude cell extracts. In addition to sufficient template preparation by PCR-based methods (Sawasaki et al. 2002; Endo and Sawasaki 2004), the use of isothermal DNA amplification in connection with CFPS (Kumar and Chernaya 2009) was shown to enable high throughput protein synthesis based on very small amounts of template DNA. mRNA stabilization by inclusion of the terminal stem-loop structures and depletion of extracts from RNase E led to greatly improved protein production (Ahn et al. 2005). More relevant for expression library screening, the protection of linear DNA templates and improved protein production was shown by inhibiting the RecBCD nuclease in E. coli extracts by addition of bacteriophage Lambda Gam (Sitaraman et al. 2004). This was also shown to be achieved by using extracts of E. coli in which the endonuclease I gene endA was removed and the recBCD operon was replaced by the Lambda recombination system (Michel-Reydellet et al. 2005). Also the tethering of linear DNA ends to microbeads in an agarose matrix led to improved DNA template stability (Lee et al. 2012).

For posttranslational modifications during in vitro synthesis of eukaryotic proteins, for example, for pharmaceutical applications, several eukaryotic CFPS systems have been developed as recently reviewed by Zemella and co-workers ((Zemella et al. 2015) and references therein). This includes systems based on S. cerevisiae, the fall armyworm Spodoptera frugiperda, rabbit reticulocytes, CHO cells, and different human cell lines. The set of well-documented eukaryotic CFPS systems also includes plant systems from tobacco BY-2 and the widely used cell-free expression system based on wheat germ embryos which represents a high yield system with correct folding of many protein types, including disulfide-rich proteins (Takai et al. 2010).

In vitro compartmentalization (IVC) represents one possible solution to the demand for clonal separation and genotype–phenotype coupling in cell-free screening systems. Being early addressed by the SIMPLEX approach (Rungpragayphan et al. 2003) using diluted single-molecule templates for PCR and subsequent CFPS in a microtiter format, emulsion-based approaches bear the possibility of substantial library sizes. Small aqueous droplets are prepared in a continuous oil phase to isolate templates in individual micro-reactors for isothermal or PCR-based amplification (Courtois et al. 2008) and CFPS. This represents a promising platform for enzyme activity screening against a wide array of substrates using either FACS- or microfluidics-based screening and sorting methods (Kintses et al. 2010).

The insight in biodiversity and the huge metabolic potential in nature provided by the recent revolutions in next-generation sequencing have renewed attention in the potential of CFPS. Consequently, key improvements have been triggered, greatly expanding the applicability of cell-free systems to HT gene expression and even large-scale protein production (Zemella et al. 2015). CFPS and suitable screening systems may form an ideal platform for the functional screening of enzymes using genomic and metagenomic DNA, independent of the limitations of cell-based systems. In a recent example, a cow rumen metagenomic library was screened for glycoside hydrolases using cell-free expression and utilizing the energy-providing effect of glucose in CFPS extracts (Kim et al. 2011). Energy generation in this case started with the polysaccharides cellulose, xylan, amylose, as well as a small amount of glucose. Enzymatic substrate degradation in a feedback loop then led to increased glucose amounts, ultimately leading to an indicator-detectable pH drop due to acid by-products (Kim et al. 2011).

This example shows that optimized CFPS systems in combination with smart assay design represent a powerful option for expression screening for microbial enzymes with high versatility, in particular when combined with platforms for ultra-high throughput analysis and sorting. Further developments in this field will likely include expansion of CFPS systems to additional microbial species, including from extreme environments, as eDNA from extreme environments may fail to be transcribed or translated by E. coli extracts (Angelov et al. 2009). Hence, “unconventional” microbial systems for functional expression are demanded (Liebl et al. 2014). Pure component systems and extracts have already been described for extremophiles from both Bacteria and Archaea (Endoh et al. 2007; Zhou et al. 2012), including Thermus, Pyrococcus, Sulfolobus, and Thermococcus (Hethke et al. 1996; Tachibana et al. 1996; Ruggero et al. 2006), which might be a valuable resource for future systems.

2.5 Outlook

Metagenomics has proven to be a powerful tool to describe environmental microbial biodiversity and exploit it for metabolic functions of relevance for commercial applications. With the ever-advancing throughput of next-generation sequencing technologies, (meta)genomic DNA sequence databases are filling rapidly, and based on that, our insight into the huge and diverse metabolic potential existing in nature has never been deeper. However, identification of useful functions is ultimately still dependent on experimental proof. Though in silico predictions are constantly improving, the field of Functional Metagenomics will continue to develop as it directly and efficiently links desired function to its determining source code, the eDNA.

E. coli and genetic tools developed for this species have been the first choice in Functional Metagenomics research, both with respect to library construction, recombinant expression, and functional screening. However, it is presently obvious that E. coli has some shortcomings, especially in the light of the growing spectrum of ecological niches and greater microbial diversity being accessed and a broader spectrum of metabolic functions and properties aimed to be exploited. Therefore, along with E. coli, which itself is still being improved further as a screening host for specific target classes, other microbial model systems, potentially more suitable for screenings for particular functions of interest, have emerged in recent years. These include, for example, thermophilic and psychrophilic systems for respective enzyme discovery and actinobacterial systems for secondary metabolite gene cluster expression and bioactive compound formation.

New and better tools are demanded and continuously developed to increase efficiency at the different steps of the Functional Metagenomics biodiscovery pipeline (Fig. 2.1). In addition to dedicated sampling and efficient DNA extraction procedures from diverse natural environments and developments within metagenomic (small- and large-insert) library cloning technology, several other aspects are in focus. Vector development for heterologous expression in and transfer between multiple host species (broad host-range) as well as optimization of different host species to heterologously express genes for bioactive functions will likely continue to converge. In particular, a higher efficiency in large-insert cloning of eDNA and its shuffling between different expression hosts allowing screening in in different organisms with complementary features and capabilities has proven to generate complementary hits (Liebl et al. 2014). It is therefore still highly desired to improve the functional metagenomic pipeline for metagenome-based bioactive compound discovery by means of new expression and screening platforms. Several different host organisms may be included, and shuffling of metagenomic libraries between these, connected to multiple host screening, is of potentially high value. It can be expected that newly developed expression systems aim to be optimal within screening for specific targeted applications (specific enzyme functions or bioactive compounds) or product properties. The concept of specifically accessing environments providing desired properties (e.g., of an enzyme of choice) and subsequently using screening hosts that perform optimally at similar conditions can be expected to produce further valuable output in the future. In addition, within this concept, the metabolic optimization of the host species from different phyla or even domains (including Archaea and Eukaryotes) may be pursued. The integration of new host species of phyla other than the Actinobacteria may expand the options to access biodiversity for medical compound discovery and thus mitigate the threat of antibiotic resistance, as well as help fighting deadly diseases, including cancer.

System biology understanding, the application of new genome editing tools, and synthetic biology principles will guide new approaches to optimize host strains for heterologous expression of metagenomic genes and formation of new natural products. Optimized Superhosts for bioactivity screening based on different model Actinobacteria will enable heterologous expression of biosynthetic gene clusters and compound formation from uncultured bacteria. Well-established thermophilic and psychrophilic host species will be good candidates for further optimization with respect to high- and low-temperature screening. Thereby, optimal hosts should attribute, among others, stable cloning vector maintenance, sensitivity toward relevant antibiotics for selection purposes, and suitable transcription and translation machinery. In addition, they should ensure correct folding, cofactor provision and insertion, relevant precursor supply, as well as counteract toxic effects from product formation (e.g., by product export mechanisms).

In vivo systems for Functional Metagenomics come with their inherent set of challenges, like limitations in achievable library sizes and the spectrum of usable substrates for screening. Consequently, cell-free (in vitro) expression systems have lately emerged as a potential alternative in functional metagenome screening for enzymatic functions (Sect. 2.4). In vitro expression systems still have their own limitations, in particular regarding large-scale production, which, however, is not very relevant for screening and biodiscovery, requiring only small amounts of product. Key challenges are constantly being addressed with new research, and solutions to key bottlenecks have already been found. An expanded spectrum of CFPS species, similar to the diversification of in vivo expression systems, as well as hybrid systems combining beneficial components of different species, can be expected to become available soon. Thus, in combination with ongoing developments of compartmentalization and miniaturization of screening technology, as achievable by, e.g., using advanced microfluidics devices, in vitro systems may become a potential future alternative to in vivo systems in Functional Metagenomics.