Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

5.1 The Archaea

5.1.1 Archaeal Diversity

Pioneer work in the 1970s first recognized the non-monolithic nature of prokaryotes, and the distinctiveness of “archaeabacteria” (Balch et al. 1977; Fox et al. 1977; Woese and Fox 1977). Ribosomal RNA (rRNA)-based analyses further supported these findings and this uniqueness was eventually formalized with the establishment of the three domains of Life: Eukarya, Bacteria and Archaea (Woese et al. 1990).

The first visual representations of the archaeal branch of the Tree of Life were sparsely populated, and sub-divided into Euryarchaeota and Crenarchaeota (Woese et al. 1990). They included only cultured representatives, all of which originating from extreme environments. Advances in cultivation-independent techniques, and particularly the use of metagenomics and single amplified genomes (SAGs), revealed several novel phylogenetic groups, which are quickly reshaping our view of the Archaea (e.g. Eme and Doolittle 2015). Indeed, we currently witness a flurry of ongoing additions of new archaeal phyla, which culminated in the creation of two superphyla, commonly referred to as TACK (Guy and Ettema 2011) and DPANN (Rinke et al. 2013).

The TACK superphylum encompasses the Thaumarchaeota (Brochier-Armanet et al. 2008), Aigarchaeota (Nunoura et al. 2010), Crenaerchaeota (Woese et al. 1990), Korarchaeota (Barns et al. 1996), and the recently suggested Bathyarchaeota (Meng et al. 2014). The DPANN superphylum includes Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanohaloarchaeota, and Nanoarchaeota (Rinke et al. 2013). However, there has been debate on the taxonomic ranking of the Nanoarchaeota since their discovery (Brochier et al. 2005; Huber et al. 2002; Waters et al. 2003). The rapidly evolving nature of the archaeal branch is further highlighted by the (a) even more recent description of the Pacearchaeota and the Woesearchaeota, two massive novel groups within the DPANN superphylum (Castelle et al. 2015), (b) description of the putative new order “Altiarchaeales”, placed within the Euryarchaeota (Probst and Moissl-Eichinger 2015), and (c) discovery of the “Lokiarchaeota” (Spang et al. 2015). The “Lokiarchaeota” represent a candidate novel phylum within the TACK superphylum and seem to bridge the phylogenetic gap between Archaea and Eukarya (Spang et al. 2015). Some authors believe that this discovery might result in the future fusion of both branches of the Tree of Life into a single domain (Fig. 5.1).

Fig. 5.1
figure 1

The Tree of Life, as originally envisioned (a), and expanded view of latest additions to the Archaea branch (b). Recent findings point suggest an alternative placement for the Eukarya branch, and possible fusion with Archaea (c). Reprinted from Current Biology, Vol 25 No 6, Eme, L. & Doolittle, F., Microbial Diversity: A Bonanza of Phyla, R228, Copyright (2015), with permission from Elsevier

5.1.2 Archaeal Ecology

From an ecological perspective, members of the domain Archaea were traditionally split into thermophilic, methanogenic, and halophilic, all of them isolated from extreme environments (e.g. Ferrera et al. 2008). The widespread use of molecular-based methodologies brought drastic changes, and revealed members of the Archaea to be much more diverse and ubiquitous than previously expected (see Sect. 5.1.1). Archaea populate and thrive in a variety of cold and moderate environments, including landfills, soils, fresh-water sediments, and deep-sea locations, and are involved in symbiotic relationships, e.g. with sponges (e.g. Antunes et al. 2011a; Ferrera et al. 2008; Karner et al. 2001; Leininger et al. 2006; Preston et al. 1996). They contribute to global energy and element cycles, most noticeably via ammonia oxidation in pelagic environments and soils (Ferrera et al. 2008). Archaea might also play important roles in the human body with methanogenic archaea present in the human gut and oral cavities (de Macario and Macario 2009), and abundant Thaumarchaeota detected on human skin (Probst et al. 2013).

5.2 Archaea in Saline Environments

5.2.1 High Salinity Biotopes

An environment is considered hypersaline when its salt concentration surpasses that of the average seawater (i.e. 3.5 % total dissolved salts). Many of the known hypersaline water bodies derive from simple evaporation of seawater, therefore closely mirroring its ionic composition and proportions. These are known as thalassohaline, as opposed to athalassohaline environments, which are usually derived from inland water bodies, hence the dissolved ions are of non-marine proportions (DasSarma and Arora 2001; Rodríguez-Valera 1988).

Hypersaline biotopes occur in high abundance in arid, coastal and deep-sea locations across the globe. Seawater often penetrates through seepage or narrow inlets near coastal areas creating several small evaporation ponds. Well known examples are (a) the Solar Lake and Gavish Sabkha near the Red Sea coast, (b) Guerrero Negro on the Baja California coast, (c) Lake Sivash near the Black Sea, (d) Sharks Bay in Western Australia, and (e) several locations in Antarctica (e.g. Deep Lake, Organic Lake and Lake Suribati). The number of hypersaline water bodies in coastal areas is further augmented by the numerous artificial solar salterns constructed throughout the ages for the production of sea salt. Natural inland hypersaline lakes have higher salinities than coastal ones and include the Dead Sea (Middle East) and the Great Salt Lake (USA), which are the two largest and best-studied examples. The conjugation of high salinity and alkaline conditions produces unusual alkaline hypersaline soda brines. Some of the better-known examples include the Wadi Natrum lakes, Egypt; Lake Magadi, Kenya; the Great Basin lakes, western United States (Mono Lake, Owens Lake, Searles Lake, and Big Soda Lake), and several others in China, India and throughout the world.

Another type of hypersaline biotopes are the often-overlooked saline soils. These include desolate areas present in, for example, Death Valley (California, USA), Alicante (Spain), dispersed locations across Iraq, and Dry Valleys (Antarctica), among several others (Ventosa et al. 1998). Additional examples of less conspicuous highly saline environments include pickled food, fermented products of oriental cuisine (soy sauce, fish paste), surfaces of salt-excreting desert shrubs, human or animal skin, and other places exposed to periodical drying (Galinsky & Trüper, Galinski and Trüper 1994; Lee 2013).

The least explored hypersaline environments include subterranean brines, evaporite deposits, and brine-filled deep-sea basins. Exploration of such environments has been hampered by the remoteness of such locations, and technical and sampling impediments. The underexplored potential of such locations has attracted considerable attention in the last decades, and resulted in several interesting studies (e.g. Antunes et al. 2008, 2011a, b, 2015; Bougouffa et al. 2013; Daffonchio et al. 2006; Fish et al. 2002; Guan et al. 2015; Joye et al. 2009; Mapelli et al. 2012; McGenity et al. 2000; Siam et al. 2012; van der Wielen et al. 2005; Vreeland et al. 2000; Wang et al. 2011), and some preliminary insights into potential applications of their microbial inhabitants (e.g. Antunes et al. 2011b; Mohamed et al. 2013; Sagar et al. 2013; Sayed et al. 2014).

5.2.2 The Halobacteria: Extremely Halophilic Archaea

Microbes living in hypersaline environments are called halophiles. Based on their preferred salinity, they can be categorized as slight (0.3–0.8 M or 1.7–4.8 % NaCl), moderate (0.8–3.4M or 4.7–20 % NaCl), or extreme halophiles (above 3.4 M or 20 % NaCl) (Ollivier et al. 1994). Extreme halophiles are traditionally associated with the members of the euryarchaeal class Halobacteria. This class contains the single order Halobacteriales and its single family Halobacteriaceae, although a recent proposal argues for splitting Halobacteria into Haloferacales, Natrialbales, and an emended order Halobacteriales (Gupta et al. 2015). The class Halobacteria currently includes 177 species with validly published names, placed in 48 genera (LPSN- List of Prokaryotic Names with Standing in Nomenclature 2015; Table 5.1).

Table 5.1 Extremely halophilic archaea: List of genera within the Halobacteria and associated patents

Extremely halophilic behaviour is not, however, an exclusive characteristic of the Halobacteriales, as it is also observed in Methanohalobium and Methanohalophilus, of the family Methanosarcinaceae, within the Euryarchaeota (Oren 2000). No halophiles have, thus far, been described within the kingdom Crenarchaeota (Oren 2002).

5.3 Applications of Halophilic Archaea

Mankind has been using halophiles for at least 5000 years. For example, the characteristic red coloration seen in salterns across the globe is imparted mostly by halophilic archaea, and aids in the process of salt crystallization. Other ancient applications include the production of fish sauce, soy sauce and other traditional fermented foods (e.g. Lee 2013).

However, an exponential increase in the applications for halophiles has been observed in the last few decades, particularly after the discovery of extremophiles and extreme-condition adapted enzymes (extremozymes). The industrial application of extremozymes, is clearly the most prominent direct applications for halophiles, and most other extremophiles. However, further exploration is leading to an increase in the number of isolated strains, general knowledge, and number of applications for halophiles and halophilic archaea (Table 5.1; Fig. 5.2).

Fig. 5.2
figure 2

Overview of the evolution in number of publications and patents associated with Archaea and Halophilic Archaea (Publications: data collected using PUBMED; Patents: data collected using Google Scholar)

5.3.1 Archaeal Pigments

Environments with high densities of halophilic archaea frequently have a characteristic red coloration. This is mostly due to the production of C-50 carotenoid pigments (α-bacterioruberin and its derivatives mono-anhydrobacterioruberin (MABR) and bis-anhydrobacterioruberin (BABR), along with small fractions of C-40 carotenoids such as lycopene and β-carotene) (Yatsunami et al. 2014), which are found in the membranes of several halophiles that thrive in such environments. The reddening of the brines contributes to the absorption of light energy, thereby increasing water evaporation and speeding up the process of salt crystallization (Oren 2002). Within these pigments, β-carotene is the most widely used, mainly as a natural food colorant and as an antioxidant, but also as an important additive in cosmetics, multivitamin preparations, and health food products (nutraceuticals) (Margesin and Schinner 2001; Oren 2002, 2010).

5.3.2 Bacteriorhodopsin

Some of the most interesting uses of halophilic archaea arise from the different proposed applications of bacteriorhodopsin. This molecule, discovered in the early 1970s, is the key protein of the halobacterial photosynthetic system (Hampp 2000; Oren 2002). It is present in Halobacterium salinarum and a few other representatives of the Halobacteriaceae where it forms a two-dimensional crystal integrated into the cellular membrane in patches (usually referred to as “purple membrane”). Bacteriorhodopsin is involved in the light-driven ejection of protons from the cell, establishing a protonic gradient across the membrane. Cells couple the dissipation of this gradient to the production of energy (i.e. ATP) by a membrane-bound ATPase.

The naturally occurring two-dimensional crystalline structure of bacteriorhodopsin is responsible for its (a) astonishing stability toward chemical and thermal degradation, and (b) photosensitivity and cyclicity to illumination. This favourable combination of properties clearly distinguishes the halophilic protein from synthetic materials and makes it attractive for numerous applications (Hampp 2000). These include holography, spatial light modulators, artificial retina, artificial neural networks, optical computing, and new types of optical memories (Margesin and Schinner 2001). From 2005–2010, over 50 patents were granted associated with different uses of bacteriorhodopsin (Trivedi et al. 2011).

5.3.3 Bioplastics

Polyhydroxyalkanoates (PHAs) are a heterogenous family of polyesters, usually used as intracellular carbon storage compounds (most frequently in the form of poly-β-hydroxybutyrate, PHB). The properties of some PHAs are comparable to those of polyethylene and polypropylene with further advantages such as biodegradability, complete water impermeability, and biocompatibility, making them a viable alternative to oil-derived thermoplastics (Divya et al. 2013; Margesin and Schinner 2001; Ventosa and Nieto 1995).

Some halophilic archaea such as Haloarcula marismortui and Haloferax mediterranei were successfully used to produce high amounts of PHA (Han et al. 2007). H. mediterranei can accumulate up to 6 g (60 % of the total biomass dry weight) of PHB per liter of culture using inexpensive starch (DasSarma et al. 2010) or rice bran as carbon source (Huang et al. 2006). The vulnerability of the haloarchaeal cells to pure water (no salt) facilitates isolation of PHA granules by hypoosmotic shock treatment (Quillaguamán et al. 2010). This cheap, straightforward and high yielding harvest procedure reduces downstream processing costs which can account up to 40 % of the total production costs for bacterial PHA production (Choi and Lee 1999).

5.3.4 Enzymes

The inability of “normal” enzymes to operate under the harsh conditions imposed by many industrial processes has limited their widespread use. The discovery of extremophiles and their extreme-adapted extremozymes, is revolutionizing this field with an apparently unceasing range of novel industrial applications. Furthermore, as extremozyme discovery is coupled with enzyme tailoring by rational engineering or directed evolution, the development of economical bioprocesses will accelerate and be enabled on larger scales (Demirjian et al. 2001; DasSarma et al. 2010; Liszka et al. 2012).

The special characteristics of halophilic enzymes, which allow them to function properly under high salinities (Reed et al. 2013), are also responsible for their frequently very poor solubility and denaturation at lower salinities, which could limit their applicability (Madern et al. 2000; van den Burg 2003). These same specific properties seem, however, to make them particularly advantageous in aqueous/organic and non-aqueous media (DasSarma and Arora 2001; Karan et al. 2012; van den Burg 2003). Furthermore, the combination of reverse micelles with halophilic enzymes is further extending the range of applications for these enzymes (van den Burg 2003; Marhuenda-Egea and Bonete 2002).

Relevant enzymes from halophilic archaea include glycosyl hydrolases, proteases, and lipases (Table 5.2). Such enzymes have great potential for biocatalysis in high-salt environments (used in, e.g. the food and detergent industries; Delgado-García et al. 2012; Liszka et al. 2012).

Table 5.2 Selected list of biocatalytically relevant enzymes produced by extremely halophilic archaea (adapted from Demirjian et al. 2001; Ventosa et al. 2005)

5.3.5 Food Industry

In general, halotolerant and halophilic microorganims (bacteria and archaea) play an essential role in the production of several traditional fermented foods, giving them their characteristic taste, flavor, and aroma. Their salinities range from low to intermediary as present in Sauerkraut, pickles or olives, to the concentrated brines used for fermentation of several traditional food products found in the Pacific Rim area. Within the halophilic archaea the importance of Halobacterium salinarum and Halococcus strains in the production of nam pla, a Thai fish sauce, is well recognized (Ventosa and Nieto 1995). Also, Natrinema gari and Halococcus thailandensis, which were originally isolated from fish sauce, are implicated as important players in the fermentation process (Tapingkae et al. 2008; Namwong et al. 2007), while a protease secreting Halobacterium strain was reported to enhance the overall sauce fermentation process (Akolkar et al. 2010). More modern applications include the use of halophilic archaea for the production of food additives (e.g. polyunsaturated fatty acids; Ventosa and Nieto 1995) and pigments (see Sect. 5.3.1).

5.3.6 Halocins

Halocins are archaeal bacteriocin-like antimicrobial peptides, produced by many members of the Halobacteriales, which inhibit the growth of closely related microbes (Riley and Wertz 2002). According to Kis-Papo and Oren (2000), they could have a role in interspecies competition, particularly on solid substrates.

To name but a few examples, species within Haloferax, Haloarcula and Halobacterium are reported to secrete specific halocins such as S8, H1, H4, C8, H6/H7, and R1 (Salgaonkar et al. 2012; O’Connor and Shand 2002). Despite the almost universal production of these compounds by haloarchaea (Torreblanca et al. 1994), they have been generally overlooked in the ongoing search for new antibiotics (Litchfield 2011). Possible reasons are that many of these purified halocins are not active against the classic group of tested bacteria, and also that many are only active after proteolytic cleavage (Li et al. 2003; Litchfield 2011).

5.3.7 Metal Bioremediation and Nanoparticles

Natural and anthropogenic activities such as erosion and mining have resulted in deposition of toxic heavy metals and their derivatives in soils, rivers and oceans (Paula et al. 2013). The use of microbial-based bioremediation attracts considerable interest, and research on the use of halophiles for metal bioremediation is flourishing (Bini 2010). Several taxa of halophilic archaea are interesting in that, potentially, their metal(loid)s resistance capabilities can be harnessed. Al-Mailem et al. (2011) reported the capability of Halococcus, Halobacterium and Haloferax to resist and volatilize mercury (Hg). Williams et al. (2013) discussed the tolerance of Natronobacterium gregoryi and Halobacterium saccharovorum to 0.001 and 0.01 mM of cadmium (Cd) and zinc (Zn), respectively. Das et al. (2014) investigated the tolerance and intracellular accumulation of Cd by Haloferax, whereas Salgaonkar et al. (2015) reported the resistance of halophilic archaea to zinc oxide nanoparticles (ZnO NPs) for the first time.

Metal(loid)s resistance in halophilic archaea also make them possible candidates for the environmentally-sound synthesis of metal nanoparticles (NPs) which can be employed in various fields. For example, the selenium nanoparticles (SeNPs) synthesized by Halococcus salifodinae BK18 could be used as a chemotherapeutic agent against cancer as they stopped the proliferation of cancerous HeLa cell lines when studied in vitro (Srivastava et al. 2014). Also, silver nanoparticles (AgNPs) synthesized by Halococcus salifodinae BK3 are reported to have anti-bacterial activity against both Gram-positive (Staphylococcus aureus and Micrococcus luteus) and Gram-negative (Escherichia coli and Pseudomonas aeruginosa) bacteria (Srivastava et al. 2013). Since metal uptake and synthesis of NPs are intracellular, haloarchaea have an added advantage as they can be used for metal(loid)s bioremediation and NPs synthesis.

5.3.8 Other Applications

The use of halophilic archaea for exo-polysaccharide production has also a large potential with current utilisation as stabilisers, thickeners, gelling agents and emulsifiers in the pharmaceutical, paint, paper and textile industries (Litchfield 2011; Ventosa and Nieto 1995). Further examples of the wide range of applications for halophiles include such diverse areas as microbially enhanced oil recovery (MEOR) processes, use of gas vesicles for bioengineering, liposomes with increased resistance for cosmetic industry, and saline soil recovery for agriculture, among several others (Litchfield 2011; Oren 2002; Ventosa and Nieto 1995).

5.4 Screening Methodologies

5.4.1 Archaeal Pigments

When grown on agar medium containing high NaCl concentrations, most extremely halophilic archaea display bright red-orange pigmentation, imparted by carotenoids, and can therefore be easily segregated from their non-archaeal counterparts.

5.4.1.1 Haloarchaeal Pigments Extraction and Characterization

Haloarchaeal pigments can be extracted from cells by using solvents, individually or combined (Salgaonkar et al. 2015). In particular, the ultraviolet (UV)-visible spectra of the haloarchaeal C-50 bacterioruberin pigment show characteristic absorption maxima and peaks, while high-performance liquid chromatography (HPLC) analysis presents multiple elution peaks (Yatsunami et al. 2014; Bodaker et al. 2009).

5.4.2 Polyhydroxyalkanoates

Various methods are employed for the screening of intracellular accumulated PHA. The primary method relies on cell staining or the staining agent being incorporated during growth, with binding to PHA granules which fluorescence when exposed to UV light (Legat et al. 2010; Ostle and Holt 1982; Spiekermann et al. 1999). Quantitative PHA production is estimated by acidic hydrolysis, and characteristic absorption peaks (Slepecky 1961). The presence of intracellular PHA granules can also be detected with transmission electron microscopy, Fourier transform infrared spectroscopy (FTIR), or screening of target strains for the genes encoding PHA synthase (further details in e.g. Han et al. 2010; Salgaonkar and Bragança 2015).

5.4.2.1 Extraction of PHA

PHAs can be recovered by lysing cells, followed by polymer solubilization and purification (Tan et al. 2014). As halophilic archaea thrive under very high salinities, their use is associated with very low risks of contamination. Furthermore, their cells lyse in water or in low osmolarity solutions, greatly facilitating the extraction of intracellular PHA granules, and reducing production costs (Quillaguamán et al. 2010).

5.4.2.2 PHA Characterization

Characterization of PHA is very important for their application, as more than 150 monomeric units are available, which impart different properties to the polymer (Tan et al. 2014). Monomer composition is determined by techniques such as gas chromatography (GC), nuclear magnetic resonance (NMR) and spectroscopy after depolymerization (Tan et al. 2014). Furthermore gel permeation chromatography (GPC) is used to determine the polymer’s average (a) molecular mass (Mw), (b) molecular mass distribution (Mn), and (c) polydispersity index (PDI; Mw/Mn) (Ashby et al. 2002).

PHA thermal properties determine the temperature conditions at which the polymer can be processed and utilized (Tan et al. 2014; Chen 2010). Thermal properties include glass transition temperature, melting temperature, and thermodegradation temperature, which are obtained using differential scanning calorimetry, differential thermal analysis, and thermogravimetric analysis. The absolute crystallinity of produced PHA polymers can be measured by X-ray diffraction (XRD) analysis (see Chanprateep 2010 and Sánchez et al. 2003 for more detailed information).

Note that PHA polymers can either be a soft elastomeric material or a hard rigid material, displaying a wide elongation at break values between 2 % and 1000 % (Chen 2010). PHA mechanical properties that are commonly evaluated include: (a) Young’s modulus which provide a measure of the polymer’s stiffness and ranges from the very ductile mcl-PHA to the stiffer scl-PHA (Rai et al. 2011); (b) elongation at break, which measures the extent that a material will stretch before it breaks and is expressed as a percentage of the material’s original length; and (c) tensile strength, which measures the amount of force required to pull a material until it breaks (Rai et al. 2011). These assays can be performed with tensile tester instrument by standardized test methods such as the ones recommended by the American Society for Testing and Materials (ASTM) standards (Wu and Liao 2014).

5.4.3 Enzymes

Quantitative analysis of hydrolytic enzyme production in halophilic archaea traditionally relies on screening by plate assays wherein the substrate of the enzyme in question is provided as the sole carbon source (Kharroub et al. 2014; Kakhki et al. 2011). Any minimal halophilic medium supplemented with 20–25 % salt and having a proper nitrogen source can be used for enzymatic screening. Examples of preparation and screening methodologies are abundant and include different hydrolytic activities such as e.g. (a) amylase (Amoozegar et al. 2003), (b) cellulose and xylanse (Wejse et al. 2003), (c) pectinase (Soares et al. 1999), (d) extracellular protease (Amoozegar et al. 2008), (e) DNase (Onishi et al. 1983), and (f) chitinase (Park et al. 2000). Examples of purification procedure of enzymes obtained from halophilic archaea can be found in multiple references (e.g. Delgado-García et al. 2012; Moshfegh et al. 2013; Pérez-Pomares et al. 2003; Vidyasagar et al. 2006).

A faster alternative to plate screening is the in silico approach where genomic data is checked for putative enzyme genes. But the fact that the whole genome of the organism has to be known, clearly limits the use of this method.

5.4.4 Halocins

Halocins are commonly found in the cell-free supernatants (CFS) of halophilic archaea. Standard methodologies employ the agar well diffusion assay, in which the indicator organism is surface-spread or seeded into agar and the CFS of the producer strain is placed in wells within the same plate and allowed to diffuse. The minimum inhibitory concentration (MIC) of the halocin is assayed by serial dilution of the CFS and the activity is presented in Arbitrary Units (AU) (Atanasova et al. 2013; Salgaonkar et al. 2012).

5.4.4.1 Characterization and Purification of Halocins

After achieving significant MIC results, additional steps of characterization and purification are employed. Initial characterization plots halocin activity profiles versus growth phase. This provides insights on the phase of growth during which the halocin is produced. To further characterize halocin activity several parameters are tested: pH, temperature, NaCl concentration, and different solvents. It is worth noting that almost all reported halocins are hydrophobic, and reverse-phase HPLC is commonly employed for their complete purification (Meknaci et al. 2014; Price and Shand 2000).

5.4.5 Bioremediation of Metal(loid)s/Metal Nanoparticles

Resistance of haloarchaeal strains to metal(loid)s can be checked by growing strains in media with increasing concentrations of the respective metals. This will also determine the MIC, which is the minimum concentration of metal(loid)s that inhibits archaeal growth. It is worth mentioning that growth of halophilic archaea in the presence of certain metals such as silver/tellurium and selenium changes its pigmentation from red-orange to black and brick-red, respectively.

5.4.5.1 Detection of Metal(loid)s Uptake

Cells grown in the presence of metal(loid)s are hydrolysed using a solution of concentrated nitric acid: sulphuric acid (v/v), followed by complete digestion at 100° C and analysis by absorption spectrophotometry (AAS) (Das et al. 2014).

5.4.5.2 Characterization of the Nanoparticles

The cells grown in the presence of metal(loid)s are harvested, dialyzed, dried and ground using motor and pestle to fine powder (nm range). This powder is analyzed using techniques such as scanning electron microscopy-energy dispersive X-ray spectroscopy (SEM-EDX), XRD and TEM. The UV-visible spectra of silver and selenium nanoparticles show absorption maxima at 440 and 270nm, respectively.

5.5 Current and Future Trends in Mining for Applications

Intensive research efforts currently aim to unleash the full biotechnological potential of halophilic archaea. The recent introduction of genetically optimized efficient expression systems for genes from halophilic sources, has removed a major limitation for large-scale applications. The most promising systems are based on fast growing aerobic extreme halophiles, such as Haloferax volcanii (Allers et al. 2010) or Halobacterium sp. NRC-1 (Karan et al. 2013), which can even be used for high-yielding protein expression in bioreactors (Strillinger et al. submitted). Additionally, different strategies were reported to optimize E. coli for archaeal protein expression (e.g. Connaris et al. 1998; Cao et al. 2008). With the appropriate molecular biotechnology tools in place, developments of more efficient and reliable bioprospecting tools are underway to eliminate remaining bottlenecks. Comprehension of the full capacity of halophilic archaea will arise from understanding their biodiversity and a detailed insight into their molecular functions. Hence, since less than 1 % of the viable organisms within a particular niche are cultivable (Amann et al. 1995), accessing and harvesting genomic material of these microorganisms represents the main challenge. To some extent, introducing specialized laboratory equipment to mimic the extreme conditions of the native habitats will facilitate more efficient laboratory cultivation of halophiles from samples. However, major contributions are expected to come from metagenomic approaches as well as SAG libraries.

5.5.1 Next Generation Sequencing Methods

The advent of cheaper and faster second or next-generation sequencing (NGS) platforms, enabled a shift towards novel culture-independent genome and transcriptome analysis methods. These methods are based on direct DNA and/or RNA isolation from environmental samples and fall into the following classes: (i) metagenomics (DNA based), (ii) metatranscriptomics (RNA based) and (iii) single cell genomics (DNA based).

Metagenomics identification of microbial communities commonly relies on sequencing of the 16S rRNA; however, the same concept can be applied directly for the sequencing of metagenomic DNA samples (Von Mering et al. 2007). Introduction of metagenomics lead to the identification of thousands of novel protein families from diverse environments (Yooseph et al. 2007). Metatranscriptomics based on mRNA (Sorek and Cossart 2010) complements the DNA-based metagenomic approach and provides an understanding of the genomically active genes of microbial population at a given time point from a specific environment. This method requires the isolation of mRNA, which is translated into cDNA before sequencing. The resulting short sequences (reads) of typically a few hundred base pairs for NGS are subsequently assembled and annotated.

5.5.2 DNA Assembly, the First Milestone for Successful Data Mining

Assembling the comparatively short reads is a challenging task, since reads are derived from a myriad of organisms, which form the sampled community. Hence the bioinformatic assembly algorithms applied need to accurately resolve the correct position, and the specific biological entity (e.g. DNA from microbial genomes, viruses or plasmids) for each read (Mick and Sorek 2014). Every single DNA fragment is therefore compared, to all others, to identify overlapping sequences as merging points. Bioinformatics challenges include (a) defining the exact length of naturally occurring and quite common repeats (identical sequence repetitions), (b) differentiating between random overlaps and defined overlaps, (c) defining the correct orientation of the DNA sequence, (d) identifying sequencing errors from the real sequence, (e) correctly identifying the organism (from the pool of diverse genetic material in the environmental sample) from which the sequence originates, and (f) accounting for different sequence depths (amount of sequencing). One should note that before sequencing, the DNA is amplified using random primers, which show statistical variations in binding DNA, resulting in regions that are more or less often amplified per amplification cycle. Assembler programs therefore require about 8 copies of each piece of genome (Baker 2012).

Until recently, the assembly of genomes relied on the genetic material from a single organism or on reference genomes. These approaches led to problems when trying to separate complex metagenomic data into specific biological entities. For samples from archaeal and/or extremophilic communities, which include genomic material that commonly extend far beyond what is covered by reference databases, other assembly strategies are required to interpret metagenomic data without relying on reference sequences. Nielsen et al. (2014) established a new method and demonstrated its power on the analysis of the complex human gut microbiome. The protocol facilitates the extraction of single genomes from complex microbial samples and uses the relative abundance of an organism in the community, which fluctuates over time between different samplings of the same environment. By tracking the changes in abundance of genes between different sampling times, genes showing highly correlated abundance are clustered together. It was shown that such a correlation corresponds with a high probability of belonging to the same genome (Mick and Sorek 2014). Strain-level resolution in metagenomics can be used to identify variations in highly flexible genomic parts, which are coexisting with the relatively stable core components (Kashtan et al. 2014) and thus provides insight into genes essential for adaptation to dramatic changes in environment. Those genes may illuminate the microbial mechanisms involved in environmental adaption. Limitations of this approach include the need for access to a fairly large number of independent samplings of one niche, or related niches, which is required for statistical analysis. However, due to the amount of sequences per sample, the sequence depth can be reduced (Mick and Sorek 2014).

The advent of single-cell genomics (Lasken 2007) allowed identification of different species in an environment, while eliminating the challenge of assigning DNA reads (fragments) to different genomes. Single cell genomics is based on multiplication of the DNA of a single cell through multiple displacement amplifications. As a result, a few femtograms of DNA are enough to provide the microgram amounts of DNA necessary for library construction and sequencing (Lasken 2007). Equal and complete amplification of the minimal amount of source DNA must be achieved to obtain unbiased results, which represents the major challenge of this method.

5.5.3 Current Representation of Archaeal Genomes in Largest Databases

Compared to other domains of life, genomic analysis of archaea is still in its infancy, but interest is growing. Correspondingly, current genomic information of archaeal origin represents only 0.4 to 3 % of the data available from major databases listing genomic and oligonucleotide sequences (Table 5.3). The availability of reference sequences is crucial for genome annotation (see below) and therefore continuous publication of fully assembled and annotated archaeal genomes is required to facilitate genomic assembly, improve reliability and accelerate bioinformatic processing of archaeal data.

Table 5.3 Representation of archaeal genomes in selected large-scale online databases

5.5.4 Genome Annotation, the Second Milestone in Successful Information Mining

Genome annotation connects DNA sequences to biological information. The value of a genome is determined by its annotation (Stein 2001). Inaccurate annotations lead to incorrect in silico identification of enzymes of interest and are particularly problematic when systems biology approaches are used to understand the functions of a cell at the molecular level based on a model of pathways or specific enzymes. Starting from an assembled standard genome, the annotation can be divided into three steps: First, parts of the genome that do not code for proteins are excluded (non-coding RNA); second, the prediction of protein-coding genes (open reading frames) in the genome is undertaken; and third, a biological function is assigned to the proteins. Depending on the goal of the genome annotation a further focus might be the identification of regulatory elements or non-coding RNA (e.g. tRNA, and rRNA).

The standard gene annotation approach relies on gene homology to genes already annotated and available from the common genomic databases. Unfortunately, annotation reliability is indirectly proportional to the variance of the two compared respective proteins’ primary structures. Since novel genomes from uncommon habitats are expected to show a lower homology to any gene described so far, the reliability of genome annotation is, in general, decreased. The situation is complicated by error propagation. Also, experimental validation of the encoded protein’s function exists only for a small and continuously diminishing fraction of gene sequences available from databases. Originally, the functions of novel genes were annotated based on gene sequences with experimentally verified function. Based on these novel determined genes, further genes were annotated and so on. While in this chain, two proteins in a row are always highly similar, a low similarity of the last annotated gene and the experimental verified source may result, depending on how many non experimentally verified genes are in-between. From an experimental and protein engineering point of view, faulty annotations are a fundamental problem.

Analysis of state-of-the-art annotation pipelines reveals a surprisingly high level of uncertainty in gene annotation. Annotations of the same E. coli strain by the leading annotation pipelines yielded about 5.5 % false positives and a significantly higher rate of false positives may be expected for novel genomes (Alam et al. 2013, Grötzinger et al. 2014). Hence, several bioinformatics groups work on strategies to increase annotation reliability, typically by including additional data. For example Alam et al. (2013), combined several strategies, including comparison of predicted 16S rRNA genes with the NCBI prokaryotic 16S rRNA gene database to retrieve taxonomic information and rank the obtained BLAST hits (Altschul et al. 1990). BLAST against several databases resulted in coverage of most known genes. Additionally, the analysis of gene distribution in different pathways helped to evaluate expected and annotated gene presence. Software such as the InterProScan database (Jones et al. 2014; Mitchell et al. 2015) introduced predictions of protein functions based on the number of domains or active sites. Other approaches are focusing on highly reliable annotation of a selected set of single proteins instead of a whole genome annotation, e.g. when mining genomic data for enzymes of interest to biotechnology (Grötzinger et al. 2014). The analysis of annotation metadata is particularly useful for this approach. These metadata contain information on the presence of conserved domains such as active centers or binding pockets, and can be identified during the annotation process. Presence of domains that are relevant for protein activity should increase annotation reliability. Despite the progress made in annotation of proteins with described function, the correct assignment of function and pathway location of proteins that are not described remains a major hurdle.

5.5.5 Potential and Challenges of Upcoming Generations of DNA Sequencing

The advent of the third-generation DNA sequencing (single molecule sequencing) not only brings a further reduction in sequencing costs, but also increases read lengths to several thousand base pairs. This not only reduces the complexity of the genome assembly process, or the assignment of specific genomes from a metagenomic DNA pool, but also increases the overall quality of genomes and therefore may even eliminate the concept of draft genomes completely (Land et al. 2015). At the moment about 10 % of all draft genomes are of too poor quality to be used (Land et al. 2014). Third generation sequencing can theoretically produce a finished genome in a few hours and simultaneously identify specific methylation sites (Land et al. 2015).

Although DNA assembly might be simplified in the future, the challenge of proper genomic annotation remains and new challenges will arise from the management of the constantly increasing stream of data. Experimentalists are in need for tools to help them make sense of their massive amount of data, while currently bioinformatics research is struggling to analyze, compare, interpret and visualize data at the pace at which sequencing throughput increase (Land et al. 2015). Bioinformatics progress heavily relies on the use of supercomputers because the amount and the complexity of genomic data are growing significantly faster than the increase in computing and storage capabilities of current systems. The development of new algorithms will require dividing the entire data processing into more manageable tasks, so that it can be addressed on smaller computer clusters, by cloud computing, or by outsourcing and accession via the web. This will assist the end-user as it does not require direct access to a supercomputer.

The need to minimize the amount of metadata included in every sequenced data (Kottmann et al. 2008) illustrates the problems that arise from handling the increasing data volume and complexity. Such metadata include, e.g. geographic location and habitat from which the sample was taken, and details of the sequencing method used which is necessary for efficient assembly, and assigning specific features such as tolerance to specific extreme environments. However, as described above, insufficient reliability of annotations for genomic material from uncommon environments mandates that annotation pipelines include a significantly enriched body of metadata. Future annotation of halophilic archaea could particularly benefit from metadata-based precise domain architecture prediction (e.g. if functionally associated domains are in close proximity such as active catalytically center and cofactor binding pocket). In detail pathway analysis can be used to evaluate how many of the other enzymes, required to provide the cofactor, or substrate, or use of a product, are represented in the organism. It may therefore provide a reliable measure for the probability of correct annotation.

Static tables or images such as charts or plots cannot illustrate accurately the highly complex information available within genomic datasets. Therefore, new approaches to analyse and visualize data are also necessary, apart from novel algorithms,. Linking integrated databases/warehouses (e.g. INDIGO (Alam et al. 2013), to visualization tools such as Krona (Ondov et al. 2014), can be used to illustrate clusters or correlations of genomic information. The integrated databases provide annotations, and direct access to metadata quickly, which can be visualized on multi-level pie charts using standard web browsers. The unprecedented rate of development of genomic sequencing methods effectively shifted the major costs of biomining from sequencing to the genome assembly and functional annotation, and data analysis and management procedures (Land et al. 2015).

The combination of novel culture independent sequencing techniques with new bioinformatics annotation and data analysis tools now permits the analysis of natural microbial communities in situ. Future results will therefore provide insights into microbial distribution patterns, and their individual (SAG & assigned genomes), or uniform (meta-genomics/transcriptomics) molecular functions. Hence, a powerful set of techniques is at hand to mine archaeal sources, which will harvest an increasing amount of the biotechnological potential of halophilic archaea. The appropriate utilization of these tools in combination with laboratory-based analysis, will not only increase our understanding of symbiotic and other interactions in microbial communities, but will also provide access to whole sets of enzymes from the same environments. This information can be used to establish multi-enzyme reactions in industry and consequently provide more sustainable solutions for the pharmaceutical and biotechnological industry.

5.6 Research Initiatives of Interest for Bioprospecting Archaea

It is a difficult task to list research initiatives on bioprospecting Archaea. First, there is no agreed definition of the term “bioprospecting”, and although there is a general understanding that it involves research for commercial purposes (outlined in Fig. 5.3), it is usually difficult to distinguish, in practice, between basic and applied research (Arico and Salpin 2005). Additionally, large-scale research initiatives usually have a wide-scope and unsurprisingly no such program has specifically targeted Archaea. However, given their importance in extreme environments, and their newly found relevance in marine ecosystems, one can rightfully assume that research initiatives focusing on such locations include Archaea as major targets.

Fig. 5.3
figure 3

General outline of the different possible steps involved in bioprospecting activities

Research on extremophiles and their applications has boomed recently as evidenced by an increasing number of publications in high-impact journals and patents. The importance of this field is further attested by concerted funding initiatives in the USA (NSF and NASA’s programs Life in Extreme Environments, Exobiology and Astrobiology), the EU (Biotechnology of Extremophiles, Extremophiles as Cell Factories, ILEE- Investigating Life in Extreme Environments, and CAREX- Coordination Action for Research Activities on Life in Extreme Environments), and Japan (JAMSTEC Frontier Research System for Extremophiles program) (Jamieson 2015; Rothschild and Mancinelli 2001).

During this period, environmental and marine research initiatives and programs have seen an impressive increase in scope, reach, complexity, and dimension. Many of these projects have a global scale and include a very wide variety of measured parameters. A non-extensive list of more visible initiatives would include the Census of Marine Life ( http://www.coml.org ), Global Ocean Sampling (GOS; http://www.jcvi.org/cms/research/projects/gos/overview , MaCuMBA ( http://www.macumbaproject.eu ), Malaspina ( http://www.expedicionmalaspina.es ), MAMBA ( http://mamba.bangor.ac.uk ), TARA Oceans ( http://www.embl.de/tara-oceans ), and Micro B3 ( http://www.microb3.eu ).

Several other initiatives target general genomic and metagenomic data generation, frequently involved in filling current gaps in our understanding of specific environments or phylogenetic groups. Noteworthy examples include the Earth Microbiome Project (EMP; www.earthmicrobiome.org ), the Genomic Encyclopedia of Bacteria and Archaea (GEBA; www.jgi.doe.gov/programs/GEBA ), and the Marine Microbial Genome Sequencing Project ( http://camera.calit2.net/microgenome ).

Microbial Biological Resource Centers (mBRCs) and culture collections also play an important role, fueling the bio-economy as sources of microbiological resources, data, and expertise. It is worth noting the current programs that are moving towards regional integration of mBRCs, and the promotion of a more active interaction with industry (e.g. the EU-funded Microbial Research Infrastructure; www.mirri.org ). Closer interactions between industrial and research institutions are further highlighted by the recent wave of clusters formed within the Bioindustrie 2021 initiative, funded by the Bundes Ministerium für Bildung und Forschung in Germany ( http://www.bioindustrie2021.eu ), and looking into fostering new innovations in bioproducts (e.g. biofuels, biopolymers, and biocatalysts).

5.7 Overview and Conclusions

Archaea were originally perceived as evolutionary oddities with restricted importance. However, a significant shift in our understanding of their diversity, ecology, and impact is currently under way. Increased exploration efforts in multiple environments, and the continued development, and application of new methodologies for cultivation, molecular-based studies, and in silico approaches will further promote this shift, and are expected to lead the way towards a wave of new discoveries. Furthermore, correct annotation of genomes still remains one of the major challenges in genomic data mining. Different strategies are evolving and improved algorithms together with experimental data established in the laboratory are poised to handle these challenges.

Halophilic archaea are a prime example of the increasing reach and range of applications and are perceived as rising stars for industrial biotechnology (e.g. biocatalysis, bioengineering, biofuel, pharmaceuticals). Further bioprospecting initiatives will foster new innovations in bioproducts, and help to fuel the bio-economy.