1.1 Introduction

Archaea are prokaryotes and as such share many properties with bacteria including circular genomes, densely packed with genes organised into operons. However, their transcription machinery is closely related to that of RNA polymerase II, the enzyme responsible for mRNA transcription in eukaryotes (Fig. 1.1). This similarity extends from the RNA polymerase (RNAP) subunit composition, via general transcription factors required for initiation, to their cognate promoter elements (Fig. 1.1a, b) (Werner and Grohmann 2011). In essence, archaeal transcription involves a eukaryotic-like machinery acting upon a bacterial-like template, making it an interesting and important subject to study. Archaeal transcription can be considered a simpler, stripped-down version of the RNAPII system, generally consisting of fewer and smaller components that facilitate the basic mechanisms of transcription. These are often obscured by the baroque complexity in eukaryotes—making archaea invaluable tools to dissect them. In vitro studies of archaeal transcription have focused on hyperthermophilic archaea due to their high biochemical tractability including the in vitro assembly of RNAPs from Methanocaldococcus jannaschii and Pyrococcus furiosus from individual recombinant subunits under defined conditions in the test tube (Naji et al. 2007; Smollett et al. 2015; Werner and Weinzierl 2002). This approach has not been successful with any eukaryotic RNAP thus far, and archaea have therefore provided invaluable model systems to elucidate the molecular mechanisms of RNAPII transcription (Fouqueau et al. 2013; Grohmann et al. 2011; Hirtreiter et al. 2010a, b; Kostrewa et al. 2009; Tan et al. 2008; Werner and Weinzierl 2005). Whilst such recombinant systems are required to carry out a definitive functional dissection of transcription, less attention has been paid to the systems level properties of the basic transcription machinery in archaea, including whole genome occupancy, transcription start site- and transcriptome mapping. As high-throughput sequencing technologies have become more accessible new avenues of research have become possible. In this chapter, we outline how systems biology can complement classical biochemistry/structural biology, and how this enhances our understanding of the different stages of the transcription cycle and the structure and function of chromatin (Fig. 1.2).

Fig. 1.1
figure 1

Evolution of the basal transcription machinery in the three domains of life. (a) Table of RNAP subunits and general transcription factors in the three domains of life. The columns represent the single RNAP transcription systems in bacteria, eury- and crenarchaea, and the three orthodox RNAPI, II and III systems in eukaryotes. The rows depict homologous factors, while functionally analogous but evolutionarily unrelated factors are shown as separated fields with dashed borders. Factors are colour-coded according to their function in transcription initiation (green), elongation (blue) and transcript cleavage (red). Subunits and factors that are not conserved in all domains are indicated with asterisks. (b) Schematic representation of key events in the evolution of the basal transcription machinery. General transcription factors are colored as in panel a, with chromatin proteins added in purple

Fig. 1.2
figure 2

The Archaeal transcription cycle. Transcription initiation is a recruitment cascade, the BRE and TATA promoter motifs sequester TBP and TFB, which in turn recruits RNAP to form the PIC. TFE stimulates DNA strand separation of the promoter IMR region, which stabilises the PIC. The later stages of initiation involved the synthesis of abortive transcripts, and promoter escape, which is likely facilitated by the swapping of TFE for Spt4/5, forming a processive transcription elongation complex. During elongation additional factors including transcript cleavage factors and Spt4/5 ensure highly processive transcription. At the 3′ end of the gene transcription is terminated by short poly-U signatures, and likely by hitherto uncharacterised termination factors

1.2 The Basal Transcription Machinery and the Archaeal Transcription Cycle

1.2.1 Promoter Recognition and Recruitment of the RNAP

In all domains of life transcription is initiated by the recruitment of basal, or general, transcription initiation factors to the promoter. Most archaeal promoters rely on three elements: the TATA box, B-recognition element (BRE) and the Initiator (Inr). TATA box and BRE are DNA sequence recognition motifs of the two general transcription factors TBP and TFB, respectively (Bell et al. 1999; Qureshi et al. 1995; Rowlands et al. 1994), both TBP and TFB are necessary and sufficient to facilitate promoter-directed transcription in vitro (Werner and Weinzierl 2002). TBP and TFB are homologous to eukaryotic TBP and TFIIB (Fig. 1.1), respectively, and have identical functions, albeit with a faster DNA-binding dynamics (Gietl et al. 2014) that may reflect different mechanisms of regulation (Blombach and Grohmann 2017). Global mapping of transcription start sites (TSSs) and subsequent promoter sequence analysis confirm in vitro observations in as much as TATA and BRE motifs are dominant elements in most archaeal promoters, with a few notable exceptions including the M. jannaschii ribosomal RNA promoter (Figs. 1.3 and 1.4) (Babski et al. 2016; Cho et al. 2017; Jäger et al. 2009, 2014; Li et al. 2015; Smollett et al. 2017; Wurtzel et al. 2010). This is in contrast to eukaryotes where strong TATA motifs (i.e., close to consensus sequence) are absent from the majority of promoters (Yang et al. 2007). Recently we have used Chromatin Immunoprecipitation followed by high-throughput sequencing (ChIP-seq) to characterise how promoter elements direct the recruitment of TBP, TFB and RNAP in vivo in the euryarchaeon M. jannaschii (Smollett et al. 2017). While BRE and TATA elements are the main contributors to promoter strength in vitro. There is only a weak correlation between BRE/TATA consensus score and TBP/TFB ChIP signals, and RNA steady-state levels in vivo. However, TBP/TFB binding does correlate with RNAP occupancy, which in turn correlates moderately well with RNA levels (Smollett et al. 2017). This shows that TBP and TFB direct pre-initiation complex (PIC) formation, and RNAP recruitment and loading into the transcription unit (TU) (Fig. 1.4). Yeast promoters show likewise little correlation between TATA box motif and TBP binding, with RNA levels being proportional to TBP occupancy (Kim and Iyer 2004). There could be several reasons for the discord between promoter motif strength and the binding of initiation factors in archaea and eukaryotes. In particular, the availability of the DNA template to the TBP, TFB and RNAP can be regulated by alternative chromatin structures, and gene-specific regulators may either enhance or inhibit PIC assembly (see Sect. 1.3). Several archaea encode multiple variants of TBP and TFB, in particular halophilic species such as Halobacterium NRC-1 contain 6 TBP and 7 TFB variants (Baliga et al. 2000); it has been proposed that the combination of TBP and TFB variants can direct a degree of promoter-specific regulation of transcription akin to bacterial sigma factors (Facciotti et al. 2007). A combination of different TBP/TFB deletion strains and ChIP analyses has revealed that only some TBP and TFB variants are essential and that different combinations of TBP/TFB bind to distinct promoters in vivo. Many promoters were associated with multiple TFB variants demonstrating a significant degree of redundancy (Facciotti et al. 2007), while subtle sequence biases in the BREs account for preferential binding of the different TFB variants (Seitzer et al. 2012).

Fig. 1.3
figure 3

Comparison of promoter consensus motifs in different archaea. Alignment of the DNA sequences upstream of TSS identified on a genome-wide scale identifies individual promoter elements including BRE, TATA box, IMR and Inr elements surrounding the TSS. Alignment of primary TSSs identified by whole genome sequencing of M. jannaschii (Smollett et al. 2017), Methanosarcina mazei (Jäger et al. 2009), Methanolobus psychrophilus (Li et al. 2015), Thermococcus kodakarensis (Jäger et al. 2014), T. onnurineus (Cho et al. 2017), Haloferax volcanii (Babski et al. 2016) and Solfolobus solfataricus (Wurtzel et al. 2010). Alignment visualised using WebLogo 3 adjusting to the background GC content for each organism (31.3% M. jannaschii, 41.5% M. mazei, 44.6% M. psychrophilus, 52% T. kodakarensis, 51.3% T. onnurineus, 65.5% H. volcanii, 35.8% S. solfataricus, http://weblogo.threeplusone.com/). Inset shows TATA box motif determined from same DNA sequences using MEME (http://meme-suite.org/tools/meme-chip). Adapted from Smollett et al. (2017)

Fig. 1.4
figure 4

Archaeal promoter elements govern transcription initiation. The interactions between promoter motifs (BRE/TATA), and sequence specific DNA-binding initiation factors (TBP/TFB) recruit RNAP to the promoter. During open complex formation the DNA strands of the promoter are separated within the IMR, an AT-rich region spanning from −12 to +2 relative to the TSS. TFE aids this process and stabilises the open PIC. The Inr surrounding the TSS plays an important role for the precise selection of the transcription start site

1.2.2 Stabilisation of the PIC by Open Complex Formation

The transcription initiation factor TFE (homologous to eukaryotic TFIIE) enhances the stability of the PIC by aiding DNA strand separation and loading of the template strand into the active site of RNAP (Blombach et al. 2015, 2016; Grohmann et al. 2011). This process is referred to as ‘open complex’ formation. The regulation of open complex formation is a crucial step in defining transcription output across all domains of life (reviewed in Blombach et al. 2016). The region of DNA to be separated, the initially melted region (IMR), extends from position −12 to +2 relative to the TSS (Bell et al. 1998; Blombach et al. 2015; Nagy et al. 2015). Global sequence analysis reveals that the IMR does not contain a specific sequence motif, but throughout the archaea have a significantly higher A and T content compared to the genome average, particularly at the upstream edge (Fig. 1.3) (Smollett et al. 2017). As A-T basepairs require less energy for DNA strand separation compared to G-C basepairs the AT-bias may have been selected to facilitate open complex formation (Fig. 1.4), while there is no correlation between AT content and promoter strength. This is similar to the bacterial −10 element, which is also AT-rich and forms the upstream edge of the transcription bubble (Sasse-Dwight and Gralla 1989; Zuo and Steitz 2015). Short AT-rich DNA motifs (IMR, −10 element) and factors (TFE, sigma, CarD, TFIIH) that contribute to open complex formation and stability have coevolved in all domains of life (Fig. 1.1). The eukaryotic counterpart of TFE, TFIIE, is a dimeric factor consisting of subunits TFIIEα and TFIIEβ. Many archaea employ monomeric TFE variants (homologous to TFIIEα), whereas crenarchaeal TFE variants are α/β heterodimers (Blombach et al. 2009, 2015, 2016).

1.2.3 Selection of the Transcription Start Site

Genome-wide studies show that mammalian genes can be transcribed from multiple promoters using multiple TSSs (Sandelin et al. 2007). In bacteria, the discriminator promoter element is important for genome-wide start site selection (Winkelman et al. 2016), and the bacterial core recognition element has been shown to influence TSS selection by interactions between a G nucleotide at register +2 in the non-template strand and the core RNAP (Vvedenskaya et al. 2016). The archaeal Inr is comprised of a dinucleotide motif ‘−1T+1[A/G]’ which directs precise start site selection in vivo (Figs. 1.3, 1.4 and 1.5) (Smollett et al. 2017). The Inr is a common feature in archaeal promoters, although the prevalence can vary between closely related species, e.g., it is present in the promoters of Thermococcus onnurineus, but not in T. kodakarensis (Fig. 1.3) (Cho et al. 2017; Jäger et al. 2014; Smollett et al. 2017). This suggests that TSS precision is not selected for some organisms, or may be compensated for by other factors. The archaeal Inr—essentially a preference for purine at the +1 and pyrimidine at the −1 position—is conserved in bacterial and eukaryotic promoters (Kadonaga 2012; Shultzaberger et al. 2007). Structural analyses suggests that base stacking interactions between the −1 nucleotide of the template strand and the initiating NTP plays a role in template DNA strand stabilisation within the PIC (Basu et al. 2014).

Fig. 1.5
figure 5

The Inr motif influences TSS selection. Many archaeal promoters include an Inr motif (−1T/+1[A/G]) and utilise single or multiple TSSs (+1). (a) A strong Inr motif will direct one specific TSS resulting in transcripts with identical 5′-termini. (b) Promoters with a weaker Inr motif will direct transcription from several TSS leading to RNA species with heterogenous 5′-termini (Smollett et al. 2017)

1.2.4 Promoter Escape Facilitated by Factor Swapping

All RNAP face a similar mechanical engineering challenge; while a network of high affinity interactions between promoter-bound initiation factors and RNAP is essential to facilitate efficient recruitment and PIC formation, the escape of RNAP from the promoter (i.e., productive transcription) requires that this network is dismantled (Werner 2012). Spt4/5 is homologous to DSIF in humans and NusG in bacteria, it is the only RNAP-associated transcription factor that is universally conserved in all domains of life (Fig. 1.1) (Werner 2012). Spt4/5 is not essential for transcription in vitro, but ChIP-seq profiles demonstrate that it associates with elongating RNAPs throughout the genome, on coding as well as noncoding TUs. As Spt4/5 and the initiation factor TFE bind to the RNAP clamp in a mutually exclusive manner in vitro, we have proposed that this exchange, or swap, between TFE and Spt4/5 occurs every time the RNAP progresses through the transcription cycle, and that the swap could enhance promoter escape (Grohmann et al. 2011; Werner 2012). Spt4/5 is recruited proximal to the promoter in vivo, in agreement with facilitating the transition from initiation to elongation (Figs. 1.6 and 1.7a) (Smollett et al. 2017). This is different from bacterial NusG, which is recruited to TEC in a stochastic fashion, and it is similar to the early recruitment of Spt4/5 in yeast (Mayer et al. 2010). In addition, a similar exchange between TFIIE and Spt4/5 has been shown at RNAPII promoters (Diamant et al. 2016; Larochelle et al. 2012).

Fig. 1.6
figure 6

An integrated view of transcription in archaea. ChIP occupancy profiles of the basal transcription machinery reflect the binding of PICs to promoters, and the distribution of RNAP-Spt4/5 TECs within the coding region as shown for the M. jannashii hsp60 TU. Both plus- and minus-strand RNA steady-state levels serve as proxy for transcription output of RNAP. Interestingly RNAPs do not strictly require Spt4/5 for transcription elongation in vitro, yet Spt4/5 closely follows RNAP in a genome-wide fashion, behaving as an ‘honorary’ RNAP subunit (Blombach et al. 2016; Smollett et al. 2017)

Fig. 1.7
figure 7

Two modes of Spt4/5 recruitment to RNAP. Global occupancy profiling of M. jannaschii RNAP and Spt4/5 reveals two patterns of recruitment. (a) Spt4/5 is recruited to RNAP proximal to the promoter at the majority of transcription units. This recruitment profile supports the theory that swapping between initiation factor TFE and elongation factor Spt4/5 aids promoter escape. (b) At a small subset of genes including the rRNA and CRISPR loci Spt4/5 is recruited hundreds of base pairs downstream of the transcription start site, during the early elongation phase of the transcription cycle (Smollett et al. 2017)

1.2.5 An Alternative Mode of Spt4/5 Recruitment

Genome-wide occupancy analysis allows us to not only define the ‘norm’ but also identify notable exceptions to the promoter-proximal Spt4/5 recruitment model (Fig. 1.7) (Mooney et al. 2009; Smollett et al. 2017). These exceptions include the ribosomal RNA operons and the abundant CRISPR loci where Spt4/5 is recruited during transcription elongation hundreds of base pairs downstream of the TSS. The underlying mechanisms behind this ‘delayed’ recruitment is currently not known, but likely includes novel gene-specific transcription factors, strong RNA secondary-structure or co-transcriptional processing—all of which are relevant for rRNA and CRISPR transcripts. The Sulfolobus solfataricus and Pyrococcus furiosus rRNA promoters have well defined BRE/TATA motifs and are very strong in vitro (Blombach et al. 2015; Micorescu et al. 2008; Qureshi et al. 1997), however, the M. jannaschii rRNA promoter shows surprisingly poor promoter motifs, and performs weakly in vitro, in apparent contrast with the high RNA levels and RNAP occupancy on the rRNA operons in vivo (Smollett et al. 2017). The lack of strong promoter motifs is akin to bacterial rRNA promoters, which tend to form unstable PICs, making them more amenable to regulation (Jensen and Pedersen 1990). It is possible that unknown transcription factors mask the Spt4/5 binding site on RNAP (clamp coiled coil) and activate M. jannaschii rRNA promoters. Alternatively, efficient promoter escape may occur at the weak rRNA promoter without Spt4/5. This is not the case for the CRISPR promoters, which have multiple promoters with strong matches to the consensus sequence (Smollett et al. 2017).

1.2.6 Termination of Transcription

Transcription termination remains one of the least understood mechanisms of gene expression in archaea. Specific DNA sequences and auxiliary factors can slow down the TEC and trigger dissociation of the TEC into RNAP, transcript and template, but the precise mechanisms and order of events is unclear. While the fundamental process appears conserved in all multisubunit RNAPs, the requirements for DNA sequence motifs and exogenous termination factors differs substantially (Epshtein et al. 2007, 2010; Porrua et al. 2016; Proudfoot 2016). Bacterial intrinsic terminators consist of a short RNA hairpin structure and a poly-U stretch; these terminators induce pausing and enable RNAP to undergo conformational changes such as an opening of the RNAP clamp (Hein et al. 2014), a process likely facilitated by the RNA hairpin that invades the DNA binding channel of RNAP (Epshtein et al. 2007). These allosteric changes lead to the dissociation of the TEC with the last residue of the poly-U stretch forming the RNA 3′ terminus (Ray-Soni et al. 2016). The limited number of archaeal terminators that have been studied in vitro share the requirement for a poly-U stretch (5–8 U-residues), but are not dependent on any RNA secondary structure elements, reminiscent of the eukaryotic RNAPIII system (Hirtreiter et al. 2010a; Santangelo et al. 2009; Santangelo and Reeve 2006; Spitalny and Thomm 2008) (Fig. 1.8a). This suggests that the termination mechanism is conserved across all domains of life, but that the archaeal and RNAPIII TECs dissociate more readily than the bacterial TEC—without the intervention of exogenous factors or RNA hairpins. It is noteworthy that one of the key differences between bacterial and archaeal RNAPs is the Rpo4/7 stalk domain, which enhances transcription termination and has been likened to an ‘inbuilt’ NusA elongation factor (Belogurov and Artsimovitch 2015; Hirtreiter et al. 2010a).

Fig. 1.8
figure 8

Transcription termination in archaea. (a) Short poly-U stretches implicated in triggering transcription termination in vitro and in vivo, and global RNA 3′ mapping demonstrates that transcript 3′ termini consist of U-residues for 30–40% of TU genome-wide. (b) Leaky termination can lead to alternative and extended 3′-UTRs, which can result in antisense transcript overlap between two genes organised in a convergent orientation, or read through into downstream TUs. (c) Archaeal genomes encode putative termination factors including 5′→3′ RNases and RNA helicases. In eukaryotes and bacteria factors with these activities facilitate transcription termination by ‘torpedo’ mechanisms

The genome-wide RNA 3′ termini of a euryarchaeon (Methanosarcina mazei) and a crenarchaeon (Sulfolobus acidocaldarius) have been mapped at base pair resolution using a systems biology approach (Term-seq) (Dar et al. 2016a, b). In agreement with the mechanisms characterised in vitro, the Term-seq dataset revealed that termination occurred in vivo immediately downstream of a poly-U motif without the need for RNA secondary structure elements (Dar et al. 2016a). In approximately half of convergent (i.e., head-to-head oriented) genes in S. acidocalaricus the terminator signal of a given TU was located in the coding region of the other TU, resulting in a potential antisense transcript overlap. This could be due to the high coding density of archaeal genomes and the resulting short intergenic regions, or have regulatory significance. Many TUs were associated with multiple RNA 3′ termini likely due to inefficient termination. Such ‘leaky’ termination could direct the synthesis of RNA isoforms that differ in the 3′-untranslated region (3′-UTR) targeted by small regulatory RNAs (Fig. 1.8b) (Dar et al. 2016a). However, Term-seq results have two principal caveats. Firstly, Term-seq cannot discriminate between ‘native’ RNA 3′ ends generated by transcription termination and ‘processed’ RNA 3′ ends resulting from nucleolytic digestion, either RNA-processing or -degradation. Secondly, termination motifs and RNA 3′ ends could only be identified in 30–39% of TUs, which suggests that alternative- or additional termination mechanisms are at work including template topology (positive supercoiling in hyperthermophiles) and hitherto unidentified termination factors. Strong terminator (poly-U) signals are present in intragenic regions but only 25% of these led to transcription termination. This could be due to transcription antitermination, a well-described phenomenon in bacteria that relies on factors that are conserved between bacteria and archaea including NusG (Spt4/5), NusA, NusE, and co-translating ribosomes (Santangelo and Artsimovitch 2011; Santangelo et al. 2008). Little is known about archaeal termination factors, but we can speculate about their properties. Both bacterial and eukaryotic termination factors use a ‘torpedo’ mechanism, i.e. they engage with the nascent transcript, translocate along the RNA in the 5′→3′ direction, and ultimately dissociate the TEC upon impact. 5′→3′ RNases (Xrn2 in mammals, Rat1 in yeast) and RNA-helicases (Rho factor in bacteria, Sen1 in eukaryotes) facilitate transcription termination in this fashion (Fig. 1.8c) (Han et al. 2016; Kim et al. 2004; El Hage et al. 2008; West et al. 2004; Epshtein et al. 2010; Porrua and Libri 2013). Archaeal genomes encode several candidates for torpedo-factors but none have been experimentally tested yet (Phung et al. 2013).

1.3 Additional Factors Affecting Transcriptional Output

1.3.1 Gene-Specific Transcription Regulators

The lack of a strong correlation between BRE/TATA promoter motifs and RNA levels genome wide (Kim and Iyer 2004; Smollett et al. 2017) suggests that additional forces are at work, including gene-specific regulators. The molecular mechanisms of several archaeal metabolic and stress response regulators have been elucidated in vitro, and their regulons characterised by ChIP-chip and ChIP-seq methods (Liu et al. 2016; Nguyen-Duc et al. 2013; Reichelt et al. 2016; Rudrappa et al. 2015; Tonner et al. 2015; Wilbanks et al. 2012). Archaeal regulators operate by a range of different mechanisms including repression by promoter occlusion and activation by enhancing the recruitment of the PIC; the mode of action of the same factor can depend on the location of the binding site relative to the promoter (Aravind and Koonin 1999; Charoensawan et al. 2010; Dahlke and Thomm 2002; Geiduschek and Ouhammouch 2005; Kanai et al. 2007; Lee et al. 2008; Lipscomb et al. 2009; Ochs et al. 2012; Peeters et al. 2013, 2015; Perez-Rueda and Janga 2010). Transcription regulators are described in greater detail in another chapter of this tome, we will only briefly mention example below.

The M. jannaschii Lrp-type regulator Ptr2 is an excellent example of how in vitro and in vivo approaches can complement each other. Ptr2 activates transcription from the rb2 promoter by recruiting TBP to the TATA box, a mechanism that was elucidated by elegant in vitro transcription experiments in the Geiduschek laboratory (Ouhammouch and Geiduschek 2001; Ouhammouch et al. 2003, 2005). Whole genome occupancy studies of M. jannaschii TBP validated this mechanism in vivo. By analysis of promoter sequences genome-wide, each TATA motif could be assigned a score that quantified its similarity to the global TATA consensus, i.e. the ideal TBP binding site. Subsequently, TBP binding to specific promoters could be predicted using a linear regression model, and compared to the actual occupancy of TBP experimentally determined by ChIP-seq. In case of the rb2 promoter the actual TBP occupancy far exceeded the predicted one (0.1 vs. 1 Log2[IP/input]), which is congruent with the notion that TBP-recruitment in vivo is strongly enhanced by Ptr2 (Smollett et al. 2017).

1.3.2 The Impact of Chromatin Structure on Transcription

All cellular genomes are organised and compacted by DNA-binding proteins that protect the DNA while still allowing the access of molecular machines that facilitate DNA replication, repair and recombination and last but not least transcription (Ammar et al. 2012; Cubonovaa et al. 2012; Peeters et al. 2015; Visone et al. 2014; Xie and Reeve 2004). In eukaryotes, histone-based chromatin has evolved into a major regulatory mechanism, with hundreds of post-translational modifications and remodelling complexes facilitating the precise execution of the genetic programme. Many archaea encode histone homologues, but it remains to be proven to which extent histone-based chromatin regulates gene expression in archaea. In addition to regulatory functions, histones are likely to protect the genomes of hyperthermophiles from thermal denaturation (Visone et al. 2014). Archaea vary in their repertoire of histones and other chromatin proteins (Peeters et al. 2015). Small archaeal chromatin proteins with the ability to bind and condense DNA were first described in Thermoplasma acidophilum (DeLange et al. 1981a, b; Searcy 1975; Searcy and Delange 1980), but archaeal histones were first characterised in the hyperthermophile Methanothermus fervidus (Sandman et al. 1990). In vitro experiments using a limited number of factors (TBP, TFB and RNAP) have shown that histones inhibit transcription under these conditions, but it remains unknown how additional general factors such as TFE, Spt4/5 and TFS assist RNAP transcribing through chromatin (Wilkinson et al. 2010; Xie and Reeve 2004).

While histones are not essential for cell viability in some archaea, deletion of histones changes the transcriptome by both up- and downregulating genes (Cubonovaa et al. 2012; Heinicke et al. 2004; Nalabothula et al. 2013). In eukaryotes this regulation chiefly occurs via post translational modifications of the histone tails (Bannister and Kouzarides 2011). Archaeal histones generally encompass only the histone fold and lack the tails of their eukaryotic counterparts. In M. jannaschii, no histone modifications could be identified in a top-down mass spectrometry approach (Forbes et al. 2004). However, most archaea with histones encode multiple paralogues, enabling different combinations of histone homo- and heterodimers to form alternative chromatin structures, either at specific regulatory sequences, different genomic loci or TU, or under different growth conditions. For example, in M. fervidus the expression levels of histone HMfA are higher than HMfB during exponential growth but decrease in stationary phase, a change which may result in more compact chromatin (Sandman et al. 1994).

High-throughput sequencing approaches including nucleosome sequencing have mapped the genome-wide histone occupancy, and identified the optimal archaeal histone DNA binding site, which is near-identical to eukaryotes and reflects a basepair sequence that enables DNA curvature/bending (Ammar et al. 2012; Maruyama et al. 2013; Nalabothula et al. 2013). Generally, archaeal histones dimerise in solution and interact with 30 bp of DNA. Limited MNase digestion of chromatin isolated from Haloferax volcanii resulted in nucleosome ladder with 60 bp steps corresponding to histone tetramers (Ammar et al. 2012), while Thermococcus kodakarensis and Methanothermobacter thermautophicus resulted in a pattern with 30 bp steps, indicative of histone dimers (Maruyama et al. 2013; Nalabothula et al. 2013). Both observations are congruent with a chromatin model where histones polymerise upon DNA binding (Fig. 1.9a). As is seen in eukaryotes, promoter regions, and specific genomic loci including the highly transcribed rRNA operons are apparently devoid of histone binding (nucleosome-free regions or NFR). Moreover, MNase digestion of in vitro reconstituted chromatin reproduces this pattern (Maruyama et al. 2013; Nalabothula et al. 2013). This not only suggests that the DNA sequence alone is sufficient to organise chromatin structure, but also implies that on-going transcription has little influence on the deposition of histones across the genome—and altogether emphasises a possible role of histones in transcription regulation (potential mechanisms are shown in Fig. 1.9b, c).

Fig. 1.9
figure 9

Interference of chromatin and transcription in archaea. (a) In euryarchaea histones compact and organise the genome into dynamic chromatin fibres that grow and shrink by association or dissociation of histone dimers at each end. These chromatin structures can interfere with transcription in a number of ways: (b) by denying access of initiation factors to promoter or (c) by providing a barrier to TEC

1.4 The Output of the Transcription System

1.4.1 The Archaeal Transcriptomes

The introduction of high-throughput sequencing approaches to determine global RNA levels provide a significant improvement compared to hybridisation-based approaches such as microarrays in terms of the dynamic range and the detection of low abundance transcripts (Zhao et al. 2014). RNA-seq data for several euryarchaeal species (Babski et al. 2016; Cho et al. 2017; Jäger et al. 2009, 2014; Li et al. 2015; Smollett et al. 2017) and the crenarcheaon S. solfataricus (Wurtzel et al. 2010) have provided new insights into archaeal transcriptomes, while other archaeal phyla remain unexplored. Because the RNA-seq approach is independent of prior knowledge about the coding regions and predicted TUs, these data sets can provide us with a wealth of novel non-coding transcripts including small regulator RNAs (discussed in Chap. 10), anti-sense RNA, and newly discovered mRNAs (Fig. 1.10) (Babski et al. 2016; Cho et al. 2017; Jäger et al. 2009, 2014; Li et al. 2015; Smollett et al. 2017; Straub et al. 2009; Tang et al. 2005; Toffano-Nioche et al. 2013; Wurtzel et al. 2010; Dar et al. 2016a). Archaeal ncRNA species with uncharacterized functions include processed fragments of mRNA UTRs. Methanogens (M. jannaschii, M. mazei and Methanolobus psychrophilus) and Thermococcales (T. kodakarensis, T. onnurineus and P. furiosus) all contain long 5′UTRs, including ribosome binding sites and potential sites of regulation by riboregulators and riboswitches (Cho et al. 2017; Jäger et al. 2009, 2014; Li et al. 2015; Smollett et al. 2017; Toffano-Nioche et al. 2013). In contrast, Sulfolobus and halophilic archaea are characterised by leaderless mRNAs where translation is initiated directly from the mRNA 5′-end (Babski et al. 2016; Brenneis et al. 2007; Koide et al. 2009; Torarinsson et al. 2005; Wurtzel et al. 2010). Term-seq has revealed the abundance of 3′-UTRs in archaea, which similar to the 5′-ends are longer in methanogens than in Sulfolobus (Dar et al. 2016a). Genes encoding ribosomal proteins tend to have long 5′UTRs in all archaea, even in species predominantly using leaderless mRNAs such as Sulfolobus (Li et al. 2015; Toffano-Nioche et al. 2013; Wurtzel et al. 2010), which suggests a common regulatory mechanism for these genes.

Fig. 1.10
figure 10

Features of the archaeal transcriptome. Global TSS mapping and RNA-seq highlight the diversity of archaeal transcripts as shown in M. jannaschii. (a) Transcription initiation using alternative promoters leads to the synthesis of distinct mRNA species with different 5′-UTRs, which provide opportunities for riboregulation by e.g., riboswitches. (b) and (c) These methods also lead to the discovery of novel transcripts including antisense RNAs (b) and small ORFs (c) missed in genome sequence-based annotations (Smollett et al. 2017)

1.4.2 Evidence for Pervasive Transcription in Archaea

Pervasive transcription describes the phenomenon of non-coding, often anti-sense transcripts that are not restricted by gene boundaries; it has been implicated in transcription regulation, transcription-coupled repair and genome evolution. RNA-seq demonstrates that pervasive transcription occurs in all domains of life (Clark et al. 2011; Smollett et al. 2017; Wade and Grainger 2014). In E. coli the comparison of transcriptome data obtained under different growth conditions, and library preparation techniques has yielded a more genuine and comprehensive map of TSSs. Furthermore the detection of novel transcripts in E. coli was aided by the deletion of nucleases including RNase E and RNase III that are involved in RNA turnover (Thomason et al. 2015; Wade 2015). The same approaches will likely enable a more accurate estimation as to the amount of pervasive transcription in archaea.

1.4.3 Deconvoluting RNA Synthesis and RNA Steady-State Levels

There are several limitations one needs to be aware of when analysing archaeal transcriptomes, in particular when attempting to correlate genome occupancy profiles of basal transcription factors and RNAP with RNA levels. Due to its high abundance rRNA is often depleted using standard procedures of RNA isolation and subsequent transcriptomics analyses. In addition RNA isolation methods and library preparation techniques tend to include size selection steps that introduce bias against small RNAs. Most importantly, RNA-seq data represent steady-state RNA levels that reflect RNA synthesis and degradation, and not nascent RNA synthesis. Attempts to determine mRNA half-lives haven been made for S. solfataricus and S. acidocaldarius. These studies revealed important differences in RNA stability depending on functional category of genes and RNA expression levels (Andersson et al. 2006). Several techniques have been developed to map the nascent transcriptome to obtain a more accurate global picture of ongoing RNA synthesis. In a NET-seq (Native elongating transcript sequencing) approach TEC are purified from biomass, the RNA associated with RNAPs is isolated and sequenced, which provides a snapshot of active transcription at a single-nucleotide resolution (Churchman and Weissman 2012). In transient transcriptome sequencing (TT-seq) approaches nascent RNA is metabolically labeled with uridine base analogues that allow the specific purification of the nascent RNA prior to sequencing (Schwalb et al. 2016). A caveat from an archaeal perspective is that the narrow phylogenetic distribution of the required uridine kinase activity would require the introduction of this enzyme by genetic manipulation to adapt such methods for archaea. A slightly different approach has been recently adapted for in vivo labeling of RNA in archaea for the first time using 4-Thiouracil rather than uridine analogues involving a different biochemical pathway from 4-Thiouracil to UMP via uracil phosphoribosyltransferase (Knüppel et al. 2017). Finally, approaches such as Gro-seq (Global run-on sequencing) that isolate TECs and carry out the metabolic labeling of nascent RNA by transcription elongation in vitro can be adapted to archaeal transcriptomics in a reasonably straightforward fashion (Core et al. 2008).

1.5 Future Directions

High-throughput sequencing approaches have greatly improved our understanding of the mechanisms of transcription in archaea. We can now begin to unravel connections between perturbations at the molecular level and changes of the entire system, between in vitro and in vivo data, aiming to understand transcription in a multiscalar fashion. The current experimental portfolio at our fingertips needs to be expanded by mapping the transcriptome-wide occupancy of RNA-binding transcription factors by techniques such as iCLIP (Konig et al. 2011), mapping of TECs by NET-seq and Gro-seq (see above), and genome-wide mapping of evolutionary pervasive DNA- and RNA-modifications (Huber et al. 2015). Once these aspects of archaeal transcription are described on a systems level, it will be possible to characterise the transcription apparatus in flux—as it changes in response to external stimuli and environmental insults. Future research would benefit from being expanded to include little characterised phyla including Nano- Thaum- and Lokiarchaeota to provide further insights into the evolution of transcription regulation in archaea. Last but not least, the ability to examine features of macromolecular metabolism genome-wide will allow us to correlate transcription with DNA replication, -recombination and -repair, and protein translation.