Introduction

A major mechanism by which eukaryotic cells regulate gene transcription is through the packaging of their genomes into chromatin—a complex of DNA and histone/non-histone proteins that serves as the “gate keeper” of genetic information. The fundamental unit of chromatin is the nucleosome, which contains ~147 base pairs of DNA wrapped around eight histone proteins: two each of H2A, H2B, H3, and H4 [14]. In addition to the actions of histone variants and specialized protein machines that assemble, disassemble, and move nucleosomes along DNA, histone post-translational modifications (PTMs) play critical roles in the accessing and regulation of our genome [57]. In particular, histone lysine methylation is known to direct the organization of basic chromatin states such as euchromatin and heterochromatin, as well as the precise organization of chromatin across genes before, during, and after the process of transcription [8]. In this review, we highlight recent findings pertaining to the function of the histone H3 at lysine 36 (H3K36) methyltransferase Set2/SETD2, and outline future questions that need to be addressed to fully understand the multi-faceted roles of this enzyme in cell biology.

Histone lysine methylation in transcription

Lysine residues can be mono-, di-, or tri-methylated retaining the positive charge of the residue regardless of the number of methyl groups added. Lysine methylation is associated with both active and repressed states of transcription, primarily depending on which lysine residue is methylated and in which methylation state (mono-, di-, or tri-) that residue exists [8]. In general, lysine methylation functions to recruit effector proteins bearing specialized domains that uniquely recognize these modified sites. Nevertheless, the presence of some methylated lysine residues in the nucleosome core close to where DNA wraps or near the dyad axis close to where DNA enters and exists the nucleosome (e.g., H3K56 methylation), suggests that some methylation events function, in part, to fine-tune or tweak nucleosome-DNA interactions [9].

In the budding yeast Saccharomyces cerevisiae, there are three well studied methylated lysine residues: histone H3 lysine 4 methylation (H3K4me), H3K36me, and histone H3 lysine 79 methylation (H3K79me). All three residues and their methylation events are conserved from yeast to humans, making budding yeast an attractive model system to study the fundamental biology of these marks. Although all three marks function in transcriptional regulation in yeast and mammalian cells [10], their precise molecular functions in transcription and other biological processes remain poorly understood. Studies have generally found that H3K4me2/3 is located primarily at promoter regions (note that enhancers in metazoans are marked with H3K4me1) whereas H3K36me2/3 is localized towards the 3′ regions of gene bodies [11, 12]. Interestingly, H3K79me2/3 is generally uniform across actively transcribed regions of the genome in yeast but is largely found in promoter regions in metazoans.

Set2 and H3K36 methylation

In S. cerevisiae, Set2 is the sole histone methyltransferase that acts on H3K36 [13]. Conversely, in higher eukaryotes, there are multiple H3K36 methyltransferases that are responsible for this mark. For example, in Drosophila melanogaster, dMes-4 is responsible for H3K36me1 and H3K36me2, whereas dSet2 is primarily responsible for H3K36me3 [14]. In humans, NSD1, NSD2, NSD3, SETMAR, SMYD2 and ASH1L are responsible for mediating H3K36me1 and H3K36me2 [15, 16], and SETD2, the human Set2 homolog, mediates H3K36me3 [17]. Because Set2 is responsible for all H3K36me in yeast, and strains in which SET2 has been deleted are viable, yeast has become a premiere model system for studying the function of H3K36me.

The domain structure of Set2

The SET domain

Set2 has several conserved domains (Fig. 1), the first being the SET domain itself, which consists of the AWS (associated with SET), SET, and PS (post-SET) motifs. The SET domain is the catalytic domain of Set2 that performs the H3K36me transfer. Interestingly, although full-length Set2 prefers to methylate nucleosomal substrates, the isolated SET domain can also methylate free histones [18]. Thus, the C-terminus of Set2 is critical for directing substrate specificity (discussed below). In budding yeast, the SET domain is located at the extreme N-terminus of the protein, which differs from its location in higher eukaryotic Set2 homologs. SETD2, the human homolog of Set2, has an extended and low complexity N-terminus with no conserved domains and no known function [19]. This extended N-terminus is common in non-yeast eukaryotes and likely plays an important, albeit poorly understood regulatory role in methylation of histone and non-histone targets.

Fig. 1
figure 1

Domain architecture of Set2/SETD2 homologs. Set2/SETD2-related enzymes share a highly conserved domain organization that encompasses a well-conserved catalytic region containing an Associated with SET (AWS), SET and post-SET (PS) domain, a WW and coiled-coli (CC) domain, followed by the Set2–Rpb1 interaction (SRI) domain. In S. cerevisiae, studies into the function of Set2 have revealed additional domains including an auto-inhibitory (AID) domain that regulates the activity of the SET domain and a H4/H2A interaction domain. Whether these additional domains are conserved outside of budding yeast is not yet known

The H4/H2A interaction domain

A histone H4/H2A interaction domain precedes the AWS–SET–PS domain in yeast Set2 [20, 21]. This domain, consisting of a short stretch of acidic residues, binds a basic region or patch located on the nucleosomal surface that encompasses residues located in H4 and H2A. Notably, disrupting the acidic residues in Set2, or eliminating the basic residues in H2A or H4 that create the Set2 binding surface, significantly reduces H3K36 methylation in vivo, thus defining the importance of this interaction for Set2 methylation in chromatin. Although further work is needed to clarify precisely how this nucleosomal surface interaction regulates H3K36 methylation, it is likely that this basic region in H4/H2A stabilizes Set2 on the nucleosome or positions the enzymatic domain over the H3K36 residue. Given the high conservation of Set2, it is likely that Set2 homologs in more complex organisms also retain the H4/H2A-interaction domain, but this conjecture remains to be tested.

The WW and coiled–coiled domains

The SET domain is followed by two protein–protein interaction domains, a WW and a coiled–coiled (CC) domain. The WW domain, named for two conserved tryptophan residues, binds to proline rich proteins [22]. In the nucleus, proteins with this domain regulate transcription by binding to the C-terminal domain (CTD) of RNA polymerase II (RNAPII) [23]. Although the WW domain is conserved in Set2 homologs, its function, if any, and its binding partners are currently unknown. Notably, a precise deletion of the WW domain in yeast does not affect the association of Set2 with RNAPII or alter H3K36 methylation [24], thus we speculate that the WW domain participates in the association of Set2 with proteins that contribute to methylation of non-histone substrates (see below). In humans, the WW domain of SETD2 mediates an interaction of SETD2 with the polyglutamine expansion region of the huntingtin (Htt) protein [25, 26]; in the expanded state, Htt leads to Huntington’s disease. Interestingly, SETD2 was first identified through yeast two-hybrid screens (first named two-hybrid protein B or HYPB) [26]. However, aside from Htt interaction, we do not know the function of the WW domain in SETD2.

The CC domain in Set2, which follows the WW domain, is also a conserved protein–protein interaction motif, that, in certain proteins, promotes homo-dimerization [27]. This domain is also highly conserved within the family of Set2 enzymes, thus future work is needed to understand how it precisely functions. We speculate, given its close proximity, that the CC domain contributes to a recently described auto-inhibitory domain [18] that controls Set2 methylation output (see below).

The auto-inhibitory domain (AID)

As mentioned above, recent studies show that Set2 contains an auto-inhibitory domain (AID) in the region between the SET domain and the WW domain [18]. Deletion of the AID results in a hyperactive Set2, increasing the amount of H3K36me on nucleosomes and at genes. Further, the AID influences Set2 interactions RNAPII, as AID mutant versions of Set2 are able to bind to the unphosphorylated CTD of RNAPII, whereas wild-type Set2 can bind to only the hyperphosphorylated form of the CTD. Further work is needed to ascertain whether the AID is conserved in higher eukaryotes and how it regulates Set2 methylation and CTD interaction.

The Set2–Rpb1 interacting (SRI) domain

Finally, Set2 contains a Set2–Rpb1 interacting (SRI) domain. This domain, which is highly conserved from yeast to humans (Fig. 1) [28], binds to serine 2 (Ser2) and serine 5 (Ser5) doubly phosphorylated CTD repeats of RNAPII [2933]. Specifically, the SRI domain recognizes these phosphorylated residues across two heptapeptide repeats of the CTD [31, 32]. Intriguingly, the Caenorhabditis elegans Set2 homolog, MET-1, does not appear to contain a SRI domain based on sequence analysis in Pfam (http://pfam.xfam.org). Whether a non-canonical SRI domain exists in this protein, and if it functions in conjunction with elongating RNAPII, awaits future testing.

Importantly, deletion of the SRI domain does not abolish chromatin localization of Set2, suggesting that additional mechanisms are primarily responsible for recruiting Set2 to chromatin [34]. Furthermore, ectopically expressed Set2 truncations lacking the SRI domain still catalyze H3K36me1 and limited amounts of H3K36me2, but they are unable to catalyze H3K36me3 [29]. One explanation for this result may be the ability of the SRI domain to regulate the AID of Set2 that itself antagonizes the SET domain [18]. In Schizosaccharomyces pombe, however, the SRI domain is completely dispensable for H3K36me2, but is required for H3K36me3 and plays a role in post-transcriptional gene silencing at subtelomeric regions of the genome [35]. However, mutants of Spt6 and Ctk1 that prevent CTD phosphorylation at Ser2 render chromatin incapable of being trimethylated by full-length recombinant Set2 [34]. Thus, it appears that Set2–CTD interaction is essential in some way for Set2 to catalyze nucleosomal H3K36me3, but, at the same time, the SRI domain also regulates the auto-inhibitory functions of Set2 [18]. Future experiments are needed to better define how the SRI–CTD interaction, and functions of the SRI domain that lie outside of CTD interaction, regulate the methylation states of Set2.

Regulation of Set2 and H3K36 methylation

Ser2 and Ser5 CTD phosphorylation of RNAPII

CTD phosphorylation plays a critical role in the regulation of H3K36 methylation. Early studies in yeast found that deletion of the major kinase complex that mediates Ser2 CTD phosphorylation, CTDK-1 (CDK12 in metazoans), results in an ablation of H3K36 methylation [24]. Consistent with this finding, deletions of the CTD or mutations of Ser2 residues (but not Ser5) in the CTD also caused a loss of H3K36 methylation [24, 33]. Intriguingly, examination of the interaction of the SRI domain with the CTD revealed that, although Ser2 phosphorylation is critical for Set2 binding, Set2 actually prefers to bind to doubly modified (Ser2 and Ser5 phosphorylated) CTD repeats—a phosphorylation pattern that is, in fact, mediated by the Ctk1 kinase of the CTDK-1 complex [29, 36]. However, in yeast, doubly modified Ser2 and Ser5 phosphorylated CTD repeat patterns appear to exist at very low levels [37], suggesting that Ser2 CTD phosphorylation may be the primary mark available for Set2 interaction.

In addition to Ctk1, the Bur1 kinase complex (Cdk9 of p-TEFb complex in metazoans) mediates Ser2 CTD phosphorylation in vitro and in vivo [38, 39]. Consistent with a role of Ser2 CTD phosphorylation in regulating H3K36me, bur1 temperature sensitive alleles, or a deletion of BUR2 (the Bur1 cyclin), reduce H3K36me3 levels in yeast [40, 41]. Unlike the loss of Ctk1, loss of Bur1 activity does not significantly alter Ser2 and Ser5 phosphorylation levels in vivo [39, 42], suggesting that the impact of Bur1 on H3K36 methylation is largely independent of the CTD. The ability of Bur1 to affect H3K36 methylation is likely an indirect effect of Bur1’s ability to regulate Spt4/5 (DSIF) and recruit the Paf1 transcription elongation complex (Paf1C) to genes, which then regulates Ser2 CTD phosphorylation [38, 39, 4345].

An unexpected finding regarding the regulation of Set2 methylation by the CTD was the realization that Set2–RNAPII interaction is critical for Set2 stability. Loss of Ctk1, or mutation of elongation machinery that impacts Ser2 CTD phosphorylation (e.g., loss of PAF or Bur1 activity), results in a rapid turnover of Set2 [46]. These results indicate that a key function of CTD binding is to protect Set2 from degradation. Interestingly, removal of the SRI domain stabilizes Set2 in addition to uncoupling Set2 from the CTD. Thus, the SRI domain, which protects Set2 from degradation when bound to the CTD, must also contain a degradation signal that becomes exposed when Set2 is not CTD-bound [46]. Although the significance of Set2 turnover is not understood, this regulation is clearly important because a truncated form of Set2 is lethal when expressed in cells deficient in H2A.Z deposition [46]. The rapid turnover of Set2 appears to be conserved in humans, as a recent study identified SPOP, a key subunit of the CUL3 ligase complex, as a binding partner for SETD2 that mediates its turnover by the proteasome [47].

The Spt6 histone chaperone

In addition to regulation by the CTD, a number of transcription elongation factors and histone chaperones also regulate Set2-mediated H3K36me. One such factor is Spt6, a highly conserved H3/H4 histone chaperone that also binds to the phosphorylated CTD of RNAPII to facilitate transcription elongation and nucleosome re-assembly in the wake of RNAPII elongation [4853]. Mutations in multiple regions of Spt6 result in a reduction in H3K36me2 and/or H3K36me3 levels [34, 45, 54, 55]. Similar to Ctk1, mutations in Spt6 also reduce Set2 protein stability [34, 45]. Interestingly, in budding yeast, Ctk1 and Spt6 mutually regulate each other’s protein stability [45], forming a feed-forward loop that maintains CTD Ser2 phosphorylation, and consequently, Set2 levels. The ability of Spt6 to maintain Ser2 CTD phosphorylation levels is likely behind the ability of Spt6 to control H3K36me. In human cells, however, the ability of Spt6 to regulate Set2 may be more direct. Iws1 (Spn1 in yeast), which is a binding partner of Spt6 that recruits this chaperone to genes [53, 56], associates with human SEDT2 and recruits it to genes [53].

The Paf1 complex

The Paf1 complex, which associates with RNAPII and facilitates transcription elongation, coordinates multiple histone PTMs during transcription. Specifically, the Paf1 complex is required for H3K36me3 [40, 41] in addition to H2BK123 mono-ubiquitylation (H2BK123ub1) [57, 58]—a chromatin mark that is necessary for H3K4 and H3K79 methylation. In the case of H2BK123ub1, the histone modification domain of Rtf1 (a member of the Paf1 complex that links Paf1 to RNAPII; [59]) directly associates with Rad6/Bre1, thus providing a molecular basis for how Paf1 regulates H2Bub1 [60]. However, in the case of Set2 methylation, the role of Paf1 is likely indirect. Evidence shows that a loss of Paf1 complex members leads to loss of Ser2 CTD phosphorylation [45, 61]. In addition, Spt6 is likely stabilized via direct interaction with Paf1 [45], which co-stabilizes Ctk1. Because a loss of Paf1 complex function leads to a reduction in Spt6 and Ctk1 levels, the levels of Set2, which is dependent on Ser2 CTD phosphorylation, are also affected [45, 46]. Together, these data suggest a model wherein the Paf1 complex helps to stabilize and/or recruit Spt6 and Ctk1 to chromatin. Once recruited to chromatin, Ctk1 can then phosphorylate the RNAPII CTD for binding by Set2 (Fig. 2).

Fig. 2
figure 2

The regulation of Set2 and H3K36 methylation. Many factors come together to regulate Set2 and the production of H3K36 methylation in budding yeast. Kin28 of TFIIH initiates transcriptional initiation through the phosphorylation of the RNA polymerase II (RNAPII) C-terminal domain (CTD) at serine 5 (Ser5). This phosphorylation is important for the recruitment of Spt4/5, the Bur1 kinase, and the Paf1 complex (Paf1C) that promote transcription elongation and downstream histone modifications such as H2BK123 mono-ubiquitylation and H3K4 and K79 methylation. In addition to Kin28, Bur1 also functions to recruit the Paf1 complex, most likely through the combined actions of Bur1 activity on the CTD and the C-terminal repeat region (CTR) in Spt5. Significantly, Paf1C associates with Spt6 and stabilizes Spt6 protein levels, which is required for stabilization and function of the major Ser2 CTD kinase, Ctk1—thus creating a feed forward circuit as Ser2 CTD phosphorylation is also required for binding and stability of Spt6. These actions drive robust CTD Ser2 phosphorylation that is required for SRI–CTD interaction, which stabilizes Set2 for nucleosomal H3K36 methylation (see text for details)

Histone H3K36 demethylation

While there is a great deal of regulation surrounding the ability of Set2 to actively methylate histones, H3K36me is also dynamic and regulated by the actions of histone demetylases. In budding yeast, there are two predominant H3K36me demethylase enzymes, Jhd1, which acts on H3K36me1/2, and Rph1, which acts on H3K36me2/3 [6265]. While these proteins both show demethylase activity in vivo, the role of active demethylation is still poorly understood. In humans, KDM2A has been shown to remove H3K36me2/3. Intriguingly, this protein contains a ZF-CXXC motif that serves to target it to nonmethylated CpG dinucleotides in linker regions and CpG islands, thereby removing H3K36me2/3 in promoter regions [66, 67]. In addition, KDM2A also has been shown to interact with HP1 through its chromoshadow domain [68], thus allowing KDM2A to be targeted to H3K9me3-marked nucleosomes and mediate chromatin silencing. Thus, it will be important in the future to define how these two activities are coordinated and work together in chromatin. Another human protein involved in H3K36me2/3 removal is KDM4A [69]. This enzyme, which also removes H3K9me3, is highly amplified in human tumors [70] and is localized in the nucleolus where it regulates rDNA transcription [71]. Future studies will be needed to determine which functions of KDM4A are specifically mediated through its action on H3K36.

HOTAIR and miR-106b-5p

SETD2 is also regulated at the transcriptional and post-transcriptional levels. The long noncoding RNA HOTAIR can bind to the promoter region of SETD2 and inhibit the recruitment of pro-transcription factors such as CREB, P300, and RNAPII [72]. At the post-transcriptional level, another non-coding RNA, miR-106b-5p, binds SETD2 mRNA and inhibits its translation in clear cell renal cell carcinoma (ccRCC) [73]. As discussed below, SETD2 is frequently mutated and/or inactivated in ccRCCs and other cancer types. Thus, miR-106b-5p and HOTAIR, which themselves play roles in cancer development [74, 75], may do so also, in part, through SETD2 inactivation and loss of H3K36me3. Further work is needed to identify other RNA factors that regulate SETD2 levels, and whether these RNA-regulating pathways function in normal aspects of cellular biology to regulate SETD2 and H3K36me3.

Effector proteins that interact with unmodified or methylated H3K36

Histone chaperones and Asf1

During transcription, nucleosomes are disassembled but then replaced to maintain proper chromatin structure. A number of chaperones contribute to nucleosome disassembly and reassembly during transcription, including Spt6, Spt16 of the FACT complex, and Asf1 [48, 7678]. Asf1, in particular, is a histone chaperone for H3 and H4 that deposits newly synthesized histones modified with H3K56 acetylation (H3K56ac) during transcription and replication [76, 79, 80]. Accordingly, H3K56ac is a marker for histone deposition during replication and histone exchange during transcription elongation. Interestingly, cells deleted for ASF1 also show a depletion of H3K36me in some strains [81], indicating a connection between histone eviction/deposition and Set2. Further, cells deleted for SET2 (set2∆) show increased H3K56ac at the 3′-ends of genes in G1-arrested cells [82], indicating increased levels of histone exchange. Conversely, cells lacking Asf1 exhibited slightly lower levels of histone acetylation over gene bodies, indicating that Asf1 and Set2 have opposing functions, and a combined loss of Asf1 and Set2 results in a significant decrease in histone acetylation [82]. Critically, Asf1, and to a lesser extent Spt6 and Spt16, show decreased affinity for peptides that are marked with H3K36me [82], suggesting a model wherein Set2-mediated H3K36me may actively inhibit histone exchange after transcription by preventing Asf1 (and other histone chaperones) from binding to and removing H3K36me histones (Fig. 3a). Thus, H3K36 methylation may serve to physically aid in preventing histone exchange. Further work is needed to understand precisely how H3K36me-marked nucleosomes are refractory to histone chaperones and whether active H3K36 demethylation is needed to allow for nucleosome disruption.

Fig. 3
figure 3

H3K36me-interacting proteins in budding yeast and humans. a In S. cerevisiae, Set2 carries out mono-, di-, and tri-methylation of H3K36. Several functions have been ascribed to H3K36me in yeast, which include prevention of inappropriate or “cryptic” transcription, DNA damage repair and mRNA splicing. How H3K36me contributes to these processes is not fully understood, but a variety of effector proteins and complexes that bind to, or are repelled from H3K36me have been identified. These include the Rpd3S complex that binds H3K36me2/3 via the chromodomain of Eaf3 to deacetylate histones, the Isw1b chromatin-remodeling complex that binds H3K36me3 via the PWWP domain of Ioc4 to promote proper nucleosome spacing across the gene body, the NuA3b complex that binds H3K36me3 via the PWWP domain of Pdp3 to promote acetylation and transcription of NuA3-specific target genes, and Asf1 (and other chaperones not shown for simplicity), which is a histone chaperone repelled by H3K36me3, thus preventing histone exchange and ensuring that nucleosomes are stable once incorporated into chromatin. Studies suggest that Rpd3S, Isw1b, and histone chaperones such as Asf1 function together to regulate nucleosome deposition, nucleosome positioning, and prevent histone exchange in the wake of RNA polymerase II transcription. In the absence of Set2, nucleosomes are not positioned correctly, histone exchange and instability is increased, and normally obscured promoters become active. b In human cells, SETD2 is the sole enzyme for H3K36me3. In addition to the recruitment of a homologous Rpd3S complex containing MRG15, H3K36me3 recruits a variety of additional effector proteins and complexes that mediates transcription elongation, splicing, DNA methylation and histone deacetylation. These effectors include LEDGF, ZMYND11, DNMT3b, NPAC/LSD2, MSH6, and PHF1/19 (see Figure and main text for details on their functions). Recently, SETD2 was shown to methylation α-tubulin, the first non-histone substrate identified for any H3K36 methylase. Trimethylation of α-tubulin was shown to be important for proper mitosis and genome integrity, providing a potential explanation for how SETD2 functions as a tumor repressor

Ioc4 and the Isw1b chromatin-remodeling complex

Yeast Ioc4 is a member of the Isw1b ATP-dependent chromatin-remodeling complex [83]. Isw1 is the catalytic subunit of the complex, and it forms two distinct complexes in vivo, one with Ioc4 (Isw1b) and one without (Isw1a). Ioc4 has a PWWP domain, named for conserved proline and tryptophan residues that, in humans, preferentially binds to H3K36me3 [84]. The PWWP domain of Ioc4 also binds H3K36me3 [85, 86]. This interaction facilitates the recruitment of Isw1b to the 3′ regions of gene bodies, where Isw1 can then slide nucleosomes along the DNA [86]. In the absence of Set2 or H3K36me, Isw1b can no longer associate properly with chromatin [85, 86], creating regions of improperly placed nucleosomes that are poor substrates for the Rpd3 histone deacetylase complex (discussed below) [87]. Although the mechanism is not fully understood, Isw1b also functions, in concert with Set2, to prevent histone exchange and limit H4 acetylation [86], again stressing the importance of Set2 and H3K36me in limiting both of these actions once RNAPII has successfully transcribed a gene.

Pdp3 and the NuA3b complex

In addition to IOC4, PDP3 is a second yeast gene containing a PWWP domain. Pdp3 is a member of the NuA3b histone acetyltransferase (HAT) complex [88]. Like Ioc4, Pdp3 binds preferentially to H3K36me3 and is dependent upon this mark to associate with chromatin. In the absence of Pdp3, genes that are targeted by NuA3 [89] experience marked reductions in the levels of their transcription, indicating that NuA3b plays an important role in transcription. It is not known how this complex is regulated by H3K36me3 binding or whether it acetylates histones (perhaps H3K14 as does NuA3a) or a non-histone substrate. In addition, how NuA3b functions with Rpd3S or Iws1b is also unknown, and resolution of the opposing role H3K36me plays in both histone acetylation and deacetylation will be intriguing.

Eaf3 and the Rpd3S complex

Rpd3S, a histone deacetylase complex (HDAC), contains two proteins with reader domains that interact with chromatin: Eaf3 and Rco1. Eaf3 contains a chromo domain that recognizes H3K36me2/3 [9094], and Rco1 contains a dual plant homeodomain (PHD) finger motif that is necessary for chromatin engagement [95] and a homo-dimerization domain [96]. Both PHD fingers are essential for chromatin binding and Rpd3S function, and both domains prefer to bind the unmodified N-terminus of H3 [97]. Interestingly, Rco1 exists as a dimer within the Rpd3S complex, adding two additional chromatin contacts to the complex [96]. Like the Isw1b complex, Eaf3’s recognition of H3K36me helps to localize the Rpd3S complex to the 3′-ends of gene bodies [90, 94, 95] where it can deacetylate histone tails. In the absence of either Set2, Rco1, or Eaf3, H4 acetylation increases at the 3′-ends of gene bodies [90, 91, 94, 95] leading to inappropriate transcription initiation at cryptic promoters within the gene bodies.

Although Eaf3 and Rco1 play an important role in localizing Rpd3S across the genome, Rpd3S can be recruited to gene bodies independently of H3K36me binding. Like Set2, Rpd3S can be recruited to chromatin via the phosphorylated CTD of RNAPII, specifically the Ser2/Ser5 dually phosphorylated form [98, 99]. However, without Eaf3 binding to H3K36me2/3, or without the PHD fingers of Rco1, the Rpd3S complex is catalytically inactive [95, 97]. Recent work has suggested that H3K36me stimulates a conformational change in Rpd3S that increases its affinity for chromatin, perhaps stimulating its enzymatic activity [100].

Interestingly, Rpd3S prefers an H3K36me di-nucleosome substrate [101]. Rpd3S deacetylase activity is further promoted when these nucleosomes are properly spaced from one another [87]. The correct spacing is likely ensured by the nucleosome sliding activity of Isw1b, which could explain the increased histone acetylation levels in isw1∆ and ioc4∆ cells [86]. Without proper spacing, and even with proper H3K36me, the affinity of Rpd3S for chromatin is decreased to a degree where the naturally low binding affinity of Eaf3 for H3K36me2/3 is not able to sufficiently engage nucleosomes and stimulate Rpd3’s deacetylase activity. Further, Rpd3S also depends on the PHD fingers in Rco1 to maintain its localization on chromatin [95, 97]. If any of these conditions are not met, the net result is increased histone acetylation in gene bodies. Collectively, these findings indicate that at least one function of H3K36me is to maintain nucleosome stability by repelling histone chaperones like Asf1 and ensure low levels of histone acetylation by recruiting the Isw1b chromatin-remodeling complex and the Rpd3S HDAC. These activities prevent RNAPII from engaging cryptic promoters within gene bodies (Fig. 3a).

Human H3K36me3-associated factors

In humans, H3K36 is methylated by a variety of enzymes [102104] including SETD2, which similarly associates, like yeast Set2, with elongating RNAPII to tri-methylate H3K36 [17, 31, 105, 106]. Similar to yeast, H3K36me3 recruits an Rpd3S-related HDAC complex that associates via the Eaf3 homolog MRG15 [107, 108]; MRG15 also maintains reduced histone acetylation levels and functions in alternative mRNA splicing [107, 109]. In addition, SETD2/H3K36me3 recruits a number of other chromatin-associated proteins that share a PWWP domain and mediate a wide variety of cellular processes including transcription elongation, heterochromatin formation, mRNA splicing, and DNA repair. These other proteins include ZMYND11 that regulates pre-mRNA splicing and transcription elongation [110, 111], the DNA methyltransferase DNMT3b that regulates gene body DNA methylation [112, 113], PHF1/PHF19 of the polycomb repressive complex 2 (PRC2) complex that recruits H3K27 methylation to silence developmental genes marked for repression [114116], NPAC, which associates with a H3K4 demethylase [84, 117], LEDGF, a DNA repair factor that functions in homologous recombination [118], and MSH6, a factor required for mismatch repair [119] (Fig. 3b). The functions of these human and yeast proteins are discussed further below in the context of the biological processes they operate in.

Cryptic transcription

One of the best-understood functions of Set2 and H3K36me in yeast is prevention of cryptic transcription across the genome. A cryptic transcript is traditionally defined as a transcript that originates from inside a gene body as opposed to originating from the canonical 5′ promoter region [120]. Cryptic transcripts can arise from both the sense and antisense direction [121, 122], and they seem to occur preferentially from longer genes that are weakly transcribed [123]. Cryptic transcripts can be further classified based on their stabilities and how they are degraded. Generally, cryptic transcripts are quickly degraded by the exosome [124] and are referred to as cryptic unstable transcripts (CUTs) [121, 122]. These highly unstable transcripts can be observed and studied only when the exosome is compromised [124]. One such class of CUTs is the Xrn1-sensitive unstable transcripts that are reliant on the Xrn1 5′–3′ exonuclease for degradation [125]. Additionally, a new class of cryptic transcripts, Set2-repressed antisense transcripts, arises selectively in set2∆ cells [126]. Finally, the stable unannotated transcripts (SUTs) comprise a separate class of transcripts that are much more stable [122]. Genome-wide studies in yeast have shown that cryptic transcripts can arise from within the 3′ region of gene bodies, but result at an even higher frequency from bi-directional transcription events at promoters [121, 122, 127]. Excitingly, cryptic transcription has recently been described in mammalian cells, and was shown to be linked to the recruitment and function of DNMT3b via H3K36me3 [128]. In both organisms, it remains to be determined what function, if any, these transcripts may serve.

There are two main mechanisms by which cryptic transcripts are thought to arise: increased histone acetylation and decreased numbers or mis-localization of histones across the gene body. As discussed above and further below, Set2 and H3K36me play an important role in preventing both of these aberrant phenomena.

Isw1b and Rpd3S play pivotal roles in ensuring that gene bodies remain hypo-acetylated after passage of the elongating RNAPII. A deletion of ISW1 or IOC4 results in increased histone acetylation and increased cryptic transcription [86]. This occurrence is likely due to the fact that nucleosomes are no longer positioned correctly for Rpd3S to engage a di-nucleosome pair [87]. Biochemical experiments have demonstrated that Rpd3S has a preferred linker length of ~50 base pairs of DNA [87]. The addition of Isw1 leads to increased Rpd3S deacetylase activity [87]. Isw1b and other chromatin remodelers are likely necessary to accurately prime the chromatin template for deacetylation by Rpd3S. In total, H3K36me serves to both activate Rpd3S activity through the binding of Eaf3’s chromodomain and, through Isw1b recruitment via the PWWP domain of Ioc4, ensure that nucleosome spacing is ideal for Rdp3S binding. These functions maintain a repressive chromatin environment behind RNAPII, and they reinforce transcription in the sense direction [127].

The other mechanism by which cryptic transcription is repressed is by ensuring that nucleosomes are properly restored after the elongating RNAPII complex has moved through the gene body. Spt6, Asf1, and FACT are all important in this process. These chaperones prevent cryptic transcription by reassembling nucleosomes behind the elongating RNAPII complex [120]. If the functions of these chaperones are compromised, there are fewer intact nucleosomes on chromatin, allowing RNAPII to access cryptic promoters [54, 55]. Further, nucleosomes marked with Set2 methylation are refractory to Asf1, Spt6, and FACT binding, thus ensuring that, once nucleosomes are assembled into chromatin behind RNAPII, they remain assembled [82].

Increased histone acetylation and defects in nucleosome reassembly are likely not the only two mechanisms that give rise to cryptic transcripts. Deletions of many chromatin modifying, remodeling, and maintenance factors promote cryptic transcription [55, 129]. Interestingly, the types of cryptic transcripts produced from these different deletion mutants are different [86], offering further support for different underlying mechanisms regulating cryptic transcript production. Curiously, widespread production of cryptic transcripts does not necessarily have a deleterious effect on cell growth. set2∆ cells are viable, as are many other deletions that lead to the production of cryptic transcripts. Further, under normal laboratory conditions, a deletion of SET2 or other chromatin factors results in relatively few changes to the transcriptome, i.e., only ~80 genes are up- or down-regulated in a SET2 deletion [130]; thus, production of cryptic transcripts does not drive large-scale transcriptional change. However, there is limited evidence that severe nutrient deprivation can produce cryptic transcripts in wild-type cells [55]. This finding suggests that cryptic transcripts could act as a defense mechanism or have relevant functions during cellular stress.

Although, historically, cryptic transcripts have not been ascribed functions, there are currently two thoughts as to their possible cellular effects or roles. First, cryptic transcripts could inhibit transcription. Genes containing antisense transcripts that overlapped with their promoters exhibited statistically significant drops in full-length transcript amounts [131], which in some cases altered protein levels. Consistent with a transcriptional inhibition function, recent studies comparing the activation of genes upon carbon source shift showed a potential for antisense transcripts to cross sense promoters and deposit Set2-mediated H3K36me that, in turn, reduced the transcription of the sense genes [132]. Additional studies, under a variety of growth conditions, are needed to ascertain whether this phenomenon is a normal consequence of cryptic transcription or whether it is limited to nutrient shift. It is also necessary to determine whether the production of antisense transcripts during nutrient shift is required for the proper cellular response to stress.

The other function of cryptic transcripts could be the production of cryptic proteins. Using Spt6 mutation that produces a very strong cryptic transcription phenotype, the Winston lab observed the translation of a select few cryptic transcripts [55]. Although no functions have been assigned to these proteins, they raise the intriguing possibility that the increasing complexity that we are just beginning to uncover in the transcriptome could extend to the proteome. One particularly compelling hypothesis is that cryptic proteins could behave as dominant negative variants of their wild-type counterparts. This conjecture is primarily relevant to in-frame, sense, cryptic transcripts, as they would theoretically encode fully functional C-terminal domains of proteins, yet lack the N-terminal domains. Quantitative mass spectrometry data sets examining the proteomes of wild-type and cryptic transcript-producing cells will be needed to understand how wide-spread the production of cryptic proteins is, and to ascertain what cellular role(s) they may play.

Cryptic transcription and aging

In addition to the repression of cryptic transcription, Set2 and H3K36me have other critical functions. One important function is regulation of aging, likely through the maintenance of transcriptional fidelity across the genome [133]. Specifically, the Berger group showed that a loss of Set2 or a mutation of H3K36 decreased the lifespan of yeast cells and, critically, removal of Rph1, an H3K36me3 demethylase, significantly extended the lifespan of wild-type cells. Interestingly, as yeast cells aged, they lost H3K36me genome-wide, and a significant increase in cryptic transcription was observed. The loss in H3K36me as yeast cells age could be due to a global loss in histones and/or an increase in H4K16 acetylation. Older yeast cells lose as much as 50% of their histones as they age, and they gain H4K16ac, further increasing the likelihood of transcriptional de-repression genome-wide [134]. Critically, these aforementioned functions are likely conserved in higher eukaryotes, as the C. elegans Set2 homolog MET-1 also extends lifespan [133, 135, 136]. In summary, in yeast and worms, H3K36me regulates the transcriptome by preventing cryptic transcription, thus decreasing transcriptional noise. This regulation ensures that genes are properly expressed throughout the organism’s lifespan, thereby promoting longevity.

Other functions of Set2/H3K36me and their evolutionary conservation

RNA splicing

A role of H3K36me in regulating splicing was first observed in human cells [109]. H3K36me3 regulates exon choice in the FGFR2 gene. In epithelial cells, exon IIIb is included in the mRNA transcript, whereas in mesenchymal cells, exon IIIc is included. Overexpression or siRNA-mediated knockdown of SETD2 demonstrated a shift in exon inclusion in both cell types. The mechanism of exon choice is recruitment of the splicing factor PTB via an interaction with MRG15, the human EAF3 homolog [109]. MRG15 is recruited to FGFR2 in an H3K36me3-dependent manner, and subsequently recruits PTB, which guides exon choice.

A genome-wide role of SETD2 in splicing has been demonstrated by next generation sequencing techniques [111, 137]. Without SETD2, a plethora of RNA splicing defects was observed, including intron retention and aberrantly spliced genes. Interestingly, nucleosome positioning was altered at many sites of RNA processing defects, suggesting that H3K36me’s chromatin remodeling function and repression of histone exchange may also be conserved in human cells.

ZMYND11 is another factor that binds H3K36me3 and regulates splicing and transcription elongation [110, 111]. This protein contains three tandemly arranged chromatin reader domains: a PHD, a bromodomain, and a PWWP domain. The PWWP domain selectively binds to H3.3K36me3, and, in concert with the adjacent bromodomain, recognizes residue S31 in the transcription-associated H3.3 variant, suggesting a specific role in transcription. In fact, ZMYND11 associates with RNA splicing regulators and RNA-seq data from a ZMYND11 knockdown showed increased intron-retention, further arguing for a role in mRNA splicing. Interestingly, ZMYND11 is down regulated in many cancers and, when overexpressed, ZMYND11 suppressed cancer cell growth [110].

Finally, similar to mammalian cells, Set2 also regulates splicing in yeast. Mutants lacking H3K36me exhibited reduced splicing efficiency across the genome [138]. Interestingly, correct splicing was dependent on H3K36me2 and association of Set2 with the CTD of RNAPII. The necessity of the SRI domain for proper splicing suggests that Set2 may act co-transcriptionally to recruit splicing factors, and the requirement of H3K36me2 suggests that Rpd3S may also play an important role in splicing.

DNA damage response

As DNA is packaged into chromatin, histones play a key role in the DNA damage response (DDR) pathways. Intriguingly, Set2 and H3K36me are necessary for DDR, although the mechanisms differ between organisms.

In fission and budding yeast, cells lacking Set2 and H3K36me display strong sensitivity to DNA damaging agents [139141]. In both systems, Set2 regulates chromatin compaction and limits resection, i.e., the creation of single-stranded DNA at sites of DNA damage that allows binding of DDR proteins such as RPA. In the absence of H3K36me, the H3K36 residue is instead available for acetylation by Gcn5, resulting in chromatin decompaction. Further, limiting resection promotes non-homologous end joining (NHEJ) [139, 140]. In budding yeast, there were also defects in DDR signaling pathways in set2∆ cells [139], suggesting that the chromatin state surrounding the damage site is critical for recruiting DDR proteins and permitting their proper signaling activities. In fission yeast, H3K36me is cell cycle regulated, further delineating the choice between homologous recombination (HR) and NHEJ [140].

In human cells, HR is defective in SETD2 mutant cells [142144]. This defect appears to be the result of a lack of RPA and RAD51 binding to damage sites [143, 144]. Further, upon loss of H3K36me3, DNA damage persists in cells, likely due to an inability to activate downstream signaling proteins, such as p53, that are critical for efficient repair [142].

In human cells, H3K36me also regulates DNA mismatch repair (MMR) by recruiting MSH6 [119, 143]. MSH6, like many H3K36me3 effector proteins, contains a PWWP domain that is essential for H3K36me3 binding. Critically, cells deficient for SETD2 display microsatellite instability and increased mutation frequency, phenotypes typical of defects in DNA MMR. Thus, Set2/SETD2 and H3K36me play important roles in maintaining genome integrity and stability.

Polycomb silencing

H3K36me regulates Polycomb-mediated gene silencing during mammalian development by recruiting Polycomb repressive complex 2 (PRC2) components PHF19 and PHF1 [114116, 145]. PHF19 and PHF1 both have Tudor domains that recognize H3K36me3 and bring PRC2 to actively transcribed and developmentally regulated regions of the genome, which then allows for H3K27me3 to spread across the genes to induce transcriptional repression. Additionally, PHF19 associates with NO66, an H3K36me3 demethylase that further reinforces the silencing of these loci [116].

DNA methylation

H3K36me is necessary for establishing DNA methylation in gene bodies. DNA methylation is an essential regulator of gene expression; hyper-methylation of gene promoters results in repression and is a hallmark of cancer. Both of the de novo DNA methyltranferases, DNMT3a and DMNT3b, contain PWWP domains that bind to H3K36me3 [146, 147] and, further, are necessary for their DNA methyltransferase activities [112, 146]. In the absence of SETD2, DNA methylation is not properly targeted to transcribed regions in the genome, which has been attributed primarily to the loss of DNMT3b targeting by H3K36me3 [112, 113, 148]. In the context of transcribed regions, the function of DNA methylation is poorly understood, but studies show that gene body DNA methylation is likely important for transcription elongation and splicing [149]. Thus, the ability of SETD2 and H3K36me3 to promote transcription elongation and splicing likely depends, in part, on maintaining DNA methylation. In addition, and as mentioned above, DNA methylation also prevents spurious transcription initiation within the gene body in mammalian cells [128]. Future work is needed to further understand the function of DNA methylation in transcribed regions.

Cell division via tubulin methylation by SETD2 at K40

In addition to its chromatin-based functions, SETD2 maintains genomic stability by ensuring the proper segregation of cells during mitosis [150]. During mitosis, SETD2 methylates tubulin at K40, and ablation of SETD2 leads to a plethora of mitotic defects. Most likely, some of the oncogenic phenotypes observed in SETD2-mutated cells are due to the genomic instability caused by tubulin defects during mitosis. Also, additional non-histone targets for SETD2 are likely to be discovered because many other histone methyltransferases (e.g., G9a) have multiple substrates [151].

Animal development

Consistent with SETD2 and H3K36me having many cellular functions, multiple animal models have shown that SETD2 and the H3K36 histone residue are necessary for animal development. The requirement for SETD2 and H3K36me was first shown for Drosophila; RNAi knockdown of dSet2 caused lethality during the larval stage of development [14, 152]. Additionally, when all of the canonical H3K36 residues in the genome were mutated to arginine, flies were unable to complete development [153]. Thus, H3K36me is specifically critical for fly development.

SETD2 and H3K36me are also crucial for mammalian development. SETD2−/− knock-out mice die as embryos at E10.5–E11.5 due to defects in angiogenesis [154]. These embryos display vascular defects in the embryo, the yolk sac, and the placenta. RNAi knock-down experiments in murine embryonic stem cells also reveal defects in differentiation due to misregulation of the Fgfr3/Erk pathway [155].

These results demonstrate that the enzyme that mediates H3K36 methylation and the residue that accepts this methylation are critical for animal development; however, the precise mechanisms underlying their importance remain to be elucidated. Further, it is likely that the failure to methylate non-histone targets could contribute to the lethality observed in these animals. These exciting questions await experimental answers.

The role of SETD2 and H3K36 methylation in cancer

SETD2 is mutated in up to 15% of patients with clear cell renal cell carcinoma (ccRCC) [156158]. SETD2 is located on chromosome 3p—a region commonly deleted in ccRCC tumors [158]. ccRCC tumors are very heterogeneous, but mutations in SETD2 arise frequently and independently in a single tumor [157]. SETD2 mutations have also been observed in acute leukemias, in bladder cancer, and in glioblastoma [159161], indicating a broad need for maintaining SETD2 to prevent tumorigenesis.

Although, it is not known, biochemically, exactly how SETD2 functions as a tumor suppressor, the abilities of SETD2 and H3K36me3 to regulate splicing, DNA methylation, chromosome segregation, and DNA damage repair are likely candidates that underlie SETD2’s role in tumor suppression. SETD2 plays a critical role in maintaining transcriptional fidelity. In the absence of SETD2, mRNA processing defects occur at as many as 25% of expressed genes across the genome [137]. Further, in ccRCC, there are many transcription termination defects in SETD2-mutant cancer cell lines [162]. These termination defects lead to the creation of chimeric transcripts, some involving oncogenes, providing yet another potential mechanism that could lead to cancer development. Together, the defects in mRNA processing, transcription termination, DNA methylation, and impaired DNA damage signaling likely provide an ideal environment for tumorigenesis.

Finally, it has recently been shown through exome sequencing that recurrent mutations at or near H3K36 can also lead to cancer [163165]. In particular, H3K36 in the context of the H3.3 variant has been found mutated to methionine in chondroblastomas [165] and in other cancer types including head and neck squamous cell carcinoma and colorectal cancer [166]. Mutations at H3.3K36 result in genome-wide loss of H3K36 tri-methylation and can directly stimulate tumor formation [167, 168]. Similar to the mechanism by which K→M mutations at H3K9 and H3K27 “trap” or “poison” their respective methyltransferase enzymes [169172], the H3.3K36M mutation rearranges the active center of SETD2 to provide a high affinity binding site for the H3.3K36M histone tail, thus explaining how a single histone mutation can impart a tumor phenotype [173]. As H3K36me is lost from the genome, a redistribution of H3K27me3 by PRC2 occurs across the genome [168]. Significantly, H3K27me3 redistribution dilutes the polycomb repressive complex 1 (PRC1) that associates with H3K27me3 across the genome and results in increased gene expression [168]. This redistribution disrupts the normal differentiation process by locking cells in an undifferentiated state.

In addition to mutations at H3.3K36, mutations at adjacent residues (i.e., G34-to-R/V and G34-to-W/L) have also been reported in a variety of cancers including pediatric non-brain stem gliomas and tumors of the bone [163165]. Unlike the H3.3K36M mutation, however, G34W/L and G34R/V mutations do not abolish global levels of H3K36me3 [170, 174]. Rather, they impact the nucleosomes that contain this mutation and result in a redistribution of the H3K36me3 mark across the genome that reprograms the transcriptome in ways that promote tumorigenesis [170, 174]. Collectively, these findings show that mutating either SETD2 or H3K36 is sufficient to promote cancer progression.

Conclusions

H3K36me is a histone PTM that is conserved from yeast to humans, suggesting that it is extremely important for proper cellular function. Although great strides have been made recently to elucidate the participation of H3K36me in various cellular processes, we are just beginning to understand its functions in these newly discovered areas. The recent insights into Set2/SETD2 biology have raised many questions that will be explored over the coming years. Significantly, we do not fully understand how all of the conserved domains in Set2/SETD2 (e.g., the WW and CC domains) contribute to Set2/SETD2 targeting to chromatin and to its enzymatic activity toward histone and non-histone substrates. We also do not fully understand the distinct roles of each H3K36 methylation state. Each methyl state likely recruits distinct effector proteins, but the majority of effector proteins discovered to date (i.e., those with PWWP domains) prefer to bind to H3K36me3. Thus, it is likely that we have not yet identified the full gamut of H3K36me effector proteins and their functions.

Another significant question is why cells invest considerable energy in repressing cryptic transcription? Set2 and a plethora of other chromatin factors function to prevent these transcripts from arising. So we ask: Are cryptic transcripts deleterious to genome stability or integrity? Are these transcripts translated? If yes, how do cryptic proteins impact cellular biology? The answers to these exciting questions may also shed light on the role of H3K36me in suppressing tumorigenesis. Along these lines, although SETD2 and H3.3K36 are mutated in multiple types of cancer, we do not fully understand H3.3K36’s role as a tumor suppressor. In addition, the recent discovery of a non-histone target for SETD2 will undoubtedly spur the search for other non-histone SETD2 targets, which, along with H3K36me and tubulin methylation, may function to maintain genome stability. Further work will elucidate how these methylation events contribute to chromatin and cellular function.