Keywords

1 Introduction

In eukaryotes, about 147 bp DNA wraps around a histone octamer (assembled from H3-H4 tetramer and two H2A-H2B dimers) to form the nucleosome core-particle. This structure is stabilized by histone H1 and can be further folded into higher-order chromatin structures [1]. It has been demonstrated that histone molecules are subject to diverse modifications, including methylation, acetylation, phosphorylation, ubiquitination and many others, which constitute a unique “code” for the regulation of chromatin function and dynamics [2, 3]. Histone methylation was discovered in 1964 [4, 5], however, its regulation and functional significance were revealed in the past 15 years. Through reactions catalyzed by different families of enzymes, histone lysine residues can be mono-, di- or tri-methylated [6] whereas arginines can be mono- or di-methylated (symmetrically or asymmetrically) [7] (Fig. 1). While most methylation occurs on the flexible histone N-terminal tails, several methylations were also detected within the globular domain. It has been well-documented that histone methylation at different residues as well as different methylation states within the same residue can confer a variety of biological functions. Table 1 summarized validated histone methylation sites together with their catalyzing enzymes and the functional outcomes. In this Chapter, we discuss several major histone methyltransferases (HMTases) and molecular mechanisms underlying methylation reactions. In most cases, the addition of methyl groups to histones does not directly affect chromatin structure. It is believed that diverse functions of different histone methylations are mediated through binding proteins harboring specific motifs. Therefore, we also discuss different methyl-histone binding domains and their recognition mechanisms.

Fig. 1
figure 1

Lysine and arginine methylation. (a) Catalyzed by different histone KMTs, lysine residues can be mono-, di-, and tri-methylated. (b) Arginine residue in histones can be mono-methylated by type I, II, III PRMTs. Type I and II PRMTs can further introduce asymmetric and symmetric di-methylation, respectively

Table 1 Methylated sites on histone

2 Histone Methyltransferase

Methylation is one of the most common protein modifications on multiple amino acids, which is catalyzed by protein methyltransferases (MTases) using S-Adenosyl methionine (SAM) as the cofactor and methyl-donor [65]. Histone methylation main occurs on lysine and arginine residues, although glutamine (H2AQ104) methylation has been recently reported which is catalyzed by a nucleolus specific rRNA 2′-O-methyltransferase [66]. Histone lysine and arginine methylations are catalyzed by two major types of MTases, lysine methyltransferase (KMT) and protein arginine methyltransferase (PRMT). While most KMTs catalyze methyl transfer mainly onto histones, PRMTs methylate histones and a wide range of non-histone substrates. These two types of enzymes share little similarity in primary and tertiary structures and reaction mechanisms which are discussed separately in different categories.

2.1 Lysine Methyltransferases (KMTs)

SUV39H1 is the first characterized de novo histone KMT, containing a conserved catalytic motif (~120aa) which was initially identified in three drosophila proteins, Suppressor of variegation [Su(var)3–9], Enhancer of zeste [E(z)]and Trithorax (Trx) [67] and thereafter named as the SET domain [25]. While sequence alignment identified ~50 SET domain-containing proteins in human genome [68], many of them have been shown to possess histone KMT activities. Among the characterized KMTs, human DOT1-like (DOT1L) does not harbor a SET domain but processes the robust catalytic activity [49]. SET domain often localizes in the C-terminus of histone KMTs while this bifurcated motif can be divided into the conserved SET-N, SET-C and a highly variable insertion (SET-I) in the middle. These enzymes also harbor different sets of other domains and can be briefly classified into seven families, including SUV39, SET1, SET2, EZH, SMYD, PRDM, SUV420 and others (Fig. 2).

Fig. 2
figure 2

Human histone lysine methyltransferases. 35 active human histone lysine methyltransferases (KMTs) are grouped into eight families according to their domain organization

The SET domain of SUV39 family KMTs are encompassed by two conserved cysteine-rich pre-SET and post-SET domains. These KMTs mainly catalyze H3K9 methylation with the exception of SETMAR which methylates H3K4 and H3K36 [16]. The SET1 and SET2 families KMTs comprise different groups of large enzymes which specifically methylate H3K4 and H3K36 respectively. The SET1 family KMTs lack the pre-SET domain and their enzymatic activities require the formation of complexes with other proteins including RBBP5 and ASH2L [69]. In contrast, the pre-SET domain is replaced by a AWS (Associated with SET) domain in the SET2 family KMTs. This unique AWS-SET-postSET configuration is believed to confer the specific methyl transfer onto H3K36 [42]. In the EZH family KMTs, both pre-SET and post-SET domains are absent. However, these enzymes harbor a conserved CXC domain which is located upstream of the SET domain and is critical for their activity. EZH1/2 also have no detectable activity unless they form the Polycomb Repressive Complex 2 (PRC2) with SUZ12 and EED [70]. Unlike other KMT families, the SUV420H family KMTs only contain the conserved SET and post-SET domains but no other domains. SUV420H1/H2 are able to introduce H4K20me2/3 to H4K20me1 [55].

The SET domain in the SMYD and PRDM families KMTs has unique configurations. SMYD family of KMTs contain a long interposed sequence composed by a MYND domain and a SET-I motif. The MYND domain of SMYD1/2 has been shown to interact with the proline-rich motif in their binding partners while the MYND domain of SMYD3 is believed to bind to DNA [71]. The SMYD family of KMTs also contain the post-SET domain and a conserved C-terminal domain (CTD) with unknown function. Different from other KMT families, SMYDs display diverse substrate specificity [40, 58]. The PRDM family of KMTs harbor a catalytic PR/SET domain which only shares 20% similarity with the SET domain in sequence but displays a similar tertiary structure. Most PRDMs contain multiple zinc-fingers while some also have a zinc knuckle motif. The human PRDM family has 17 members in which six have histone KMT activities [30,31,33, 46].

Other human KMTs do not fit into above categories are SETD7, SETD8 and DOT1L. SETD7 and SETD8 only contain the catalytic SET domain and specifically mediate mono-methylation on H3K4 [72] and H4K20 [73] respectively. DOT1L is a non-SET domain-containing KMT which specifically catalyzes H3K79 methylation. The catalytic domain of DOT1L is similar to the catalytic domain of PRMT which utilizes a Rossmann fold for SAM binding [74, 75]. Furthermore, in vitro methylation catalyzed by DOT1L is a non-processive reaction whereas the reaction mediated by most SET domain-containing KMTs is processive [76]. Although several other proteins have also been reported to possess KMT activities [77, 78], the molecular details underlying these reactions still remain elusive.

2.2 Molecular Basis of Lysine Methylation

The representative SET domains of all KMT family members have been crystalized and their tertiary structures reveal several common features which are critical to confer efficient methyl transfer reaction [79]. Overall the SET domain is a highly interwined globular structure with extensive intra-domain interactions, suggesting each motif is critical for the structural integrity. The conserved SET-N and SET-C motifs form three beta-sheets and two beta-sheets with a pseudo-knot structure respectively. The variable SET-I motif also displays a similar structural fold, containing two anti-parallel β-strands and a short α-helix. Together with SET-C, SET-I motif is directly involved in substrate and SAM bindings and thus contributes to the substrate specificity [79]. Furthermore, zinc-chelating was observed in several structures. For example, the pre-SET domain of SUV39H family chelates three zinc ions [80, 81] and the AWS domain of SET2 family chelates two [36, 44], whereas the Zinc-Knuckle domain of PRDM9 chelate one zinc ion [82] and the CXC domain of EZH2 binds to six [70]. These domains do not contact substrate but pack against the SET domain to facilitate its structure stability and enzymatic activity [79]. Zinc-chelating was missing in SU420H1, SETD7 and SETD8 structures, however, a α-helix bundle, a beta-sheet and a long alpha-helix were observed respectively [72, 73, 83], suggesting that they may exert the similar function without zinc binding. Additionally, three cysteines in the post-SET domain of many KMTs chelate one zinc ion together with a conserved cysteine in the SET-C motif and this structure is also critical for the enzymatic activity [84]. Without zinc- chelating, the C-terminal sequences of the SET domain in SETD7 [72] and SETD8 [73] also form the similar structural fold for substrate and SAM binding.

Furthermore, the SET domain interacts with histone substrate and SAM through opposite surfaces. While SAM fits into an open concave pocket, histone peptide exhibits an extended conformation and extensively interactions with the binding groove. In this way, the target lysine is precisely positioned and its side chain can go through a narrow channel to meet with SAM [79, 85]. Different SET domains form different interactions with the backbone of their substrates to specifically define the methylation site. For instance, DIM-5, a neurospora SUV39 family KMT methylates H3K9 (QARK9ST) but not H3K27 (AARK27SA) due to its specific interaction with the side chain of T11 [81]. Moreover, the alkyl component of the lysine side-chain inhabits a hydrophobic environment while the ε-Nitrogen is stabilized by hydrogen bonds with surrounding carbonyl groups and hydroxyl-group [85]. In SETD7 structure, one ε-nitrogen on the side chain of H3K4 forms a hydrogen bond with a conserved Y245 and another with a tightly bound water molecule to prohibit the rotation of the εC-N bond for additional methylation. Accordingly, Y245A mutation enables SETD7 to catalyze H3K4me2/3 [72]. Mutation of another the target lysine binding tyrosine (Y305F) also results in H3K4me2 [84]. While the same Y-to-F mutation in SETD8 [73], MLL [86], G9A [70, 71] leads to similar effects on methylation, the F281Y mutation in DIM-5 disables the catalyzed H3K9me3 [84]. Intriguingly, the ε-N on the target lysine side chain forms a critical hydrogen-bond with S161 in the SET domain of mouse Suv420h2, which makes the enzyme inefficient for trimethylation [83, 87]. Disabling this hydrogen-bond by S161A mutation greatly increases H4K20me3 [83]. Together, the substrate specificity of KMTs is determined by extensive substrate-SET domain interactions whereas the methylation states rely on the accommodation of the ammonium group on the target lysine side chain in the structure.

SMYD proteins share a conserved bilobal structure in which the catalytic core is located in the middle of the N-terminal lobe with the MYND domain and CTD around. While the MYND domain is dispensable for methylation, the SET and post-SET domains form a surface pocket for cofactor binding and a deep pocket of the interface between the SET domain and CTD binds to substrate [71]. Although the overall structures are similar, SMYD1-3 display substantial differences in the size and surrounding structure of the substrate binding pocket, which could be responsible for divergent specificities on substrate and methylation states [88]. In the PRDM family KMTs, the SET domain signature motifs are poorly conserved. However, the overall structure of the PR/SET domain corresponds to the SET domains, in which the central SET domain fold is flanked by pre-SET and post-SET regions. The bindings of cofactor and substrate peptide to the PR/SET domain of Prdm9 are also similar to the SET domains [82]. These findings suggest that the structural similarity to the SET domain confers the lysine methylation activity of both SMYD and PR/SET domains. In contrast, the catalytic domain of DOT1L forms an open α/β structure which is comprised of a seven-stranded β sheet. This structure is distinct from the SET, SMYD or PR/SET domain but similar to several class I SAM-dependent MTases [89]. The active core of DOT1L has an elongated structure, containing a SAM binding pocket and a lysine binding channel. While the SAM binding pocket is critical for methylation, the lysine binding channel allows the accommodation of all three methylation states. The positive charged C-terminal region of the catalytic domain is also critical for the enzymatic activity, likely through binding to negatively charged nucleosomal DNA [75]. Despite of the structure similarity between DOTlL and PRMTs, arginine methylation by DOT1L was not detected.

2.3 Protein Arginine Methyltransferase

Human genome harbors eleven protein arginine methyltransferases (PRMTs), eight of them are able to methylate different histones. Based on methylation products, PRMTs are classified into four different types [7]. Type I enzymes, including PRMT1, 3, 4, 6 and 8 introduce monomethylation on arginine and further proceed to asymmetrical di-methylation (aDMA). PRMT5, 7 and 9 are type II PRMTs which generate the symmetrical dimethylated products (sDMA) after the initial monomethylation. Type III enzymes only catalyze monomethylation but do not proceed. Different from methylation on the terminal guanidine nitrogen atoms catalyzed by above PRMTs, Type IV PRMTs introduce monomethylation of δ-nitrogen. Most PRMTs catalyze aDMA on arginine, which is likely attributed to the higher energetically challenging with sDMA [90]. The characterized PRMTs and their function are also summarized in Table 1.

2.4 Molecular Basis of Arginine Methylation

Different from the SET domain, the conserved 310-aa catalytic core of PRMTs shares a similar structure with a Rossmann-fold domain and a C-terminal β-barrel domain and functions in a homo-dimer [91]. The methylation occurs at the interface of the catalytic core where SAM is accommodated in the Rossmann-fold, whereas the substrate peptide binds to an acidic groove with the side chain of target arginine inserted into a narrow tunnel to meet SAM [92]. Two notable structures were observed at the interface, a double-E loop from the Rossmann fold and a THW loop from the beta-barrel, which are important for methylation. It has been shown that E181D mutation in the double-E loop of Trypanosoma brucei PRMT7 converts this type III enzyme to a type I enzyme to catalyze asymmetric dimethylarginine (aDMA) [93]. Another critical motif for the SAM and target arginine binding is in the N-terminal helix of Rossmann-fold. F379M mutation in this motif of C. elegans PRMT5 partially shifts the reaction towards aDMA. This mutation likely opens up the active site to allow more bulky asymmetrical di-methyl groups [94]. To corroborate the importance of this residue, M48F mutation of rat PRMT1 enables its ability to catalyze symmetrical di-methylation [90], suggesting the Phe or Met residue in the N-terminal helix of Rossmann-fold could define the type of the methylation. However, type II PRMT9 contains a Met at the exact is position. Therefore, this proposed F/M switch could only be the part of the underlying mechanism [91]. In general, the size of the target arginine binding pocket significantly affects the product specificity catalyzed by different PRMTs.

3 Regulation of Histone Methylation

The enzymatic activity of HMTs is often evaluated in an in vitro assay in which the enzyme is incubated with SAM and substrate to catalyze methylation. In this reaction, several HMTases, including EZH2 and MLL failed to show robust activity unless their core protein complexes were used [95, 96]. Different substrates have also been used in this assay, including histone peptides, recombinant histones, histone octamers and nucleosomes [56]. While several KMTs prefer nucleosomes, such as SETD8 [56], SUV420H1/2 [55] and DOT1L [49], many enzymes predominantly methylate recombinant histones or octamers, such as SETD7 [15], G9A and GLP [97]. These observations suggest that the intact catalytic domain is necessary but not sufficient for histone methylation. Therefore, we discuss several regulatory mechanisms at different molecular layers.

3.1 Regulation by Interacting Proteins

The structure of EZH2 SET domain uncovers an inappropriate position of the SET-I and post-SET domains, which prohibits their interactions with substrate peptide and SAM [98, 99]. Recently, the crystal structures of PRC2 reveal that the extensive protein interactions with EED and SUZ12 optimize the structure of EZH2’s SET-I to form an active catalytic moiety for H3K27 methylation [70]. In the SET domain of MLL, the SET-I motif is highly dynamic. After forming the protein complex with the RBP5-ASH2L heterodimer, extensive protein interactions significantly reduce this inherent flexibility and lock the SET domain in an active conformation to enable efficient methyl transfer [69]. Intriguingly, the intermolecular β-sheet interactions between MLL SET-I and RBBP5(330–344aa) was also observed in the structure of other KMTs. In SUV39 and SET2 family KMTs, the similar interactions are formed intramolecularly between the SET-I and a short fragment upstream of the pre-SET domain. This fragment also exists in EZH2 and functions as SET activation loop (SAL) [70], suggesting it is a conserved structural configuration for the functional SET domain. Together, these novel advances in structure demonstrate that the inherent imperfection of certain SET domains can be corrected through interactions with their binding partners.

Furthermore, interacting proteins can regulate HMTases’ activity through different mechanisms. In the in vitro assays, PRMT5 equally catalyzes symmetrical methylation on both H3R8 and H4R3. However, it preferentially methylates H4R3 after binding to COPR5 [100], suggesting this regulatory protein fine-tunes PRMT5’s substrate specificity. HSP90α, the binding partner of SMYD2 stimulates methylation on H3K4 but not H3K36 [101]. Similarly, the substrate specificity of EZH2 on either H1K26 or H3K27 is modulated by a PRC2 core component EED [63]. A Polycomb-like protein PHF1 also facilitates PRC2-mediated H3K27me3 without affecting H3K27me1/2 [102, 103]. Additionally, SETDB1’s binding partner ATF7IP/AM, an ATFα-associated factor not only augments SETDB1’s enzymatic activity, but also facilitates the conversion of H3K9me2 to H3K9me3 in vitro and in vivo [104]. However, the similar effects were not observed when peptide substrates were used in the in vitro assays [105]. The molecular mechanisms underlying these regulations are largely unknown.

3.2 Regulation by Post-Translational Modifications

Several HMTases are subjected to different post-translational modifications which could also regulate their catalytic activity. For example, PKB/AKT phosphorylate EZH2-Ser21 and this phosphorylation inhibits EZH2 binding to H3 and thus reduces H3K27me3 in vitro and in vivo [106]. In glioblastoma stem-like cells, EZH2 interacts with and methylates STAT3, while the Ser21 phosphorylation facilitates PRC2-catalyzed STAT3-K180 methylation to activate STAT3 signaling [107]. In response to DNA damage, SETD7 has been shown to interact with and methylate SUV39H1 at K105 and K123, leading to a dramatically reduced enzymatic activity of SUV39H1. Since these lysines are close to the chromodomain which is critical for chromatin binding, these methylations could weaken the SUV39H1-substrate interaction [108]. While SUV39H1-K266 acetylation within the SET domain reduces its catalytic activity, SIRT1-mediated deacetylation can restore it by facilitating the interaction between the SET and post-SET domains [109].

Moreover, bacteria-purified SETDB1 failed to methylate histones in the in vitro assays but 293T or Sf9 cell-purified enzymes displayed robust activity [110], indicating post-translational modifications could regulate SETDB1’s activity. Recently we demonstrate that SETDB1 is monoubiquitinated at K867 (K867ub1) within its unique SET-I motif via an E3-independent mechanism. The conjugated-ubiquitin is protected from active deubiquitination, likely through multiple intramolecular interactions. Importantly, the resulting constitutive monoubiquitination is required for SETDB1’s enzymatic activity and function. While most post-translational modifications are dynamically regulated, our findings highlight the constitutive role of K867Ub1 in regulating enzymatic activity of KMTs [97].

3.3 Regulation by Histone Modification

It is well-documented that histone methylation is also modulated by other histone modifications. On the same molecule, H3S10 phosphorylation blocks the access of the adjacent H3K9 for methylation [111]. The similar regulation also exists on different histones and one good example is that H2A and H2B ubiquitination affects H3K4 and H3K79 methylation. Site-specific installation of H2BK120ub1 causes the allosteric regulation of nucleosomes to facilitate DOT1L binding and thus increases the intranucleosomal H3K79me1/2 [112]. H2BK120ub1 also interacts with the N-terminal winged helix motif of ASH2L and promotes H3K4 methylation catalyzed by ASH2L-MLL-RBP5 complex [113]. However, H2AK119ub1 inhibits PRC2-catalyzed H3K27 methylation, suggesting that ubiquitination at different sites trans-regulates different histone methylations [114]. Similarly, the internucleosomal regulations have also been reported. While the SET domain of GLP and G9A preferentially methylates histone octamer, the full-length proteins efficiently catalyze oligonucleosomal methylation because G91/GLP bind to methyl-H3K9 binding on adjacent nucleosomes through their Ankyrin repeats domain [115]. Similarly, PRC2-catalzyed H3K27 methylation is stimulated by EED which binds to methyl-H3K27 on adjacent nucleosomes [116, 117]. In the PRC2 structure, H3K27me3 binding also stabilizes the conformation of EZH2 N-terminal SRM motif and affects the SET-I conformation to facilitate H3K27 methylation [70]. Similar to G9A/GLP, the chromodomain of SUV39H1 recognizes H3K9me3 and this methyl-binding anchors the enzyme to chromatin allosterically to allow further spreading of H3K9me3 [118]. Therefore, pre-existing modifications on histones could dramatically modulate methylation through different mechanisms.

4 Recognition of Histone Methylation

Although histone methylation does not neutralize the positive charge of DNA, several methylations could affect nucleosome structure to facilitate transcription [119]. For example, H3R42 locates at the DNA entry-exit region of nucleosome and addition of asymmetric dimethylation by CARM1 and PRMT6 could reduce nucleosome stability [48]. Structural study reveals that H3K79me2 leads to a subtle reorientation of the surrounding region in nucleosome [120]. Moreover, H3R17me2a and H3K4me1 have been shown to reduce chromatin association of NuRD complex, suggesting that these methylation marks could indirectly regulate accessibility of chromatin [121, 122]. In most cases, however, histone methylation serves as docking site for specific binding proteins which in turn recruit additional chromatin modifiers for diverse functional outcomes [119]. So far at least nine domains have been characterized as methyl-histone binding motifs that are briefly summarized in Table 2 together with their recognition sites.

Table 2 Histone methylation binding protein

4.1 The Royal Family of Domains

Several methyl-histone binding motifs, including chromodomain, Tudor, MBT and PWWP belong to the Royal family of domains that are descended from a common ancestor [173]. These domains share a similar structure of barrel-like three-strand β-sheet with a short helix and bind to different methyl-lysines [174]. Existing in ~31 human proteins [175], the chromodomain recognizes methyl-lysines using an aromatic cage formed by three highly conserved residue [176]. The binding site specificity is determined by specific interactions between amino acids around the methyl-lysine and the chromodomain. For example, the chromodomains of HP1 [176, 177] and MPP8 [125, 178, 179] preferentially recognize H3K9me3 [176, 177] while the same domain in PC binds to H3K27me3 [180, 181]. Intriguingly, the CHD family proteins contain tandem chromodomains which bind to a single H3K4me3 mark in a coordinated manner [182, 183]. In most cases, the aromatic cage of chromodomain accommodates trimethylation and binds to mono- or di-methylation with a lower affinity due to less optimal van der Waals and cation-π interaction [149, 184].

The Tudor domain forms four- or five β-strands and bind to methyl-lysine with a similar aromatic cage. The tandem Tudor domains of JMJD2A specifically recognize H3K4me3 [156]. In the Tudor domains of 53BP1, the carboxylate group of Asp1521 forms a hydrogen bond with the nitrogen group of dimethyl-amine to confer stable binding to H4K20me2 but causes a steric hindrance for the trimethyl-amine [149]. Therefore, binding specificity to different methylation states is precisely regulated in different Tudor domains. Although the tandem Tudor domains are often observed, only one of the tandem domains interacts with methyl-lysine, leaving another free of binding.

The MBT domain is a larger motif (~100aa) containing 2–4 repeats and exists in nine human proteins. All human MBT domains harbor the conserved aromatic residues, indicating they can bind to methyl-lysines. However, the MBT domain preferentially recognizes mono- and di-methylated lysines [185] with poor site specificity. It is likely due to the lack of interactions with amino acids around the methyl-lysine [186, 187]. The PWWP domain (100–150aa) folds into a five-strand β-sheet packed against a helical bundle with significant variations in β2 and β3 while many have the conserved aromatic cage [188]. It has been shown that the PWWP domains of Pdp1 and DNMT3A/3B specifically recognize H4K20me [189] and H3K36me3 [128, 190] respectively.

4.2 Other Methyl-Lysine Binding Domains

In addition to the Royal family of domains, several other motifs can recognize different methyl-lysines as well, including PHD finger, CW, Ankyrin repeat, WD repeat and BAH. The PHD finger domain forms two anti-parallel beta strands and one C-terminal alpha-helix, which are stabilized by two zinc ions chelated by a consensus C4HC3 sequence [191]. This motif exists in multiple chromatin-associated proteins and many have been shown to recognize methylated histones [130, 192]. Due to a favorable accommodation of H3R2, most PHD finger domains bind to methyl-H3K4 [150, 151]. For example, the PHD domains of BPTF and ING2 recognize H3K4me3 through an aromatic cage similar to the Royal family domains [150, 151]. Similarly, CW domain also centers two anti-parallel beta-strands and chelates one Zinc ion with the consensus C4 sequence [164]. The CW domain of ZCWPW1 preferentially recognizes H3K4me3 [163] through an aromatic cage. Among the seven human CW domain-containing proteins, four contain at least two conserved aromatic residues and can bind to H3K4me3 peptide in vitro [164].

The Ankyrin repeat domain is a widely distributed motif for protein-protein interactions. Intriguingly, the Ankyrin repeat domains of G9a and GLP can bind to H3K9me2 [160]. Similar to the chromodomain, three aromatic residues and a Glu are involved in the binding. However, the size of the aromatic cage cannot accommodate trimethyl groups. Distinctly, the WD repeats form a seven-bladed β-propeller, in which three scattered aromatic residues are responsible for methyl-lysine binding. It has been shown that the WD40 repeat domain of a PRC2 component EED recognizes multiple methyl-lysines, including H1K26me3, H3K9me3, H3K27me3 and H4K20me3 with similar affinity [116, 117]. The MLL complex subunit WDR5 contains seven WD40 repeats which have been reported to bind to methyl-H3K4 [193]. Existing in many chromatin-associated proteins, the BAH domain folds into a beta-rich structure. It has been demonstrated that the BAH domain of a DNA replication protein ORC1 specifically recognizes H4K20me2 through a four aromatic residues cage [168]. Due to a hydrogen bond between methyl-ammonium and side carboxylate chain of a nearby Glu, this binding favors H4K20me2 over H4K20me3 [168]. The BAH domain in BAHD1 has also been reported to recognize H3K27me3 [169].

4.3 Recognition of Arginine Methylation

Among methyl-histone binding domains, multiple Tudor domain-containing proteins also bind to methyl-arginine in various proteins, suggesting they can recognize methyl-arginine in histones [194]. The extended Tudor domain of SND1 has been shown to interact with peptides harboring H4R3 methylation [195]. Similar to methyl-lysine recognition, methyl-H4R3 binding also involves the aromatic cage of the Tudor domain of SND1. However, two aromatic residues pack to the guanidium planar in parallel, whose distance to the ammonium group is shorter than methyl-lysine binding [195]. While some Tudor domains recognize aDMA and sDMA on histones with comparable affinity, others display clear preference. In a peptide pull-down assay, the Tudor domain of TDRD3 preferentially recognizes aDMA on histone H3 tail [166]. Structural study reveals that such selectivity is rendered by a unique hydrogen bond between the unmodified amino group and the hydroxyl group of a Tyr in the aromatic cage [196]. Unexpectedly, the Glu-rich region in PELP1 has also been shown to bind to H3K4me2, H3K9me2 [197] and arginine methylated in vitro [198], suggesting there are more unidentified methyl-histone binding motifs.

4.4 Modulation of Methyl-Histone Binding

The methyl-histone binding by different domains is also subject to multiple regulations. Because of the extensive interaction between the binding domain and amino acids adjacent to the methylated residue, post-translational modifications on these amino acids could have drastic effects. For example, H3S10 phosphorylation, one of the most prominent modification on mitotic chromosome, inhibits HP1 binding to H3K9me3 [199, 200]. Similarly, H3K4me3 binding by different domains are blocked when H3R2 is asymmetrically dimethylated [10]. Intriguingly, modification adjacent to the methyl-lysine has also been shown to facilitate the recognition by binding domains. For example, the structure of RAG2 PHD finger domain complexed with H3K4me3 peptide uncovers an additional binding pocket. Therefore, H3K4me3 binding is increased when H3R2 is symmetrically methylated on the same molecule [201]. Furthermore, methyl-histone binding are regulated by other interacting partners. The structure of the ternary complex of Pygo PHD finger, the BCL9/Legless HD1 domain and H3K4me peptides demonstrates that the efficient H3K4me2 recognition requires the PHD-HD1 complex in instead of the PHD domain alone [155]. In addition to interacting proteins, ncRNA TUG1 can switch H3K9me3 binding by the PC2 chromodomain to H4R3me2s and H3K27me2. Through unknown mechanisms, another ncRNA NEAT2 can convert PC2’s H3K9me3 binding to H2AK5ac and H2AK13ac binding [202].

5 Conclusion Remark

As a key component of the proposed “histone code”, histone methylation is precisely regulated in cells and plays pivotal roles in the regulation of all chromatin-based processes. Histone methylation “code” is introduced by different groups of HMTases and recognized by various methyl-histone binding proteins. These proteins coordinate with various transcription factors, chromatin modifying proteins, signal pathway cascades and non-coding RNAs to constitute a large sophisticated network for diverse functional outcomes. It has been acknowledged that many histone methylating and binding proteins are altered in human diseases, including various cancers. Accordingly, numerous small molecule modulators have been developed and characterized for the pharmaceutical intervention of these diseases [203]. Multiple inhibitors targeting different HMTases have also been applied to various clinical trials. Therefore, a thorough understanding of the molecular basics underlying histone methylation and recognition will not only shed lights on their physiological functions, but also facilitate the development of therapeutic strategies for human diseases.