Introduction

Epigenetics is the term used to describe heritable changes in gene function that are not caused by changes in the DNA sequence (Wu and Morris 2001; Dupont et al. 2009). Histones (H3, H4, H2A, H2B) form the core of the nucleosome that is the fundamental unit of chromatin. Over 60 different histone residues that can be modified by at least eight distinct types have been reported (Kouzarides 2007; Shahbazian and Grunstein 2007; Lennartsson and Ekwall 2009; Banerjee and Chakravarti 2011; Bannister and Kouzarides 2011). Histone modification, one of the main epigenetic regulatory mechanisms, plays critical roles in various biological processes, including regulation of chromatin structure dynamics and gene expression at the whole-genome level (van Leeuwen and Gottschling 2002; Kouzarides 2007; Gelato and Fischle 2008).

Histone lysine methylation status is controlled dynamically by histone lysine methyltransferases (KMTs) and histone lysine demethylases (KDMs). H3K4 or H3K36 methylation is generally associated with active transcription, whereas H3K9 or H3K27 methylation is associated with gene silencing (Pontvianne et al. 2010; Binda 2013). KMTs catalyze mono-, di-, or trimethylation of lysine residues, and these modifications are involved in most plant development processes (Pontvianne et al. 2010; Thorstensen et al. 2011; Ay et al. 2014).

Histone lysine methylation is a reversible chromatin mark (Mosammaparast and Shi 2010) and to balance the methylation level of histone lysines, distinct histone demethylases are required (Chen et al. 2011). Bacteria and human lysine-specific demethylase 1 (LSD1) was the first KDM reported to specifically target histone H3 lysine 4 (H3K4) and histone H3 lysine 9 (H3K9) (Shi et al. 2004; Metzger et al. 2005). Arabidopsis LDL1 and LDL2 are homologs of LSD1 and they have been found to be involved in flowering time control (Jiang et al. 2007). The JmjC (Jumonji C) family of KDMs is involved in histone lysine or arginine demethylation through an oxidative reaction (Tsukada et al. 2006; Rotili and Mai 2011). A JmjC domain protein was first reported by Takeuchi and colleagues (1995); now there are more than 25,000 JmjC domain protein sequences in UniProt (http://www.ebi.ac.uk/uniprot) and SMART (http://smart.embl-heidelberg.de/). For their activity, JmjC domain proteins require iron Fe(II) and α-ketoglutarate (αKG) as cofactors. The JmjC domain has eight β-sheets, three conserved amino-acid residues in the binding site of Fe(II), and two residues in the binding site of αKG (Klose et al. 2006; Lan et al. 2008; Lu et al. 2008) also are important for Fe(II) binding. Thr185/Phe185 and Lys206 are required for αKG binding (Chen et al. 2006; Klose et al. 2006; Lu et al. 2008). JmjC domain proteins often harbor a JmjN (Jumonji N) domain, which as an integral core of all KDM catalytic domains (Pilka et al. 2015).The JmjC domain histone demethylases (JHDMs) can remove mono-, di-, and tri-methylation of histone lysine (Klose et al. 2006) and different KDMs exhibit different substrate specificities: KDM2A/JHDM1A are specific for H3K36me1/2, KDM5/JARID1 for H3K4me1/2/3, KDM4/JHDM3 for H3K9me2/3 and H3K36me2/3, KDM6 for H3K27me2/3, and KDM3/JHDM2 for H3K9me1/2 (Klose et al. 2006; Allis et al. 2007; Lu et al. 2008; Hou and Yu 2010). JmjC domain proteins play important roles in epigenetic processes, gene expression, and plant development by their involvement in the interplaying between histone modifications and DNA methylation (Lu et al. 2008; Saze et al. 2008; Chen et al. 2011; Luo et al. 2014).

Previously, JmjC domain proteins have been classified into eight groups in animals and five groups in plants (Lu et al. 2008). Recently, Luo et al. (2014) showed that Arabidopsis JmjC domain proteins could be divided into five groups: KDM4/JHDM3 group (AtJMJ11-13), KDM5/JARID1 group (AtJMJ14-19), JMJD6 group (AtJMJ21/22), KDM3/JHDM2 group (AtJMJ24-29), and JmjC domain-only group (AtJMJ20 and AtJMJ30-32). The functions and substrate specificities of each group also have been suggested (Luo et al. 2014). AtJmjC11/AtJMJ11/ELF6 (EARLY FLOWERING 6) was reported to act as a repressor in the photoperiod pathway by erasing H3K4 methylation deposited in FT (Flowering Locus T) chromatin thereby preventing precocious flowering (Noh et al. 2004; Jeong et al. 2009). AtJmjC12/AtJMJ12/REF6 (RELATIVE OF EARLY FLOWERING 6) is a FLC (Flowering Locus C) repressor by demethylating H3K27me2/3 (Noh et al. 2004; Lu et al. 2011a). Additionally, mutations in AtJMJ11 or AtJMJ12 were found to lead to brassinosteroid-related phenotypes and increased H3K9me3 levels (Yu et al. 2008). AtJMJ13 could act redundantly with REF6 (Holec and Berger 2012). AtJMJ14 was shown to act downstream of the Argonaute effector complex to demethylate histone H3K4 at the target of RNA silencing and repress flowering (Searle et al. 2010; Yang et al. 2010). In contrast, AtJMJ15 and AtJMJ18 were reported to accelerate flowering by demethylating H3K4me3/H3K4me2 at FLC (Yang et al. 2012a, b). AtJMJ30 may be a novel clock component involved in controlling the circadian period (Lu et al. 2011b). In rice, 20 JmjC domain proteins have been identified (Lu et al. 2011a). OsJMJ703 was shown to regulate stem elongation and plant growth by demethylating H3K4me1/me2/me3 (Chen et al. 2013), OsJMJ705 was found to function as an H3K27me2/3 demethylase involved in response to pathogen infection (Li et al. 2013), and OsJMJ706 was reported to encode a heterochromatin-associated H3K9 demethylase involved in the regulation of flower development (Sun and Zhou 2008). In addition, BcJMJ30 (Brassica campestris Jumonji 30) was implicated in pollen development and fertilization (Li et al. 2012). Therefore, it is well established that JmjC proteins play crucial roles in various plant species and developmental processes.

Only a few plant JmjC domain proteins have been analyzed at the whole-genome level (Lu et al. 2008). Zhou and Ma (2008) analyzed the distinct evolutionary patterns of histone demethylases in some lineages, but several representative species, such as the Selaginella, Ostreococcus, and gymnosperms, were not included in their study; moreover, the enzymatic activity of each subfamily remained unclear. Here we identified and analyzed JmjC domain proteins from representative green lineages. The results provide insights into the evolutionary conservation and diversification of JmjC domain proteins and putative catalytic targets, which may contribute to the future functional characterization of these important epigenetic regulators in plants.

Materials and methods

Identification of JmjC domain proteins

JmjC domain protein sequences from Arabidopsis thaliana, Populus trichocarpa, Brachypodium distachyon, Oryza sativa, Selaginella moellendorffii, Physcomitrella patens, Homo sapiens, and Drosophila melanogaster were retrieved from Chromdb database (http://www.chromdb.org/). Arabidopsis AtJMJ11-30 were used as queries to blastp against official websites of Brassica rapa (http://brassicadb.org/brad/, The syntenic genes between Arabidopsis and cabbage are also detected), Picea abies (http://congenie.org/) and Ostreococcus tauri or Volvox carteri (http://genome.jgi-psf.org).

Domain organization analysis

The protein sequences were analyzed for domain organization using NCBI-CD searches (http://ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi). The low-complexity filter was turned off, and the Expect Value was set at 10 to detect short domains or regions of less conservation in this analysis. Domains were also verified and named according to the SMART database (http://smart.embl-heidelberg.de/). Proteins with JmjC domain were used to construct phylogenetic trees. And sequences of JmjC domain were used to align and logos analysis.

Phylogenetic and sequence alignment analysis

Multiple sequence alignments were performed using the ClustalW program (Thompson et al. 1994). The resulting file was subjected to phylogenic analysis using the MEGA 6.0 program (Tamura et al. 2013). The trees constructed setting was depended on proteins sequences or JmjC domain sequences. JmjC domains of each group were aligned on line (https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_clustalw.html), and logos analysis was carried out (http://weblogo.berkeley.edu/logo.cgi).

Results

Identification and phylogenetic analysis of JmjC domain proteins

We screened 29 JmjC domain proteins in B. rapa, 25 in P. trichocarpa, 21 in A. thaliana, 20 in O. sativa, 19 in B. distachyon, 16 in P. abies, 14 in S. moellendorffii/P. patens, six in O. tauri, and five in V. carteri, together with 25 in H. sapiens and 11 in D. melanogaster. We have used the nomenclature of lysine demethylases proposed by the Chromdb database (JMJ, http://www.chromdb.org/index.html) (Online Resource 1).

Our phylogenetic analysis based on 169 JmjC domain proteins in 10 plants and 36 in fruit fly and human showed that the plant JmjC domain proteins could be divided into seven groups rather than the five groups reported in a recent study (Luo et al. 2014). Group-III KDM5B proteins were separated from Group-KDM5/JARID1 proteins based on their domain organization. Group-IV JmjC domain-only A was separated from Group-JmjC domain-only and was closely related to Group-JMJD6 whose homologs shared a similar domain organization with JmjC domain-only (Fig. 1). The origin and substrate targets for each group were inferred based on the phylogenetic tree and the functions were inferred from the annotated Arabidopsis and human KDMs (Table 1).

Fig. 1
figure 1

Phylogenetic tree of JmjC domain proteins based on sequences of JmjC domain. This tree concludes 169 JmjC domain proteins from P. trichocarpa, A. thaliana, B. rapa, O. sativa, B. distachyon, S. moellendorffii, P. patens, V. carteri, and O. tauri genomes, and 36 from animals. The JmjC domain proteins can be grouped into 7 groups based on the phylogenetic tree and domain organization. Square shows JmjC domain proteins from model plant Arabidopsis, triangle show plant JmjC domain proteins homologs from human and fruit fly and diamond show animal JmjC domain proteins from human and fruit fly. Different colors show different groups. The proteins named according to web (http://www.chromdb.org/index.html). JmjC domain protein sequences were aligned using ClustalW, and the phylogenetic tree analysis was performed using MEGA6. The trees were constructed with the following settings: tree inference as neighbor-joining; include sites as pairwise deletion option for total sequences analysis; Substitution model: p-distance; and Bootstrap test of 1000 replicates for internal branch reliability

Table 1 Putative origin and activity of JmjC domain proteins in plants

Group-I KDM4/JHDM3

Group-I included three Arabidopsis (AtJMJ11-13), six P. trichocarpa, five O. sativa, four B. distachyon/P. abies/S. moellendorffii/human, three B. rapa, three V. carteri, two P. patens/fruit fly, and one O. tauri JmjC domain proteins (Fig. 2a; Online Resource 1). It is remarkable that some genes were doubled in poplar and rice but not in cabbage even though whole genome duplication has occurred after cabbage and Arabidopsis divergence (Wang et al. 2011). The Group-I KDM4/JHDM3 members have been predicted to act as H3K9me2/3, H3K27me2/3, and H3K36me2/3 demethylases and some of them play roles in flowering time control (Noh et al. 2004; Jeong et al. 2009; Yang et al. 2010; Searle et al. 2010; Lu et al. 2011a; Luo et al. 2014). KDM4/JMJD2 was shown to epigenetically control herpes virus infection and reactivation from latency, but whether it is involved in the plant defense system remains unknown (Liang et al. 2013). JmjC, JmjN, and C2H2/C5H2 are the domains specific to this group. The JmjN domain and its interaction with the JmjC catalytic domain were found to be important for Jhd2 (also known as KDM5), a H3K4-specific demethylase in budding yeast (Huang et al. 2010; Quan et al. 2011). C2H2/C5H2 are small DNA-binding peptide motifs, which suggests that Group-I KDM4/JHDM3 might bind directly to a specific DNA sequence (Klug 1999; Lu et al. 2008).

Fig. 2
figure 2

Phylogenetic and domain analysis of Group-I KDM4/JHDM3 based on full-length sequences. Group-I KDM4/JHDM3 includes AtJMJ11, AtJMJ12, AtJMJ13 and their homologues in all organisms and can be divided into 3 sub-groups. Subgroup-I and II character JmjC, JmjN and C2H2 domains while subgroup-III characters JmjC, JmjN and C5H2 domains. Domains came from SMART V 6.0 database except C5H2 zinc finger from CDD V3.11 database. a Phylogeny and domain organization. b Logos analysis of JmjC domain. Fe(II) binding site are showed in red triangle and those in the αKG binding site are showed in black triangle. The trees were constructed with the following settings: Tree inference as neighbor-joining; include sites as complete deletion option for total sequences analysis; substitution model: Poisson model; and Bootstrap test of 1000 replicates for internal branch reliability

The Group-I JmjC proteins were divided into three subgroups based on their phylogenetic relationship and domain organization. Subgroups-I and II contained AtJMJ11 and AtJMJ12 plant homologs, respectively. Subgroup-I proteins were found only in angiosperms, while Subgroup-II proteins were found in higher plants. The domain organization of the Subgroup-I/II proteins was highly conserved, except that the B. distachyon protein BdJMJ11301 had an additional Cdt1_m domain. Subgroup-III contained AtJMJ13 homologs that were found in all the plants and animals studied. The C5H2 zinc finger was specific to this group (Fig. 2a). The C2H2/C5H2 domains were absent in SmJMJ1605, VcJMJ6402/640, and OtJMJ3606, indicating that these proteins may bind indirectly to a specific chromatin region by interacting with other factors.

Sequence alignment and logos analysis of the JmjC domain showed that the αKG and Fe(II) binding sites were highly conserved in the group; however, some proteins from pine, Lycophyta, and V. carteri were the exceptions. For example, in the αKG binding site of SmJMJ1601, Phe was replaced by Trp, and in the Fe(II) binding site of SmJMJ1608, His was replaced by Arg, whereas in VcJMJ6401, Phe and Lys in the αKG binding site were replaced by Thr and Arg, respectively, and His was replaced by Leu in the Fe(II) binding site (Fig. 2b; Online Resource 2).The conserved binding sites suggest that most members of Group-I may be active histone demethylases that can target tri- and di-methylated H3K9 and H3K36 (Lu et al. 2011b). AtJMJ11, which has conserved binding sites for αKG and Fe(II), was shown to have H3K27 demethylase activity (Lu et al. 2011a).

Group-II KDM5A

The phylogenetic analysis showed that KDM5/JARID1 could be divided into two groups, Group-II KDM5A and Group-III KDM5B (Fig. 1), even though they have similar substrates. Group-II KDM5A included five A. thaliana (AtJMJ14/16/18/19), nine B. rapa, five P. trichocarpa, two O. sativa/B. distachyon/P. abies/S. moellendorffii, and one P. patens JmjC domain proteins. No Group-II KDM5A proteins were found in green alga and animals (Fig. 3a; Online Resource 1). The KDM5/JARID1 proteins in animals belonged to Group-III KDM5B. KDM5/JARID1 proteins are histone demethylases of H3K4me1/2/3. Animal JMRID1 was reported to be involved in cell fate determinacy (Benevolenskaya 2007; Secombe and Eisenman 2007). In plants, the KDM5/JARID1 proteins were found to regulate flowering time, AtJMJ14 was shown to promote flowering by activating FT, the solute carrier SOC1, the floral homeotic protein AP1, and LEAFY (LFY), while AtJMJ15 promoted flowering by reducing the H3K4me3 of FLC (Lu et al. 2010; Searle et al. 2010; Yang et al. 2010, 2012a). Moreover, overexpression of the H3K4 demethylase gene AtJMJ15 was found to enhance salt tolerance in Arabidopsis (Shen et al. 2014). AtJMJ18 displayed demethylase activity toward H3K4me3 and H3K4me2 of FLC and accelerated flowering (Yang et al. 2012b).

Fig. 3
figure 3

Phylogenetic and domain analysis of Group-II KDM5A based on full-length sequences. Group-II includes AtJMJ14-16, AtJMJ18/19 and their homologues in vascular plants and can be divided into 3 sub-groups. Subgroup-I and II character JmjC, JmjN and FYRN-FYRC domains while subgroup-III characters only JmjC, JmjN domains. a Phylogeny and domain organization. b Logos analysis of JmjC domain. Fe(II) binding site are showed in red triangle and those in the αKG binding site are showed in black triangle. The trees were constructed with the following settings: tree inference as neighbor-joining; include sites as complete deletion option for total sequences analysis; substitution model: p-distance; and Bootstrap test of 1000 replicates for internal branch reliability

The FYRN (FY-rich domain N-terminal) and FYRC (FY-rich domain C-terminal) domains, which are founded in a variety of chromatin-associated proteins and are particularly common in histone H3K4 methyltransferases (García-Alai et al. 2010), were specific to proteins in Group-II KDM5A; AtJMJ19 homologs were an exception (Fig. 3a). Group-II KDM5A could be divided into three subgroups (Fig. 3). Subgroup-I contained the AtJMJ14/15/18 homologs in higher plants and had highly conserved domain organization. The exceptions were AtJMJ15, which had additional FDH-GDH (formate dehydrogenase-glycerate dehydrogenase) and PMEI (plant invertase/pectin methylesterase inhibitor) domains, AtJMJ18, which had a Knot1 domain (knottins), BrJMJ10829 (Bra039869), which had a PMEI domain, and BrJMJ10815 (Bra021937), which had a PU domain (furin-like repeats). PMEI inhibits pectin methylesterase and invertase activity by forming a non-covalent complex and functions in the regulation of wounding, osmotic stress, senescence, seed development, and plant defense mechanisms (Hong et al. 2010). In this subgroup, the KDM5A proteins of cabbage had the most copies. We found that BrJMJ10829 (Bra039869) and BrJMJ10816 (Bra023136) were syntenic genes of AtJMJ18, indicating a onetime duplication event may have occurred. Additionally, BrJMJ10815 (Bra021937), but not BrJMJ10811 (Bra017281), was found to be a syntenic gene of AtJMJ15, implying that BrJMJ10811 may be a new original gene. Subgroup-II contained AtJMJ16 homologs in flowering plants, and all the genes were duplicated except in the monocots. Two copies of AtJMJ16 were found in cabbage, and BrJMJ10820 (Bra030735) with conserved length and domain organization was syntenic to AtJMJ14, whereas BrJMJ10814 (Bra018612) was not. The BrJMJ10814 sequence was much longer than the other sequences in this subgroup and had additional HELICc (helicase superfamily C-terminal) and DEXDc (DEAD-like helicases superfamily) domains, suggesting that it may exhibit additional helicase activity. Subgroup-III was only found in dicots. The protein sequences in this subgroup were much shorter than the other sequences in Group-II and contained only JmjC and JmjN domains. Cabbage had two syntenic genes of AtJMJ19, BrJMJ10801 (Bra000108) and BrJMJ10804 (Bra005068).

Sequence alignment and logos analysis showed that the JmjC domains were highly conserved in Subgroups-I and II, while in Subgroup-III the binding sites for αKG and Fe(II) were variable. Phe and Lys were replaced by Gln and Arg in the αKG binding site, and two His residues were replaced by Lys and Tyr in the for Fe(II) binding site in the Subgroup-III proteins. PtJMJ910 in Subgroup-III was an exception and was similar to proteins in Subgroups-I and II, that is, it had Phe and Lys in the αKG binding site and two His residues in the Fe(II) binding site (Fig. 3b; Online Resource 3).

Group-III KDM5B

The phylogenetic tree and domain organization showed that KDM5A/5B belonged to a branch, but had different domain organization (Figs. 1, 3, 4). Group-III KDM5B contained one member from all the studied plants except for pine and V. carteri (Fig. 4a; Online Resource 1). It was surprising that Group-II KDM5A included the pine KDMs, while the alga and animal KDMs were absent. Group-III KDM5B contained all the plant and animal KDMs except those of pine and V. carteri. AtJMJ17 was shown to have H3K4me1/2/3 and H3K9me1/2/3 demethylase activity and was reported to be involved in the plant’s defense system against virulent pathogens (Lu et al. 2008; Erickson 2012).

Fig. 4
figure 4

Phylogenetic and domain analysis of Group-III KDM5B based on full-length sequences. Group-III includes AtJMJ17 and their homologues in all organisms except pine and V.carteri. JmjC, JmjN, PHD, and BRIGHT are the special domains. a Phylogeny and domain organization. b Logos analysis of JmjC domain. Fe(II) binding site are showed in red triangle and those in the αKG binding site are showed in black triangle. The trees were constructed with the following settings: tree inference as neighbor-joining; include sites as complete deletion option for total sequences analysis; substitution model: no. of differences; and Bootstrap test of 1000 replicates for internal branch reliability

Unlike the Group-II KDM5A proteins, the Group-III KDM5B proteins contain PHD (plant homeodomain) and BRIGHT domains that are specific to them (Fig. 4a). BRIGHT, also called ARID (AT-rich interaction domain), is a helix-turn-helix motif-based DNA-binding domain that is conserved in a wide variety of species and plays important roles in development, tissue-specific gene expression, and proliferation control (Kim et al. 2004; Patsialou et al. 2005). The PHD finger domain is a C4HC3 zinc-finger-like motif that is found in nuclear proteins where it forms an integral part of the enzymatic core of the HAT (histone acetyltransferases) domain of CBP (CREB binding protein) and is involved in epigenetics and chromatin-mediated transcriptional regulation (Kalkhoven et al. 2002). SmJMJ1603 has an additional RRM (RNA recognition motif) domain that was found to be important for binding a large variety of RNA sequences and proteins (Maris et al. 2005). PpJMJ1504 and DmJMJ401 have an additional HMG17 (high mobility group HMG14 and HMG17) domain that was reported to facilitate transcription by repressing histone activity (Bustin et al. 1995). It should be noted that all these extra domains in these proteins were involved in binding to RNAs or proteins. HsJMJ503 and HsJMJ504 have a CHAD domain, but VcJMJ6402, which is much short, contained only the JmjC domain (Fig. 4a).

The sequences of the JmjC domain in Group-III proteins were highly conserved (Fig. 4b; Online Resource 4). Sequence alignment and logos analysis of the JmjC domain showed that the Group-III proteins had two conserved His residues and Glu in the Fe(II) binding site, and Phe and Lys in the αKG binding site, suggesting that all the proteins in this group may have similar function.

Group-IV JmjC domain-only A

In phylogenetic tree, Groups-IV, V and VI belonged to different branches, but cluster together in large clade (Fig. 1). The JmjC domain-only group included AtJMJ20 and AtJMJ30-32 and their homologs. Although most of them contained only the JmjC domain (Klose et al. 2006; Lu et al. 2008), the AtJMJ20 group belonged to a distant branch that was close to Group-V JMJD6 (Fig. 1), the same results were obtained even when different methods were used to construct the trees, suggesting that the origin of Group-IV JmjC domain-only A was different from that of Group-VI JmjC domain-only B. Group-IV JmjC domain-only A contained one member from all the plants except pine, moss, and green alga. All the proteins in this group only have a JmjC domain, except HsJMJ22, which had an additional Cupin_2 domain with unknown function (Fig. 5a). The HsJMJ522/JMJD4 proteins catalyze carbon 4 (C4) lysyl hydroxylation of the eukaryotic release factor eRF1 (Feng et al., 2014). AtJMJ20 may affect seed germination by removal of repressive histone arginine methylations at GA3ox1/GA3ox2 (Cho et al. 2012).

Fig. 5
figure 5

Phylogenetic and domain analysis of Group-IV JmjC domain-only A based on full-length sequences. Group-IV includes AtJMJ20 and their homologues in vascular plants except pine. a Phylogeny and domain organization. b Logos analysis of JmjC domain. Fe(II) binding site are showed in red triangle and those in the αKG binding site are showed in black triangle. The trees were constructed with the following settings: tree inference as neighbor-joining; include sites as pairwise deletion option for total sequences analysis; substitution model: Poisson model; and Bootstrap test of 1000 replicates for internal branch reliability

In the JmjC domain, the two His residues in the Fe(II) binding site and Lys in the αKG binding site were conserved, but Glu was replaced by Asp in the Fe(II) binding site, and Phe was replaced by Thr in the αKG binding site (Fig. 5b, Online Resource 5). In addition, it was suggested that the JmjC domain-only proteins may be active demethylases, but their substrates were not determined (Lu et al. 2008; Chen et al. 2011).

Group-V JMJD6

Based on the phylogenetic analysis, Group-V JMJD6 was close to Group-IV and contained AtJMJ21/22 homologs, two members in all the plants, except for three in P. patens and one in P. trichocarpa/O. tauri/fruit fly/human, indicating that no obvious gene duplication event had occurred (Figs. 1, 6a). Mouse JMJD6 was reported to show histone arginine demethylase activity of histone H3R2 and H4R3 (Chang et al. 2007). In addition, JMJD6 was found to be unable to remove the methyl group from histone arginine residues but it could hydroxylate the histone H4 tail at lysine residues in a 2-oxoglutarate (2-OG)- and Fe(II)-dependent manner (Han et al. 2012; Wang et al. 2014). It has been suggested that JMJD6 might associate preferentially with RNA/RNA complexes and to a lesser extent with chromatin (Hahn et al. 2010). AtJMJ22 was found to be involved in seed germination dependent on HYB activation (Cho et al. 2012).

Fig. 6
figure 6

Phylogenetic and domain analysis of Group-V JMJD6 based on full-length sequences. Group-V includes AtJMJ21/22 and their homologues in all organisms and can be divided into 2 sub-groups which character JmjC, FBOX and Cupin_2 domains. a Phylogeny and domain organization. b Logos analysis of JmjC domain. Fe(II) binding site are shown in red triangle and those in the αKG binding site are shown in black triangle. The trees were constructed with the following settings: tree inference as neighbor-joining; include sites as pairwise deletion option for total sequences analysis; substitution model: p-distance; and Bootstrap test of 1000 replicates for internal branch reliability

Most Group-V JMJD6 members have additional F-box and cupin_2 domains (Fig. 6a). The F-box domain, named based on its first identification at the N-terminal region of cyclin F, contains about 40–50 amino acids and has been shown to be required for protein–protein interactions involved in many processes (Craig and Tyers 1999; Wang et al. 2004; Jain et al. 2007). Group-V JMJD6 proteins were divided into two subgroups. Among Subgroup-I, OsJMJ711 had an additional Sec7 domain, which is a conserved elongated, all-helical sequence, and is required for proper protein transport through the Golgi (Mossessova et al. 1998; Cox et al. 2004). PpJMJ1506 had a hydrolase domain that is found in bacteria and eukaryotes and is approximately 110 amino acids long. In subgroup-II, monocots and green alga lacked the F-box and cupin_2 domains. SmJMJ1607 had an additional G2F domain (G2 fragment) that contained binding sites for collagen IV and perlecan, and PpJMJ1507 had an AXH (ataxin-1 and HBP1 module) domain, which is a protein–protein and RNA binding motif in ATX1 (ataxin-1) (de Chiara et al. 2003; Chen et al. 2004).

Similar to Group-IV, the JmjC domains of the Group-V proteins had the two conserved His residues and Aspin the Fe(II) binding sites, and Lys in the αKG binding sites, but the Phe/Thr sites varied (Fig. 6b; Online Resource 6).

Group-VI JmjC domain-only B

Group-VI JmjC domain-only B contained AtJMJ30-32 and their homologs, four in O. sativa/B. distachyon/S. moellendorffii, three in B. rapa/P. patens/O. tauri/human, two in P. trichocarpa/fruit fly, and one in P. abies, but none in V. carteri (Figs. 1 and 7a). The functions of this group of proteins are largely unclear. HsJMJ525/JMJD5 was proposed to be involved in embryonic cell proliferation acting as H3K36me2 histone demethylase (Ishimura et al. 2012; Zhu et al. 2014), AtJMJ30/JMJD5 was found to show hydroxylase activity, but no histone demethylation ability (Del Rizzo et al. 2012), and AtJMJ30 was ubiquitously expressed in different tissues and involved in Circadian clocks (Lu et al. 2008; Jones et al. 2010; Jones and Harmer. 2011; Lu et al. 2011a).

Fig. 7
figure 7

Phylogenetic and domain analysis of Group-VI JmjC domain-only B based on full-length sequences. Group-VI includes AtJMJ30-32 and their homologues in all plants and can be divided into 3 sub-groups. Subgroup-I characters JmjC and _2 domains, and Subgroup-II/III character JmjC domains. a Phylogeny and domain organization. b Logos analysis of JmjC domain. JmjC domains of AtJMJ31, OsJMJ713, BdJMJ11314, SmJMJ1602/10/14 are detected by SMART (http://smart.embl-heidelberg.de/) because them cannot be detected by NCBI. Fe(II) binding site are showed in red triangle and those in the αKG binding site are showed in black triangle. The trees were constructed with the following settings: tree inference as neighbor-joining; include sites as pairwise deletion option for total sequences analysis; substitution model: no. of differences; and bootstrap test of 1000 replicates for internal branch reliability

Group-VI was divided into three subgroups (Fig. 7a). Subgroup-I contained AtJMJ30 homologs, one homolog in dicot, Lycophyta, and moss, and two homologs in the monocots and O. tauri. A few of JMJC proteins in Subgroup-I had JmjC and cupin_2 domains. AtJMJ30 had an additional BRO1_Alix_like (protein-interacting Bro1-like domain of mammalian Alix and related domains) domain and BrJMJ10824 (Bra035742) had a TAP_C (TAP C-terminal) domain that was found to be important for binding to FG repeat-containing nuclear pore proteins (Bachi et al. 2000). Subgroup-II contained AtJMJ31 homologs, one homolog in each species, except for three homologs in Lycophyta and none in pine and green alga, and had a highly conserved domain organization. Subgroup-III proteins were found in the higher plants except Lycophyta, and BrJMJ10827 (Bra038271) had an additional PTX (pentraxins) domain that is found in CRP (C-reactive protein, also called PTX1) and SAP (serum amyloid P component), which has been regarded as innate antibodies (Lu et al. 2013a).

The amino acids in the Fe(II) binding sites and Lys in the αKG binding sites in the JmjC domain of this group were similar to those in the JmjC domain of Group-V. Glu was replaced by Asp, and Phe/Thr was replaced by Ser in the αKG binding sites in Subgroup-II HsJMJ525 and DmJMJ408 (Fig. 7b; Online Resource 7).

Group-VII KDM3/JHDM2

Group-VII KDM3/JHDM2 was the largest group and contained JmjC domain proteins that demethylate H3K9me2 and H3K9me1, but not H3K9me3, comprising six members in Arabidopsis (AtJMJ24-29) and their homologs, 10 in B. rapa, nine in P. trichocarpa, seven in P. abies, five in O. sativa/B. distachyon, four in P. patens and human, and one in fruit fly, but none in Lycophyta and green alga (Klose et al. 2006; Lu et al. 2008; Luo et al. 2014). Most of the known functions of KDM3 are attributed to the human homologs. For example, HsJMJ516/JMJD1A may regulate cell proliferation and renewal (Loh et al. 2007; Park et al. 2013), cell migration and invasion (Tee et al. 2014), and sex determination (Kuroki et al. 2013). HsJMJ518/JMJD1C was shown to regulate DNA repair through demethylating MDC1 (mediator of DNA damage checkpoint 1) (Lu and Matunis 2013b; Watanabe et al. 2013). AtJMJ24 was reported to interact with other JmjC and DCL proteins and increase root length, and cotyledon and floral organ size (Audonnet 2014). AtJMJ25/IBM1 was shown to be involved in leaf breadth, floral organ and embryo morphogenesis, pollen grain fertility, and seed reproduction, and was regulated by DDM1 (decrease in DNA methylation1) and KYP (kryptonite); it also activated RDR2 (RNA-dependent RNA polymerase2) and DCL3 (Dicer-like3) through H3K9me2 and DNA methylation (Saze et al. 2008; Fan et al. 2012).

Besides the JmjC domain, the RING (really interesting new gene) domain was the second main domain in this group, and the ZZ domain also was found in some members of Subgroups I–III. The RING domain consists of two zinc fingers with the sequence C-X2-C-X[9–39]-C-X[1–3]-H–X[2–3]-C-X2-C-X[4–48]-C-X2-C and it often functions in ubiquitin ligase activity (Lorick et al. 1999; Joazeiro and Weissman 2000; Smit et al. 2012). The ZZ domain binds two zinc ions and has the Cys-X2-Cys motif found in other zinc finger domains (Ponting et al. 1996; Legge et al. 2004). Group-VII was divided into four subgroups. Subgroup-I contains AtJMJ25/26/29 and their homologs in higher plants. Monocots had four copies of AtJMJ25, but no copies of AtJMJ26/29. The AT-hook has been identified in HMG (high mobility group) DNA-binding proteins from plants (Klosterman and Hadwiger 2002). PtJMJ916/918 had a B-box. Surprisingly, Subgroup-II contained AtJMJ27 and its three homologs in cabbage and seven in pine, but no copies in monocots and Lycophyta. Subgroup-III contained AtJMJ24 and its homologs in flowering plant, and Subgroup-IV contained only AtJMJ28 and its homologs in dicots. In many members of Subgroup-III/IV the JmjC domain was not detected by SMART (Fig. 8a). The JmjC domains proteins of animals with only JmjC domain belonged to the same branch on the phylogenetic tree (Fig. 8a). In cabbage, the AtJMJ28/29 homologs were diploid and the AtJMJ27 homologs were triploid (Fig. 8). BrJMJ10808 (Bra013461)/BrJMJ10828 (Bra038775) were syntenic genes of AtJMJ28/29, andBrJMJ10802 (Bra000935) and BrJMJ10826 (Bra037394) were syntenic genes of AtJMJ27, but BrJMJ10803 (Bra000936) was not.

Fig. 8
figure 8

Phylogenetic and domain analysis of Group-VII KDM3/JHDM2 based on full-length sequences. Group-VII includes AtJMJ24-29 and their homologues in seed plants and moss and can be divided into 4 sub-groups. JmjC and RING are the special domains. a Phylogeny and domain organization. b Logos analysis of JmjC domain. JmjC domains of PtJMJ913/914, AtJMJ28, BrJMJ10809/21/28, OsJMJ716, and BdJMJ11320 are detected by SMART (http://smart.embl-heidelberg.de/) because they cannot be detected by NCBI. Fe(II) binding site are showed in red triangle and those in the αKG binding site are showed in black triangle. The trees were constructed with the following settings: tree inference as neighbor-joining; include sites as partial deletion option for total sequences analysis; substitution model: poisson model; and bootstrap test of 1000 replicates for internal branch reliability

Sequence alignments and logos analysis of the JmjC domain showed that Thr and Lys in the αKG binding sites, and the two His residues and Asp in the Fe(II) binding sites were highly conserved in Subgroups-I/II. Lys in the αKG binding sites was conserved in Subgroups-III/IV, but the Phe and Thr varied and one of the His in the Fe(II) binding site was replaced by either Tyr of Phe (Fig. 8b; Online Resource 8).

Discussion

JmjC domain proteins constitute the largest histone demethylase family that contributes to histone methylation levels in vivo. Here, we screened 169 JmjC domain proteins from the whole genomes of green plants and 36 from animals (Fig. 1; Online Resource 1). The phylogenetic analysis showed that JmjC domain proteins were an ancient and conserved family that could be divided into seven groups, which displayed different histone demethylases activity in the green lineage. In all seven groups, the Fe(II) and αKG binding sites within the JmjC domain were highly conserved and when replacements occurred, most were by amino acid residues with similar properties.

Histone methylation mainly occurs in the Lys residues of histone H3 (K4, K9, K27, K36, K79) and H4 (K20), and are carried out by KMTs, except H3K79,which is methylated by the DOT1L (DOT1-Like) methyltransferase (Tschiersch et al. 1994; Jones et al. 2008; Nguyen and Zhang 2011). Previous study reported 37 KMTs in Arabidopsis,49 in cabbage, and 36 in rice and these proteins were involved in the methylation of H3K4, H3K9, H3K27, and H3K36 (Huang et al. 2011). Here, we found only 21 JmjC histone demethylases in Arabidopsis, 20 in rice, and 29 in cabbage, and these proteins targeted H3K4, H3K9, H3K27, and H3K36 (Fig. 1; Online Resource 1). The JmjC domain proteins have been divided previously into eight groups: five found in both plants and animals (KDM5/JARID1, KDM4/JHDM3, KDM3/JHDM2, JMJD6, and JmjC domain-only); and three found only in animals (KDM6/JMJD3, KDM2/JHDM1, and PHF) (Lu et al. 2008). We also did not find homologous proteins of KDM6/JMJD3 (HsJMJ14/15, DmJMJ405), KDM2/JHDM1(HsJMJ12/13, DmJMJ407) and PHF (HsJMJ10/11/24, DmJMJ408/409) in plants. Moreover, the animal JARID2 proteins (HsJMJ505, DmJMJ402) were distant from the homologous plant proteins (Fig. 1). KDM6/JMJD3 was reported to be involved in H3K27me3 (Pasini et al. 2010; Chen et al. 2012), and PHF8 was found to demethylate H4K20me1 in zebrafish brain and craniofacial development (Qi et al. 2010). KDM2/JHDM1 was found to be involved in H3K36me2 and JARID2 was reported to affect the H3K27me3 levels of target genes acting as an essential component of PcG (Polycomb group) complex (Tsukada et al. 2006; Pasini et al. 2010). We also noted that animal KDM2/JHDM1 and PHF were much closer to Group-VI JmjC domain-only B (Fig. 1). Generally, the total number of JmjC domain proteins increased with plant evolution; lower plants such as green alga had fewer JmjC domain proteins than the higher plants in our study (Fig. 1; Online Resource 1). Only six KDMs (possibly involved in K3K4, H3K9, and H3K36) were found in O. tauri. Overall the numbers of JmjC domain proteins were much less than the numbers of KMTs found in plants, suggesting that JmjC domain proteins might share a variety of demethylation roles.

Although the numbers of JmjC domain proteins were relatively small, some genes seemed to have been duplicated. Arabidopsis and cabbage both belong to the family Brassicaceae, and whole genome duplication is known to have occurred after cabbage and Arabidopsis divergence, making these two plants ideal material for gene duplication studies (Wang et al. 2011). Many KMT genes were found to have duplicated, especially Group-KMT1 genes with H3K9me activity and Group-KMT2/KMT7 genes with H3K4me activity (Huang et al. 2011). Here, the cabbage KDMs were found to have duplicated in Group-II, which was involved in the demethylation of H3K4me1/2/3, and in Group-VII, which was involved in the demethylation of H3K9me1/2/3, suggesting a distinct functional division. Most of the duplicated genes, except BrJMJ10803 and BrJMJ10811, shared synteny with their Arabidopsis homologs, suggesting that they were derived from chromosome/genome segment duplications. Some duplication proteins display similar domain organization, while others increase or lose domain(s).

Based on the proteins phylogenetic tree,  domain organization, and their relationship to known functions proteins from model plant Arabidopsis and animal human/fruit fly, the substrate specifies of the plant JmjC domain proteins was inferred (Table 1): Group-I KDM4/JHDM3, demethylation of H3K9me2/3, H3K27me3 and H3K36me1/2/3; Group-II KDM5A, demethylation of H3K4me1/2/3; Group-III KDM5B, demethylation of H3K4me1/2/3 and H3K9me1/2/3; Group-IV JmjC domain-only A and Group-V JmjC domain-only B, hydroxylation and demethylation; Group-VI JMJD6, demethylation of H3R2 and H4R3 and hydroxylation of H4; and Group-VII KDM3/JHDM2, demethylation of H3K9me1/2/3. Based on the Arabidopsis and human annotations, we know that the Group-I KDM4/JHDM3 and Group-II KDM5A proteins might function in flowering time control, Group-V JMJD6 might play a role in seed germination, and Group-VII KDM3/JHDM2 might be involved in leaf, flower, and embryo development, and pollen grain fertility and seed reproduction (Noh et al. 2004; Saze et al. 2008; Jeong et al. 2009; Searle et al. 2010; Yang et al. 2010, Yang et al. 2012a, b; Lu et al. 2011a; Fan et al. 2012; Luo et al. 2014). It is surprising that no JmjC domain protein associated with H3K4me1/2/3 demethylation had been found in V. carteri.

The JmjC domain is the core of JmjC domain proteins, and iron Fe(II) and αKG act as cofactors (Tsukada et al. 2006; Rotili and Mai. 2011). His188, Glu190 and His276 are important for Fe(II) binding and Thr185/Phe185 and Lys206 are required for αKG binding (Chen et al. 2006; Klose et al. 2006; Lu et al. 2008). The sequence alignments of the JmjC domains showed that two His residues in the Fe(II) binding sites and Lys in the αKG binding site were highly conserved in all the JmjC domain proteins, whereas Glu was often replaced by Asp in the Fe(II) binding sites (Online Resource 5–8), and Thr/Phe was often replaced by Ser/Ala/Lys (Online Resource 6–8). Thus, the replacements often occurred among the same types of amino acid residues; Glu and Asp have similar acidic groups, and Thr and Ser each have a hydroxyl group. However, different replacements did occur; for example, one of the His in the Fe(II) binding site of AtJMJ19 and its homologs in Group-II was replaced by Tyr, and Thr/Phe in the αKG binding site was replaced by Gln (Online Resource 3). The AtJMJ24/28 homologs in Group-VII also displayed non-conservative replacements; in particular, Glu and His were replaced by Lys and Phe in the Fe(II) binding sites. These non-conservative replacements occurred mainly in flowering plant JmjC domain proteins with as yet unidentified functions (Online Resource 8).

JmjC domain proteins involved in most of plant development events function by regulating the activity of target genes by balancing the methylation status of the associated histones. Our results show that JmjC domain proteins are an ancient and conserved family in which the domain organization and Fe(II) and αKG binding sites in the JmjC domains have been modified in some species. We have shown that the JmjC domain proteins can be divided into distinct groups that target specific substrates. The results will be useful to further examine the functional conservation and divergence of JmjC domain proteins.