Keywords

1 Introduction

The history of epigenetics demonstrates the fast progress and the dramatic increase of knowledge gained in the last 70 years of this young discipline. This is illustrated by the definition of epigenetics which evolved from a science reflecting phenomena that cannot be explained to a more precise field known as the study of mitotically and/or meiotically heritable changes in gene regulation that are not due or cannot be explained by changes in DNA sequence. These changes are represented by biochemical modifications of DNA, post-translational modifications of histones (PTMs), the proteins that coat DNA, nucleosomes positioning, chromatin remodeling (Fig. 1a) and non-coding RNA-associated gene silencing (Waddington 1942).

Fig. 1
figure 1

Main targets of epigenetics regulation and model diatoms used for epigenetic studies. (a) A simplified scheme of epigenetic factors associated with activation (e.g., open chromatin, active histone marks) or repression (e.g., closed chromatin, DNA methylation) of gene transcription. The scheme was drawn using BioRender tool. (b) Photos of diatom species used as models of epigenetic studies, P. tricornutum (left), a scanning elctron microscopy image of F. cylindrus (top middle, courtesy of N. Joli), a chain of T. pseudonana (bottom middle), Haslea ostreraria (top right) and Cyclotella cryptica (bottom right, courtesy of NCMA)

The first reference to epigenetics goes back to 1942 by Conrad Waddington, an embryologist who referred to epigenetics as the whole complex of developmental processes that take place between genotype and phenotype (Waddington 1957). His research led to the famous model of epigenetic landscape illustrating the different fates or developmental pathways a cell might take during differentiation with branches in the landscape structured by underlying genes. This concept quickly evolved in modern science, which extended epigenetics studies to several model organisms including bacteria, mammals, plants, insects, fungi and microalgae. Nowadays, the concept of epigenetics also includes changes that are not necessarily inherited in gene regulation, without modifying the underlying DNA sequence.

Epigenetics permitted different discoveries with the diversification of model organisms including the yeast species Saccharomyces cerevisiae and Schizosaccharomyces pombe used to elucidate chromatin structure and telomere silencing (Huang 2002). Long before, the genetic fly model Drosophila melanogaster was used to study the effect of the position of genes on the phenotype (position effect variegation), which led to the discovery of heterochromatin, chromatin remodeling and histone modifying proteins (Elgin and Reuter 2013). In plants, Arabidopsis thaliana emerged naturally as a model for epigenetic studies revealing widely conserved mechanisms as well as some specificities compared to animals. As an example, the seasonal regulation of the flowering locus C (FLC) involved in the vernalization process is associated with dynamics in deposition and removal of histone PTMs (Hepworth and Dean 2015). In mammals, the mouse is so far the greatest model to learn about epigenetic regulation mechanisms in humans in particular in stem cell research and environmental studies. It was demonstrated that histone PTMs are implicated in the transcriptional regulation of the homeobox-containing ‘Hox’ genes during the establishment of the antero-posterior axis (Deschamps and van Nes 2005). More recently, three unicellular species from the red, green and brown lineages of microalgae contributed to advance our fundamental knowledge in epigenetic research, Cyanidioschyzon merolae, Chlamydomonas reinhardtii and the diatom Phaeodactylum tricornutum respectively. Using these species is an interesting opportunity to address the questions of epigenetic regulation mechanisms in an evolutionary context. Sequencing of the P. tricornutum genome (Bowler et al. 2008) revealed a conserved epigenetic machinery including writers, erasers and readers of its different components (Rastogi et al. 2015; Tirichine et al. 2017) which were investigated in recent studies (Veluchamy et al. 2013a, 2015; Zhao et al. 2019). Few other diatoms were used to address the role of epigenetics in genome regulation (Fig. 1b) and evolution of epigenetic factors, such as DNA methylation which was investigated in Thalassiosira pseudonana, Fragilariopsis cylindrus (Tirichine et al. 2014; Huff and Zilberman 2014) (Joli et al., unpublished), Cyclotella cryptica (Traller et al. 2016) and Haslea ostreraria (Jean Luc Mouget, personal communication). In this chapter, we summarize our current knowledge about DNA methylation, PTMs of histones and non-coding RNAs in diatoms with a particular focus on in silico studies drawing a snapshot of the progress made at a time where epigenetics is coming to the forefront of diatoms biology.

2 DNA Methylation

DNA methylation is a major epigenetic mark in eukaryotes and prokaryotes. In plants and animals, methylation at the fifth carbon of cytosines, 5-methylcytosine (5mC) (Fig. 2a), is an epigenetic mark involved in the repression of transposable elements in many species and the establishment and maintenance of genomic imprinting (Ideraabdullah et al. 2008; Kohler et al. 2012; Galagan and Selker 2004). 5mC patterns are very diverse within the eukaryotic tree of life which reflects a fine-tuning of lineage-specific regulatory networks (de Mendoza et al. 2019; Schmitz et al. 2019). Hence, 5mC can be found at cytosines in different contexts. In vertebrates and invertebrates, DNA methylation mainly occurs at cytosines found at CG dinucleotides, while in fungi and plants, methylation at non-CG sites is more widely observed (Zemach et al. 2010; Feng et al. 2010; Stroud et al. 2014).

Fig. 2
figure 2

DNA methylation mediated silencing. (a) Schematic representation of the DNA methyltransferase activity of 5mC DNA methyltransferases. (b) Structure of representative enzymes putatively involved in DNA methylation and de-methylation as well as DNA repair proteins

3 5mC Patterns and Functions in Diatoms

To date, 5mC have been reported in four diatoms, namely P. tricornutum, T. pseudonana and F. cylindrus (Veluchamy et al. 2013a; Huff and Zilberman 2014) as well as Cyclotella cryptica (Traller et al. 2016). In diatoms, 5mC is mainly found in a CG context over repeats and transposable elements usually (but not exclusively) concentrated in telomeric regions (Veluchamy et al. 2013a; Huff and Zilberman 2014). Although scarce, non-CG methylation is also detected. This is opposite to what is observed in the closely related multicellular brown alga Saccharina japonica, in which genes are mainly marked by non-CG methylations and transposable elements (TEs) are devoid of DNA methylation (Fan et al. 2020). In addition, in P. tricornutum, T. pseudonana and F. cylindrus, total levels of DNA methylation are low and range from 8% to as low as 1% of cytosines in the CG context compared to other species such as rice where 18% of total Cs are methylated (Huff and Zilberman 2014). Protein coding gene methylation is also sparsely observed. Among all diatoms with known methylome, C. cryptica shows the highest level of DNA methylation, which correlates with a higher amount of TEs found in its genome. In addition, DNA methylated regions can span up to 30 kb, a pattern not found in the other diatoms. Furthermore, no CG rich (CpG islands) and no promoter methylation patterns have been clearly described so far (Veluchamy et al. 2013a; De Riso et al. 2009). Overall, diatoms methylation pattern strongly contrasts with the patterns observed in animals in which nearly all CG dinucleotides are heavily methylated including within exons (Lister et al. 2009). This also contrasts with the pattern observed in the dinoflagellates Symbiodinium minutum and S. kawaguttii with ‘hypermethylated’ genomes exceeding 70% of CG methylation over genes and TEs (de Mendoza et al. 2018).

In diatoms, methylated TEs often have low expression (Veluchamy et al. 2013a; Huff and Zilberman 2014; Traller et al. 2016). This is very consistent with the repressive role of DNA methylation in other eukaryotes and further traces back 5mC mediated control of TE expression to the last eukaryotic common ancestor. In P. tricornutum, while DNA methylation over TEs correlates with low expression (Veluchamy et al. 2013a; Rastogi et al. 2018), its repressive role on genes depends on its pattern. Extensive methylation of genes correlates with low expression, while partial methylation correlates with moderate to high levels of expression (Veluchamy et al. 2013a). Nitrogen depletion triggers the concomitant loss of DNA methylation and over-expression of the transposable element ‘Blackbeard’ in P. tricornutum, suggesting that 5mC negatively regulates TEs expression under specific environmental triggers (Veluchamy et al. 2013a; Maumus et al. 2009). However, these observations are not consistently reported. In C. cryptica, 5mC patterns in any context are stable in response to silica depletion, including at transposable elements (Traller et al. 2016). Hence, within diatoms, DNA methylation, while likely involved in TE repression, might have evolved lineage-specific dynamics and functions.

4 Diatoms Have a Peculiar Set of DNA Methyltransferases

Eukaryotes possess diverse mechanisms to set, propagate and remove methylated cytosines. The deposition of 5mC is performed by DNA methyltransferases (DNMTs). Based on similarity with prokaryotic enzymes involved in the restriction-methylation system (Bestor 1990), six main eukaryotic DNMT families have been described: the DNMT1, DNMT2, DNMT3, DNMT4, DNMT5 and DNMT6 (Huff and Zilberman 2014; Ponger and Li 2005). All DNMTs contain a conserved protein domain with S-adenosyl-L-methionine binding and methyltransferase activity (PF00145 domain) referred to as DNMT domain. A summary of the DNMTs found in the three model diatoms as well as their associated protein domains are listed in Table 1 and Fig. 2.

Table 1 Putative writers and erasers of C5 DNA methylation in three model diatoms

Diatoms encode a unique set of DNMT2, DNMT3, DNMT4, DNMT5 and DNMT6 enzymes. The DNMT2 enzymes are responsible of RNA methylation at diverse cytosine positions that occur during the maturation of t-RNAs (Jeltsch et al. 2017). DNMT2 enzymes are highly conserved DNA methyltransferases that evolved RNA modifying functions and are found in animals, plants and micro-algae (Huff and Zilberman 2014). The DNMT3 proteins are widespread de novo DNA methylases in eukaryotes (Huff and Zilberman 2014). Typical DNMT3 proteins in metazoans therefore include chromatin domains that connect the DNA methylation pathways and the histone post-translational code deposited during development and meiosis (Laisne et al. 2018). Diatoms DNMT3, however, are short proteins lacking all known chromatin-associated domains (Fig. 2).

The DNMT4 family is a weakly supported DNMT1-related family, typified by fungi RIP deficient/DNA methyltransferase activity (RID/DMTA) and MASC1 (a member of the fungal-specific DMT-like family) proteins respectively involved in the Repeat Induced Point mutation (RIP) and methylation induced premeiotically (MIP) processes, which are two DNA methylation-dependent genomic defense mechanisms against TEs (Galagan and Selker 2004; Amselem et al. 2015; Gladyshev 2017). Earlier reports showed that DNMT4 is only conserved in fungi and diatoms which probably highlights a convergent evolutionary history of this gene family (Huff and Zilberman 2014; Ponger and Li 2005). In diatoms, DNMT4 is composed of a unique DNMT domain as no chromatin-associated protein features are found with this domain (Fig. 2). The role of DNMT4 in diatoms is unclear as it is unknown whether any RIP- or MIP-related process occurs in these species. Diatoms notoriously lack other DNMT1-related proteins that are the major 5mC maintenance enzyme in metazoans and plants. However, diatoms possess DNMT5 which was previously shown to maintain DNA methylation in a CG context in the parasitic yeast Cryptococcus neoformans (Huff and Zilberman 2014). This enzymes is likely responsible for CG methylation in other fungi, green algae, haptophytes and in the stramenopile Aureoccocus anophagepherens (Huff and Zilberman 2014; Bewick et al. 2019). DNMT5 enzymes possess a long C-terminal region containing asp-glu-ala-asp box (DEADx) and Helicase domains (Fig. 2). This domain is related to Sucrose Non-Fermentable (SNF) chromatin remodeler with ATPase activity that is required for the DNA methylation function of the enzyme (Dumesic et al. 2020). Within diatoms, DNMT5 proteins are divergent. The DNMT domain of the DNMT5 enzyme of Thalassiosira pseudonana is shorter than the DNMT5 protein found in P. tricornutum. In addition its SNF-related domain lacks a RING finger domain. Whether these divergences lead to differences in the establishment or maintenance of the epigenetic landscape in both diatoms is unknown.

The DNMT6 family has been first described in the parasitic euglenozoa Trypanosoma brucei and Leishmania major, in the green alga Micromonas pusilla and dinoflagellates (Huff and Zilberman 2014; de Mendoza et al. 2018; Ponger and Li 2005). In Leishmania major, DNMT6 does not seem required for either de novo or maintenance of 5mC (Cuypers et al. 2020). In diatoms, DNMT6, whose function is unknown, is only composed of a highly conserved methyltransferase domain with no chromatin domains as observed for other diatoms DNMT3, DNMT4 and DNMT2 enzymes.

Our current understanding of the proteins involved in the regulation of DNA methylation in diatoms is in progress. Our investigation of the diversity of DNMTs found in unicellular eukaryotes of the Marine Microbial Eukaryote Transcriptome Project (MMETSP) data base (Keeling et al. 2014) using complementary in silico approaches and functional studies (Hoguin et al., unpublished) indicates that DNMT5 is an unappreciated diversified gene family in marine micro-eukaryotes. This is the only DNMT with chromatin-associated domains in diatoms. In addition, in P. tricornutum, DNMT5 knock-out associates with a loss of CG methylation and a transcriptional activation of otherwise silenced TEs revealing for the first time the mechanisms controlling TE expression in diatoms (Hoguin et al., unpublished). More questions nonetheless remain regarding the maintenance and establishment of 5mC in diatoms. Since no de novo DNA methylation activity has been found yet in diatoms, we may indeed ask whether diatom DNA methylation patterns are rather the results of strong maintenance activity as suggested in some fungal lineages. As mentioned, previous and current studies suggest that P. tricornutum methylome is responsive to environmental cues. It is therefore probable that DNA methylation evolved a condition-specific regulatory role in diatoms and hence might translate environmental changes into stable epigenetic inheritance of genes and or transposons regulation.

5 What Are the De-methylation Pathways in Diatoms?

DNA de-methylation machinery is not highly conserved in diatoms. There are no Ten-eleven translocation methylcytosine dioxygenase (TET) enzymes that are known DNA demethylases in animals (Choi et al. 2002; Agius et al. 2006; Gehring et al. 2006; Wu and Zhang 2017). Although with low similarity, two putative DNA demethylases, Phatr3_J46865, and Phatr3_J12645 with Endonuclease IIIc (ENDO3c) InterPro predicted domain were found in P. tricornutum. In F. cylindrus and T. pseudonana, orthologues of both Reactive Oxygen Species (ROS) and DEMETER (DME) proteins are detected by reciprocal BLAST analysis. Both enzymes are ENDO3c domain containing proteins but do not have additional domains (Fig. 2). It is worth noting that ENDO3c domains are also associated with a wide range of evolutionary diverse DNA repair proteins (Kanchan et al. 2015) and the presence of this domain alone does not confer 5mC demethylation activity.

The alpha-ketoglutarate-dependent hydroxylase (ALKBH) enzymes are diverse proteins known to regulate adenine methylations in vivo in mouse and Caenorhabditis elegans (Greer et al. 2015; Wu et al. 2016) and to produce oxidized 5-methylcytosines derivate in vitro (Bian et al. 2019), which can eventually lead to active DNA demethylation. ALKBH enzymes are also known to be involved in DNA repair of methylated DNA templates and they can modulate RNA methylation (Fu et al. 2010; Zdzalik et al. 2014; Iyer et al. 2016). BLAST analysis in diatoms revealed several putative ALKBH orthologues (Table 1). They all contain an alpha-ketoglutarate-dependent dioxygenase domain but lack RNA/DNA binding domains. The ALKBH8 orthologue in P. tricornutum possesses an S-adenosyl-methionine binding domain (SAM) highlighting a potential RNA modifying activity. It is important to note that the current phylogenetic assignment of the putative diatoms ALKBH enzymes must be further investigated. Nonetheless, these enzymes are potential new actors of the epigenetic regulation in diatoms.

6 Post-Translational Modifications of Histones and Their Enzymes in Diatoms

Histones are subject to a variety of post-translational modifications (PTMs) that alter gene expression and chromatin structure. P. tricornutum genome sequencing revealed a long list of histone modifying enzymes which were described previously (Rastogi et al. 2015; Veluchamy et al. 2015; Tirichine et al. 2014). Here we update the list of genes with predicted function in histone modifications in few diatom species (Table 2). Since the identification of PTMs using mass spectrometry in P. tricornutum, an epigenomic map of several histone marks known to be active or repressive was established using ChiP-Seq. Combined with previously published genome-wide DNA methylation data (Veluchamy et al. 2013b), comprehensive and combinatorial analyses revealed some conserved and specific epigenetic features in P. tricornutum extending the existence of the epigenetic code to Stramenopiles. One of the important findings is the co-occurrence of repressive histone marks and DNA methylation over genes and transposable elements (Veluchamy et al. 2015). These co-occurrence patterns define combinations of epigenetic marks unique to diatoms, suggesting a cooperation in repression and/or an interdependent recruitment mechanisms. This chapter section provides a general overview of predicted histone modifiers based on four fully sequenced diatom genomes. It is important to keep in mind that histone modifications are usually deposited by protein complexes which need to be taken into consideration in functional studies of these enzymes and the histone code in diatoms.

Table 2 Histone acetyltransferases and deacetylases in diatoms

7 Histone Acetyltransferases and Deacetylases

Acetylation of ε-amino group in lysines leads to activation of transcription. This process is carried out by a group of proteins known as histone lysine acetyltransferase (HATs or KATs), which can be divided into 5 families: (1) Gcn5-related acetyltransferase (GNATs) family; (2) MYST family which includes MOZ-, Ybf2-, Sas2- and Tip60-related proteins; (3) p300/CBP family; (4) general transcription factor HATs including TFIIIC90 and Taf1, and (5) the steroid receptor co-activators like SRC1, ACTR and CLOCK (Carrozza et al. 2003). Table 2 summarizes acetyl transferase families in diatoms except for steroid receptor co-activator family, since no homologs have been found. In GNATs family, three subgroups of KATs were found and listed in Table 2: KAT1, KAT2A/B and KAT9. Among those, KAT1 is the simplest which only contains a histone acetyltransferase HAT1 type domain. Interestingly, except the listed KAT1 homologs, there are more GNAT domain containing acetyltransferases in diatom genomes. Taking P. tricornutum as an example, there are around 48 genes with GNAT domain, but their function is unknown. Among them there are few unusual genes with another domain revealing a combination of protein domains, which have never been reported before. Such examples include Phatr3_J47498 which has a histidine phosphatase domain with GNATs, Phatr3_J46516 possesses two possible tRNA binding domains similar to bacterial acetyltransferase TmcA suggesting that these acetyltransferases might target non-histone proteins like ATAT1, α-tubulin K40 acetyltransferase (Akella et al. 2010; Shida et al. 2010). The expanded KAT1 subgroup of GNAT family in diatoms requires further investigations in future studies.

MYST domain containing proteins is another large family of KATs. In the yeast model species S. cerevisiae, only three genes were reported: Esa1, Sas2 and Sas3 (Osada et al. 2001). The common feature of MYST family acetyltransferase is a MYST-type histone acetyltransferase domain with a chromodomain shared in yeast (Esa1), humans (Tip60) and Drosophila (MOF). Similar domain features were also found in diatoms MYSTs, where an RNA binding activity-knot of a chromodomain (PF11717) can be found at the N-terminal. There are only three gene homologs that belong to MYST family in each of the model diatom species investigated (Table 2). In P. tricornutum, Phatr3_J44463 has a MORF-like acetyltransferase domain considered as a homolog of MORF which is responsible of H3K14 and H4K5/8/12/15 acetylation in vitro and H3K9 (Mishima et al. 2011; Kitabayashi et al. 2001) and H3K23 acetylation in vivo (Klein et al. 2019).

CBP/p300 family was initially identified in mammals, and they are a unique acetyltransferases group without any sequence similarity to GNATs (Goodman and Smolik 2000). In yeast there are no orthologs of human CBP and p300. However, a protein called Rtt109, which has a related 3-D protein structure of CBP/p300, was identified (Wang et al. 2008). It is a fungal-specific gene with no orthologs in uni- and multicellular organisms. Here we found fungal-like Rtt109 acetyltransferases in four diatom species (Table 2), three paralogs with similar domain structure in each species, suggesting an ancient origin of CBP/p300 family.

Histone lysine acetylation is a reversible process. Acetylated lysine can be removed by histone deacetylases (HDACs). Deacetylation of histones induces chromatin compaction leading to transcriptional repression. It is a constant balance between the antagonistic action of histone acetylases and deacytlases that contribute to transcriptional regulation of genes. Both enzymes were shown to have an important role in development and diseases (Haberland et al. 2009). Based on protein 3-D structure and domain feature, HDACs can be grouped into four classes (Seto and Yoshida 2014). In T. pseudonana, 7 genes were found in Class I, II and IV. In F. cylindrus and Pseudo-nitzschia multiseries, 11 and 8 homologs were identified respectively as histone deacetylase genes (Table 2). In P. tricornutum, 13 homologs were identified as histone deacetylase similar to HDAC1-11 protein sequences from human and yeast Hos1-3, Rpd3 and Hda1 proteins. Of note, there are several deacetylase domain containing genes in P. tricornutum that do not fall into any of the classes suggesting a complex and diversified deacetylation mechanisms in diatoms.

8 Histone Methyltransferases and Demethylases

Methylation and demethylation of histones activates or represses genes depending on the amino acid that is methylated and how many methyl groups are attached to the residue. This activation or repression acts by loosening the attraction between histone tails and DNA allowing the transcriptional machinery and other regulatory proteins to access DNA or by compacting chromatin restricting the access to DNA respectively. Histone methylation is considered more stable than other modifications such as phosphorylation and acetylation, is involved in long-term maintenance of the expression status of regions of the genome and has been shown to play a role in virtually all biological processes (Greer et al. 2015).

Histone methyltransferases (HMTs) are one of the most well-studied histone modifiers. Unlike the broad range targeting strategy of HATs, HMTs are responsible for the methylation of specific residues. Almost all of the HMTs contain a Su(var)3-9, Enhancer-of-zeste and Trithorax (SET) domain except for DOT1 family. The DOT1 family is not structurally related to SET-domain proteins, but their members can methylate K79 of histone H3 (Feng et al. 2002; Ng et al. 2002). Interestingly, Dot1 homolog were detected in P. tricornutum, F. cylindrus and P. multiseries, three pennate diatom species, but not in T. pseudonana (Table 3), a centric diatom. However, methylation of H3K79 was reported in T. pseudonana (Rastogi et al. 2015), it would be appealing to identify the putative methyltransferase of H3K79 in this diatom. SET-domain containing superfamily is a big group of HMTs that can be divided into seven families (Dillon et al. 2005). However three families were missing in diatom SET containing HMTs, namely SUV39, RIZ and SUV1–20 families (Table 3).

Table 3 Histone methyltransferases and demethylases in diatoms (KMTs)

SUV39 family proteins are the most well-characterized HMTs. SUV39H1 was the first identified lysine methyltransferase which methylates lysine 9 of histone H3 (Tamaru and Selker 2001). In yeast model species, SUV39 is not found, but a homolog named Clr4 was identified in S. pombe (Ivanova et al. 1998). Although H3K9me2 and me3 modifications were both found in P. tricornutum, no homologs of SUV39H1 in any of the four diatom species discussed here has been identified, suggesting the existence of a diatom-specific H3K9 methyltransferase. Another probability is that H3K9 methylation is deposited by other SET-domain HMTs such as enhancer of zeste which was shown recently to methylate lysine 9 of histone H3 in Paramecium (Frapporti et al. 2019). SUV1-20 family is also missing in diatom species and yeast, compared to their diversity in humans suggesting that these two families might have evolved with the emergence of multicellular species.

SET1 and SET2 families are two similar groups of proteins which both possess SET and Post-SET domains with sometimes another domain, a Pre-SET found in the SET2 family (Dillon et al. 2005). Although most of SET containing HMTs are similar to yeast Set1 or Set2, there are some unique diatom HMTs which show more similarity to human MLL1. For instance P. tricornutum Phatr3_J6915 and P. multiseries 0078730 are homologs of human MLL1 with extra Bromo and PHD-finger domains which yeast Set1 does not have. P. tricornutum has homologs of not only human SET2 family genes such as NSD1-3 (Phatr3_J15937) but also yeast Set2 (Phatr3_6903). The redundancy of SET family in humans is believed to be related to multiple targets and functions of MLL/COMPASS complexes, but diatoms are unicellular species and the redundancy of SET-containing HMT is not clear. More genes with SET domain are found in diatoms but are not listed in Table 3 because of the lack of clarity on how to classify them with single SET domain. A SET domain containing enzyme is an enhancer of zeste E(z), which is known to methylate lysine 27 of histone H3. E(z) is the only methyltransferase characterized so far in diatoms, where it was shown to deposit H3K27me2/me3 in P. tricornutum (Zhao et al., doi: https://doi.org/10.1101/2019.12.26.888800).

Histone demethylases (HDMs) are enzymes responsible for the removal of methyl predominantly from lysine and arginine residues. HDMs can be categorized into two families with six classes, HDM1, HDM2, HDM4D, HDM5A/B, HDM6A and 6B. Except HDM1 which has an amine oxidase domain, the rest of the classes possess Jumonji C (JmjC) domain containing iron- and α-ketoglutarate (2OG)-dependent oxygenases (Klose et al. 2006). Each HDMs is a site-specific histone demethylase, including HDM2A which demethylates H3K36me1/2 (Tsukada et al. 2006) and HDM1 involved with demethylation of H3K4me1/2 (Rudolph et al. 2007). In S. cerevisiae there are only JmjC family histone demethylases, Jhd1, Jhd2 and Rhd2. However, in diatoms, additional domains were found including amine oxidase, multiple TRP repeat or SET domains suggesting that some of the diatom HDMs might have dual functions and/or specific recognition and demethylation mechanism.

9 Non-coding RNAs and and the RNAi Machinery Components

A considerable portion of eukaryotic genomes can be transcribed to RNAs with no coding potential. According to their length, they can be classified into small and long non-coding RNAs (lncRNAs). They were shown to be involved in silencing, house-keeping functions, cell differentiation, development and stress response.

9.1 Small Non-coding RNAs

Different epigenetic mechanisms have evolved in eukaryotes to silence the expression of genes and mobility of transposable elements (TEs). They all require the cleavage of input double strand RNA into small RNAs (micro RNAs (miRNA) and small interfering RNAs (siRNA)) with a size between 19 to 31 nt in length) by an enzyme called dicer. The small RNAs are then bound by Argonaute proteins which are part of the RNA Induced Silencing Complex (RISC) with RNA-dependent RNA polymerases (RDRs) (Castel and Martienssen 2013). RISC uses the small RNAs as guides for sequence specific gene and TEs silencing via translational repression, mRNA degradation and heterochromatin formation by recruitment of histone and/or DNA methyltransferases to regulatory sequences of the target genes (Holoch and Moazed 2015). RNA mediated recruitment of DNA methyltransferase for silencing is known as RNA directed DNA methylation (RdDM), widely studied in Arabidopsis thaliana. In plants, canonical RdDM comprises 2 steps (1) biogenesis of 24 nt siRNAs mediated by Pol IV, the RNA-dependent RNA polymerase 2 (RDR2) and the dicer endonuclease 3 (DCL3) and (2) de novo methylation involving PolV, AGO4 and de novo dnmts (Law and Jacobsen 2010). Observation of systematic presence of siRNA over DNA-methylated TEs (Tirichine et al. 2017; Rogato et al. 2014) (Fig. 3) leading to their silencing in diatoms suggests that RdDM is not only restricted to plants and few other species but seems to have an evolutionary deep origin. RNA-mediated silencing was reported to occur in diatoms including both model species P. tricornutum (De Riso et al. 2009; Sakaguchi et al. 2011) and T. pseudonana (Shrestha and Hildebrand 2015). Small RNAs were characterized in both species as well as the polar diatom F. cylindrus (Rogato et al. 2014; Lopez-Gomollon et al. 2014; Norden-Krichmar et al. 2011; Huang et al. 2011).

Fig. 3
figure 3

Snapshot of the epigenome browser illustrating co-occurrence of repressive marks including DNA methylation and the presence of small RNAs over TEs suggesting RNA directed DNA methylation leading to silencing

Despite two reports of in silico prediction of miRNA (Norden-Krichmar et al. 2011; Huang et al. 2011), canonical miRNAs were not detected in any of the diatoms suggesting a diversified small RNA biogenesis pathway. Scanning of the MMETSP database reveals that diatoms encode all the components of the RNAi machinery (data not shown). Argonaute proteins which are highly conserved among species have multiple members in plants (10 in Arabidopsis, 19 in rice) (Kapoor et al. 2008; Baumberger and Baulcombe 2005), humans (8), Drosophila melanogaster (5) C. elegans (27), Neurospora crassa (2) but only one copy in the investigated diatoms except Fragilariopsis cylindrus and Fustilifera solaris which encode two copies. All the diatom Agos contain the Piwi-Argonaute-Zwille (PAZ), shared with Dicer and PIWI domains which are important for binding to small RNAs and cleavage.

Unlike Ago, Dicer shows a poor conservation in diatoms. Typical Dicer protein such as in humans shows an N-terminal Dead Like helicase domain (DEXDc/Helicase), DUF283, a domain of unknown function, a PAZ, two RNaseIII domains (RNaseIIIa and RNaseIIIb) and a dsRNA binding domain. Diatom dicers have the two RNase III domains but miss the typical DEXDc/Helicase and for some of them the PAZ domain. We will therefore refer to diatom Dicer as Dicer like (DCL). Giardia intestinalis which is a unicellular parasite of the Excavates is the only species in which the crystal structure of Dicer was determined. Structural analysis has shown the importance of a conserved residue among all dicers (Proline at position 266) in dicer function (Macrae et al. 2006). This residue is in the platform domain between RNase and PAZ domains and was found conserved in the investigated diatoms of MMETSP except Minutocellus polymorphus. G. intestinalis, which has a dicer protein similar to some diatoms with only tandem RNaseIII and PAZ domains was shown to be capable of dicing dsRNA in vitro and to support RNAi in vivo (Macrae et al. 2006) suggesting similar functionalities in diatoms. Most of the diatoms including P. tricornutum and T. pseudonana miss the PAZ domain and have only the RNAse domains suggesting their importance in RNA-mediated silencing and diversified silencing pathways. Interestingly F. cylindrus DCL has unique features in that it is the only diatom with a DEXDc/Helicase domain and an N terminal C5 DNA methylase domain similar to dnmt4 C5 methylase domain. This unique combination suggests an intimate interaction between DNA methylation and DCL domains to mediate silencing in an RNA-directed DNA methylation fashion.

9.2 Long Non-coding RNAs

LncRNAs are a class of transcripts with lengths superior to 200 nt and no coding potential. They can be intronic, intergenic or antisense transcripts. Although, not coding for proteins, they play an important role in gene regulation in combination with chromatin remodeling complexes and histone modifications (Fatica and Bozzoni 2014). A famous example is COLD ASSISTED INTRONIC NONCODING RNA (COLDAIR), which is a plant lncRNA encoding a flowering inhibitor protein Flowering Locus C (FLC) which regulates vernalization. Knockdown of FLC decreases its expression which causes late flowering after vernalization (Heo and Sung 2011). LncRNAs are poorly investigated in diatoms where the fraction of non-coding genome is estimated around 40% in both P. tricornutum and T. pseudonana (Rastogi et al. 2015). Few interesting studies reported the presence of lncRNAs in P. tricornutum under stress conditions or in natural variants of the species (Huang et al. 2018; Cruz de Carvalho et al. 2016; Rastogi 2016). The studies revealed the synthesis of intergenic lncRNAs under phosphate depletion and high CO2 with some shared lncRNAs in stress-related studies suggesting an important and central regulatory role of lncRNAs in response to stresses. Validation of these lncRNAs using Phatr3 gene models (Rastogi et al. 2018) for those identified under phosphate depletion and quantitative RT PCR as well as functional studies is necessary to demonstrate the relevance of these lncRNAs to phosphate and high CO2 metabolisms.

10 Conclusions and Future Directions

In recent years, some progress has been made in the characterization of epigenetic factors in few diatoms where still many questions remain to be addressed. Epigenetics in diatoms emerged only recently and it is experiencing classic gain in knowledge rate similar to previous disciplines with important progress expected to happen in the future. Diatoms and microalgae in general represent suitable species to address fundamental questions about epigenetic mechanisms involved in genome regulation. Genome size, short life cycle, conservation of epigenetic components and lack of redundancy are all favourable factors in these microscopic living organisms to provide insightful findings about their epigenetic regulation.

Veluchamy and co-authors have shown the importance of PTMs of histones and DNA methylation in the response of P. tricornutum to nitrate starvation which induced dramatic changes genome wide in the redistribution of H3K9me3, H3K9/14Ac and DNA methylation with a decrease or an increase in the expression of targeted regions upon loss or gain of one or more of these marks (Veluchamy et al. 2015). A recent study has established a link between PRC2 and its associated repressive mark H3K27me3 in cell differentiation in P. tricornutum. Knockout of the catalytic subunit of PRC2, enhancer of zeste in three morphotypes, fusiform, cruciform and triradiate, led to a change in the morphlogy and caused a genome wide depletion in H3K27me3 suggesting a role of the polycomb mark in cell differentiation (Zhao et al. 2021). As widely investigated in many model species, epigenetic factors are likely to regulate many biological processes in diatoms including but not limited to stress responses, cell cycle, differentiation, life cycle and reproduction.

The feasibility of gene editing in diatoms made these species attractive and boosted our knowledge of their gene function. The diversity of epigenetic factors and their peculiar domain combinations need to be addressed using the genetic tools that are now available in P. tricornutum and some other diatoms that emerge as additional and attractive models (Tirichine et al. 2017). The recent evolution of customizable epigenome engineering tools in mammals is a great inspiration for diatom biologists. Typical examples include a strategy that uses fusions of engineered transcription activator-like effector (TALE) repeat arrays and the TET1 hydroxylase catalytic domain for efficient targeted demethylation of specific CpGs in human cells (Maeder et al. 2013). This TALE system is effectively used in fusion with histone demethylase of LSD1 type to remove enhancer associated chromatin modifications from target loci (Mendenhall et al. 2013). Epigenetic factors, their writers, erasers and readers do not act in isolation, and multiple evidence point to complex interactions orchestrating epigenetic-mediated regulation of genomes. Typical examples include LncRNAs interactions with histone modifiers, shaping thus the outcomes of gene transcription and nuclear architecture. Likewise, DNA methylation and histone modifications maintain a close cross-talk that deserves further investigation.

Another epigenetic process that deserves attention is RNA editing, known as specific modifications to nucleotide within an RNA molecule after synthesis by the RNA polymerase. Such examples include pseudouridylation which is the isomerization of uridine residues and deamination which is the removal of an amine group from cytidine to give rise to uridine mostly known as C to U change (more RNA editing types exist). RNA editing can also be insertional or deletional, in which nucleotides are added to, and in some cases also removed from a transcript (Lin et al. 2008). In humans, Adenosine deaminase acting on RNA (ADAR) is the enzyme that converts adenosine (A) to inosine (I) by deamination. RNA editing takes place within the nucleus and the cytoplasm as well as in the mitochondria and the plastids. It is known to occur in animals, plants, trypanosomes, dinoflagellates (only later diverging dinoflagellates) and even viruses. RNA editing was not detected in ciliates apicomplexans, basal lineages of dinoflagellates (Lin et al. 2008) and not yet documented in diatoms. However, in silico search detects homologues of ADAR enzyme in several diatom species including P. tricornutum (data not shown), which is likely going to be a great opportunity to investigate the role of RNA editing in generating protein diversity in microalgae.

Functional studies in diatoms provided important findings about adaptation to their environments, and because epigenetic variations are intimately connected to adaptation, it is important to investigate such connection over several generations asking, what is the role of DNA methylation, histone modifications and non-coding RNAs in the evolution of adaptive traits in response to specific changes in environmental factors. The inheritance of such modifications can be investigated in clonally propagating species such as P. tricornutum but also species reproducing sexually such as H. ostrearia where current studies in Mouget’s lab are addressing such topics. Epigenetic studies in diatoms are undoubtedly going to provide exciting and important insights for still many years to come.