Introduction

Gene expression during spermatogenesis undergoes a roller-coaster ride as this highly complex process involves cell division (mitotic), chromosome reduction (meiotic), and differentiation (post-meiotic) phases that require diverse sets of genes at work [1]. Post-meiosis, two processes must progress simultaneously: DNA compaction and cell differentiation. The process of DNA compaction would render gene expression impossible, and hence all transcripts required for the synthesis of proteins participating in the differentiation process must be synthesized at this stage. Eventually, a spurt in gene expression is seen in the round spermatids before they become transcriptionally inactive [2]. A variety of regulatory and coding RNAs generated during spermiogenesis are stored in the round spermatids, which also make their way to mature spermatozoa probably in a selective manner [1]. mRNAs generated during this phase are translationally repressed, stored, and later used for translation for the completion of spermiogenesis [2]. However, the mechanisms of gene activation in the round spermatids remain largely unknown, and the mechanisms of gene silencing remain almost entirely unknown. It is well known that histones get replaced by protamines during spermatogenesis; however, the presence of histones is still evident in mature sperm, suggesting their critical role in development and spermatogenesis [3]. It is noteworthy that epigenetic changes during this stage would define the chromatin structure, which is essential for extraordinary packaging of DNA in spermatozoa [4].

Histone proteins are subjected to a number of post-translational modifications (PTMs), including methylation, acetylation, ubiquitination, and phosphorylation [5]. It is a combination of a number of these modifications, which defines the chromatin’s open or closed states. The maturation of immature germ cells into mature spermatozoa is mediated by a complex interplay of histone PTMs [5]. A large number of these modifications play a crucial role in protamine transition and chromatin reorganization in the process of spermatogenesis [5, 6]. Among a number of such PTMs, modifications at Lys4, Lys9, and Lys27 have been particularly characterized for their role in chromatin condensation [6]. Histone H3 trimethylation modification on lysine 4 residue (H3K4me3) is an important histone modification that affects gene expression [7]. H3K4me3 marks near the transcriptional start sites (TSSs) evince active transcription or transcriptional readiness [8]. H3K4me3 was initially regarded as the regulator of Hox genes. H3K4me3 marks facilitate the open state by recruiting the chromatin remodelling factors CHD1 [9] and BPTF [10] while blocking the binding of the chromatin repressive proteins like NuRD [11] and INHAT complexes [12]. Histone H3 methylation modification on lysine 9 residue (H3K9) is the mark of condensed heterochromatin and a state of transcriptional inactiveness as di/trimethylation modification on these residues (H3K9me2/3) marks are abundantly present in the silenced genes [13]. H3K9me3 binds to the heterochromatin protein 1 (HP1) to form constitutive heterochromatin and is responsible for transcriptional repression and the maintenance of heterochromatin [14]. HP1 also recruits DNA methyltransferase 3b, providing one of the best examples of the interplay between histone methylation and DNA methylation [15].

Spermatids represent the last stage of transcriptionally active germ cells, which see significant changes in gene expression as they enter differentiation. Since transcription must be shut after this stage, these cells are ideal for studying the chromatin changes that affect gene expression and to study how these cells prepare for differentiation. In most of the studies on sperm and embryonic stem cells, H3K4me3 and Histone H3 trimethylation modification on lysine 27 residue (H3K27me3) have been studied as the activation and repressive marks, respectively [16]. In the present study, we employed chromatin immunoprecipitation sequencing (ChIP-Seq) to figure out the correlation of H3K4me3 marks with transcriptional activity and H3K9me3 marks with transcriptional silencing in the round spermatids purified from rat testes. We also undertook transcriptome sequencing in the round spermatids to find if H3K4me3- and H3K9me3-enriched regions correlate with gene expression profile in these cells. Transcripts corresponding to a large number of H3K4me3-enriched genes (~ 64%) were actually present in the round spermatids and transcripts corresponding to only 25% of H3K9me3-enriched genes were present in the round spermatids, that too at very low levels. We also undertook sperm transcriptome sequencing and found that almost all the transcripts corresponding to open chromatin regions (suggested by H3K4me3) present in the round spermatids were also detectable in spermatozoa.

Materials and Methods

Animals

Male Sprague–Dawley rats of 60–70 days of age were recruited from the National Laboratory Animal Centre, CSIR-CDRI, Lucknow. All rats were kept under a 12L:12D cycle in a vivarium with proper housing with ad libitum access to water and rat chow. Four sexually mature adult animals were sacrificed by anesthetic overdose under a protocol approved by the Institutional Animal Ethics Committee of the CSIR-Central Drug Research Institute, Lucknow (IAEC/2014/49/Renew03(135/16)).

Isolation and Purification of Round Spermatids from Rat Testes

Isolation and purification of the round spermatids were done by trypsin digestion followed by centrifugal elutriation and density gradient centrifugation described in our earlier study [17]. In brief, the testes were first decapsulated and then minced with scissors in Basal Medium Eagle (BME). Subsequently, the minced testis suspension containing Basal Medium Eagle (BME) supplemented with 0.1% trypsin (w/v), 0.1% glucose, and 17 µg/ml DNase (Sigma-Aldrich, USA) was incubated in a water bath with shaking for 15 min at 34 °C. After incubation, the enzyme reactions were abolished by adding soybean trypsin inhibitor (0.04% w/v) in the suspension, which was then filtered through a nylon mesh (36 µm), followed by sperm removal by passing it through a column of glass wool. The collected cell suspension was centrifuged at 400 g for 5 min at 4 °C to obtain the cell pellet, which was washed twice with BME. The mixed germ cell pellet was suspended in BME supplemented with DNase (2 µg/ml) and FBS (8% V/V) and kept on ice. Later, the cell suspension was elutriated with a Beckman Elutriator Rotor (JE-5) fitted with a standard chamber and mounted on a Beckman High-Speed Centrifuge (Avanti J-26S–XP, Beckman Coulter Inc, USA). A fraction was collected at 2000 rpm at flow rates of 23.0 and 40.0 ml/min containing round spermatids at around 75–80% purity. To increase the purity level, the fraction was layered over a linear gradient of 23–33% Percoll® (Sigma-Aldrich, USA), and centrifuged at 4025 g for 60 min in a swinging bucket rotor fitted on to a Sigma 3-30 K refrigerated centrifuge (SIGMA Laborzentrifugen GmbH, Germany). The major band was carefully recovered through a puncture in the side of the tube, followed by washing and dilution with BME. Furthermore, the purity of cell preparation was visually checked under a microscope using acridine orange staining (Fig. S1).

Chromatin Immunoprecipitation Assay

Chromatin immunoprecipitation (ChIP) assays were performed using the Simple ChIP Enzymatic Chromatin IP kit (9003, CST, USA) according to the protocol provided by the manufacturer. Round spermatids were subjected to cross-linking by adding 37% formaldehyde at a final concentration of 1% and incubating at room temperature for 10 min. The cross-linking was quenched by the addition of 125 mM glycine followed by nuclear shearing and chromatin digestion by sonication. For the input sample, 2% of enzyme digested chromatin was collected and stored at − 20 °C. The fragmented chromatin was analyzed on agarose gel which was found to be around 150–300bp in size. The sonicated chromatin was pre-cleared by centrifugation and subjected to immunoprecipitation with mouse anti-histone H3 (tri methyl K4) antibody (2 μg/IP) (ab12209, Abcam, UK) and rabbit anti-histone H3 (tri methyl K9) antibody (2 μg/IP) (ab8898, Abcam, UK) or with rabbit anti-IgG as the negative control at 4 °C overnight with rotation. The cross-link was reversed, and desired immune complexes were recovered by incubating with magnetic protein G beads at 65 °C for 30 minutes. The DNA thus precipitated was purified by column and eluted in a buffer for further downstream processing.

Library Preparation and High Throughput Sequencing

Immunoprecipitated DNA fragments were subjected to library preparation using TrueSeq DNA Kit (FC-121–2001, Illumina, USA). NEB Next index set 3 was used for barcoding and adapter ligation (E7710S, NEB, USA). Briefly, 5–10 ng of ChIP DNA was subjected to end repair which converts the overhangs resulted from fragmentation into blunt ends using an end repair mix. This end repair mix has 3′ to 5′ exonuclease activity which removes the 3′ overhangs and the polymerase activity fills in the 5′ overhangs. After this step, a single ‘A’ nucleotide is added to the 3’ ends of the blunt fragments, which prevents the end-repaired fragments from ligating to each other which is called A tailing. Following this step, adapter ligation was performed where adapters having indexes were ligated to the ends of the DNA fragments, enabling them for hybridization onto a flow cell. Purification of adapter-ligated fragments followed by enrichment of library was done to amplify DNA fragments having adapter sequences. DNA sequencing was performed on the Illumina NextSeq 500 platform (Illumina Inc, USA) with a read length of 75 bp and paired end chemistry.

ChIP-seq Data Analysis and Visualization

The ChIP-seq reads obtained from the sequencer were demultiplexed using bcl2fastq tool. Quality and adapter trimming was done using FastQC software. The processed data/reads were then aligned with rat genomic assembly rn6 by using Bowtie 1.2 (parameters -m 1 -best). Peak calling was done using MACS1.4 with p-value threshold of 0.001 and 10% of FDR cutoff. The bed files of peaks were subjected to motif discovery by rGADEM package and these motifs were visualized using MotIV package [18]. The resulting bed files were analyzed and visualized using R/bioconductor platform–based tool ChIPseeker [19]. Further functional enrichment analysis of the annotated peaks was carried out using R/bioconductor platform–based ReactomePA [20] and clusterProfiler [21].

RNA Isolation and Transcriptome Sequencing

RNA was extracted using Qiagen RNeasy Micro Kit (74,004, Qiagen, Germany) according to the manufacturer’s recommended protocol. The integrity and quality of the RNAs were checked by Agilent 2100 bioanalyzer (G2939BA, Agilent, USA), and the qualified RNA samples were used for sequencing. RNA was enriched with polyA tail using Dynabeads™ mRNA DIRECT™ purification kit (61,012, Thermo Fisher Sci, USA) and subjected to amplification of cDNA by SMARTer® Ultra® Low Input RNA Kit for Sequencing-v3 kit (634,848, Takara Bio Inc., USA). TruSeq RNA Library Prep Kit v2 was used for mRNA-seq library preparation (RS-122–2001, Illumina Inc, USA) and sequencing was performed on Illumina Hiseq 2500 next-generation sequencing platform (Illumina Inc, USA).

Transcriptome Data Processing

The raw RNA-seq reads were processed for adapter sequences and quality trimmed followed by their mapping to rat genomic assembly rn6 using TopHat (v2.0.8b, http://tophat.cbcb.umd.edu/). HTSeq (http://www.huber.embl.de/users/anders/HTSeq/doc/overview.html) was used for obtaining read counts. Differentially expressed genes were analyzed using DESeq R software pack. Benjamini–Hochberg multiple testing corrections were employed to identify differentially expressed genes.

Results

Deep Sequencing Read Quality

Deep sequencing of the ChIP DNA fragments generated an average of 36 million raw reads per ChIP reaction with 91% of bases having a Q score of greater than 30% and 94% having greater than 20. Quality enhancement and adapter removal of raw reads fetched an average of 34.7 million good quality reads with 94% and 95% of bases possessing quality scores of 30 and 20, respectively. The maximum and minimum read length considered for alignment and peak calling was 76 bp and 50 bp, respectively, with an average read length of 75 bp. The GC content of the processed reads remained below 45% in all reads (Table S1).

Distribution of H3K4me3 and H3K9me3 Marks in the Regulatory Regions

ChIPseeker package was used for annotation and profiling of peak calling data. After peak calling, peak locations and coverage over chromosomes were identified and visualized using the ‘Covplot’ function, which calculates the coverage of peak regions over chromosomes. Well-dispersed peaks were observed in the case of H3K4me3 and H3K9me3 marks, which were mutually exclusive in many instances (Fig. 1, Fig. S2a and S2b). We also compared the distribution of peaks between H3K4me3 and H3K9me3 marks. The active regions showed significant peak enrichment in H3K4me3 marks and repressive regions showed enrichment in H3K9me3 marks. Some regions also showed bivalent peak enrichment, suggesting a poised condition of certain regions/genes (Fig. 1). To map the location of histone marks with respect to genes, heat-map and average density profiling of ChIP peaks with respect to the transcription start site (TSS) regions were generated. The read density of H3K4me3 showed enrichment in the 1.5 kb region from the TSS (Fig. 2a). On the other hand, read densities of H3K9me3 were distributed widely from the TSS (Fig. 2b). This dispersed pattern is present in the regions where no enrichment was observed with H3K4me3, suggesting discrete regions of activity and inactivity. The distribution of histone marks with respect to the TSS showed that more than 35% of total H3K4me3 peaks lie within 3 kb upstream and downstream of the TSS whereas only about 10% of H3K9me3 were seen in this region (Fig. 2c and d).

Fig. 1
figure 1

Representative peak enrichment profiles of H3K4me3 and H3K9me3 in some chromosomes showing active (green box), repressive (red box), and bivalent states (blue box)

Fig. 2
figure 2

Heat map and average profile of peaks depicting their distribution in 3 Kb upstream and downstream of the TSS for H3K4me3 (a), H3K9me3 (b) (upper panel) Bar plots exhibiting the percentage distribution of peaks within the region of TSS and 100kbs for H3K4me3 (a) and H3K9me3 (b) (lower panel)

In order to get information about the genomic location of enriched peaks, we used “annotatePeak” function of the ChIPseeker package, which assigns peaks to specific genomic annotations such as TSS, exon, 5’ UTR, 3’ UTR, intronic or intergenic. Approximately 27% of the H3K4me3 peaks were in the promoter regions of various genes (Fig. 3a). Enrichment of H3K4me3 peaks in the promoter regions suggests overall high transcriptional activity in spermatids. About 86% of H3K9me3 marks were seen in the intergenic regions, with a very low frequency in the promoter regions (~ 4%) (Fig. 3b). The low frequency of H3K9me3 peaks in the promoter regions also suggested overall high transcriptional activity in spermatids. This is also shown by a high number of H3K4me3 peaks in the promoter region and H3K9me3 peaks in other regions in the upset plot (Fig. 3c and d).

Fig. 3
figure 3

Upset plot and Venn pie plot (inset) showing the annotation overlap between different genomic regions for H3K4me3 (a), H3K9me3 (b)

Functional Enrichment Analysis of H3K4me3 Peaks

H3K4me3 ChIP-seq identified 3958 peaks corresponding to 2807 genes. In order to identify the predominant biological pathways represented by the H3K4me3 (active) peaks, we performed Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis (Fig. 4a). This analysis fetched around 40 genes involved in protein processing in the endoplasmic reticulum, 30 genes for RNA transport and 25 genes for spliceosome. Apart from these, approximately 20 genes of the cell cycle pathway were also enriched. Next, these genes were subjected to gene ontology (GO) analysis. The GO analysis highlighted the processes such as cilium development, spermatid differentiation, spermatid development, and germ cell development, which suggests gene expression relevant to germ cell differentiation (Fig. 4b). Similarly, in the cellular component domain of GO, we observed significant enrichment of sperm and flagellum-related components (Fig. 4c). In the molecular function domain, ubiquitin-related activities showed enrichment (Fig. 4d).

Fig. 4
figure 4

KEGG histogram for genes enriched in H3K4me3 marks showing molecular pathways (a), biological processes (b), cellular components (c), and molecular functions (d)

Gene concept networks analysis also depicted crucial spermatid and sperm-related biological linkage to genes enriched in H3K4me3 (Fig. 5). In order to identify mutually overlapping genes and gene sets in enriched GO terms, an enrichment map was created (Fig. S3). Enrichment map analysis of biological processes domain fetched a total of six clusters, among which the most notable were processes such as spermatid development, spermatid differentiation, germ cell development, sperm motility, flagellated sperm motility, and cellular processes involved in spermatogenesis. Similar analysis of cellular components produced seven clusters and the largest of them consisted of terms such as sperm part, sperm flagellum, and sperm principal piece. A thorough literature search for all these genes showed that 134 genes had an already established role in spermatogenesis (Table 1) and the remaining genes have a high plausibility of participating in spermatid differentiation (Data S1).

Fig. 5
figure 5

Gene-concept networks for genes enriched in H3K4me3 marks showing biological domains (a), cellular components (b), and molecular functions (c)

Table 1 H3K4me3-enriched genes (active) related to spermatid development, differentiation, spermatogenesis, and sperm structural and functional components. All references listed in the table are given in supplementary information

Functional Enrichment Analysis of H3K9me3 Peaks

H3K9me3 ChIP-seq identified 6352 peaks corresponding to 1836 genes. KEGG pathway analysis of H3K9me3 peaks demonstrated enrichment with regard to processes other than spermiogenesis, such as retinol metabolism (31 genes), steroid hormone biosynthesis (27 genes), inflammatory mediator regulation of transient receptor potential (TRP) channels (32 genes), and chemical carcinogenesis (25 genes) (Fig. 6a). The biological process of GO analysis showed the highest number of hits in pattern specification processes (~ 110 genes) followed by synapse organization (114 genes), regionalization (98 genes), and embryonic organ morphogenesis (88 genes) (Fig. 6b). In the cellular component domain, synaptic membrane and post-synaptic membrane exhibited the highest enrichment with 109 and 83 gene hits, respectively (Fig. 6c). Interesting enrichment of terms was seen in the molecular function domain in which DNA-binding transcription activator activity exhibited more than 100 gene hits followed by transcription factor activity with more than 75 gene hits (Fig. 6d). Gene concept networks of H3K9me3 peak-enriched genes also revealed many networks involved in processes other than spermatogenesis (Fig. S4). Enrichment map analysis of biological processes showed only two clusters, one of which was related to the embryonic development (Fig. S5). In the cellular component, the clustering of pre- and post-synaptic terms was observed (Fig. S5). Similarly, only one cluster related to the ion channel was observed in the molecular function domain (Fig. S5). Out of 1836 genes enriched in H3K9me3, only 20 genes appeared to have some role in fertility or reproduction (Table 2). All other genes (1816) were unrelated to spermatogenesis/fertility (Data S1).

Fig. 6
figure 6

KEGG histogram for genes enriched in H3K9me3 marks showing molecular pathways (a), biological processes (b), cellular components (c), and molecular functions (d)

Table 2 H3K9me3-enriched genes (repressed) related to functions in spermatid development and spermatogenesis. All references listed in the table are given in supplementary information

Bivalent (H3K4me3 and H3K9me3) Enriched Regions

The distribution of H3K4me3 and H3K9me3 marks are generally mutually exclusive in nature. The active regions exhibit H3K4me3 peak enrichment and the repressive regions show H3K9me3 enrichment. However, there are certain instances where both co-exist (Fig. 1), suggesting a poised state of genes. We found 53 genes which showed overlapping peak enrichment in H3K4me3 and H3K9me3 marks (Table S2). Upon functional mining of these genes, we found them to be unrelated to spermatogenesis, except a few which had roles in early and late spermatogenesis. A large number of these genes were found to be important for early embryonic development (Data S1).

Spermatid Transcriptome Represents Most of the H3K4me3-Enriched Genes

ChIP-seq of H3K4me3 detected 3958 peaks corresponding to 2807 genes or genomic regions. We also undertook transcriptome sequencing in round spermatids, which identified about 11,000 transcripts. Upon comparing spermatid transcriptome with ChIP-seq data, we found 1800 genes to be common. Transcripts corresponding to 1800 out of 2807 (64.13%) genes suggested to be active by H3K4me3 marks were actually present in spermatids. This is a very high number as the expression of a gene is affected by a number of histone and DNA modifications. We pulled out the fragment per kilobase of transcript per million mapped reads (FPKM) values of these genes to assess their expression level in the round spermatids. We found that a large number of these genes showed transcription at significantly high levels with the top FPKM value at 29,279 and 330 had FPKM above 100 and 902 transcripts had FPKM values above 10 (Fig. 7). This gene list consisted of top spermiogenesis candidates, such as Acrbp, Fabp9, Catsper2, Catpser3, Catsper4, Ccdc42, Ccdc63, Ccdc181, Cftr, Dnah1, Dnah8, Dync1h1, Dynll1, Kdm3a, Gapdhs, H1fnt, Odf1, Odf3, Odf4, Oaz3, Tssk1b, Tssk2, Tssk4, Adam2, Adam26a/4, Adam3a, Tekt3, Tekt4, and Tekt5, most of which encode structural and functional components of spermatozoa.

Fig. 7
figure 7

Bar diagram showing the percentage of H3K4me3, H3K9me3, and bivalent marks and the percentage of various transcripts in each category classified on the basis of expression (FPKM) values

H3K9me3-Enriched Genes Have a Scanty Presentation in Spermatid Transcriptome

ChIP-seq of H3K9me3 enriched regions detected 6352 peaks corresponding to 1836 genes or genomic regions. Upon comparing these genes with spermatid transcriptome data, we found transcripts corresponding to 413 (22.49%) genes out of 11,000 transcripts detected in the round spermatids. We also looked into the distribution of genomic features of these peaks and found that a very high number of peaks (75%) represented intronic regions of genes. We pulled out the FPKM values of these genes and found that the top FPKM value was relatively significantly low (3517 versus 29,279 in the case of H3K4me3 top transcript), with only 14 transcripts showing FPKM above 100 and 53 transcripts showing FPKM above 10 (Fig. 7). Therefore, H3K9me3 is associated with transcriptional silencing to a significant extent.

Genes with bivalent modifications were looked in the spermatid transcriptome for their expression. We found that 15 of 52 (28.8%) genes with bivalent modifications showed the actual presence of their transcripts in spermatids. Most of these genes were expressed at very low levels, with the top FPKM being 3320, which sharply dropped to 483 for the second candidate and there were only two genes having FPKM above 100 and only 5 genes having FPKM above 10 (Fig. 7).

RNA Payloads of Sperm Are Set by Chromatin Activity in Spermatids

Since spermatozoa also carry a plethora of RNAs, we also undertook transcriptome sequencing in sperm prepared from rat epididymis. Since sperm are transcriptionally inactive, a large number of these RNAs may have origin in the round spermatids or predecessor cells [1]. We compared the list of the above genes (shared between H3K4me3-enriched regions and spermatid transcriptome) with sperm transcriptome. Interestingly, we found that sperm carried almost 100% (1788 out of 1800) transcripts corresponding to the active chromatin regions in the round spermatids as denoted by H3K4me3 marks. The top transcripts in spermatozoa consisted of the same set of spermiogenesis-related genes as seen in spermatids, but with much lesser FPKM values (reduced 100 to 500 folds). This suggests that most of the active chromatin regions in spermatids generated their transcripts in the round spermatids, which also made their way to mature sperm.

Discussion

This study focused on scoring chromatin activity in the round spermatids, particularly with respect to the role of H3K4me3 and H3K9me3 marks and found that genes participating in the formation of structural and functional components of sperm were enriched in H3K4me3 marks and those unrelated to spermatid differentiation were enriched in H3K9me3 marks. H3K4me3-enriched genes aligned to processes such as cilium development, spermatid differentiation, spermatid development, and germ cell development, which strongly support the idea that the genes participating in sperm formation have active chromatin state in the round spermatids. Functional annotation of these genes showed their roles in sperm development, sperm flagellum, 9 + 2 motile cilium, and cilliary parts. H3K4me3 peaks also showed enrichment of functions involving ubiquitin enzymes and mRNA binding. Interestingly, a study has identified more than 30 ubiquitinating enzymes as crucial regulators of spermatogenesis, and mRNA-binding proteins are required for their repression and active translation [22]. Interestingly, Ube2b knockout mice have been shown to present with male infertility, characterized by abnormalities in head shape and abnormal distribution of periaxonemal structures [23]. UBE2B mutations have been linked with human male infertility [24, 25]. The ubiquitin system has been shown to be critical for the morphogenesis and function of sperm organelles [26]. March7 ubiquitin ligase is highly expressed in the developing spermatids and is involved in tail formation.

A thorough literature search found that 134 genes enriched in H3K4me3 have established roles in sperm differentiation, development, and sperm maturation (Table 1). In particular, Acrbp and Fabp9 play roles in acrosome formation, Catsper2, Catpser3, and Catsper4 participate in hyperactivated sperm motility, Ccdc42, Ccdc63, and Ccdc181 participate in sperm flagella formation, Cftr participates in capacitation, Dnah1 and Dnah8 participate in cilia formation, Dync1h1, Dynll1, and Kdm3a take part in chromatin condensation and manchette formation, GAPDHS is sperm-specific glycolysis protein, H1fnt encodes testis-specific histone, Odf1, Odf3, Odf4, and Oaz3 encode sperm tail proteins, Tssk1b, Tssk2, and Tssk4 encode kinases regulating sperm motility and fertility, Adam2, Adam26a/4, and Adam3a encode proteins for sperm binding to zona pellucida, and Tekt3, Tekt4, and Tekt5 encode flagella formation proteins (Table 1). More than one thousand other genes which appeared in H3K4me3 ChIP profile, but do not have defined roles in spermatogenesis, are excellent candidates for investigation of their roles in the process of spermiogenesis.

Since H3K9me3 represents a repressive state of chromatin, we observed enrichment of GO terms, largely unrelated to spermatogenesis or fertility, but related to other distant processes such as neuron and synapse development. An appropriate level of H3K9me3 mediated gene inactivation of the pathways that do not participate in sperm differentiation is required for spermatogenesis to proceed normally. Accordingly, disturbances in this process have been found to affect spermatogenesis negatively. For example,  a high level of H3K9me3 in germ cells has been linked to apoptosis [27] and pachytene spermatocytes lacking H3K9me3 show abnormal aggregation of chromosomes and improper synapsis [27, 28]. Various findings have shown that the removal of H3K9me3 enhances the efficiency of reprogramming by increasing the production rate and the number of pluripotent stem cells [29]. We found that 80% of the genes in the H3K9me3 list were not expressed in spermatids or showed very low expression. The H3K9me3 marks associated with the non-coding portions of the genome barricade them from transcriptional activation. This whole observation points to the fact that the presence of repressive signals during spermatogenesis is as important as that of the activation marks. The literature search identified four [30,31,32,33] and two mouse studies [34, 35] on H3K4me3 and H3K9me3, respectively, and the comparison of data identified H3K4me3 marks on 178 genes (having roles in cell differentiation) to be common across these studies while only 51 genes had common H3K9me3 marks (Fig. S6).

In comparison with a much higher percentage of genes with H3K4me3 and H3K9me3 marks, only a few genes (1%) had bivalent marks. Bivalent modifications are known to keep RNA polymerase II paused at the proximal promoter to maintain gene expression at low levels, but in the poised state. Epigenetic modifications are not only important in regulating gene expression during sperm differentiation, but also during the post-fertilization development of the embryos [36]. Accordingly, we found that most of the genes with bivalent modifications have critical roles in the post-fertilization development. The bivalent modifications may be a chromatin state to facilitate quick post-fertilization development by facilitating the process of confrontation and consolidation, which is followed by massive gene expression to support post-fertilization development. Earlier studies in sperm have shown that the co-occurrence of activation (H3k4me3) and repressive marks (H3k27me3) represents the bivalent genes which includes many transcription factor families such as Hox, Gata, Tbx, Sox, and Pax that have crucial roles in early embryogenesis [37]. The scenario of histone modifications is beyond methylation and the real picture of chromatin state is a puzzling chimera of acetylation, ubiquitination, phosphorylation, and other modifications such as butyrylation, crotonylation, malonylation, propionylation, and succinylation, in which histones also affect each other in defining the chromatin state that condenses as spermiogenesis progresses [4].

The correlation of H3K4me3 marks with gene expression in spermatids is interesting. We found that about 64% of genes enriched in H3K4me3 marks had transcripts in spermatids. Out of 134 spermatogenically important genes identified by H3K4me3 ChIP-seq, 124 gene transcripts showed actual presence in the spermatid transcriptome. This indicates the importance of H3K4me3-based gene regulation in the round spermatids. A large number of these genes showed very high FPKM (expression) values in spermatids, with Oaz3, Odf1, Smcp, Gsg1, Fam229b, Odf2, Irgc, Spata6, Spata18, Hmgb4, Lqcf3, Ubb, Spata20, and Gapx4 being some of the top candidates. Only 413 of H3K9me3 (22.49%) genes showed their transcripts in spermatids, and most of them were detected at very low levels. The top FPKM value for any H3K9me3-enriched gene was relatively significantly low (3517 versus 29,279 for top H3K4me3 transcript), with only 14 transcripts above 100 FPKM and only 53 transcripts with FPKM above 10, showing their low expression. Therefore, H3K9me3 is associated with transcriptional silencing in the round spermatids. Genes with bivalent modifications were looked in the spermatid transcriptome for their expression. We found that 15 of 53 (28.3%) genes showed the actual presence of their transcripts in spermatids. Most of these genes were expressed at very low levels, with the top FPKM being 483, which sharply dropped to 93 for the second candidate and there were only two genes having FPKM above 100 and only 5 genes having FPKM above 10.

We found that out of 1800 genes identified to be active in spermatids, the transcripts of almost 100% genes were detected in sperm, suggesting that most of the transcripts found in the round spermatids are used for encoding proteins during spermiogenesis and these transcripts also make their way to sperm. In general comparison of spermatid and sperm transcriptome, we found sharing of about 99% of transcripts between the two. The importance of the genes with active and bivalent chromatin state is also evident through clinical reports on the genes identified; for example, DPY19L2 [38] and MFSD14A [39] are associated with round head sperm, i.e., globozoospermia. Mutations in the GALNTL5 gene result in the reduction of sperm motility [40], PLCZ1 mutations hamper the oocyte activation ability and the initiation of embryonic development [41], PMFBP1 mutations associate with acephalic spermatozoa syndrome [42], and non-stop mutations in the MAGEB4 gene cause X-linked azoospermia and oligospermia [43]. Over-expression of the TXNDC8 gene results in abnormal shaped human spermatozoa [44] and decreased expression of HSPA2 is seen in infertile men with ([45] or without varicocele [46]. Nonetheless, the importance of the presence of many transcripts in sperm remains to be investigated, and their roles in the post-fertilization development remain a strong proposition.

This study has identified a strong correlation between H3K4me3 and gene expression and H3K9me3 and gene silencing. Sperm also retain a large number of histones, which do not get replaced by the protamines [16]. It would be interesting to figure out which post-translational modifications the retained histones carry. These histone marks and a heterogenous set of transcripts from the round spermatids guide further differentiation in the process of spermiogenesis. In most of the studies till date, H3K4me3 and H3K27me3 have been studied in combination and found that genes having both of these represent a poised state in sperm, which may play roles in early and late embryonic development [16]. This is the first study showing that H3K4me3 and H3K9me3 bivalent modifications in germ cells/round spermatids not only correlate with gene expression, but also facilitate poised state as these genes may have roles in the post-fertilization development. The lack of replication experiments is a limitation of this study. Transcription is ultimately regulated by a number of mechanisms including histone methylation, but H3K methylation appears to be the major regulator of transcription in the round spermatids. Very likely, there are a number of other co-existent epigenetic modifications, giving rise to a multi-valent state that regulates the overall state of gene expression to guide spermatogenesis and post-fertilization development.

Conclusions

H3K4me3 enrichment in the round spermatids correlates significantly with gene expression and H3K9me3 correlates with gene silencing. This ensures the expression of all essential genes and repression of all others. The bivalent state with these two histone marks keeps certain genes in a ready state as they may soon be required in the later stages of spermatogenesis or in the post-fertilization events. Generally, spermatids appear to be transcriptionally very active and show active and profuse expression of genes participating in sperm formation. Since the chromatin must start condensing, this may explain profuse gene expression activity seen in the round spermatids. Nevertheless, the level of transcription activity in further stages of spermatids remains to be investigated. The transcript content of the round spermatids is perfectly poised to facilitate chromatin condensation, sperm head and tail formation, sperm functional proteins required for egg penetration, and other proteins participating in actual flagellar formation, beating for motility, and in generating the energy required for active motility. Almost all the transcripts seen in this stage of spermatids are seen in mature spermatozoa, whose functional significance would unfold many other mysterious arrangements in the complex process of spermatogenesis. Since methylation at K4 and K9 positions is not the only regulatory mechanism for gene expression, this ChIP-seq experiment may not have picked all active regions of the chromatin, which is also reflected in the number of matches between ChIP-seq peaks and the transcripts in RNA sequencing. In summary, the histone marks in the round spermatids define the battery of genes that show expression and pile up their transcripts for facilitating spermiogenesis, silence other genes that are not required during this stage, and prepare yet another set of genes (poised) for post-fertilization functions.