Marchantia TCP transcription factor activity correlates with three-dimensional chromatin structure

  • Article
  • Published:

From Nature Plants

Information in the genome is not only encoded within sequence or epigenetic modifications, but is also found in how it folds in three-dimensional space. The formation of self-interacting genomic regions, named topologically associated domains (TADs), is known as a key feature of genome organization beyond the nucleosomal level. However, our understanding of the formation and function of TADs in plants is extremely limited. Here we show that the genome of Marchantia polymorpha, a member of a basal land plant lineage, exhibits TADs with epigenetic features similar to those of higher plants. By analysing various epigenetic marks across Marchantia TADs, we find that these regions generally represent interstitial heterochromatin and their borders are enriched with Marchantia transcription factor TCP1. We also identify a type of TAD that we name ‘TCP1-rich TAD’, in which genomic regions are highly accessible and are densely bound by TCP1 proteins. Transcription of TCP1 target genes differs on the basis gene location, and those in TCP1-rich TADs clearly show a lower expression level. In tcp1 mutant lines, neither TCP1-bound TAD borders nor TCP1-rich TADs display drastically altered chromatin organization patterns, suggesting that, in Marchantia, TCP1 is dispensable for TAD formation. However, we find that in tcp1 mutants, genes residing in TCP1-rich TADs have a greater extent of expression fold change as opposed to genes that do not belong to these TADs. Our results suggest that, besides standing as spatial chromatin-packing modules, plant TADs function as nuclear microcompartments associated with transcription factor activities.

Fig. 1: Topologically associated chromatin domains at Marchantia autosomes.
Fig. 2: Marchantia TCP1 is dispensable for TAD patterns.
Fig. 3: Some Marchantia TADs having intensive interactions with TCP1.
Fig. 4: TCP1-rich TADs are part of TCP1 protein speckles in the nucleus.
Fig. 5: Characteristics of TCP1-rich TADs concerning TCP1–chromatin interactions.
Fig. 6: Impact of loss of TCP1 on gene expression.

Data availability

Short read data of in situ Hi-C, ChIP–seq, ATAC–seq and RNA-seq are publicly available at NCBI Sequence Read Archive under accession number PRJNA597314.

Large datasets, including Hi-C matrices (2-kb bin size for individual chromosomes), integrated epigenetic marks, ATAC–seq and ChIP–seq track files in 200-bp bin size are available from figshare repository, which are accessible with the following link: All figures presented in this manuscript are associated with these data.

The following public datasets were downloaded for coexpression analysis (with their accession numbers from the NCBI Sequence Read Archive): 11-day thalli (DRR050343, DRR050344, DRR050345), Archegoniophore (DRR050351, DRR050352, DRR050353), Antheridiophore (DRR050346, DRR050347, DRR050348), Antheridia (DRR050349, DRR050350), apical cell (SRR1553294, SRR1553295, SRR1553296), 13d-Sporophyte (SRR1553297, SRR1553298, SRR1553299), Sporelings 0 hr (SRR4450262, SRR4450261, SRR4450260), 24hr-Sporeling (SRR4450266, SRR4450265, SRR4450259), 48hr-Sporeling (SRR4450268, SRR4450264, SRR4450263), 72hr-Sporeling (SRR4450267, SRR4450258, SRR4450257), 96hr-Sporeling (SRR4450256, SRR4450255, SRR4450254), mock-inoculated plants, 1dpi (SRR7977545, SRR7977546, SRR7977548), mock-inoculated plants, 2dpi (SRR7977547, SRR7977550, SRR7977549), mock-inoculated plants, 3dpi (SRR7977552, SRR7977551, SRR7977554), mock-inoculated plants, 4dpi (SRR7977553, SRR7977556, SRR7977555), 1-month thallus (SRR6685782, SRR6685783, SRR6685784), Tak-1-1_Mp (SRR7772758, SRR7772757, SRR7772756, SRR7772755), Tak-1-2_Mp (SRR7772759, SRR7772761, SRR7772760, SRR7772762), Tak-1-3_Mp (SRR7772763, SRR7772764, SRR7772765, SRR7772766), and Mp-mock (SRR5905098, SRR5905099, SRR5905100). Source data are provided with this paper.

Code availability

All scripts used for pattern analysis are available upon request.


We thank S. Czemmel from the Center for Quantitative Biology (University of Tübingen) for their assistance with sequencing. We are grateful for inspiring discussions with members of the COST Action CA1612 INDEPTH network. We acknowledge computing support by the High Performance and Cloud Computing Group at the Zentrum für Datenverarbeitung of the University of Tübingen, the state of Baden-Württemberg through bwHPC and the German Research Foundation (DFG) through grant no. INST 37/935-1 FUGG. This work was supported by the Deutsche Forschungsgemeinschaft (LI 2862/4) and the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 757600).

Author information

Authors and Affiliations



C.L. conceived and designed the experiments. E.S.K., N.W., N.F. and H.B. established and characterized transgenic lines. E.K. performed ChIP–seq, ATAC–seq and RNA-seq experiments. N.W. performed FISH and immunostaining experiments. Y.L. performed coexpression analysis. S.A.M. and F.B. performed epigenomic profiling. K.W.B. performed nuclei sorting. C.L., E.S.K. and S.L. performed Hi-C experiments. E.S.K. and C.L. wrote the manuscript with contributions from other authors. All authors read and accepted the final version of the manuscript.

Corresponding author

Correspondence to Chang Liu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Plants thanks Stefan de Folter, Stefan Grob and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 The Marchantia genome has different types of TADs.

a, Epigenetic marks across Marchantia TADs. ‘mCG-poor’ and ‘mCG-rich’ TADs are shown in green and brown curves, respectively. b, Comparison of gene expression according to gene locus location. Thalli transcriptome data was from (doi: 10.1093/pcp/pcw020). Boxplots from left to right: n = 16707, 2264, 972, 215 and 575. c, Chromatin accessibility across ‘mCG-poor’ (green) and ‘mCG-rich’ (brown) TADs. d, Clustering analysis of TADs according to histone marks. The epigenetic profiling of various histone marks was from our previous study (doi: 10.1016/j.cub.2019.12.015).

Extended Data Fig. 2 Co-expression analysis.

a, Clustering of transcriptome datasets used for calculating gene co-expression. The dendrogram shows hierarchical clustering based on Euclidean distance. b, Distribution of expression correlation coefficients as a function of distance. The distance of a given gene pair was determined according to their annotated TSSs. Boxplots from left to right: n = 6455, 9068, 9238, 9165, 8992, 8931, 8832, 9053, 8736, and 8793. c, d, TADs contain more co-expressed genes than expected. c, For all the gene pairs located in the same TAD, the fraction of co-expressed gene pairs (q < 0.05) was computed and indicated by a black dot at the top of the panel (highlighted with an arrow). The boxplot with twenty blue data points denotes results in which TADs were randomly assigned to the genome. The p-value is an empirical p-value calculated based on twenty simulations. d, Numbers of co-expressed and not co-expressed gene pairs in each round of shuffled TADs (simulation) are shown. As TADs are gene-poor (Fig. 1c), randomly shuffling TADs leads more genes (hence gene pairs) to overlap with TADs. The boxplots in b and c indicate the median (line within the box), the lower and upper quartiles (box), margined by the largest and smallest data points which are still within the interval of 1.5 times the interquartile range from the box (whiskers); outliers are not shown.

Source data

Extended Data Fig. 3 Motif analysis links TCP1 to TAD borders.

a, A phylogenetic tree of DNA binding domains of TCP proteins in Arabidopsis thaliana (green), Oryza sativa (yellow), and Marchantia polymorpha (blue). b, Alignment of the TCP DNA-binding domain sequences from two Marchantia TCP proteins and founder members of the TCP family. c-e, Motif analysis of Marchantia TAD borders. The scatter plot in c shows enrichment of motifs, which are recognized by various plant transcription factors (position weight matrices are according to the Arabidopsis DAP-seq dataset (doi:, at TAD boundary regions. The fold enrichment of a motif was calculated as the relative density of this motif in a 2 kb region overlapping with TAD borders compared to that in 20 kb region flanking TAD borders. The presence of motifs in query DNA sequences was determined by the ‘matchPWM’ function in the ‘Biostrings’ package in R, with the search stringency set to 85%. Red and black dots depict motifs recognized by class I and II TCP members, respectively. d, e, Analysis of TCP class I (d) and class II (e)consensus binding sites based on text search. Only sequences exactly matching the query motif were counted. Note that these consensus sequences, determined by Kosugi and Ohashi (doi:, partially overlap.

Extended Data Fig. 4 Genome-wide identification of TCP1 target regions.

a, Verification of anti-TCP1 antibody for immunoprecipitation. TCP1:GFP fusion proteins under the control of 35S promoter were expressed transiently in Nicotiana benthamiana leaves. The presence of TCP1:GFP in each sample was examined by using anti-GFP antibody. Similar results were observed from two independent experiments. b, Snapshot showing reads distribution of ChIP-seq reads in different samples. tcp1 represents a TCP1 knock-out line (in Tak-1 background). See ‘methods’ for details of knock-out line generation. c, Venn diagram of genomic regions enriched in each biological replicate. Below this Venn diagram, the J(rep1, rep2) indicates the Jaccard index. d, Epigenetic marks across TCP1-bound chromatin regions (grey block).

Source data

Extended Data Fig. 5 Comparison of insulation scores of chromatin regions around wild-type and tcp1 mutant TAD borders.

a, Comparison of insulation scores in wild-type TAD borders with those in tcp1 no. 18-1 (blue plots) or in tcp1 no. 24-8 (green plots). See Fig. 2e legend for the definition of ‘TCP1-bound’ TAD borders. The titles above these plots, which indicate bin positions, are as those under boxplots in Fig. 2e. Assuming that TCP1 plays a structural role on TCP1-bound TAD borders, we expect that the removal of TCP1 results in specific changes in insulation scores of these regions compared to regions not bound by TCP1. The violin plots in this panel show distribution of changes in insulation scores in the mutant Hi-C maps. For each pair of comparison (that is, changes in insulation scores of TCP1-bound TAD borders vs. TCP1-free TAD borders), its p-value from the two-sided Mann-Whitney U test is given. To assess effect size, the cohen’s d (c’d) is also given below each p-value note. In general, the difference between two populations is considered ‘trivial’ or ‘negligible’ when the absolute value of cohen’s d is less than 0.2. b, Metagene plots showing chromatin contacts around TAD border regions. Pixels in the plots stand for 2-kb bins in the Hi-C matrices. For each plot, TAD borders are aligned and indicated with a dotted triangle. With careful inspection, we conclude that the differences of chromatin organization between TCP1-bound TAD borders and TCP1-free TAD borders are comparable in Tak-1 and tcp1, and loss-of-TCP1 does not led to drastic structural changes in TCP1-bound TAD boundaries.

Source data

Extended Data Fig. 6 Epigenetic and transcriptional profiling of TCP1-rich TADs.

Comparison of various epigenetic marks (a) and chromatin accessibility (b) between TCP1-rich TADs (blue curves) and the rest TADs (gray curves) belonging to the ‘mCG-poor’ category. c, d, Comparison of genes (c) and repeats (d) in different TADs. Same as those in panel b, the blue and gray curves denote TCP1-rich TADs and rest TADs in the ‘mCG-poor’ category, respectively. The brown curves denotes ‘mCG-rich’ TADs. Labels are the same as in Fig. 1b.

Extended Data Fig. 7 Changes in gene expression in relation to TCP1 binding, and the distribution of differentially expressed genes in tcp1 in relation to their location.

a, Changes in expression of genes bound by TCP1. Only genes with their gene bodies (defined as their transcribed region plus 0.5 kb flanking regions) overlapping with TCP1 ChIP-seq peaks are included in this plot. b, Distribution of gene expression changes in tcp1 mutants. All the genes from the genome are divided into four groups according to the extent to which they overlap with TCP1 ChIP-seq peaks. c, d, Differentially expressed genes bound (c) and not bound (d) by TCP1 are divided into different groups. The p value indicates two-sided Fisher’s exact test result. The term ‘regular TADs’ in these two panels refers to TADs that are not annotated as TCP-rich.

Extended Data Fig. 8 Motif analysis of Marchantia TAD borders and TADs.

a, Motif analysis of Marchantia TAD borders. This plot is the same as Extended Data Fig. 3c, but highlighting motifs of a few transcription factor families. b, Motif analysis of Marchantia mCG-poor and TCP1-rich TAD bodies. Fold enrichment was calculated as the ratio of motif density in TADs over that in 30 kb flanking genomic regions. Other than that, the motif search was performed as for Extended Data Fig. 3c.

Supplementary information

Supplementary Information

Supplementary Figs. 1–6.

Reporting Summary

Supplementary Table 1

Number of Hi-C reads.

Supplementary Table 2

TADs annotation.

Supplementary Table 3

Motifs at TAD borders and within mCG-rich TADs.

Supplementary Table 4

TCP1 ChIP–seq peaks.

Supplementary Table 5

Motifs at TCP1-bound chromatin.

Supplementary Table 6

RNA-seq count table.

Supplementary Table 7

Motif position weight matrices.

Supplementary Table 8

ATAC–seq peaks.

Supplementary Table 9

FISH oligonucleotides.

Source data

Source Data Fig. 2

Statistical source data, related to Fig. 2.

Source Data Fig. 3

Statistical source data, related to Fig. 3.

Source Data Fig. 5

Statistical source data, related to Fig. 5.

Source Data Fig. 6

Statistical source data, related to Fig. 6.

Source Data Extended Data Fig. 2

Statistical source data, related to Extended Data Fig. 2.

Source Data Extended Data Fig. 4

Unprocessed western blots, related to Extended Data Fig. 4.

Source Data Extended Data Fig. 5

Statistical source data, related to Extended Data Fig. 5.

