Biological context

Matrin-3, which was first identified as a protein component of the nuclear matrix (Nakayasu and Berezney 1991; Belgrader et al. 1991), is a multiple functional protein. As a DNA-binding protein, it controls transcriptional regulation (Belgrader et al. 1991; Hibino et al. 1993a, b, 1998, 2000; Niimori-Kita et al. 2018). Furthermore, as an RNA-binding protein, it plays important roles in mRNA metabolism including splicing, transport, stabilization, and degradation (Salton et al. 2011; Kula et al. 2013; Uemura et al. 2017; Coelho et al. 2015; Boehringer et al. 2017; Banerjee et al 2017; Ahmed and Barmada 2021).

Matrin-3 is well-conserved among vertebrates. Human Matrin-3 is composed of 847 amino-acid residues and possesses two CCHH-type zinc finger (ZF) domains (the regions spanning residues H291-Y326 for ZF1 and P797-T847 for ZF2, respectively), which were reportedly responsible for the DNA-binding activity of Matrin-3 (Hibino et al. 2000). In the peptide region between the two ZF domains, there are two tandemly-linked RNA recognition motif domains (RRM domains) spanning residues R398-I477 and R496-V576, respectively (referred to as RRM1 and RRM2, hereafter in the text) (Supplementary Fig. 1a) (Hibino et al. 2006). It has been reported that RRM2 could bind to the RNA sequence (5′-AUCUU-3′) (Ray et al. 2013), but RRM1 does not show any substantial RNA binding activity (Ayala et al. 2005; Kuo et al. 2009; Buratti and Baralle 2001). On the other hand, enhanced cross-linking immunoprecipitation (eCLIP) experiments showed another RNA sequence bound to Matrin-3 other than the pyrimidine-rich sequence described above (Ramesh et al. 2020a, b; Van Nostrand et al. 2016). The remaining amino-acid sequences in Matrin-3, namely the N-terminal region, and that region between RRM2 and ZF2, consist of two intrinsically disordered regions, termed N- and C-IDR, respectively.

Recently, Matrin-3 has attracted remarkable attention, since functional abnormalities of Matrin-3 cause hard-to-treat neuromuscular human diseases, amyotrophic lateral sclerosis and frontotemporal dementia (ALS/FTD) (Brown and Al-Chalabi 2017; Ito et al. 2017; Xue et al. 2020; Ahmed and Barmada 2021; Malik and Barmada 2021). For familial ALS, a growing number of genetic mutations have been identified on several genes. In particular, pathogenic mutants of TAR DNA-binding protein of 43 kDa (TDP-43) and fused in sarcoma (FUS) form insoluble granules in the cytoplasm, leading to neuronal cell death (Kamelgarn et al. 2016; Picchiarelli and Dupuis 2020). Matrin-3 has also been associated with familial ALS. The S85C, F115C, P154S, and T622A mutants within the N- and C-IDR of Matrin3 have been identified as pathogenic (Lin et al. 2015; Leblond et al. 2016; Xu et al. 2016; Marangi et al. 2017). In this case, the inclusion bodies of wild-type TDP-43 proteins with the mutated Matrin-3 are frequently formed in the cytoplasm. Moreover, even in the sporadic ALS cases, the insoluble granules of wild-type TDP-43 formed in the neural cytoplasm also frequently contain wild-type Matrin-3 (Tada et al. 2018).

In normal neurons, Matrin-3 adopts a granular nuclear localization. However, in spinal cord samples obtained from patients with ALS, Matrin-3 diffuses in the cytoplasm and is involved in the formation of insoluble granules in spinal motor neurons. Recent studies have suggested that the formation of cytoplasmic aggregates of TDP-43 is dependent on a liquid–liquid phase separation (LLPS) and dysfunctional Matrin-3 affects the LLPS of TDP-43. In this case, it has been deduced that the solubility of Matrin-3 is increased upon the interaction with RNA, which suppresses the formation of an abnormal LLPS (Maharana et al. 2018; Gallego-Iradi et al. 2019; Česnik et al. 2020).

On the other hand, it has been also reported that the RNA-binding activity of Matrin-3 could accelerate ALS pathogenesis. The G4C2 repeat expansion in the first intron of the C9orf72 gene is the most common genetic cause of ALS. When its abnormally-transcribed RNA molecules are translocated into the cytoplasm, they are speculated to recruit several RNA binding proteins for the formation of the insoluble granules, and also Matrin-3 through protein-RNA interaction (DeJesus-Hernandez et al. 2011, Renton et al. 2011; Ramesh et al. 2020a).

These evidences suggest that the RNA binding activity mediated by RRMs of Matrin-3 could play an important role in the formation of insoluble granules in the cytoplasm. However, it is not clear yet how their RNA binding activities contribute to ALS pathogenesis, which has been hypothesized to be through protein–protein and/or protein-RNA interactions (Kamelgarn et al. 2016; Iradi et al. 2018; Malik et al. 2018; Ramesh et al. 2020b).

Here we report the 1H, 13C, and 15N chemical shift assignments and the solution structures of the two RRM domains of mouse Matirn-3, since the amino-acid sequences of its RRM domains are entirely identical to the corresponding human domains. These assignments and the structural information obtained in this work will provide insight in the further understanding of the neurodegenerative disease caused by Matrin-3.

Methods and experiments

Sample preparation

A clone of mouse Matrin-3 was utilized, as the primary sequences of the regions corresponding to RRM1 and RRM2 are identical between humans and mice. In our study, a cDNA clone with a natural variation (an S397R mutation), which appeared at a position just preceding the RRM1 region, was used for plasmid construction. Mouse Matrin-3 is composed of 846 amino-acid residues. The protein samples used for the NMR experiments were RRM1 and RRM2 of mouse Matrin-3, corresponding to residues Q390-K478 (RRM1) and K478-V576 (RRM2), respectively (Supplementary Fig. 1b). The folding states of the proteins were checked by 2D 1H–15N HSQC experiments with 15N-labeled samples (Kigawa et al. 2004), and we could produce the two RRM domains in soluble forms.

15N/13C-labeled Matrin-3 RRM1 and RRM2 were synthesized using an Escherichia coli cell-free protein synthesis system (Kigawa et al. 2004; Matsuda et al. 2007) and treated and purified as described previously (Li et al. 2008). The samples were expressed as N-terminal His-tagged fusion proteins. The fusion proteins were purified using a Ni–NTA affinity column. The His-tag was released by TEV protease cleavage and the two RRM domains were further purified using Superdex-75 gel filtration chromatography (GE Healthcare). For structure determination, uniformly 15N/13C-labeled RRM samples were concentrated to nearly 1.0 mM in 20 mM Tris–HCl (Tris-d6) buffer (pH 7.0), containing 100 mM NaCl, 1 mM dithiothreitol, and 0.02% NaN3 with the addition of 2H2O to 10% v/v.

NMR spectroscopy and structure calculations

All NMR data were acquired at 298 K on Bruker 600 MHz and Bruker 800 MHz spectrometers and processed with NMRPipe software (Delaglio et al. 1995). Two-dimensional 1H–13C and 1H–15N HSQC spectra, three-dimensional HNCO, HN(CA)CO, HNCA, HN(CO)CA, HNCACB, CBCA(CO)NH, HBHA(CO)NH, H(CCCO)NH, (H)CC(CO)NH, HCCH-TOCSY, HCCH-COSY, CCH-TOCSY and NOESY spectra (Clore and Gronenborn 1998; Cavanagh et al. 2007) were used to assign all carbon, nitrogen, and hydrogen atoms of the proteins.

NOE peaks from the 15N and 13C-edited 3D NOESY spectra with 80 ms mixing time were converted to distance restraints for the structure calculations of Matrin-3 RRM1 and RRM2. The three-dimensional structures of the proteins were determined by combined automated NOESY cross peak assignment and structure calculation with torsion angle dynamics (Herrmann et al. 2002) implemented in the program CYANA 2.1 (Güntert et al. 1997). The dihedral angle restraints for ϕ and ψ were obtained from the main-chain and the 13Cβ chemical shift values using the program TALOS (Cornilescu et al. 1999) and by analyzing the NOESY spectra. Stereospecific assignments for isopropyl methyl and methylene groups were determined based on the patterns of the inter- and intra-residual NOE intensities (Powers et al. 1993). For each RRM, the structure calculations started from 200 randomized conformers using the standard CYANA simulated annealing schedule, with 40,000 torsion angle dynamics steps per conformer (Güntert and Buchner 2015). Among them, the 20 structures with the lowest CYANA target function values were deposited in the Protein Data Bank (accession codes: 1X4D for RRM1 and 1X4F for RRM2).

Further refinements by restrained molecular dynamics followed by restrained energy minimization were performed for the 40 conformers with the lowest final CYANA target function values, using the Amber12 program with the Amber 2012 force field and a generalized Born model (Case et al. 2005), as described previously (Tsuda et al. 2011). Finally, the 20 conformers with the lowest Amber energy values were selected. They were deposited in the Protein Data Bank (accession codes: 7FBR for RRM1 and 7FBV for RRM2). PROCHECK-NMR (Laskowski et al. 1996) and MOLMOL (Koradi et al. 1996) were used to validate and to visualize the final structures, respectively.

Extent of resonance assignments

The assigned 1H-15N HSQC spectra of RRM1 and RRM2 are depicted in Fig. 1a and b. In the case of RRM1, the backbone resonance assignments were almost complete, except for the amide protons and nitrogen atoms of Val399, Lys409, Asn410, Lys433, Gln472, Lys473, and Arg476. In total, 98.8%, 100%, and 94.3% of the Cα, Cβ, and C′ chemical shifts were determined, respectively. Furthermore, the chemical shifts of the side-chain resonances except for the Cδ and Cε protons of Tyr454, and the side-chain NH2 resonances of Gln390 were also assigned. The backbone and side-chain resonance assignments for RRM2 are complete except for the amide protons and nitrogen atoms of Gln486, Lys487, Asp489, Glu493, His505, Gly507, Met531, and Lys573, the side-chain resonances of Lys479, Lys 483, Gln486, Arg530, Lys573, and the Cζ protons of Phe536. As described below, the N-terminal segment spanning residues 478–496 adopted a disordered structure, which caused the missing backbone resonances. In total, 98.9%, 98.9%, and 91.9% of the Cα, Cβ, and C′ chemical shifts were determined, respectively. For both RRM domains, all X-Pro peptide bonds were confirmed to be in the trans conformation.

Fig. 1
figure 1

1H-15N HSQC spectra of the two RRM domains of Matrin-3 a RRM1 and b RRM2. Signals are labeled with their assignments. Both data sets were acquired on Bruker 600 MHz spectrometers by the States-TPPI method with the water-flip back pulse sequence. The red-colored peak of V564 is aliased

The quality of the NOESY spectra of RRM1 and RRM2 are appropriate for straight-forward structure calculation. In the 15N- and 13C-edited 3D NOESY spectra, 2063 non-redundant distance restraints including 758 long-range distance restraints for RRM1, and 1905 non-redundant distance restraints including 707 long-range distance restraints for RRM2, were identified. The backbone torsion angle restraints calculated by the TALOS program (Cornilescu et al. 1999) were also used for structure calculations with the program CYANA 2.1 (Herrmann et al. 2002; Güntert et al. 1997; Güntert 2004) and Amber12 (Case et al. 2005). The main chain of the calculated structures of RRM1 and RRM2 were fitted for residues 398–471 and 496–569, respectively (these regions correspond to the canonical secondary structural elements of the RRM domain). A bundle of 20 conformers representing the solution structures of RRM1 and RRM2 are shown in Fig. 2a and b, respectively. Their precisions are characterized by RMSD values to the mean coordinates of 0.25 Å for the backbone atoms and 0.93 Å for all heavy atoms of RRM1, and 0.31 Å for the backbone atoms and 1.18 Å for all heavy atoms of RRM2. For these regions, the structural qualities of both RRMs also reflect that 100.0% of the (ϕ,ψ) backbone torsion angle pairs are in the most favored and additionally allowed regions of the Ramachandran plot, according to the program PROCHECK-NMR (Laskowski et al. 1996). Statistics regarding the quality and precision of the final 20 best conformers that represent the solution structures of RRM1 and RRM2 are given in Supplementary Table I.

Fig. 2
figure 2

Solution structures of the two RRM domain of Matrin-3. Best-fit superposition of the backbone atoms from the 20 structures of Matrin-3 a RRM1 and b RRM2 with the lowest energy, as calculated by CYANA2.1 and Amber12. Ribbon presentation of the lowest energy structure of Matrin-3 c RRM1 and d RRM2. The helices, β-strands, β′–β″ hairpin and loop regions are shown in red, cyan, green and gray, respectively. In addition, the regions corresponding to the C-terminal extensions specific for the PTBP-1 subgroup (Gln472–Ile477 for RRM1 and Glu570-Leu575 for RRM2) were colored brown. Electrostatic surface presentation of Matrin-3 e RRM1 and f RRM2 in the same view as (c) and (d). The back surface of the structures are shown for g RRM1 and h RRM2 (surface models (e) and (f) are rotated 180 degrees around vertical axis). Blue and red represent positive and negative electrostatic surface potentials, respectively

Solution structures of the two RRM domains of Matrin-3

The Matrin-3 RRM1 and RRM2 adopts a β1–α1–β2–β3–α3–β4 topology (β1:V399–M403, α1:N410–V419, β2:I425–L431, β3:E436–E440, α2:T444–T456, and β4:R467–L470 for RRM1, and β1:V497–S501, α1:D510–A517, β2:I523–M529, β3:Q534–E538, α2:R542–K554, and β4:K565–L568 for RRM2) (Supplementary Fig. 1 and Fig. 2a–d). As a canonical RRM fold, the four β-strands form an antiparallel β-sheet with the order of β4–β1–β3–β2. The α1 and α2 helices are packed against the β-sheet structure. The α2 helix and the β′–β″ hairpin structures (L460–P465 for RRM1, and W558–C563 for RRM2) associate with the loop between the β1 strand and the α1 helix (Fig. 2c, d).

For both RRMs, some amino-acid residues in the C-terminal regions just following the β4-strand do not show signals in the 1H-15N HSQC spectra. Thus, the C-terminal regions were determined less well than the core of the RRM-fold in the structural calculations and did not show distinct secondary structure elements. However, the C-terminal regions were located near the β-sheet surface by NOEs between hydrophobic amino-acid residues (Y474 and I477 of RRM1, and Y572 and L575 of RRM2, Supplementary Figs. 1 and 2) and hydrophobic amino-acid residues in the β-sheets (V399, L429, and F438 of RRM1, and V497, I527, and F536 of RRM2) respectively. The C-terminal extension of PTBP1 RRM1 also exhibited the same structural feature, as reported previously (Oberstrass et al. 2005) (Supplementary Fig. 2).

Analysis using the DALI protein structural comparison server (http://ekhidna2.biocenter.helsinki.fi/dali/) showed that the overall structures of Matrin-3 RRM1 and RRM2 are very similar to the first RRM domain of PTBP1 (Z-score: 10.9, RMSD: 1.76 Å for the Cα atoms of matched residues in its best 3D superimposition form PDBID:2N3O) (Supplementary Fig. 2a) and the first RRM domain of hnRNP L (Z-score: 10.3, RMSD: 1.76 Å for the Cα atoms of matched residues in its best 3D superimposition form PDBID:2MQL). Structure superpositions of Matrin-3 RRM1, RRM2, and PTBP1 RRM1 (Oberstrass et al. 2005) revealed the characteristic structural points of these RRMs as mentioned below (for comparison, the structure of Musashi-1 RRM1 is shown in Supplementary Fig. 2b as an example of a “standard” RRM domain). As previously pointed out (Blatter et al. 2015), these RRMs comprise the sub-group (hereafter referred to as the PTBP-1 subgroup).

First, the α2 helices of the PTBP-1 subgroup members are longer than in average RRM-folds. In most RRMs, the α2 helices are composed of ten or eleven residues, while the α2 helices of the Matrin-3 RRMs are composed of fourteen residues. Thus, they are one turn longer at the C-terminus than other canonical RRMs. Second, the members of the PTBP-1 subgroup have a C-terminal extension that covers the β-sheet surface (Supplementary Fig. 2). Aliphatic amino-acid residues are located in the C-terminal fragment just following the β4 strands (I477 of Matrin-3 RRM1 and L575 of Matrin-3 RRM2) and interact with hydrophobic amino-acid residues on the β-sheet surface, as it was discussed by Blatter et al. based on our deposited solution structures of the Matrin-3 RRMs (PDBID: 1X4D and 1X4F) (Blatter et al. 2015). However, the lengths between the end of the β4-strand and the key aliphatic amino-acid residues (I477 of RRM1 and L575 of RRM2) are different from that of PTBP1 RRM1 (L136 of PTBP1 RRM1). Based on a comparison of the tertiary structures, the positions corresponding to the side-chains of Lys473 of RRM1 and Lys571 of RRM2 in Matrin-3 were occupied by His133 in PTBP1 RRM1. Then, the aromatic amino-acid residue corresponding to Y474 (RRM1) and Y572 (RRM2) was not identified in PTBP1 RRM1. In the calculated structures of Matrin-3 RRM1 and RRM2, the side-chains of the preceding Lys residues (K473 of RRM1 and K571 of RRM2) seem to stack with the aromatic rings of the tyrosine residues (Y474 of RRM1 and Y572 of RRM2) and the side-chains of the aromatic amino-acid residues located on the β3-strand (F438 of RRM1 and F536 of RRM2), respectively, through the cation-π interactions (Supplementary Fig. 2c). These interactions were specific for Matrin-3 RRMs among the PTBP-1 subgroup members. In the case of Martin-3 RRMs, the aromatic side chains of Y474 of RRM1 and Y572 of RRM2 seem to occupy the space utilized for recognition of the uracil base in PTBP-1 RRM1. Instead, the spaces could be found on the opposite side of the aromatic ring of Y474 of RRM1 and Y572 of RRM2. In many RRMs, the stacking interactions between the exposed aromatic ring and RNA bases were utilized for RNA recognition (Burd and Dreyfuss 1994; Maris et al. 2005). Thus, it is probable that the spaces are utilized for the accommodation of the RNA bases.

The canonical RRM sequence has two well-conserved consensus sequences, RNP1 [(R/K)-G-(F/Y)-(G/A)-(F/Y)-V-X-(F/Y)] and RNP2 [(L/I)-(F/Y)-(V/I)-X-(N/G)-L], which correspond to the β3 strand and β1 strand, respectively (Bandziulis et al. 1989; Burd and Dreyfuss 1994; Mulder et al. 2007). In the canonical RRMs, aromatic amino-acid residues are located at the third and fifth positions of RNP1 and at the second position of RNP2. They are exposed to solvent and play important roles in the RNA binding activity of RRM domains. However, in the PTBP-1 subfamily including Matrin-3 RRM1 and RRM2, hydrophilic amino-acid residues (Glu436 for Matrin-3 RRM1 and Gln534 for Matrin-3 RRM2) are located at the third position of RNP1. In addition, His residues are located at the second position of RNP2. These features were rare in the canonical RRMs (Muto and Yokoyama 2012). Furthermore, in the case of Matrin-3 RRM1, an acidic amino-acid residue (Asp404) is also located at the position immediately following the β1 strand. Consequently, with Glu436, a negatively-charged patch was formed at the edge of the β-sheet surface of Matrin-3 RRM1 in contrast to Matrin-3 RRM2 (Supplementary Fig. 1b and Fig. 2e, f). This could be the reason that Matrin-3 RRM1 reportedly could not bind to RNA molecules. On the other hand, when the surface models of Fig. 2e, f are viewed from behind, acidic amino-acid residues were clustered and formed a wide negatively-charged patch on the upper surface formed by the C-terminus region of the α1 helix and N-terminus region of the α2 helix in Matrin-3 RRM1 and RRM2 (Fig. 2g, h), which was not obvious in PTBP1 RRM1. Therefore, this negatively-charged surface may mediate the specific protein/protein interaction of Matrin-3. We expect that the assignments and the structural information obtained in this study will provide the insight on the further understanding of the pathogenesis of ALS and/or FTD involving Matrin-3.