Main

To gain structural insights into early spliceosome assembly, we prepared the yeast prespliceosome A-complex on the UBC4 pre-mRNA that carries a mutation in the pre-mRNA branch point sequence, which was previously used to stall the A-complex9 (UACUAAC to UACAAAC, in which A is the branch point adenosine and A is the mutated nucleotide) (Extended Data Fig. 1a, b). The purified A-complex contained stoichiometric amounts of the U1 and U2 snRNP proteins (Extended Data Fig. 1b), and was used to determine cryo-electron microscopy (cryo-EM) densities of the A-complex at 4.0 Å (U1 snRNP, map A2) and 4.9–10.4 Å (U2 snRNP, maps A1 and A3) resolution, respectively (Extended Data Figs. 1c–e, 2). From these densities we built a near-complete atomic model of the A-complex (Fig. 1, Supplementary Videos 1, 2, Supplementary Data, Extended Data Fig. 1f), comprising 34 proteins, U1 and U2 snRNAs, and 34 nucleotides of pre-mRNA. The final model lacks the mobile cap-binding complex, Prp5 or the U1 subunit Prp40 (Extended Data Fig. 1b, d, e; Extended Data Table 1). The elongated U1 and U2 snRNPs bind the pre-mRNA 5′ splice site (5′SS) and branch-point sequences, respectively, and associate in a parallel manner to form the A-complex (Fig. 2a). The U1 snRNP structure contains all the essential regions of the U1 snRNA and 16 proteins (Fig. 1). The U1 snRNP ‘core’ is highly similar to its human counterpart10 (Extended Data Figs. 3, 4), comprising the seven-membered Sm ring and orthologues of the human U1 snRNP proteins (Snp1, human U1-70k; Mud1, human U1A; Yhc1, human U1C), and is bound to the peripheral yeast U1 proteins Luc7, Nam8, Prp39, Prp42, Snu56 and Snu7111 (Extended Data Figs. 3, 4). The U2 snRNP has a bipartite structure as observed in B-complex8, comprising the SF3b subcomplex (‘5′ region’) and the U2 3′ domain and SF3a subcomplex (‘3′ region’) that are organized around the 5′ and 3′ regions of the U2 snRNA, respectively (Figs. 1, 2a, Extended Data Fig. 5). At the current resolution, the conformation of the U2 5′ region appears unchanged from the B-complex8, in which the pre-mRNA branch-point sequence is base-paired with the U2 snRNA and the branch point adenosine is bulged out and accommodated in a pocket formed by the U2 SF3b subunits Hsh155 and Rds3. After we completed the A-complex structure, the cryo-EM structure of the free yeast U1 snRNP was reported12. This model is in good agreement with the U1 snRNP in our A-complex structure, but there are important differences12.

Fig. 1: Prespliceosome A-complex structure.
figure 1

Two orthogonal views of the yeast A-complex structure. Subunits are coloured according to snRNP identity (U1, shades of purple, U2, shades of green), and the pre-mRNA intron (black) and its 5′ exon (orange) are highlighted. The orthologous human protein name is shown after the solidus. The location of the cap-binding complex (CBC) is indicated by a brown oval (see Extended Data Fig. 1e).

Fig. 2: 5′SS recognition and implications for alternative splicing.
figure 2

a, The A-complex U1–U2 snRNP interfaces (A and B) and the RNA network are shown as cartoons, and are superimposed on the transparent surfaces of the prespliceosome proteins. The U2 subunit Hsh155 surface (grey oval), which interacts with the tri-snRNP in the B-complex, is freely accessible in the A-complex. The U1 snRNP proteins Nam8 (orange, human TIA-1), Luc7 (purple, human LUC7L), Prp39 (magenta, human PRPF39) and Yhc1 (dark magenta, human U1C) and the U2 snRNP proteins Lea1 (light green, human U2-A′), Rse1 (dark green, human SF3B3), and Prp9 (teal, human SFA3) are shown as ribbons. BP, branch point. b, The pre-mRNA 5′SS is recognized by the U1 snRNA 5′ end, and is stabilized by Luc7 and Yhc1. Notably, the Yhc1 ZnF and Luc7 ZnF2 domains are arranged with pseudo-C2 symmetry around the U1–5′SS helix. c, Nam8 binds the U1 snRNP through its linker (yellow), RNA recognition motif 3 (RRM3, light orange) and C-terminal regions (orange), whereas its RRM1 and RRM2 domains are mobile and project towards the intron to bind uridine-rich sequences downstream of the pre-mRNA 5′SS (dashed black line), as with its human counterpart TIA-118. Nam8 contacts the Yhc1 (human U1C) C terminus, and human TIA-1 biochemically also interacts with human U1C18. Snu56 (blue), Prp39 (magenta), Prp42 (violet), and Hsh49 (light green) are shown as transparent ribbon models and other protein and U1 snRNA elements were removed for clarity.

The first ten nucleotides of U1 snRNA are disordered in the free U1 snRNP12, but become ordered in our A-complex structure by pairing with the pre-mRNA 5′SS (Fig. 2a, b). Additional density appeared adjacent to the U1–5′SS helix, into which we could build a newly ordered Yhc1 peptide (human U1C) that contacts the 5′SS phosphate backbone (+5 and +6 positions, the ‘Yhc1–5′SS loop’) and a near-complete model of Luc7 (in the previous study Luc7 was attributed to what is now assigned as Snu7112) (Extended Data Figs. 3a, c, 4a). Although Luc7 is disordered in the free U1 snRNP, it associates stably with the U1–5′SS helix in the A-complex (Extended Data Fig. 4a), suggesting a mechanism for the selection of weak 5′SS sequences13. In our structure Luc7 is anchored by its N-terminal α-helix 1 to the Sm ring subunit SmE, and its C3H-type zinc finger 1 (ZnF1) domain binds where the 5′ exon emerges from the U1–5′SS helix, in excellent agreement with RNA–protein crosslinks13 (Fig. 2b). The adjacent Luc7 C2H2-type ZnF2 contacts the U1–5′SS helix minor groove and the U1 snRNA phosphate backbone (nucleotides U5–C8). This interaction mirrors that between the Yhc1 ZnF domain and the 5′SS nucleotides +1 to +4 downstream of the 5′SS junction10 (Fig. 2b). Thus, Yhc1 and Luc7 make no base-specific interactions with the U1–5′SS helix, and instead cradle the U1–5′SS helix phosphate backbone to stabilize 5′SS binding. Consistent with the structure, weakening of any of these interactions can impair splicing and bypass the requirement for Prp28 helicase activity13,14,15,16.

The A-complex structure reveals structural insights into the functions of the human alternative splicing factors LUC7-like (LUC7L, yeast Luc7) and TIA-1 (yeast Nam8) (Extended Data Fig. 4c, d). Luc7 and its human homologues LUC7L1–3 are highly conserved, suggesting that the LUC7L N-terminal α-helix also anchors it to the SmE protein and that the invariant ZnF2 helix α8 similarly stabilizes the U1–5′SS helix to promote the inclusion of weak alternative splice sites13 (Fig. 2b, Extended Data Figs. 3c, 6a). The yeast U1 snRNP subunit Nam8 and its human homologue TIA-1 contain three RNA recognition motif (RRM) domains and a C-terminal Gln-rich extension (Extended Data Fig. 6b). Human TIA-1 binds to uridine-rich sequences downstream of the 5′SS predominantly through the RRM2 domain17,18 to allow the use of weak 5′SSs. The Nam8 RRM2 shows high sequence similarity to the TIA-1 RRM2, including the nearly identical RNP1 and RNP2 motifs, indicating that Nam8 also binds uridine-rich sequences through its RRM2 also (Extended Data Fig. 6b). In the A-complex structure the Nam8 RRM3 and its C-terminal region bind in a cavity of the Prp39–Prp42 heterodimer and contact the Yhc1 C-terminal region near the U1–5′SS helix (Fig. 2c). From this location, Nam8 could project its mobile RRM2 domain to bind uridine-rich intron sequences downstream of the 5′SS, consistent with crosslinking experiments17, and thereby promote meiotic pre-mRNA splicing19 (Fig. 2a, c).

In the A-complex, the U1 snRNP binds to the U2 snRNP through two interfaces, A and B (Fig. 2a). In interface A, the N-terminal helices α1–2 of the U1 protein Prp39 stably bind the U2 3′ domain subunit Lea1 (human U2A′) (Fig. 2a, Extended Data Fig. 5). The Prp39–Prp42 heterodimer binds Yhc1 to anchor the U2 snRNP 3′ domain to the U1 snRNP. Similar interactions were observed biochemically between the human alternative-splicing factor PRPF39 homodimer and U1C12 (yeast Yhc1), suggesting that PRPF39 may contact the human U2 3′ domain in a similar manner, although it is not an obligate component of the human A-complex20 (Fig. 2a). Different, non-overlapping Lea1 surfaces are used to interact with the NTC protein Syf1 in the yeast C- and C*/P-complex conformations of the spliceosome21 (Extended Data Fig. 5c), suggesting that Lea1 aids in the repositioning of the U2 3′ domain in multiple stages of splicing. Interface B is transient and found only in a subset of cryo-EM images (Extended Data Figs. 2a, 5a, b). It involves weak interactions between the yeast-specific U1 snRNA stem loop 3–3 and the U2 SF3b Rse1 subunit β-propellers B and C (BPB and BPC) and the C terminus of U2 SF3a Prp9. The pre-mRNA 5′SS and branch point branching reactants are positioned approximately 150 Å apart in the A-complex, with 40 nucleotides of the UBC4 intron looped out in between (Fig. 2a, Extended Data Fig. 1e, f). The small interfaces between the U1 and U2 snRNPs orient the snRNPs relative to each other, and this may facilitate 5′SS transfer in the assembled spliceosome and the subsequent dissociation of the U1 snRNP, consistent with the structural and biochemical data7,8. Although the precise U1–U2 snRNP interfaces may differ in the human A-complex, a key function of U1–U2 (alternative) splicing factors could be to ensure that U1 and the U1–5′SS helix are oriented correctly relative to the U2 snRNP.

Before A-complex formation, the yeast Msl5–Mud2 heterodimer recognizes the branch point sequence through Msl5 and binds the U1 snRNP subunit Prp40 (human PRPF40) in the E complex, looping out the intron between the 5′SS and branch point sequences22 (Extended Data Fig. 4e). Although Prp40 was not identified in the free U1 snRNP12 or in our A-complex structure, Prp40 crosslinks to Luc7 and Snu7112 and unassigned cryo-EM density in the A-complex may indicate its peripheral location near Luc7 (Extended Data Figs. 1e, 4a, e). Msl5–Mud2 may then be destabilized by the Sub2 helicase, allowing the Prp5 helicase to remodel U2 snRNA for the stable association of the U2 snRNP with the branch point sequence in the A-complex9. Prp5 was shown to physically interact with the U2 SF3b subunit Hsh155 HEAT repeats 1–6 and 9–1223 and with U2 snRNA at and surrounding the branch point-interacting stem loop9. Thus, after Prp5 activity, Prp5 needs to dissociate to fully expose the Hsh155 HEAT repeats 11–13 together with the U2 snRNA 5′ end in the A-complex, to allow for the subsequent U4/U6.U5 tri-snRNP association to assemble the spliceosome7,8,9 (Fig. 2a).

The A-complex structure also provides new insights into formation of the fully assembled pre-B-complex spliceosome, which requires integration of the tri-snRNP with the A-complex. The subsequent Prp28 helicase-mediated transfer of the 5′SS from U1 to U6 snRNA and destabilization of the U1 snRNP produces the B-complex spliceosome24. We first modelled a fully assembled yeast spliceosome, by superimposing the U2 snRNP SF3b-containing domains of the yeast A-complex (from this study) and the yeast B-complex structure8. As in the B-complex structure8, the U2 snRNP would associate with tri-snRNP via the U2/U6 helix II and Prp3 (Extended Data Fig. 7). The modelling shows that the U1 snRNP would clash with large parts of the Brr2-containing ‘helicase’ domain (‘U1-B-complex’; Extended Data Figs. 7b, 8b), which may be relieved owing to their known flexibilities8 (Extended Data Fig. 5a). However the known binding site for Prp28 at the U5 Prp8 N-terminal domain (Prp8N) observed in human tri-snRNP25 would be sterically occluded by the pre-bound B-complex proteins7,8,26. We therefore considered an alternative model for the assembled yeast ‘pre-B-complex’ spliceosome, by combining the available data from yeast and human systems8,25,27,28 (Fig. 3a, Extended Data Figs. 7a, 8a). First, the isolated human25 and yeast tri-snRNP26,29 structures differ in their protein composition and conformation, indicating that different complexes accumulate at steady-state. In the human tri-snRNP structure25 the BRR2 helicase is held near SNU114 by the SAD1 protein and PRP28 is bound to the PRP8 N-terminal domain. In the yeast tri-snRNP26,29 and the yeast and human B-complex structures7,8 Brr2 is repositioned and loaded onto its U4 snRNA substrate and the B-complex proteins replace Prp28 at the Prp8N domain, ready for spliceosome activation. Second, in humans, an ATPase-deficient PRP28 helicase stalls spliceosome assembly at the pre-B-complex stage, before disruption of the U1–5′SS interaction28 and this complex comprises the U1 and U2 snRNPs, a loosely associated tri-snRNP, and SAD128. Third, in yeast, Sad1 is essential for splicing and is very transiently associated with the tri-snRNP27. Given the high conservation of the major spliceosome components in yeast and humans, the yeast spliceosome may likewise assemble with a human-like tri-snRNP that contains Prp28, Sad1 and a repositioned Brr2 helicase25,28. On the basis of these assumptions, we modelled a yeast pre-B-complex spliceosome that comprises all five snRNPs with a combined molecular mass of approximately 3.1 megadalton and with only minor clashes (Fig. 3a, Extended Data Fig. 7a, b). Notably, this model indicates that the U2 snRNP positions the U1 snRNP to deliver the U1–5′SS helix to the exposed U6 ACAGAGA stem in tri-snRNP, only approximately 20 Å away from where Prp28 is likely to mediate 5′SS transfer, consistent with protein–RNA crosslinks30 (Fig. 3b). This suggests that the subsequent repositioning of the Brr2 helicase onto the U4 snRNA, observed in the B complex structure7,8,would coincide with the release of the U1 snRNP owing to a steric clash, rendering Brr2 competent for spliceosome activation only after successful 5′SS transfer (Extended Data Figs. 7b, 8a). The model thus indicates a new molecular checkpoint to couple 5′SS transfer with U1 snRNP release and formation of the B-complex (Extended Data Figs. 7b, 8a).

Fig. 3: Spliceosome assembly and 5′SS transfer.
figure 3

a, One of the two alternative pre-B-complex models, suggesting that the U2 snRNP orients the U1 snRNP to deliver the pre-mRNA 5′SS to the U6 ACAGAGA stem. The model was obtained by superposing the yeast A- (from this study) and B-complex structures (RCSB Protein Data Bank code (PDB ID) 5NRL) and by modifying the locations of Brr2, U4 Sm ring, Sad1, and Prp28 to resemble a human-like pre-B-complex conformation on the basis of the biochemical data and the human U4/U6.U5 tri-snRNP structure (PDB ID 3JCR) (see ‘Structural modelling’ in Methods). Colouring as in Fig. 1 and a previously published work8. b, The pre-B-complex RNA network and the Prp28 helicase are shown as cartoons and are superimposed on transparent surfaces of the spliceosome proteins. Prp28 is positioned at the Prp8 N-terminal domain as in human tri-snRNP25 and may clamp onto the pre-mRNA near the U1–5′SS helix to destabilize it and transfer the 5′SS from U1 snRNA to the U6 snRNA ACAGAGA stem (red arrow), which are separated by approximately 20 Å in the pre-B model. The positions of proteins marked with asterisks are based on the human tri-snRNP structure (PDB ID 3JCR).

In summary, the prespliceosome structure reveals how the U1 and U2 snRNPs recognize the two reactants of the branching reaction and associate together with the tri-snRNP into the fully assembled spliceosome. The results further suggest how the human alternative-splicing factors LUC7L and TIA-1 may influence splice-site selection.

Methods

Prespliceosome preparation and purification

To obtain the prespliceosome A-complex for structural studies, we prepared yeast S. cerevisiae containing a genomic TAPS affinity tag on the U2 snRNP subunit Hsh155, essentially as described31. Yeast cells were grown in a 120-l fermenter, and splicing extract was prepared using the liquid-nitrogen method, essentially as described32. Capped UBC4 pre-mRNA containing a point mutation (U > A) two nucleotides upstream of the branch point adenosine and three MS2 stem loops at the 3′ end was produced by in vitro transcription9,33. The RNA product was labelled with Cy5 at its 3′ end to monitor complex purification34. The pre-mRNA substrate was bound to the MS2–MBP fusion protein and added to an in vitro splicing reaction carried out for 90 min at 23 °C, essentially as described33. The reaction mixture was then centrifuged through a 40% glycerol cushion in buffer A (20 mM HEPES (pH 7.9), 50 mM KCl, 0.2 mM EDTA, 1 mM dithiothreitol (DTT), 0.04% NP-40). The cushion was diluted with buffer A containing 1% glycerol, and applied to amylose resin (NEB) pre-washed with buffer B (20 mM HEPES (pH 7.9), 75 mM KCl, 5% glycerol, 0.2 mM EDTA, 1 mM DTT, 0.03% NP-40). After 12 h incubation at 4 °C, the resin was washed with buffer B and eluted in buffer B containing 50 mM KCl and 12 mM maltose. Fractions containing A-complex were pooled and applied to Strep-Tactin resin (GE Healthcare), pre-washed with buffer B, and incubated for 4 h at 4 °C. The resin was washed with buffer B containing 2 mM MgCl2, and eluted with buffer B containing 50 mM KCl, 2.5 mM desthiobiotin, and 2 mM MgCl2. The A-complex fractions were pooled and crosslinked using 1.1 mM BS3 (Sigma) on ice for 1 h, and subsequently quenched with 50 mM ammonium bicarbonate. The sample was concentrated to ~0.4 mg ml−1 and immediately used for EM sample preparation. Mass spectrometry (data not shown), indicated that homogenous A-complex was purified, containing sub-stoichiometric amounts of Prp5 (Extended Data Fig. 1b). The splicing assay in Extended Data Fig. 1a was carried out as for A-complex purification, but in a volume of 25 μl and in the absence of MS2–MBP fusion protein, and was visualized after 30 min of splicing at 23 °C on a denaturing 14% polyacrylamide TBE gel with a Typhoon scanner (GE Healthcare).

Electron microscopy

For cryo-EM analysis the A-complex sample was applied to R2/2 holey carbon grids (Quantifoil), precoated with a 5–7-nm homemade carbon film. Grids were glow-discharged for 20 s before deposition of 2.5 μl sample (~0.4 mg ml−1), and subsequently blotted for 2–3.5 s and vitrified by plunging into liquid ethane with a Vitrobot Mark III (FEI) operated at 4 °C and 100% humidity. Cryo-EM data were acquired on three separate FEI Titan Krios microscopes (datasets one to three) operated in EFTEM mode at 300 keV, each equipped with a K2 Summit direct detector (Gatan) and a GIF Quantum energy filter (slit width of 20 eV, Gatan). Datasets one and three were recorded using ‘Krios 1’ and ‘Krios 2’ at the MRC-LMB, respectively, and dataset three using ‘Krios 2’ at the Astbury Biostructure Laboratory (University of Leeds). For dataset one, 5,935 movies were acquired using EPU (FEI) with a defocus range of –0.4 μm to –4.4 μm at a nominal magnification of 105,000× (1.13 Å pixel–1). The camera was operated in ‘counting’ mode with a total exposure time of 13 s fractionated into 20 frames, a dose rate of 4.25 e pixel−1 s−1, and a total dose of 43 e Å−2 per movie. Dataset two was collected in the same manner, except that 727 movies were recorded using SerialEM35, at a nominal magnification of 105,000× (1.14 Å pixel−1), a total exposure time of 8 s fractionated into 20 frames, a dose rate of 4.33 e pixel−1 s−1 and a total dose of 27 e Å−2 per movie. Dataset three was collected with EPU (FEI) similar to dataset one, except that 2,745 movies were collected at a nominal magnification of 130,000× (1.07 Å pixel−1), a total exposure time of 8 s fractionated into 20 frames, a dose rate of 7.94 e pixel−1 s−1 and a total dose of 56 e Å−2 per movie.

Image processing

Movies were aligned using MOTIONCOR236 with 5 × 5 patches and applying a theoretical dose-weighting model to individual frames. Contrast transfer function (CTF) parameters were estimated using Gctf37. Resolution is reported on the basis of the gold-standard Fourier shell correlation (FSC) (0.143 criterion) as described38 and B-factors were determined and applied automatically in RELION 2.139,40. Particles from dataset one were automatically picked using Gautomatch (K. Zhang) and screened manually, and were then extracted in RELION with a 5,602 pixel box size and pre-processed. Particles from datasets two and three were picked and pre-processed in the same way, and were then rescaled to the pixel size of dataset one (1.13 Å pixel−1) in RELION 2.1 by Fourier cropping during particle extraction with a 5,602 pixel box. For rescaling, we first calculated 3D refinements in RELION 2.1 for each dataset (one to three) and performed real space correlation fits in UCSF Chimera to identify scaling factors for datasets two and three relative to dataset one. Because the absolute magnification values differed slightly for the different microscopes, we re-determined the CTF values for datasets two and three using the new pixel sizes with Gctf37, and then re-extracted and rescaled the particles to the 5,602 pixel box. Combining datasets one to three yielded a total dataset of 406,272 particles that were used for subsequent processing.

The first 22,319 particles from dataset one were used to generate an ab initio 3D reference for the A-complex using default parameters and three classes in cryoSPARC41 (Extended Data Fig. 2a). The complete dataset (one to three) was subjected to a ‘heterogeneous’ (multi-reference) refinement in cryoSPARC using default parameters and four classes: the ab initio A-complex reference and three ‘junk’ references (Extended Data Fig. 2a; round 1). Class one contained 153,570 particles (37.8%, percentage of particles form the full dataset) and was used for a 3D refinement in RELION 2.1 with a soft mask in the shape of the A-complex. This yielded a density (map A1) with an overall resolution of 4.9 Å and a B-factor of −188 Å2, comprising U1 snRNP and the U2 snRNP 3′ region (Extended Data Figs. 1e, d, 2, 9). To improve the U1 snRNP density, we prepared a soft mask enveloping the U1 snRNP with the volume eraser in UCSF Chimera42 and RELION 2.139,40. This enabled the focused refinement of the U1 snRNP (map A2) from the same 153,570 particles to an overall resolution of 4.0 Å and a B-factor of −146 Å2 (Extended Data Figs. 1e, d, 2, 9). In the A-complex the U2 snRNP 5′ region is flexible relative to the U1 and the U2 3′ region (Extended Data Fig. 2). To position the U2 snRNP 5′ region in the A-complex, we used a soft mask surrounding the U2 5′ region and carried out 3D classification without image alignment with six classes (Extended Data Fig. 2a; round 2). This revealed a class with defined U2 5′ region from 19,937 particles (4.9%) that could be refined to an overall resolution of 10.4 Å (Extended Data Figs. 2, 9). Local resolution was estimated using ResMap43 (Extended Data Fig. 2d, e).

Structural modelling

We prepared a composite model of the A-complex by combining the A1-3 densities (Extended Data Fig. 1e, f). Model building was carried out in COOT44. The U1 snRNP coordinates were refined into the sharpened A2 density in PHENIX45 using the phenix.real_space_refine routine, and applying secondary structure, rotamer, nucleic acid and metal ion restraints. Homology models for yeast Yhc1, Snp1, and Mud1 were generated using MODELLER46 from the human U1 snRNP crystal structures10 (PDB ID 4PJO, 4PKD) and were fitted and manually adjusted in the A2 map. The yeast B-complex U5 Sm ring model was used as the initial model for the U1 Sm ring, and was manually adjusted in the A2 density. Initial models for Prp39 and Prp42 were generated by I-TASSER47 and were subsequently adjusted and extended manually. The Prp39 N-terminal residues 47–339 were modelled as poly-alanine owing to a lower local resolution of ~5–6 Å (Extended Data Figs. 2d, e, 3c). Snu56, the Yhc1 C terminus, the Snu71 N terminus were modelled de novo; Yhc1 residues 48–82 and 135–142 were modelled as poly-alanine. To build the Luc7 model a C3H-type ZnF (from PDB ID 1RGO) for ZnF1 and a C2H2-type ZnF (from Yhc1) for ZnF2 were used to guide modelling in the A2 density, with a local resolution of 4–5 Å (Extended Data Fig. 3c). The helices connecting Luc7 ZnF1 and ZnF2 (α5–7) were modelled as poly-alanine, and were assigned on the basis of density connectivity. The U1 snRNP protein model is in excellent agreement with biochemical and protein crosslinking results12. The U1 snRNA model was generated on the basis of similarity to the U1 snRNA in the human U1 snRNP crystal structures (PDB ID 3CW1, 4PJO, 4PKD) and according to the yeast U1 snRNA secondary structure prediction48. All base-pairing U1 snRNA regions (helix H; SL1; SL2-1 and -2; SL3-1, -2, -3, -4, -5 and -6), except for the SL3-7 and the tip of SL3-3, were modelled (Extended Data Fig. 3f, g). The human SL1 loop (PDB ID 4PKD) was rigid-body-fitted together with the homology model of the yeast Snp1 (described above), and the human U1 snRNA sequence was replaced with that of yeast. The loops connecting SL2-1 to SL2-2 as well as SL3-3 to SL3-4 and SL3-4 to SL3-5 and the tips of SL2-2, SL3-3, -4 and -5 were not built, owing to a lower local resolution (~4.5 Å). The location of a region of U1 snRNA SL3-7 was modelled as a phosphate backbone only and may correspond to the sequence surrounding residues 378–391 and 428–440. The U1–5′SS was modelled de novo, and the UBC4 pre-mRNA contained 12 nucleotides, ten from the intron (+1 to +10) and two from the 5′ exon (−1 to −2).

The U2 snRNP 3′ region (U2 3′ domain and SF3a subcomplexes) from the yeast B-complex structure (PDB ID 5NRL) were fitted into the A1 density using UCSF Chimera42, and the positions of Lea1, Msl1 and U2 snRNA residues 139–1169 were adjusted as a rigid body in COOT44. The U2 snRNP 5′ region from the yeast B-complex structure (PDB ID 5NRL) was fitted into the A3 density in UCSF Chimera. This provided an excellent fit, suggesting that the U2 5′ region structure is not changed substantially from that observed in the yeast B-complex8. To generate the complete A-complex model, the refined U1 snRNP model and the U2 snRNP 3′ region were fitted into the A3 density in UCSF Chimera, together with the fitted U2 snRNP 5′ region. The final model comprises 34 proteins, U1 and U2 snRNAs, and the pre-mRNA substrate.

To generate the alternative pre-B-complex model shown in Fig. 3, we modified and combined structural models using COOT44, on the basis of structural and biochemical data from yeast and human systems8,25,28. We first superimposed our A-complex structure on the yeast B-complex structure8 using the U2 SF3b-containing domain. The free human tri-snRNP structure (PDB ID 3JCR), which may resemble the pre-B conformation7,25, was used to model the yeast tri-snRNP in the pre-B-complex conformation. We first removed the B-complex proteins from the yeast B-complex structure, because these are absent in the purified human pre-B-complex28. Human pre-B instead contained the PRP28 helicase and SAD1, and we therefore placed crystal structures of the yeast Prp28 helicase49 (PDB ID 4W7S) and yeast Sad150 (PDB ID 4MSX) in their human tri-snRNP locations25. We then positioned the U4 Sm ring and Brr2 as in the human tri-snRNP structure, in which the Brr2 PWI domain makes a conserved contact with Sad151. We removed a Snu66 peptide bound to Brr2 from the model, because its binding at this site is uncertain in the pre-B-complex conformation. Several minor differences remain between the free human tri-snRNP structure25 and the pre-B-complex model, and these were not modelled. The final pre-B model contained only minor clashes, and one observed clash between the highly flexible Prp28 RecA-2 lobe25 and the flexible U6 snRNA 5′ stem loop8,26 could be resolved by a minor repositioning of either domain. The final pre-B model comprises 66 proteins, five snRNAs, the pre-mRNA substrate, and has a combined molecular mass of ~3.1 MDa.

Figures were generated with PyMol (https://www.pymol.org) and UCSF Chimera.

Reporting summary

Further information on experimental design is available in the Nature Research Reporting Summary linked to this paper.

Data availability

Three-dimensional cryo-EM density maps A1, A2 and A3 have been deposited in the Electron Microscopy Data Bank under the accession numbers EMD-4363, EMD-4364 and EMD-4365, respectively. The coordinate file of the A-complex has been deposited in the Protein Data Bank under the accession number 6G90.