1 Most Membrane Proteins are Made of Transmembrane Helices

Approximately one third of our genes encodes membrane proteins. While lipid bilayers separate aqueous compartments—the inside and outside of cells and intracellular compartments,—membrane proteins connect them by mediating specific transport of molecules across that barrier. Other membrane proteins localize enzyme activities to the lipid surface, connect membranes to the cytoskeleton, or are involved in membrane traffic etc.

Structurally, the vast majority of integral membrane proteins are composed of transmembrane (TM) α-helices embedded in the lipid bilayer, from a single membrane-spanning helix to several or even many helices bundled together. TM helices consist largely of hydrophobic amino acids with the side chains exposed to the hydrophobic core of the membrane and the hydrophilic peptide backbone hidden inside. The thickness of the fatty acyl chain region of the lipid bilayer of ~ 3 nm defines the required length of a TM helix to be ~ 20 residues (1.5 Å progression per amino acid). With 3.6 amino acids per turn, such a helix consists of five to six turns. TM helices are frequently not perpendicular to the membrane, but tilted and thus somewhat longer.

As an example for helix-bundle membrane proteins, the Sec61/SecY translocon complex is shown in Fig. 1a, i.e. the machinery that mediates integration of helical membrane proteins. It consists of three subunits conserved in all kingdoms: SecYEG in bacteria, SecYEβ in archaea (shown here), and Sec61α/γ/β, respectively, in eukaryotes. The main α subunit (SecY/61α) is a ten-helix protein bundled together with a single-helix β subunit (two TM helices in bacterial SecG) on one side and a γ subunit (SecE/61γ) with a TM helix and an amphipathic helix on the membrane surface clasping two other sides. The surface embedded in the lipid is mainly apolar, in contrast to the parts exposed above or below the membrane (Fig. 1a, surface structure colored in green for hydrophilicity). The α subunit produces a central pore for polypeptide translocation that is closed in the idle state by a constriction ring of six mainly hydrophobic residues and a lumenal short plug helix. TM segments 2 and 7 form a lateral gate that provides a translocating chain access to the membrane interior.

Fig. 1
figure 1

Two basic types of membrane proteins: α-helix bundles and β-barrels. a As an example of a helix bundle membrane protein, the structure of the archaeal SecYEβ translocon from Methanococcus janaschii (1RHZ [67]) is shown as backbone ribbon (left) and as a surface representation colored green for hydrophilic residues. b The corresponding hydrophobicity plot is shown according to Kyte and Doolittle [1] (11-residue window) (Color figure online)

To identify TM domains, hydrophobicity plots, where the average hydrophobicity of a window of 11–15 residues throughout a protein sequences is plotted [1], have proven very useful (Fig. 1b). Hydrophobic peptide stretches inside globular, soluble proteins are rarely long enough to be mistaken for TM segments. Since the hydrophobic effect is also the primary contribution to the internal packing of membrane proteins, even TM helices inside a bundle are mostly apolar all around.

The alternative structural principle for membrane proteins is the β-barrel, an anti-parallel closed β-sheet of 8–26 β-strands of ~ 11 residues each (6–22, depending on the tilt) [2, 3], forming a pore with more or less specificity (porins). β-Barrel membrane proteins are strictly limited to the outer membranes of bacteria, mitochondria, and chloroplasts. To be embedded in the bilayer, only every second amino acid side chain in a β-strand pointing to the outside of the barrel needs to be hydrophobic. As a result, their sequences are not sufficiently hydrophobic to integrate as TM helices. Indeed, bacterial porins are translocated by the SecYEG translocon as secretory proteins into the periplasm from where they are integrated into the outer membrane by insertases that are themselves β-barrels. It has been proposed that successive β-strands of nascent β-barrel proteins intercalate between the first and last strands of the insertase until they bud out as independent β-barrels [4, 5]. By analogy, TM helices of nascent helical membrane proteins intercalate into the lateral gate of the SecY/61 translocon to be successively released to form an independent helix bundle. Indeed, recent structures of translocons containing signal or TM domains illustrate exactly this intercalation between the helices of the lateral gate [6,7,8].

2 Three Distinct Processes of TM Segment Integration

The first sequence of a membrane protein to engage with the translocon is the signal sequence. After recruitment of signal recognition particle (SRP), targeting of the SRP—nascent chain—ribosome complex to the SRP receptor at the endoplasmic reticulum (ER) membrane, and transfer of the complex to the translocon, the signal somehow interacts with the Sec61 translocon by inserting itself into the translocation pore. This insertion process is what several translocation inhibitors, such as cotransin, decatransin, and apratoxin, have been shown to block [9,10,11]. These inhibitors either block access of the signal to the translocon and/or they stabilize and lock the closed state of Sec61.

Three classes of signal sequences can be distinguished, as illustrated in Fig. 2(①a–c). Classical cleavable signals (①a), initially discovered by Blobel and Dobberstein [12], insert into the translocon as a loop to place the downstream sequence into the translocation pore. This was illustrated by a cryo-electron microscopy structure of ribosome-bound translocon containing a stalled nascent chain [6]. The signal’s C-terminal end is exposed to ER lumen and cleaved by signal peptidase, generating a new N-terminus on the exoplasmic side. Signal-anchors insert in the same way in an Ncyt/Cexo orientation (cytosolic N- and exoplasmic C-terminus), but are not cleaved and thus remain in the bilayer as a TM anchor (①b). They generally have a longer hydrophobic core than cleavable signals to comfortably span the membrane and they are not necessarily positioned at the very N-terminus, but may be preceded by any length of polypeptide. Reverse signal-anchors insert in the opposite orientation and initiate translocation of their N-terminal end, finally anchoring the protein in the opposite Nexo/Ccyt orientation (①c).

Fig. 2
figure 2

Three types of TM integration: signal, stop-transfer, and re-integration. Topogenic sequences integrating in an Ncyt/Cexo orientation are shown in red for cleaved signal and signal-anchor, and pink for re-integration. TM segments integrating in an Nexo/Ccyt orientation are shown in turquoise for reverse signal-anchor and blue for stop-transfer. Successive stop-transfer and re-integration TM segments result in multi-spanning proteins, illustrated here for 7–8 TM proteins. For example, approximately two thirds of the seven-TM G-protein coupled receptors utilize a cleavable signal (①a②③②③②③②), the rest a reverse signal-anchor (①c③②③②③②) to initiate topogenesis, as shown schematically. Below, hairpin integration is shown as an example that two closely spaced TM segments cannot integrate independently of each other (Color figure online)

As a polypeptide is passing through the translocon, a hydrophobic TM segment will stop further translocation as a so-called stop-transfer sequence by exiting the pore laterally into the lipid bilayer (Fig. 2(②)). The downstream sequence will be directly synthesized through a gap between the ribosome and the translocon into the cytosol. A subsequent TM segment, a re-integration sequence (③), again integrates itself into the membrane via the translocon in an Ncyt/Cexo orientation, inserting its immediate downstream sequence into the pore for translocation. This process is similar to initial insertion by a signal-anchor, except that the translocon might already be in an open state and possibly still be associated with the preceding stop-transfer sequence.

Three types of single-spanning membrane proteins may thus be produced by a cleavable signal and a stop-transfer sequence (Nexo/Ccyt; type I or type Ia), by a signal-anchor sequence (Ncyt/Cexo; type II), or by a reverse signal-anchor (Nexo/Ccyt; type III or type Ia). For multi-spanning (polytopic) membrane proteins, it has been proposed already in 1980 by Blobel that they achieve their final topology by an additional succession of alternating stop-transfer and re-integration sequences (or, as he called them, “internal signal sequences”) [13].

3 The Positive-Inside Rule and N-Terminal Folding Determine Signal Orientation

The signal sequence engages with the translocon to insert one of its flanking sequences into the translocation pore for transfer across the membrane. While cleavable signals and signal-anchors translocate their C-terminal end, the N-terminal sequence is translocated by reverse signal-anchors. Cleavable signals typically carry positive charges in the n-region that remains cytoplasmic [14]. Arginines and lysines were found to be statistically enriched in the cytoplasmic portions of membrane proteins as a general phenomenon, the positive-inside rule, and serves as a useful criterion in topology prediction [15, 16]. In particular, it also holds true for the sequences flanking signal-anchors and reverse signal-anchors [17]. Mutation of flanking charges resulted in protein inversion in both directions (e.g. [18,19,20]) demonstrating a causal relationship between flanking charges and signal orientation. However, charge inversion was not generally sufficient to produce an uniform topology; additional factors thus also influence signal orientation at the translocon.

The available structures of the Sec61/Y translocon confirm that the pore is too narrow to allow folded domains beyond individual helices to pass through. In cotranslational translocation, the nascent polypeptide is largely kept in an unfolded state as it passes through the ribosome exit tunnel to the translocation pore. This is not the case for the N-terminal sequence preceding the signal sequence, which emerges from the ribosome into the cytosol before SRP-dependent membrane targeting. Rapid folding of N-terminal domains inhibit their translocation, overriding the positive-inside rule [21]. Most reverse signal-anchor proteins have rather short N-terminal domains, for example the synaptotagmins with up to ~ 60 residues, but there are exceptions of Nexo/Ccyt single-spanning proteins with N-domains of more than 100 residues (e.g. 138 residues for the ectodysplasin A2 receptor [TNR27_HUMAN in uniprot.org] and 242 residues for pro-neuregulin-1 [NRG-1_HUMAN]). Apparently, these sequences do not sufficiently fold in the cytoplasm to hinder their translocation, probably also because its disulfide bonds are only formed in the ER lumen.

4 The Mechanism of Signal Orientation

The origin of the positive-inside rule in eukaryotes is not clear. There is no general membrane potential across the ER membrane that might contribute. An attractive mechanism is the cytosolic retention of the more positive end of the signal by negative charges at or near the translocon, for example also by the net negative lipid headgroups. An alternative mechanism was discovered testing artificial signal-anchors composed of increasingly hydrophobic oligo-leucine stretches: the more hydrophobic the core of the signal, the more N-translocation is favored, for N-terminal signals even overriding the charge rule [22, 23]. Testing different amino acids either as homo-oligomers or as guest residues in an oligo-leucine host signal confirmed that it is side chain hydrophobicity that promotes N-translocation [24].

How this might happen was suggested by the observation that the final topologies of model proteins with very hydrophobic N-terminal signals depended on the length of the protein. An N-terminal signal-anchor with a generic of 22-leucine h-domain inserted to a large fraction with an Nexo/Ccyt orientation, despite a positive N-terminus. This fraction (i.e. N-translocation) was highest for a short protein of ~ 100 residues following the signal sequence and decreased up to ∼ 300 residues [25]. This result indicated that N-terminal signals initially insert with the N-terminus in the ER lumen and then invert orientation until protein synthesis is terminated or until the reorientation process ends, possibly due to lipid integration of the TM signal (Fig. 3a). Consistent with this interpretation, the fraction of C-translocated signal-anchors increased for each construct, when more time until protein completion was provided by slowing down translation rate with cycloheximide. Signal inversion was shown to be driven by N-terminal positive charge according to the positive-inside rule and inhibited by hydrophobicity (likely by stabilizing the initial orientation bound to translocon and lipid). Most or all natural cleavable signals and signal-anchors are less hydrophobic and thus invert within seconds, long before translation is completed. While this model was originally derived from endpoint topology analysis of model proteins expressed in vivo in COS-1 cells [25], the process of head-on insertion and inversion of a natural signal-anchor was corroborated more directly by in vitro translation/translocation into dog pancreas microsomes using arrested nascent chains and biochemical analysis [26].

Fig. 3
figure 3

Mechanistic model of signal orientation. a N-terminal signals initially insert head-on in an Nexo/Ccyt orientation (a). Following the positive-inside rule, they either invert orientation to Ncyt/Cexo (b) to integrate as signal-anchors (c), or they retain the original direction and integrate as reverse signal-anchors (d). Inversion is slowed down by high signal hydrophobicity. b Internal signals are hindered by their N-terminal sequence from head-on insertion and position themselves according to the positive-inside rule before pore opening and translocation. Destabilization of the translocon by prl mutations leads to integration before correct alignment and thus appears to weaken the charge rule

Not surprisingly, initial head-on insertion is only possible for N-terminal signals: with n-domains longer than 20 residues time/length-dependent reorientation of signals was not observed anymore [27]. Internal signals therefore position themselves according their flanking charges, before they insert into the translocon and open it (Fig. 3b). The apparent difference in the integration mechanism of N-terminal and internal signals may be the cause of the observed difference in sensitivity to the translocon inhibitor mycolactone, which blocks Ncyt/Cexo integration and inversion, but not Nexo/Ccyt integration [28].

What is the contribution of the Sec61 translocon in orienting signal sequences according to the positive-inside rule? Mutation of candidate charges in yeast Sec61p identified three residues (R67 and R74 in the plug on the lumenal side, and E382 on the cytoplasmic side) that, when mutated, weaken C-translocation of a diagnostic N-terminally positive signal and N-translocation of a C-terminally positive one [29]. A screen for mutations in Sec61p that affect signal orientation, however, revealed a complex situation [30]. Three classes of mutations with distinct effects on different substrates could be distinguished, one of them with the same phenotype of less efficiently retaining the positively charged flanking region in the cytosol. These mutations localize to different positions in the Sec61 protein and do not only involve charged residues. Almost all of them have a prl (protein localization) phenotype initially described in bacteria as suppressors of signal sequence mutations [31]. They appear to destabilize the closed state of the translocon causing it to open more easily and also with very weak signals [32,33,34]. Rapid or “premature” pore opening, before the signal had time to properly position itself may thus seem to weaken the positive-inside rule [30]. However, a direct role for the positively charged residues on the plug in signal orientation cannot be excluded.

5 Topogenic Information in Multi-Spanning Proteins

Once the signal sequence has adopted its orientation, that of the downstream TM segments is defined only by their relative positions in the protein. The activity of stop-transfer and re-integration sequences appears not to be very specific. For example, at least five of the seven TM segments of rhodopsin (a reverse signal-anchor followed by three re-integration and three stop-transfer sequences) were able to function as signal sequences [35]. Conversely, chimeric proteins constructed from two to four identical copies of a signal-anchor integrated readily as two- to four-fold membrane-spanning proteins, respectively [36, 37]: the even-numbered signal-anchors inserted as stop-transfer sequences in an inverted orientation, against the charge rule.

While this is true for TM segments generously separated from each other in model proteins, it is not necessarily the case in natural multi-spanning membrane proteins, indicating that there are additional factors defining topology. One might have expected that mutating a signal-anchor to a reverse signal-anchor would result in inversion of the entire protein. This was not observed, when it was experimentally tested with the glucose transporter GLUT1. Instead, the second TM segment did not integrate in a membrane-spanning manner, leaving the downstream topology unchanged [38]. Similarly, mutation of positive charge clusters in short cytoplasmic loops caused both neighboring TM segments to be translocated, while upstream and downstream of this disturbance, the topology remained as in the wild-type [39]. Such topological “frustration” was also observed in artificial multi-spanning proteins with mismatched charge distribution in E. coli [40].

Of particular importance is here that TM segments in multi-spanning proteins are often very closely spaced. Competition between two conflicting topogenic elements, a cleaved signal and an internalized signal-anchor, was observed with intervening sequences ≤ 60 residues [41]. The shorter the connecting peptide, the more extensive the topology was rearranged. In natural proteins, TM segments are frequently separated by only a few amino acid, forming the turn of a helical hairpin. In such a situation, the first TM domain cannot possibly (re-)integrate into the membrane independently of the second. They rather act together in a distinct hairpin insertion process (illustrated in Fig. 2, bottom). Indeed, it was shown that helical transmembrane hairpins can fold already in the exit vestibule of ribosome [42]. In addition, pre-assembly of TM helices containing residues of opposite charge may reduce the energetic cost of their integration.

6 Membrane Integration of Stop-Transfer and Re-Integration Sequences

While it is self-evident that TM helices must be hydrophobic for membrane integration, a systematic analysis of the contribution of every amino acid to the process (the “molecular code for TM helix recognition by the Sec61 translocon”) was performed by von Heijne and coworkers for stop-transfer integration [43, 44]. Mildly hydrophobic so-called H-segments were created based on a 19-residue oligo-alanine sequence (Fig. 4a). By exchanging increasing numbers of alanines to other amino acids, a large number of H-segments were produced and tested in a reporter protein for their ability to integrate into the bilayer and to stop polypeptide transfer (Fig. 4b–d). The results suggested membrane insertion to be a thermodynamic equilibration process between the lipid and the translocation pore. This allows to calculate the apparent free energy contribution (ΔGapp; Fig. 4e) of each amino acid and at any position in an H-segment to membrane insertion and thus prediction of transmembrane segments, at least for single TM domains, where the process is not complicated by specific interactions with neighboring TM helices (as analyzed in Ref. [45]).

Fig. 4
figure 4

Membrane integration as thermodynamic equilibration. a H-Segments are based on a 19-alanine guest sequence in which one or more alanines are replaced by other amino acids. In a simple hydrophobicity series, 0–6 leucine replacements were made (L0 and L5 are shown). b Inserted in a model protein (here dipeptidylaminopeptidase B [50], H-segments (in blue) will integrate into the membrane or be translocated with different glycosylation (Y). c Upon expression (here in yeast with 5-min [35S]methionine labeling), the glycosylation pattern of the products after SDS-gel electrophoresis and autoradiography reveals the extent of H-segment translocation (T, full glycosylation) and integration (I, partial glycosylation; U, unglycosylated). d Quantitation shows the hydrophobicity threshold for 50% integration to be slightly < 4 leucines. e Considering the results to be apparent equilibration constants Kapp = I/T, apparent free energies for integration ∆Gapp = − RT ln Kapp can be calculated. f, g Two model representations of the partitioning process of the H-segment. (bd were adapted from Ref. [50]) (Color figure online)

The result was a “biological hydrophobicity scale” of amino acids [43] that largely parallels biophysical scales, except that it appears compressed and shifted [46,47,48]. In part, this may be because equilibration occurs between lipid and translocon, and not between lipid and free solution (Fig. 4f). The translocon interior is a narrow space of low hydration, mainly because of its constriction ring of six mostly apolar residues forming a gasket. Mutation of these residues to more polar ones indeed enhanced integration of H-segments [49, 50], perfectly in line with equilibration. By the residues lining the inside of the pore, the translocon thus defines the hydrophobicity threshold for membrane integration.

In the simplest model, the nascent chain, as it moves through the pore at the speed of translation, has access to the lipid through the lateral gate. At an elongation rate of ~ 5 residues per second, each segment is in register with the membrane in the order of ~ 1 s, during which equilibrium is reached. The probability of the two states directly reflects that of integration and translocation (Fig. 4f). Yet, an H-segment—particularly one of intermediate hydrophobicity—will find an energetic minimum in the translocon’s gate between the lipid and the partially water-filled pore, and also vertically between the aqueous environments of ER lumen and cytosol (Fig. 4g). Indeed, TM segments were found positioned in the gate in molecular dynamics simulations and cryo-electron microscopy structures (e.g. Refs. [7, 50]). Using constructs with a SecM stalling sequence to measure the force acting on the nascent chain, two force peaks were observed when the TM segment reached into the pore and then into the lipid [51]. Integration versus translocation is determined by the relative rates of exit from gate position either into the lipid or the lumen. With increasing length of the polypeptide chain accumulating between ribosome and translocon, the reversibility of the exit is rapidly decreasing (Fig. 4g). The membrane acts as an entropic trap for the TM segment and chaperone binding to the lumenally exposed sequence as a ratchet. Consistent with this view, the conformational properties of a sequence up to 100 residues downstream of the H-segment affected the hydrophobicity threshold of integration [52]. Integration was facilitated, when the sequence was flexible and extended, and inhibited when compact, reflecting the gain in entropy permitted by the downstream sequence. This result indicates that sequences do not define autonomously their integration behavior, but are influences by their sequence context. Similarly, one might speculate that the folding and chaperone binding properties of the upstream sequence might influence the rate of exit into the ER lumen. This context dependence of membrane integration is at least in part the explanation, why the hydrophobicity thresholds determined with identical leucine-containing H-segments in different model proteins and expression systems produced different results (Fig. 5).

Fig. 5
figure 5

Dependence of ∆Gapp of H-segment integration on the system and model protein. ∆Gapp for stop-transfer integration is plotted against the number of leucines in the H-segments (as in Fig. 4a) analyzed in different expression systems and model proteins: in vitro—Lep-H [43]; BHK—Lep-H [43]; HeLa—H1-H [68]; COS—H1-H (our unpublished data); S.c.—DPAPB-H [49]; S.c.—CPY-H [52]; S.c.—SP-Lep-H [69]; E.c.—PCLep-H [70]; E.c.—LepLacY-H [71]. S.c., Saccharomyces cerevisiae; E.c., Escherichia coli. The model constructs used are shown schematically with signal and TM segments colored according to their function (as in Fig. 2): red for cleavable signals and signal-anchors, turquoise for reverse signal anchors, pink for re-integration sequences, and blue for the H-segments (potential stop-transfer sequences) with the GPGG···GGPG insulator sequences in yellow. The sequence lines are drawn in different grays to indicate their different origins. A scale is shown for polypeptide length in amino acids (aa) and total protein lengths are provided in parentheses. Black dots indicate glycosylation sites and arrowheads signal cleavage sites (Color figure online)

While the process of stop-transfer integration has been extensively studied, the principles of re-integration remain to be dissected. As reported by Lundin et al. [53], the molecular code for re-integration generally parallels that for stop-transfer integration, except that the overall hydrophobicity threshold is significantly lower. In a SecM pulling force experiment, Cymer et al. [54] found a re-integration TM segment to exert a weaker pulling force on an arrested chain compared to a stop-transfer segment, confirming that they engage with the translocon differently.

7 Accessory Factors Regulating Membrane Protein Biogenesis

The Sec61 translocon can associate with a complex of Sec62 and Sec63—membrane proteins with two and three TM segments, respectively—and peripherally attached Sec71 and Sec72 [55, 56]. The lumenal J-domain of Sec63 recruits the Hsc70 chaperone BiP (immunoglobulin binding protein) to capture the translocating chain and thus to provide vectoriality for post-translational translocation. Recent cryo-electron microscopy structures of the Sec61 translocon with the Sec63 complex demonstrate how the Sec61 channel is activated for post-translational protein translocation [57, 58]. However, Sec63 and BiP (Kar2p in yeast) were also found to be important for SRP-dependent, co-translational translocation [59,60,61]. Interestingly, the new structures show Sec63 complex bound to Sec61 in a manner blocking simultaneous binding of a ribosome. How to reconcile this with the effects of the Sec62/63 complex in cotranslational translocation is not clear at present.

The Sec61 translocon is furthermore accompanied by several auxiliary proteins that assist integration of subsets of membrane proteins by mostly unclear mechanisms. Translocating chain-associated membrane protein (TRAM) was initially shown to be required for integration of signals with short n- and h-domains in a reconstituted in vitro system [62]. Recently, TRAM2 was discovered to ensure correct insertion of the reverse signal-anchor of a four-TM protein TM4SF20, an inhibitor of cleavage of the membrane bound transcription factor CREB3L1 [63]. TRAM proteins contain a potential ceramide-binding domain. In the presence of ceramide or in the absence of TRAM2, the reverse signal-anchor of TM4SF20 inserts as an Ncyt/Cexo signal-anchor, in this case even resulting in inversion of the topology of all three downstream TM domains.

Recently, the ER membrane protein complex EMC was identified to contribute to membrane protein folding [64]. It was specifically shown to be required for insertion of the first TM domain (the reverse signal-anchor) of some G-protein coupled receptors [65]. In a systematic proteomic approach, the EMC was found to be involved in the topogenesis of multi-spanning proteins in general, acting as a membrane chaperone especially for mildly hydrophobic TM domains [66].

The contribution of different membrane lipids on protein integration is poorly studied in eukaryotes, but there is evidence in bacteria demonstrating surprisingly strong effects on topology particularly the lactose permease LacY (see accompanying review on “The role of lipids in membrane protein biogenesis” by W. Dowhan). Lipid composition is certainly very different in some of the system studied in Fig. 5 and might contribute to the observed differences. Furthermore, it seems likely that additional factors—substrate specific or even general—will be discovered to contribute to protein integration into the membrane, a process that goes beyond partitioning of hydrophobic surfaces into the bilayer.