Introduction 

Well-defined spatial structure is necessary for functioning of majority of proteins. Protein folding is a process that converts the disordered protein chain into a chain having a definite, unique three-dimensional (3D) structure. However, nowadays the term “protein folding problem” has two meanings: one emphasizing the process, the other the result. The former (sometimes called “the protein folding problem of the first order”) implies the answer to the question of how can the protein chain choose, in minutes, its unique structure among a giant number of others; the latter (sometimes called “the protein folding problem of the second order”) implies the answer to the question of what structure will be attained by the protein chain of a certain amino acid sequence.

For a long time, these two problems were considered as one, assuming that once “how” was solved, “what” would be solved right away.

However, now it is clear that these are two different problems because they have been solved by two quite different methods.

The problem of “what” has been very recently solved by bioinformatics with the aid of neural networks (Fariselli et al. 2001) and artificial intelligence (Senior et al. 2019, 2020; Yang et al. 2020; Jumper et al. 2021); see some discussion of these works in Roney and Ovchinnikov (2022); and for a review of early works on protein structure prediction, see Finkelstein and Ptitsyn (2016), lectures 22, 23.

The topic of protein structure prediction (or recognition) from its amino acid sequence needs to be described and considered not here but separately. However, here it is appropriate to say that a pronounced success of the best of the latest protein structure prediction programs, AlphaFold2 (Jumper et al. 2021), is based on enormous databases of protein structures (Berman et al. 2003) and sequences (The UniProt Consortium 2021) collected during many decades.

The problem of “how can” the protein chain choose, in minutes, its unique spatial structure among a giant number of others has been solved by physics. The aim of this article is to outline the principal moments of the solution.

The ability of proteins to fold spontaneously puzzled protein science for a long time (see, e.g., (Anfinsen and Scheraga 1975; Jackson 1998; Fersht 2000; Grantcharova et al. 2001; Robson and Vaithilingam 2008; Dill and MacCallum 2012; Wang et al. 2012; Wolynes 2015; Finkelstein and Ptitsyn 2016; Finkelstein 2018)).

As known, in living cells, gene-encoded protein chains are synthesized by special molecular machines, called ribosomes. Most of the protein chains, though not all of them (see (Uversky and Finkelstein 2019) and references therein) have to obtain their unique (“native,” working) three-dimensional structures to perform their unique biological functions.

This phenomenon is called “protein folding.”

Its importance for protein functioning was recognized in the 1950s (Anfinsen 1959), followed by the finding that protein folding can occur not only in vivo but also in vitro (Anfinsen et al. 1961). Although those early in vitro studies that have shown that proteins can reversibly refold from a denatured, disordered state were focused mainly on small proteins, recent experiments using mass spectrometry-based proteomics (To et al. 2021) have demonstrated that nearly two-thirds of soluble bacterial proteins are refoldable in vitro under physiological conditions. Still, many proteins, especially those with large complicated multi-domain structures, aggregation-prone and non-soluble proteins are not refoldable in vitro under physiological conditions (To et al. 2021); see also Sorokina et al. (2022); besides, some proteins (we will not consider them here) are “intrinsically disordered” (Wright and Dyson 1999; Uversky 2002; Tompa 2005; Uversky and Finkelstein 2019) — they start their work, not yet having a well-folded structure, that they cannot acquire per se either in vivo or (under physiological conditions) in vitro, but usually become well-folded when interacting with other molecules.

Therefore, this review is limited to the folding of single-domain proteins and separate protein domains made of single chains; and here, we virtually do not consider folding of multi-domain proteins (facilitated, of course, by folding of separate domains) and complications of folding associated with interactions of a protein with other proteins (including chaperones), protein aggregation, amyloid formation, etc.

Experimental studies of protein folding

Since it is rather difficult to trace a change in the structure of a nascent protein chain against the background of many other molecules in a living cell, the investigation of protein folding started with in vitro experiments on the folding of water-soluble molecules of globular proteins, see Finkelstein and Ptitsyn (2016), lectures 19, 20.

However, it makes sense to begin this paper with a short overview of comparatively recent results on the folding that occurs in the course of protein biosynthesis on ribosomes.

The first studies were carried out on large (mostly multi-domain) proteins. They showed that these start to fold before their biosynthesis has been completed: the first synthetized (N-terminal) immunoglobulin domain folds when the whole chain has not been synthesized yet (Isenman et al. 1979); the luciferase protein starts to work immediately upon completion of the chain biosynthesis, so that it has no time to fold after the biosynthesis and should fold cotranslationally (Kolb et al. 1994); and the globin chain can bind to heme when a bit more than a half of the chain has been synthesized by the ribosome (Komar et al. 1997), though it is hard to say whether structuring of this half-made chain occurred before the heme-binding or resulted from it. Anyway, these data suggest that the protein chain folding in vivo starts already on the ribosome (“cotranslationally”) and that this cotranslational process may differ from the in vitro folding (“renaturation”) of entire protein chains discussed below.

More up-to-date experiments on cotranslational structure acquisition by small, of ≈70 residues, nascent proteins (monitored by 15 N, 13C NMR, and FRET) showed that “polypeptides [at a ribosome] remain unstructured during elongation but fold into a compact, native-like structure when the entire sequence is available” (Eichmann et al. 2010); “… folding [occurs] immediately after the emergence of the full domain sequence” (Han et al. 2012); “… cotranslational folding … proceeds through a compact, non-native conformation [i.e., something molten globule-like] … [and] rearranges into a native-like structure immediately after the full domain sequence has emerged from the ribosome” (Holtkamp et al. 2015); thus, the latter case shows that a protein can fold cotranslationally outside the ribosome exit tunnel (and then it meets nearly the same problems as a protein renaturating in vitro.

Further experiments using optical tweezers, single-molecule real-time FRET, cryo-EM, and pulling force-profile analysis allowed a more detailed study of cotranslational folding. It has been shown that a small (of ≈30 residues) protein, zinc-finger domain, can fold deep inside the vestibule of the ribosome exit tunnel (Nilsson et al. 2015; Wruck et al. 2021), and that α -helices, these “one-dimensional” details of the protein structure, can fold sequentially inside and at the vestibule of the ribosomal tunnel. The observed folded or partially folded structures of a nascent α -helical domain of spectrin show that it may fold there via a pathway different from that of the isolated domain (Nilsson et al. 2017), but with the same result. On the other hand, the principal features of the folding pathway of a larger (of ≈100 residues) β-structural Ig domain has been found to remain conserved on and off the ribosome (Tian et al. 2018), while folding of another protein, having a β-barrel shape, demonstrates a switch from the initial dynamic α-helical to β-strand conformation during the co-translational folding (Agirrezabala et al. 2022).

Thus, as shown, there may be no fundamental difference between the in vivo (on the ribosome) and in vitro (out of the ribosome) folding, at least for small proteins, though some details of the on-ribosome and in vitro folding pathways can differ. In both cases, native structures, at least for small proteins, emerge only when the entire sequence of a stable protein domain has been synthesized (in this connection, it should be noted that slightly truncated protein chains lose stability of their native folds, do not refold, and remain compact but disordered in vitro (Flanagan et al. 1992)).

The discovery of chaperones, the cell’s troubleshooters (Ellis and Hartl 1999), re-aroused suggestions that the protein folding processes in vivo and in vitro may be quite different because chaperones may have a foldase/unfoldase activity (see, e.g., Libich et al. (2015) and references therein). However, the analysis of data presented in Libich et al. (2015) reveals that the most studied chaperone (GroEL) does not speed up the overall folding process (Marchenko et al. 2015): GroEL accelerates transitions between the unfolded and folded GroEL-bound states of the target protein (Libich et al. 2015; Thirumalai et al. 2020), but not its overall folding. Moreover, when the concentration of the target protein is low so that it does not aggregate, a redundant concentration of GroEL slows down the folding of this protein (Marchenkov et al. 2004). This corroborates the conclusion that GroEL serves as an auxiliary transient trap that simply binds the excess of unfolded protein chains, thus preventing them from irreversible aggregation (Marchenkov et al. 2004; Marchenko et al. 2009).

One can conclude that the self-organization of structures of separate proteins (which in the case of in vitro folding of water-soluble globular proteins unassisted by other biomolecules) captures the main peculiarities of the protein folding phenomenon. This means that all the information necessary to build up the 3D structure of a protein is inscribed in its amino acid sequence (this was Anfinsen’s “thermodynamic hypothesis”).

Thus, the studies of self-organization have shown that an unfolded protein chain can spontaneously, “by itself,” fold into its unique native 3D structure (Anfinsen et al. 1961; Anfinsen 1973). In Anfinsen’s experiment, the enzyme ribonuclease A stayed unfolded in the presence of urea and a thiol reagent, and with these agents removed, it spontaneously refolded, recovering its structure (as shown by correct restoration of all four S–S bonds) and function. However, as it has been recently found by David Eisenberg (2018), “essentially the same experiment had been performed earlier by a medical student [Lisa A. Steiner, later MIT professor] at Yale, but neither [she nor] her research supervisor nor her department chair thought it particularly significant, and her work was not published.” “Why did this transformative result lay hidden in her thesis?” asked Eisenberg, and answered: “She had the answer to a hugely important question, but that question had not yet been posed” because then (in the mid-1950s) it had not yet been elucidated “how biological information passes from the genome to proteins”…

The protein folding problem

In the course of self-organization, the protein chain has to find its native (and seemingly, according to Anfinsen’s “thermodynamic hypothesis,” the most stable) fold among zillions of other alternatives (Fig. 1) within only minutes or seconds given by a cell life for its folding.

Fig. 1
figure 1

The Levinthal’s choice problem. The choice of the native structure can be determined either by the somehow restricted folding process (Levinthal’s “kinetic hypothesis”) or by the enhanced native fold stability (Anfinsen’s “thermodynamic hypothesis”) 

The number of alternatives is vast indeed (Levinthal 1968, 1969): it is at least 2100 but more likely 3100 or even 10100 (or 100100) for a 100-residue chain, because at least 2 (“right” and “wrong”) but more likely 3 (α, β, “coil”) or ≈10 (Privalov’s (1979) experimental estimate), or even 100 (Levinthal 1969) conformations are possible for each amino acid residue.

Since the chain cannot pass from one conformation to another faster than within a picosecond (the time of a thermal vibration), the exhaustive search would take at least ~ 2100 ps (but more likely 3100, or even 10100, or 100100), that is, ~ 1010 (or 1025, or even 1080, or 10180) years (Levinthal 1969). And it looks like the sampling should be exhaustive because the protein “feels” that it has attained the stable structure only when hitting it precisely, since even a 1 Å deviation can strongly increase the chain energy in the closely packed globule.

The main protein folding puzzle is why the native protein structure is found within minutes rather than within “"Levinthal’s” ~ 1010 or more years (that is, within ~ 1018 or more minutes)! This reduction of the folding process by 1 000 000 000 000 000 000 (!) times (compared to iterating over all structures) must be always kept in mind, without distracting to dead-end considerations that promise, say, 1000- or even 1 000 000-fold acceleration of the process.

How can the protein chain choose, in minutes, its native structure among a giant number of others, asked Levinthal (1968; 1969) who first noticed this paradox, and answered: It seems that the protein folding follows some specifically restricted fast pathway, and the native fold is simply the end of this pathway, no matter if it is the most stable chain fold or not (this was Levinthal’s “kinetic hypothesis”). In other words, Levinthal suggested that the native protein structure is determined by kinetics rather than stability and corresponds to the easily accessible local free energy minimum rather than the global one.

However, both numerous experiments that demonstrate reversibility of protein folding and unfolding in vitro and computer experiments with folding and unfolding of lattice models of protein chains strongly suggest that the chains fold to their most stable structure, i.e., that the “native protein structure” is the lowest-energy one, and the protein folding (at least for not very long chains) is under thermodynamic rather than kinetic control (Šali et al. 1994; Abkevich et al. 1994).

Nevertheless, most of the proposed and widely discussed hypotheses on protein folding were based on the “kinetic control” (rather than “thermodynamic control”) assumption.

In particular, before Levinthal, Phillips (1966) proposed that the protein folding nucleus is formed near the first-synthesized N-end of the nascent protein chain and the remaining chain wraps around it; but it has been shown later that the successful in vitro folding of many single-domain proteins and protein domains does not begin from the N-terminus (Goldenberg and Creighton 1983; Grantcharova et al. 1998; Lappalainen et al. 2008).

Wetlaufer (1973) hypothesized the formation of the folding nucleus by adjacent residues of the protein chain, but in vitro experiments have shown that this is not always so (Fulton et al. 1999; Wensley et al. 2009).

Ptitsyn (1973) proposed a model of hierarchical folding, i.e., a stepwise involvement of different interactions and the formation of different folding intermediate states. However, many not very long protein chains fold without visible folding intermediates (Fersht 1999).

More recently, various “folding funnel” models (Leopold et al. 1992; Wolynes et al. 1995; Dill and Chan 1997; Bicout and Szabo 2000; Dill et al. 2008; Wang et al. 2012) became very popular for illustrating and describing the reason for the speedy folding processes. This issue will be considered below in more detail.

The difficulty of the “kinetics vs stability” problem is that it hardly can be solved by a direct experiment. Indeed, suppose that a protein has some structure that is more stable than the native one (later we will demonstrate one of extremely few examples of this kind; it has been found for a rather long protein chain). How can we find the most stable but kinetically unattainable structure if the protein chain does not do so itself? Shall we wait for ~ 1010 (or even ~ 10180) years?

On the other hand, the question as to whether the protein structure is controlled by kinetics or stability arises again and again in solving practical problems of protein physics, engineering, and design. For example, when predicting the protein structure from its sequence, should we look for the most stable structure or the most rapidly attained one? When designing a de novo protein, should we maximize the stability of the desired fold or create a rapid pathway to this fold?

However, is there a contradiction between “the most stable” and the “rapidly folding” structure? Maybe, the stable structure automatically forms a focus for the “rapid” folding pathways, and therefore it is automatically capable of fast-folding?

The major thermodynamic peculiarities of protein folding

Before considering these questions, i.e., before considering the kinetic aspects of protein folding, let us recall some basic experimental facts concerning protein thermodynamics (as usual, we shall only consider single-domain water-soluble globular proteins formed by chains of ~ 100 residues; and we will consider only those experiments in which individual proteins interact only with the solvent). These facts will help us understand what chains and folding conditions we have to consider. The facts are as follows:

  1. 1.

    Nearly all observations show that native states of single-domain water-soluble globular proteins behave as the lowest-energy folds (Tanford 1968; Privalov 1979; Fersht 1999), i.e., they stay in this fold forever and also come to the same fold after de- and renaturation cycle induced by the change of a solvent. However, it should be mentioned that there is at least one exception: a large (≈ 400 residues) protein, serpin, at first obtains the “native” (that is, “working”) structure, works for half an hour, and then acquires another, non-working but more stable structure (Tsutsui et al. 2012).

  2. 2.

    The denatured state of proteins, at least that of small proteins treated with a strong denaturant, is usually an unfolded random coil (while the temperature-denatured state can be a compact molten globule) (Tanford 1968; Ptitsyn 1995).

  3. 3.

    Protein unfolding is reversible (Anfinsen 1973); moreover, the denatured and native states of a protein can be in a kinetic equilibrium (Creighton 1978); there is an “all-or-none” transition between these two states (Privalov 1979). The latter means that, close to the point of the folding-unfolding equilibrium, only two states of the protein molecule, native and unfolded, are present in a visible quantity, while all others, “semi-native” or misfolded states are virtually absent. (Notes: (i) the “all-or-none” transition makes the protein function reliable: like a light bulb, the protein either works or not; (ii) very important: the physical theory shows that such a transition requires the amino acid sequence that provides a large “energy gap” between the most stable protein structure and the bulk of misfolded and unfolded ones (Shakhnovich and Gutin 1990; Gutin and Shakhnovich 1993; Šali et al. 1994; Galzitskaya and Finkelstein 1995; Shakhnovich 2006; Finkelstein and Ptitsyn 2016)).

  4. 4.

    Even under normal physiological conditions, only a few kilocalories per mole (Privalov 1979) separate the native (i.e., the lowest-energy) state of a protein from its unfolded (i.e., the high-entropy) state (and at mid-transition, these two states have equal stabilities, of course).

(For the below theoretical analysis, it is essential to note that (i) as is customary in the literature on this subject, the term “entropy” as applied to protein folding only means conformational entropy of the chain without solvent entropy; (ii) accordingly, the term “energy” actually implies “free energy of interactions” (often called the “mean force potential”), so that hydrophobic and other solvent-mediated forces, with all their solvent entropy (Tanford 1968), come within “energy”. This terminology is commonly used (and will be used in this paper) to concentrate attention on the main problem of sampling the protein chain conformations.)

The above-mentioned “all-or-none” transition means that native (N) and denatured (U) states are separated by a rather high free-energy barrier. It is the height of this barrier that limits the rate of this transition, and just this height is to be estimated to solve Levinthal’s paradox.

The major kinetic peculiarities of protein folding

The “kinetic control” hypothesis initiated very intensive studies of protein folding intermediates.

It was clear almost from the very beginning that the stable intermediates are not obligatory for folding, since the protein can also fold and unfold near the mid-point of equilibrium between the native and denatured states (Fig. 2) (Segava and Sugihara 1984; Fersht 1999), where the transition is of the “all-or-none” type (Privalov 1979), which excludes any stable intermediates.

Fig. 2
figure 2

The rate (k) of lysozyme re- and denaturation vs temperature. The mid-transition point corresponds to ambient conditions where the rates of renaturation (U → N transition) and denaturation (N → U transition) are equal (i.e., where the blue and red lines intersect) so that the U and N states have equal free energies (FU = FN)). The plot is adapted from Segava and Sugihara (1984). Note that the folding at physiological temperatures of ≈40 °C is only about fivefold faster than that at the mid-transition point (≈ 50 °C). The similar in value but opposite in sign slopes of the U → N and N → U lines indicate that the transition state energy E.# is close to the value intermediate between those for the native and denatured states (since, according to Arrhenius, \({{k}_{\mathrm{B}}T}^{2}\frac{d\mathrm{ln}{k}_{U \to N}}{dT}={E}^{\#}-{E}_{\mathrm{U}}\) and \({{k}_{\mathrm{B}}T}^{2}\frac{d\mathrm{ln}{k}_{N \to U}}{dT}={E}^{\#}-{E}_{\mathrm{N}}\) (Pauling 1970))

The obtained basic experimental facts on folding kinetics of globular proteins are as follows:

  1. 1.

    The protein “"folding unit” is either a whole compact globular protein or a domain (compact sub-globule), if the protein includes several such sub-globules. This has been shown by two groups of evidence: (i) isolated domains, separated from the remaining protein body, are usually capable of folding into the correct structure (Petsko and Ringe 2004); (ii) single-domain proteins usually cannot fold when as few as 10 of their C- (or N-) terminal amino acid residues are deleted (Flanagan et al. 1992; Neira and Fersht 1999a,b).

  2. 2.

    Folding of some proteins proceeds as a two-state (“all-or-none”) process without any accumulating intermediates (when only two states, the native fold and the coil are observable (Matouschek et al. 1990; Fersht 1999)), whereas the folding of other single-domain proteins, mostly larger ones (and especially when the folding occurs far from the equilibrium mid-point) exhibit multi-state kinetics where molten and/or pre-molten globules serve as the folding intermediates (Dolgikh et al. 1984; Ptitsyn 1995; Fersht 1999).

  3. 3.

    When the folding process proceeds via the folding intermediates, the rate-limiting step immediately precedes the native state formation and corresponds to transition from the molten globule (often rather dense) to the native structure (Dolgikh et al. 1984).

Understanding of the protein folding times

To begin with, it is not out of place considering whether the “Levinthal’s paradox” is a paradox indeed. Bryngelson and Wolynes (1989) mentioned that this “paradox” is based on an absolutely flat (and therefore unrealistic) “golf course” model of the protein potential energy landscape (Fig. 3a), and somewhat later Leopold et al. (1992), following the line of Go and Abe (1981), considered more realistic (tilted and biased to the protein native structure) energy landscapes and introduced the “folding funnels” (Fig. 3b), which seemingly (but not indeed, see below) eliminate the “Levinthal’s paradox”.

Fig. 3
figure 3

Schematic illustration of basic models of the energy landscapes of protein chains. (a) The “Levinthal’s golf course model.” (b) The “energy funnel” model; the funnel is centered in the lowest-energy (“native”) structure. (c) The potential energy landscape of a protein chain in more detail with bumps and wells, the deepest of which (“native”) is by many kBTmelt (where kB is Boltzmann’s constant, and Tmelt is protein melting temperature) deeper than the others: the resulting energy GAP between the global and other energy minima is necessary to provide the “all-or-none” type of decay of the stable protein structure (Shakhnovich and Gutin 1990). Only two coordinates (q1 and q2) can be shown in the figures, while the protein chain conformation is determined by hundreds of coordinates

Various “folding funnel” models became popular for explaining and illustrating protein folding (Wolynes et al. 1995; Karplus 1997; Nölting 2010; Wolynes 2015). In the funnel, the lowest-energy structure (formed by a set of the most powerful interactions) is the center surrounded by higher-energy structures containing only a part of these powerful interactions. The “energy funnels” may appear not perfectly smooth due to some “frustrations” (Bryngelson and Wolynes 1987), i.e., contradictions between optimal interactions for different links of a heteropolymer forming the protein globule, but a stable protein structure is distinguished by minimal frustrations (that is, most of its elements enhance the native fold stability) (Bryngelson and Wolynes 1987, 1989; Bryngelson et al. 1995; Finkelstein et al. 1995).

In principle, the “energy funnel” can channel the protein chain movement towards the single lowest-energy structure, thereby automatically turning this most stable structure of the chain into the “rapid” folding pathways, which seems (but… – see below) to be able to prevent the “Levinthal’s” sampling of the vast majority of chain conformations.

However, this would be so provided there were only energy and no entropy, which (if the temperature is > 0 K) opposes the chain movement towards the single structure, even though corresponding to the global energy minimum.

But the protein folding occurs in liquid water, at temperatures ≳ 273 K, where the entropy term is large; moreover, at the folding in proximity to the mid-transition conditions (Fig. 2), the entropy term nearly compensates the folding energy.

Mid-transition conditions are the best to analyze Levinthal’s paradox (though under the “strongly folding” conditions the folding can be, say, 10- (Segava and Sugihara 1984) or even 1000-fold faster (Kiefhaber 1995; Fersht 1999) than at the mid-transition – but these 10 or 1000 times are incomparable with the puzzling 1 000 000 000 000 000 000-fold acceleration of the folding process compared to iterating over all structures).

In conditions corresponding to the mid-transition, the protein chain has two equally stable low-free-energy thermodynamic states (phases): “denatured” and “native.” The latter includes the native structure (corresponding to the global free-energy minimum) and small fluctuations around it. The denatured state includes a multitude of the random coil-like conformations, molten-globule-like, “semi-native,” and “misfolded” structures. The physical theory (Shakhnovich and Gutin 1990; Gutin and Shakhnovich 1993; Šali et al. 1994; Finkelstein and Ptitsyn 2016) shows that the co-existence of these two phases requires an amino acid sequence that provides a large “energy gap” (Fig. 3c) between the most stable (native) fold and the misfolded structures. It is this energy gap (present in 1 of approximately 1011 of random polypeptide chains, see Finkelstein and Ptitsyn (2016), lectures 16, 18 and appendix D, and Keefe and Szostak (2001)) that makes the protein fold unique and stable and keeps all misfolded structures very unstable. This allows neglecting misfolded structures when considering protein folding in conditions corresponding to the mid-transition (Finkelstein and Ptitsyn 2016).

The denatured and native states (phases) are separated by a free-energy barrier that provides the all-or-none phase transition between them (Privalov 1979), thus making the energy landscape acquire the “volcano-like” shape (Rollins and Dill 2014), where the funnel only remains in its center (Fig. 4).

Fig. 4
figure 4

This purely illustrative drawing shows how entropy converts the energy funnel (Fig. 3b) into a “volcano-shaped” free-energy folding landscape with a barrier on any pathway leading from unfolded conformations to the native fold. The smooth free-energy landscape corresponds to compact partly folded structures; the rocks (denoted by dotted lines) present high-energy structures that are non-compact or contain high-energy bumps (see Fig. 3c). A more accurate but less beautiful scheme of the free-energy landscape is shown in Fig. 2 in Galzitskaya and Finkelstein (1999)

Thus, any pathway from the unfolded state to the native one first goes uphill in free energy, and only then, in the vicinity to the native state, after passing the free-energy barrier (i.e., the crater edge), the “free-energy funnel” starts working and pulls the chain downhill to the native state. Note that if there were only a funnel and no barrier, then even a very large protein would fold not in minutes but in microseconds (since the time of conformational rearrangement of one residue is in the nanosecond time range (Zana 1975)).

However, to have a rapid transition from the coil to the native state, the free-energy barrier created by the volcano must be not too high: according to the conventional transition state theory (Eyring 1935; Pauling 1970; Emanuel and Knorre 1984), the time of overcoming the barrier is estimated as

$$TIME\approx\tau\times\exp(+\Delta F\#/k_{\mathrm B}T)$$
(1)

where τ is the time of a step from the barrier onwards, and ∆F# is the height of the free energy barrier on the reaction pathway (that is, the free energy of the “folding nucleus”).

It should be noted that protein folding is a multistep process (see Finkelstein and Ptitsyn (2016), lecture 19 and references therein), and that the conventional transition state theory is not very accurate when applied to multistep processes, including the protein folding (which is an intramolecular “all-or-none” phase transition (Privalov and Khechinashvili 1974)) and phase transitions in general (Djikaev and Ruckenstein 2016; Ruckenstein and Berim 2016). However, the error in this case only concerns the estimate of the pre-exponential factor (τ in Eq. (1)), being mainly the error in the estimate of the number of steps at the top of the barrier (Finkelstein 2015; Ruckenstein and Berim 2016), which is not too large in the case of protein folding. Therefore, the uncertainty in the pre-exponential factor is of secondary importance compared to the main, exponential term in Eq. (1), which accounts for the transition state free energy and can be enormous for a high barrier.

The energy funnel helps the fast-folding but does not guarantee that the whole process will be really fast. It is the height of the barrier (which is before the funnel) that determines the protein folding (and unfolding) rate. The energy funnel per se cannot resolve Levinthal’s paradox, because not any type of energy funnel provides a low free energy barrier created by the edge of the volcano crater. A strict analysis (Bogatyreva and Finkelstein 2001) of the straightforwardly presented funnel models (Zwanzig et al 1992; Bicout and Szabo 2000) corresponding to the uniform condensation of the chain (previously considered by Shakhnovich and Finkelstein (1989)) shows that close to the mid-transition point, such funnels cannot simultaneously explain both major features observed in protein folding: (i) the “all-or-none” type of transition, which requires the free-energy barrier; and (ii) the non-astronomical folding time. By the way, the stepwise folding mechanism (Ptitsyn 1973) also cannot (Finkelstein 2002) simultaneously explain both of these major features close to the mid-transition point, and hence, also cannot resolve Levinthal’s paradox.

Resolution of Levinthal’s paradox requires funnels of a special type — those provided by a transient separation of folded and unfolded phases within the folding chain (Finkelstein and Badretdinov 1997a, b) (this, as subsequently mentioned in a review by Wolynes (1997), resembles the “capillarity”" theory of nucleation of the first-order phase transitions; the transient separation of the folded and unfolded phases in the course of protein folding was later demonstrated in computer simulations by Shaw et al. (2010)).

It is not as easy to theoretically find a good protein folding pathway. It is much easier to figure out how a good (low-free-energy) unfolding pathway should look like. The compactness of protein globules suggests the existence of surface tension, which results from the free-energy excess at the surface of the globule. Thus, a low-free-energy pathway of the unfolding of the globule to the coil should proceed via the least unstable partly folded structures consisting of two phases (native and unfolded) separated by a relatively small boundary: the globule’s cross section that separates the remaining dense, compact part of the globule, and the unfolded loops and tails protruding from it (Fig. 5) (Finkelstein and Badretdinov 1997a, b; Galzitskaya and Finkelstein 1999; Garbuzynskiy et al. 2013).

Fig. 5
figure 5

Schematic illustration of a sequential folding/unfolding pathway of a globule through compact partly folded intermediate structures. At each step of sequential unfolding, one residue leaves the native-like part of the globule (shaded) and turns into a coil (shown by a dashed line); the sequential folding follows the same pathway in the opposite direction. The highest-free-energy intermediate structure (i.e., the folding nucleus corresponding to the transition state; marked as #) has the largest (in the pathway) interface of the globular and unfolded phases. Its globular part covers about half of the chain. Adapted from (Finkelstein and Badretdinov 1997a, b)

This good pathway of unfolding, when followed in the opposite direction, presents a good pathway of folding (Finkelstein et al. 2017) because, according to the well-known in physics detailed balance law (Landau and Lifshitz 1980), the direct and reverse reactions, under the same ambient conditions, follow the same pathway and have equal rates when both end-states have equal stability: otherwise, i.e., if the pathways for and reactions were different, the result would be a permanent circular flow (generating, at thermodynamic equilibrium, a perpetual motion machine of the second kind), which contradicts to the second law of thermodynamics.

(Two notes: (i) To resolve Levinthal’s paradox, it is not necessary to prove that the above outlined pathway is the best possible pathway; it is enough to prove that this pathway resolves the paradox, because any additional pathway will only accelerate the process. Imagine two pools, one full of water and another empty, with water leaking from one to the other through cracks in the wall between them; if the cracks cannot absorb all the water — which is prohibited by the all-or-none kind of transition — each additional crack accelerates filling the empty pool. (ii) The same, of course, applies to additional folding pathways passing through folding intermediates, which are sometimes observed (Aviram et al. 2018) in apparently two-state transitions. (iii) Actually, the pathway itself is of no interest for us here; according to the transition state theory, only the barrier, i.e., the free-energy maximum on the pathway, is important indeed).

In a simplified form (for details, see Finkelstein and Badretdinov (1997a, b; 1998; Garbuzynskiy et al. 2013)), the resulting free-energy barrier is estimated as follows.

When the free energies of the folded and unfolded phases are equal (i.e., in the mid-transition ambient conditions), the free energy of a semi-folded protein depends only on the interface between the two phases.

The largest unavoidable interface corresponds to the transition state (structure # in Fig. 5) that looks like a half of the native globule and has ≈L2/3 residues at the interface (assuming the most compact spherical shape of the native globule; for an oblate or oblong globule, the largest unavoidable interface can be a little less).

Thus, the transition state free energy is proportional not to the number L of the chain residues (as Levinthal’s estimate implies), but to L2/3 only.

The energy constituent ΔE# of the barrier free-energy ΔF# results from interactions lost by the interface residues; it is about \(({L}^{2/3})\bullet\upvarepsilon /4\), where \(\varepsilon\) ≈ 1.3 kcal/mol ≈ 2kBTmel is the average latent heat of protein melting per residue (Privalov 1979) (this \(\varepsilon\) is the first empirical parameter used by the theory), and ≈1/4 is, roughly, the fraction of interactions lost by an interface residue (which has lost, roughly, 1 of 6 neighbors in space that it had inside the globule (1 “up,” 1 “down,” and 4 neighbors along the future interface), but 2 of these 6 neighbors in space cannot be lost — they are its neighbors in the chain). Thus,

$$\Delta {E}^{\#}/{k}_{B}{T}_{\mathrm{melt}} \approx 0.5{L}^{2/3}$$
(2)

The entropy constituent \({\Delta S}^{\#}\) of the barrier free-energy \({\Delta F}^{\#}\) is caused by entropy lost by closed loops protruding from the globular into the unfolded phase (note that the partially folded state, denoted as # in Fig. 5, contains two closed loops, and the another partially folded state in Fig. 5 contains no closed loops).

When the shape of the native protein fold and especially the shape of the chain in the transition state are not known, the closed-loops-connected \({\Delta S}^{\#}\) value (which is ≤ 0, because it is due to restriction of loop conformations) can only be estimated — from above and from below.

The upper limit of \({\Delta S}^{\#}\) is zero (when the interface contains no closed loops).

The lower limit of \({\Delta S}^{\#}\) is about.

$$({\Delta S}^\#)_{\mathrm{lower}}\approx{\textstyle\frac16}{(L}^{2/3})\bullet\left[-{\textstyle\frac52}k_{\mathrm B}\ln({3L}^{1/3})\right]$$
(3)

Here, \(\frac16{(L}^{2/3})\) is the maximal expected number of loops protruding from the maximal (containing ≈\({L}^{2/3}\) residues) unavoidable interface. Actually, \(\frac16{(L}^{2/3})\) is the average number of loops protruding from the interface containing \({L}^{2/3}\) residues. The multiplier \(\frac16\) results from the fact that the chain can have, roughly, 6 directions in each interface residue (4 along the interface, 1 inside the folded part, and only 1 looking outside, thereby initiating a loop). Among many possible cross-section interfaces dividing the globule into two halves, the lowest-free-energy interface should serve for the transition state in the folding/unfolding pathway. Therefore, this “optimal” interface should be covered by no more than \(\frac16{(L}^{2/3})\) or possibly a smaller number of closed loops.

The value \({3L}^{1/3}\equiv(L/2)/{(\frac16L}^{2/3})\) is the average number of residues in a closed loop in the transition state (\(L/2\) being the number of unfolded residues in the transition state and \(\frac16L^{2/3}\) the maximal number of closed loops there). The value \(-\frac52k_{\mathrm B}\ln(3L^{1/3})\) is the entropy lost by a \(3{ L}^{1/3}\)-residue closed loop at the interface (such a loop cannot cross the interface plane; this restriction changes 3/2, the conventional Flory’s (1969) coefficient for the entropy of an unrestricted closed loop, for 5⁄2 (Finkelstein and Badretdinov 1997a, b)). Having L ~ 100 (actually, this approximation is good for the whole range of L = 10–1000), one obtains

$$(\Delta {S}^{\#}{)}_{\mathrm{lower}} \approx -{{\frac{5}{12}k}_{\mathrm{B}}L}^{2/3}\left[\mathrm{ln}\left(3\right)+\frac{\mathrm{ln}(L)}{3}\right] \approx -{{k}_{\mathrm{B}}L}^{2/3}$$
(3a)

In the mid-transition ambient conditions, the corresponding transition state free energy, \(\Delta {F}_{0}^{\#}\), equals to ΔE# − TmeltΔS#. The \(\Delta {F}_{0}^{\#}\) value is not less than ΔE#0 (when ΔS# = 0) and not larger than ΔE# − TmeltS#)lower, that is,

$$[\Delta {E}^{\#} \approx 0.5{L}^{2/3}{k}_{\mathrm{B}}{T}_{\mathrm{melt}}] \le \Delta {F}_{0}^{\#} \le [\Delta {E}^{\#} -{ T}_{\mathrm{melt}}(\Delta {S}^{\#}{)}_{\mathrm{lower}} \approx 0.5{L}^{2/3}{k}_{\mathrm{B}}{T}_{\mathrm{melt}} + {L}^{2/3}{k}_{\mathrm{B}}{T}_{\mathrm{melt}}]$$
(4)

Thus, when the free-energy difference ∆F between the native (the most stable) and the unfolded state is equal to zero, the time of both folding and unfolding of the L-residue protein chain is estimated as

$$TIME_{\Delta F=0}\approx\tau\times exp\left[+\Delta F_0^\#/k_BT_{\mathrm{melt}}\right]\sim\tau\times\exp\left[+\left(0.5\div1.5\right)L^{2/3}\right]$$
(5)

where τ ≈ 10 ns is the time of structure growth by one residue (Zana 1975) (this τ is the second and the last empirical parameter used in the theory (Finkelstein and Badretdinov 1997a, b)).

Here, one thing should be added: A search over folds with different chain knotting can, in principle, create a rate-limiting “quasi-Levinthal” factor since the knotting cannot be changed without globule decay. However, since the computer experiments show that one chain knot involves many tens of residues (Grosberg 1997), this factor for the chain of 100–200 residues can be 22–24 only, and the search for correct knotting can only be rate-limiting for extremely long (L >  > 1000) chains (Finkelstein and Badretdinov 1998) that cannot fold within a reasonable time (according to Eq. (5)) in any case.

The above Eq. (5) shows that in the mid-transition conditions (where ∆F = 0), a ≈100-residue protein chain should attain its most stable fold within milliseconds or days, but not years.

If the native fold is more stable than the unfolded state (i.e., if ∆F < 0), the folding is faster. Because the folding nucleus covers about half of the chain (more detailed calculations give ≈40% (Garbuzynskiy et al. 2013)), its free energy decreases from \(\Delta {F}_{0}^{\#}\) (that was at ∆F = 0) to approximately \(\Delta {F}_{0}^{\#}\) + 0.4 ∆F at ∆F < 0, so that

$$TIM{E}_{\Delta F < 0} \sim TIM{E}_{\Delta F = 0 }\times \mathrm{ exp}[+ 0.4 \Delta F/{k}_{\mathrm{B}}T]$$
(6)

which can be approximately presented as

$$TIME_{\Delta F<0}\sim10\mathrm{ns}\times\exp\left[+\left(0.5\div1.5\right)\times\left(L^{2/3}+0.4\Delta F/k_BT\right)\right]$$
(6a)

(Garbuzynskiy et al. 2013). Because the value ∆F ≈ 40 kJ/mol for a ≈100-residue protein under physiological conditions (Privalov 1979), the folding time of such a protein decreases by about 500-fold, and now ranges from a fraction of a millisecond to tens of minutes.

It should be noted that all the above considerations are focused on the case of the moderate stability of the native fold, which corresponds to the available data on protein folding (occurring near the mid-transition point, see Fig. 2). For the opposite case of a very high native fold stability (-ΔF >  > kBT), another but similar to Eq. (5) scaling law (ln(TIME) ∼ L1/2) was obtained by Thirumalai (1995).

Conclusion: one can see that although the protein folding problem is the so-called “NP-hard” problem (Ngo and Marks 1992; Unger and Moult 1993) (which loosely speaking implies an exponentially-long time to be spent to solve it by a folding chain or by a computer), and indeed the time is, in the main term, a stretched-exponential function of the chain length L (see Eqs. (3a), (5), (6a), and the later rigorous mathematical papers (Fu and Wang 2004; Steinhofel et al. 2006)), this does not mean that this time is unreasonably long for a normal-size protein domain of ~ 100 residues.

Protein folding times: theory and experiment

The observed protein folding times (see Fig. 6) span over 11 orders of magnitude (which is akin to the difference between the lifespan of a mosquito and the age of the universe).

Fig. 6
figure 6

Folding rates and times. Experimental in vitro measurements have been made “in water” (under approximately “biological” conditions) and at mid-transition for 107 single-domain proteins (or separate domains) without SS bonds and covalently bound ligands (though the rates for proteins with and without SS bonds are principally the same (Galzitskaya et al. 2001)). The golden-and-white triangle: the region theoretically allowed by physics at the mid-transition. Its golden part corresponds to biologically-reasonable folding times (≤ 10 min); the bronze belt is the additional area allowed in “biological” conditions. The white zone: the larger folding times (i.e., the lower folding rates) are observed (for some proteins) only under mid-transition (i.e., “non-biological”) conditions. The yellow dashed line limits the additional area allowed for oblate (1:2) and oblong (2:1) globules at mid-transition; the bronze dashed line means the same for “biological” conditions. L is the number of amino acid residues in the protein chain. ΔF is the free energy difference between the native and unfolded states of the chain under the experimental conditions and temperature T close to 300 K. Adapted from (Garbuzynskiy et al. 2013)

Figure 6 shows the region theoretically allowed in Garbuzynskiy et al. (2013) for the folding times by Eqs. (5)–(6a) (obtained with only two empirical and no adjustable parameters) and describes the observed folding times of all studied before 2013 single-domain globular proteins of any size and stability of their native state.

Figure 6 also shows that a chain of L ≲ 80–90 residues will find its most stable fold within minutes (or faster) even under “non-biological” mid-transition conditions, where folding is known (Creighton 1978; Fersht 1999) to be the slowest (see also Fig. 2). Thus, native structures of such relatively small proteins are under complete thermodynamic control: they are the most stable among all structures of these chains. In other words, any possible lowest-energy fold can be achieved at a “biologically reasonable” time for these small proteins.

Native structures of larger proteins (of ≳ 100 to ≈450 residues) are, in addition, under a kinetic “control of complexity,” in a sense that too entangled (due to, e.g., complicated β-sheets) folds of their long chains (having too many intersections with any globule’s cross section) cannot be achieved within days or weeks even if they are thermodynamically stable; indeed, globular domains with greatly entangled folds of long protein chains have never been observed (Garbuzynskiy et al. 2013): they seem to be excluded from the repertoire of existing protein structures. Besides, the native fold of at least one protein (serpin) of ≈400 residues is not the most stable but a long-living metastable fold (Tsutsui et al. 2012).

The kinetic control also explains why larger (with L ≳ 450) proteins should have far from spherical shape or consist (according to the “divide and rule” principle) of separately folding domains: otherwise, chains of more than 450 residues would fold too slowly. This is a kinetic “size restriction” for domains. In essence, this effect resembles Levinthal’s “kinetic control,” though at another level and only for very large proteins. The above estimates (≈100 and ≈400 residues) are somewhat (by 30–50%) elevated when the native fold free-energy ΔF is substantially lower than that of the unfolded chain, but essentially they remain nearly the same (Garbuzynskiy et al. 2013).

Equations (5)–(6a) outline the range of folding times depending on the protein size and stability of its native structure under given ambient conditions. To predict the protein folding time more accurately, the shape of its folding nucleus or, for lack of such information, its native fold should be taken into account. So did Plaxco et al. (1998), who introduced a “contact order” (CO, that equals to the average chain separation of the residues that are in contact in the native protein fold, divided by the chain length) as a phenomenological measure of complexity of the native fold (though, CO “works” well only for small proteins that fold without folding intermediates). Later, this CO was added (Ivankov et al. 2003) to the already developed (Finkelstein and Badretdinov 1997a, b) chain length dependence, and the resulting method (Ivankov et al. 2003) showed quite good results, now for all proteins; in particular, it was shown that α-proteins (having low CO due to intra-helical H-bonds) fold faster than other proteins of the same size (Ivankov and Finkelstein 2004), though large α-proteins (with low CO) fold much slower than small β-proteins (with high CO). The subsequent extension of this method (Finkelstein et al. 2013; Ivankov and Finkelstein 2020) gave even more accurate results.

It should be added that no attention was paid in these works to specific 3D structures of folding nuclei; the attention was only paid to their overall features like size, instability and complexity. The reason: although, in some cases, there is evidence that folding nuclei are well-organized and possess specific structural features (see Fersht 1999, 2000; Garbuzynskiy and Kondratova 2008; Shaw et al. 2010)), in other cases, they are poorly organized (“diffused nuclei”) (see (Grantcharova et al. 2001; Finkelstein et al. 2007, 2014) and references therein). The latter, together with the observed sensitivity of positions and shapes of the folding nuclei to mutations, led to the conclusion that a “nucleus” is an ensemble of structures rather than a single structure (Galzitskaya and Finkelstein 1999; Garbuzynskiy et al 2004) and that the folding nucleus and folding pathway are much less resistant to amino acid sequence mutations and change of ambient conditions than the native protein structure.

Also, it should be noted that all the above considerations were focused on stability (or rather, instability) of transition states (folding nuclei), and virtually no attention was paid to folding intermediates, because these — in contrast to transition states — do not determine the rate of folding of native protein structures (Fersht 1999, 2000).

Dependence of the number of compact chain folds (and of the time of iterating over them) on the protein size

The total (“Levinthal’s”) volume of the protein conformation space estimated at the level of amino acid residues is huge: ≳ 3100 conformations for a 100-residue chain (see above).

However, should the chain sample all these conformations in search for its most stable fold? No, a vast majority of them are non-compact (that is, high-energy ones) and should not be examined, but the conformation space is covered by local energy minima, each surrounded by a local energy funnel (Fig. 7) providing fast downhill decent to this local minimum. And, actually, the folding protein chain only has to sample various chain folds within these local energy funnels leading to compact protein globules.

Fig. 7
figure 7

Comparison of a huge search among all, mostly disordered, conformations and a much less voluminous search only among compact and well-structured globules, thus corresponding to the deep energy minima surrounded by energy funnels. Adapted from (Finkelstein 2017)

To estimate the actual volume of this sampling, one has to estimate the number of low-free-energy local energy minima. This is similar to the idea of enumerating all possible “topomers” that a protein chain can form (Debe et al. 1999; Makarov and Plaxco 2003; Wallin and Chan 2005).

An overview of protein 3D structures shows that interactions occurring in the chains are mainly connected with secondary structures (Levitt and Chothia 1976; Chothia and Finkelstein 1990; Finkelstein and Ptitsyn 2016). Thus, a question arises as to how large the total number of energy minima is if considered at the level of formation and assembly of secondary structures into a compact globule, that is, at the level considered by Ptitsyn (1973) in his model of stepwise protein folding.

We will be interested mostly in proteins that fold under thermodynamic control, that is, those having chains of L ≈100 or less amino acid residues (see above). Such proteins have no more than 10 α- and β-structural elements (Ptitsyn and Finkelstein 1980; Rollins and Dill 2014).

The number of compact globular packings of the chain is by many orders of magnitude smaller than that of conformations of amino acid residues (Finkelstein and Garbuzynskiy 2015): the latter, according to Levinthal’s estimate, scales up as something like 100L or 10L or 3L with the number L of residues in the chain, while the former scales up not faster (see below) than ~ LN with the chain length L and the number N of the secondary structure elements. N is much less than L (N < L/10, according to Rollins and Dill (2014)), and this drastic decrease of the power N as compared to L is the main reason for the drastic decrease of the conformation space.

The number of compact globular packings of the chain with given secondary structures can be presented (Finkelstein and Garbuzynskiy 2015) as a product of the following multipliers (Fig. 8).

Fig. 8
figure 8

Adapted from Supplement to (Finkelstein and Garbuzynskiy 2015)

Scheme for estimating the volume of the conformational space at the level of secondary structure assembly and packing. Explanations are given in the text.

MA, the number of Architectures, i.e., types of dense stacks of given secondary structures. This number is small (cf. (Levitt and Chothia 1976; Murzin and Finkelstein 1988; Chothia and Finkelstein 1990)). It is usually (at L ≾ 100 and N ≾ 10) about 10 or less architectures (Fig. 8a) for a given set of secondary structures, since the architectures are packings of a few secondary structure layers (each containing several secondary structures), and therefore the combinatorics of the layers is very small, as compared to that of much more numerous secondary structure elements, which is described below.

MP, the number of all possible combinations of positions of N structural elements within the given protein architecture that cannot exceed N! ≡ N × (N − 1) × … × 2 × 1 (Fig. 8b).

MT, the number of all possible topologies, i.e., all combinations of directions of these structural elements that cannot exceed 2 N (Fig. 8c).

The above means that the number of compact packings of N secondary structure elements (“topomers”) is about MA × MP × MT ≈ 10 × N! × 2 N. Using Stirling's approximation (N! ≈ (N/e)N), we have

$$NUMBER\;of\;topomers\approx M_{\mathrm A}\times M_{\mathrm P}\times M_{\mathrm T}\approx\left[10\times\left(\frac2e\right)^N\right]\times N^N\approx N^N$$
(7)

in the main term at N >  > 1.

Each of these topomers contains MS × T ~ (L/N)N local energy minima connected with shifts and turns of secondary structure elements within a topomer (Fig. 8d).

MS × T is this number of possible shifts and turns of structural elements within the dense globule. Here, transverse shifts and tilts are prohibited by the dense packing, while longitudinal shifts and rotations of structural elements are coupled (this is shown in (Fig. 8d) using a β-sheet as the best illustrative example, but this is also true for α-helices — remember their “knobs in the holes” close packings by Crick (1953)). As a result, each of N α- and β-element can have about L/N (that is, about the number of chain residues per an element) possible shifts/turns in the globule formed by N secondary structures in the L-residue chain.

So, the

$$NUMBER\;of\;energy\;minima\;to\;be\;sampled\approx\left(M_{\mathrm A}\times M_{\mathrm P}\times M_{\mathrm T}\right)\times M_{S\times T}\approx N^N\times(L/N)^N=L^N$$
(8)

in the main term (if L >  > N >  > 1) (Finkelstein and Garbuzynskiy 2015).

This number can be somewhat reduced by the symmetry of the globule, by shortness of some loops, by the impossibility to have α-helices inside β-sheets, etc., but this is not important in estimating the upper limit of the number of conformations (Finkelstein and Garbuzynskiy 2015).

As to the question of how the chain knows where and what secondary structures to form, the answer is that most of the secondary structures are determined by local amino acid sequences (Ptitsyn and Finkel'shtein 1970; Ptitsyn 1973; Lim 1974a, b; Chou and Fasman 1974; Schulz et al. 1974; Ptitsyn and Finkelstein 1983; Finkelstein et al. 1990; Jones 1999; etc.).

Because in a chain of L ≈ 20 residues one (N = 1) α-helix forms within ≈0.2 μs (Mukherjee et al. 2008), and a β-hairpin of N = 2 β-strands forms within ≈6 μs (Muñoz et al. 1997), the time necessary for iterating over ~ LN of possible assemblies of the secondary structures can be estimated (cf. Equation (6a)) as.

$$TIME\;for\;iterating\sim10\;\mathrm{ns}\times L^N$$
(8a)

In a compact globule, the length of a secondary structure element should be proportional to the globule’s diameter, i.e., to ~ L1/3. More specifically (taking into account volumes of amino acid residues and their length along the chain and α and β structures), a diameter of a globule of L residues is ≈5 L1/3 Å, and thus, on the average, α helix consists of ≈3 L1/3 residues, while a β-strand, as well as a loop, comprises ≈1.5 L1/3 residues. Thus, an α-helical globule (consisting of α-helices connected by loops) contains ≈L/[L1/3(3 + 1.5)] = L2/3/4.5 helices, and a β-structural globule (consisting of β-strands connected by loops) contains ≈L/[L1/3(1.5 + 1.5)] = L2/3/3 β-strands (Finkelstein and Garbuzynskiy 2015). This means that

$$NUMBER\;of\;structural\;elements\;N\approx\frac{\ln\left(L\right)}{4.5}\;\mathrm f\mathrm o\mathrm r\;\mathrm{\alpha}\text{-}\mathrm{proteins}-\!\!\!-\frac{\ln\left(L\right)}3\;\mathrm f\mathrm o\mathrm r\;\mathrm{\beta}\text{-}\mathrm{proteins}$$
(9)

Thus, the value LN of possible secondary structure assemblies is expected to come within the range

$$L^{L^{2/3}/4.5}\equiv\exp(\frac{\ln\left(L\right)}{4.5}\times L^\frac23)\;\mathrm f\mathrm o\mathrm r\;\mathrm{\alpha}\text{-}\mathrm{proteins}-\!\!\!-L^\frac{L^\frac23}3\equiv\exp(\frac{\ln\left(L\right)}3\times L2/3)\;\mathrm f\mathrm o\mathrm r\;\mathrm{\beta}\text{-}\mathrm{proteins}$$
(10)

Since ln(L = 50 ÷ 150) = 4 ÷ 5, the outlined range of possible secondary structure assemblies, LN, can be estimated, for domains of L ≈ 100 residues, as

$${L}^{N} \approx \mathrm{ exp}({L}^{2/3})-\!\!\!-\;\mathrm{ exp}(1.5 {L}^{2/3})$$
(11)

So, the number of the secondary structure assemblies scales with the chain length L approximately as the upper boundary of the range of folding times outlined by Eq. (5) (Finkelstein and Badretdinov 1997a, b), and

$$TIME\;for\;iterating\sim10\;\mathrm{ns}\times L^N\approx10\;\mathrm{ns}\times\exp(L^{2/3})-\!\!\!-10\;\mathrm{ns}\times\exp(1.5L^{2/3})$$
(12)

coincides with the upper boundary of the range of folding times given by Eq. (5).

It is not out of place mentioning that the scaling of LN given by Eq. (10) looks exactly like those obtained by Fu and Wang (2004) and Steinhofel et al. (2006) from mathematical consideration of the folding problem complexity for a chain consisting of only two kinds of links (“hydrophobic” and “hydrophilic” ones) rather than from physical reasons.

Conclusion

The point of this article is not to explain how proteins fold (this needs experimental studies of many proteins of various kinds); the point is to explain why a protein chain is able to choose, in minutes, its unique most stable 3D structure among an enormous number of alternatives.

Throughout the article, we have only considered folding (and unfolding) of a single protein chain that does not interact with anything but a solvent.

Our review is mostly theoretical; it aims to clarify a physical theory behind our understanding of protein folding. The reason for this theoretical accent is that the famous Levinthal’s paradox, which concentrates the essence of protein folding enigma, is itself, actually, a theoretical concept, and hence its “ultimate” resolution is also expected to be theoretical. Otherwise, this paradox is doomed to remain unsolved and not understood. This paradox cannot be solved by a direct experiment (which would need enormous time and an experimental investigation of folding of all possible polypeptide sequences), and even this would only give a result, but not its understanding. Besides but not the least: solving the Levinthal’s paradox, the presented theory generates experimentally testable predictions that turn out to be correct (see, e.g., Fig. 6 of the review, where 212 out of 214 experimental points fall into the theoretically predicted region).