Introduction (why a fractal description of the protein interior is appropriate)

Proteins are enigmatic entities. They have been described as ‘complex mesoscopic systems’ [1]. Apart from being characterized by an enormous number of degrees of freedom, proteins have multidimensional potential energy surfaces. It is a known fact [2, 3] that while a protein needs to adhere to its specific native fold to ensure its (thermal) stability, this native fold should allow certain parts of the protein to undergo large-amplitude movements. These movements (studied under the broad umbrella of fluctuation studies) assume a critical character in ensuring the desired functioning of a protein. It is equally intriguing to notice that although the structure of a protein can be accurately studied as that of a composite macromolecule, it has many a properties in common with that of macroscopic microparticles [4]. From somewhat a different perspective, crystallography studies [5] have reported on the mesoscopic nature of protein structures. The standard ‘compact object description’ of proteins (characterized by small-amplitude vibrations and by a low-frequency Debye density of states) cannot account for their ‘non-idealistic’ behavior [2, 3]. Indeed, the non-constancy of distance between any two atoms \( (|r_{I} - r_{j} |\, \ne \,{\text{constant}}) \) in any biologically functional protein can easily be verified with the simplest of computer programs. In addition, recent [6] and previous [7] characterizations of inhomogeneous distributions of mass and hydrophobicity in the protein interior merely serve to complicate an effort to construct a straightforward and linear scheme for the description of the protein interior.

Thus, it is unrealistic to expect that constructs that are otherwise adequate to describe the complexity of simple structures (spheres, cubes, other regular structures of idealistic shape and characteristics) will be facilitate attempts to describe proteins. Clearly, to achieve this goal one needs to think beyond the tool-set of simple geometry and the physics of idealistic systems. To be categorical, one needs to adopt an approach that has the capabilities to describe the inherently inhomogeneous and nonlinear behaviors of protein structural parameters as an innate capacity of proteins, rather than treating them as deviations from (illusory) ideality. In this context, one notes that quantification of the self-similarity prevalent in protein properties can serve as a potent tool to achieve the aforementioned challenging task. Self-similarity has a unique advantage over many other possible constructs because an objective evaluation of self-similarity will enable deciphering of the hidden symmetry that connects global patterns of macroscopic properties in proteins (such as hydrophobicity distribution, polarizability distribution, etc.) with the local (atomic) interactions that produce them.

During attempts to describe many natural phenomena, researchers found that a measure denoted as fractal dimension (FD) could reliably quantify self-similarity. Hence, resorting to FD to quantify the self-similarity prevailing in the distributions of protein biophysical and biochemical properties does not appear to be unreasonable. As a result, investigations of protein structures using FD-based measures have generated enormous interest. The aim of this review is multifold: (1) to collate the various FD-based approaches that have been employed to study the protein interior over the years into distinct clusters; (2) to present an account of cases where FD-based methodologies have successfully contributed to research of protein interior. In this context, a thorough documentation of accurate predictions made from the spectrum of FD-based studies will be presented. However, fractals are no panacea, and they cannot suggest magic solutions to all the problems. Consequently, a third aim of our review is to examine the intrinsic limitations of FD-based measures. Finally, with a balanced assessment of the entire framework, we will attempt to identify some of the outstanding questions for which the application of FD-based investigations may help in deciphering deep and unexpected facets of protein structural organization.

Definition of FD

The concept of FD is based primarily upon the quantitative expression of the symmetry of self-similarity. While the Euclidean perception of symmetry revolves (essentially) around translation, rotation and reflection, self-similarity reveals itself through the symmetry of scale-invariance. This invariance implies that when any part of a self-similar object is magnified, it is seen to bear an exact resemblance to its whole. For example, a magnified view of one part of the coastline may not be a precise reproduction of the full picture, but it will definitely have the same qualitative appearance. Hence, it can be asserted that a coastline demonstrates statistical self-similarity. By definition, any structure possessing a self-similar motif that is invariant under a transformation of scale and which cannot be described by topological dimensions may be represented by a FD. For regular structures, self-similarity (in general) is geometric (the so-called ‘geometric self-similarity’); for random or irregular objects, however, one observes self-similarity to be statistical in nature. Statistical self-similarity implies that the statistics of the spatial correlations in the patterns associated with the description of the system are similar across a range of length scales. See Fig. 1 for a description of the differences between geometric and statistical self-similarity. Here we note that the average end-to-end length of an unbranched polymer chain demonstrates statistical self-similarity [810] (this point will be returned to later in this review).

Fig. 1
figure 1

a Example of exact (geometrical) self-similarity: the ‘Koch-curve’. b Example of statistical self-similarity: roughness profile of the Earth’s surface. Both diagrams have been reproduced courtesy of Professor Paul Bourke [153]

The dimension of an object measures its complexity. For example, in Euclidean space, the dimension of set measures the amount of information that is necessary and sufficient to identify any point of that set. Traditionally, points have zero dimension as there is nothing to specify, straight lines and curves have one dimension because they can be parameterized by a single real coordinate and surfaces have two dimensions since two coordinates can describe every point on a surface. However, this intuitive concept of dimension cannot adequately characterize many ‘real-world’ objects [11] that can be characterized by the fractal dimensions. Many researchers tend to refer to the same as the ‘Hausdorff dimension’ [12] because Mandelbrot’s definition provides a simplification of a more formal concept proposed by Hausdorff in 1919 [13]. For a formal introduction to fractals, the reader is referred to Falconer [14] and Meakin [15].

Schools of protein interior fractal studies

Application of FD-based methodology to investigate proteins dates back to (almost) the beginning of the development of modern fractal theory. This historical perspective helps us to understand why two of the most fundamental forms of protein information, namely, the primary structure and three-dimensional structure of proteins, respectively [1618], were chosen for study in preliminary investigations. The school of primary structure studies owes its origin to polymer research and focuses primarily upon examining fractal properties within backbone connectivity. The other school of studies largely ignored main-chain information and concentrated on the distribution of residues and atoms within protein space. These schools provided a general framework by which to quantify interior fractals. Various sub-approaches can be observed within both these broad-based approaches that differ from each other (mainly) in terms of their implementation strategy. There are also two other basic approaches—slightly more abstract—that have been used to probe the protein interior. One of these revolves around the application of correlation dimension-based constructs to explore interior structural invariants, and the other attempts to relate renormalization group theory to the context of protein structure analysis. It is necessary to (briefly) survey these lines of thoughts when comparing and contrasting protein structure. Such a comparative analysis helps us to assess the possibilities of implementing the approaches in particular cases while investigating the complex (nonlinear) dependencies of the protein interior. Figure 2 presents a schematic view of the families of FD-based approaches. In this review, we will not survey the various methodologies and results associated to multifractal description of protein properties.

Fig. 2
figure 2

Genealogy of the family of fractal-based approaches to study protein interior. FD Fractal dimension

Approaches with backbone connectivity

Based on known data and theories (see section Definition of FD dimension), in 1982 Havlin and Ben-Avraham [810] studied the fractal properties of polymer chains. Denoting the average end-to-end length of an unbranched polymer by L, the number of monomer segments (of length ε) by N and fractal dimension as FD, they found the following scaling relationship: \( N = \left( \frac{L}{s} \right)^{{\frac{1}{{\upsilon_{\text{F}} }}}} = \left( \frac{L}{s} \right)^{\text{FD}} , \) in which the exponent turns out to be the inverse Flory constant υ F in polymer theory. (Theoretical considerations provide limits 1 < FD < 2, which correspond to a linear structure given by L = εN and to a structure represented by unrestricted random walk, when FD approaches the topological dimension magnitude 2, where \( L = \varepsilon N^{\frac{1}{2}} \) [19]).

Two similar approaches emerged from this basic platform. Both considered protein length to be the most important variable, but they differed on the exact definition of protein length. The first method attempted to find the scaling exponent of the contour length, which is proportional to the number of monomers with respect to the end-to-end length. The second method attempted to quantify the length of a line segment along the chain, measured with a scale of the number of monomers in the segment. Because of its algorithmic strategy, the second method is alternatively denoted as ‘the internal line segment method’.

In the first method [8, 9, 17, 18], protein length is defined as:

$$ \langle Z(n)^{2} \rangle = \frac{1}{N - n + 1}\sum\limits_{i = 1}^{N - n + 1} {Z_{i,i + n}^{2} } $$

where \( Z_{i,i + n}^{2} \) is the square of distance between terminal residues i and i + n of the chain with N amino acids and n intervals. The FD in this case is calculated by performing a linear regression in the scale-range appropriate to the logarithmic form of:

$$ \left[ {\langle Z(n)^{2} \rangle^{\frac{1}{2}} } \right]^{\text{FD}} = {\text{Const}}.n $$

(The exact scheme of implementation of this strategy is a bit more complex; the interested reader can consult the work of Havlin and Ben-Avraham [8, 9] for details of the algorithm and to observe the importance of FD when studying polymers).

The second method, although based on regression for the calculation of FD magnitude, differs from the first one in its definition of protein chain length. This approach [16, 20] defined protein length as:

$$ L(n) = L(m) + \frac{N - n\mu - 1}{n\mu }L(m) $$

where N is the number of amino acids, n is the total length interval, L(m) corresponds to the length of the chain for the first m-integer segments and \( \mu = \left[ {\text{int} \left( \frac{N}{n} \right) - 1} \right] \). The second term quantifies the remaining (nm) length of the segment. Qualitatively, this equation implies that with this scheme the emphasis rests primarily on quantifying the scaling behavior of the segmental length of the backbone, which is defined as the sum of stepwise connections of straight lines, measured for different intervals of monomers.

FD can then be obtained by analyzing L(n) = Const.n 1−FD (only) in the scale-range where ln [L(n)] and ln (n) are linearly related. Such linearity in the log–log plot corresponds to the scale invariance associated with the observation of the phenomenon under consideration. Some 10 years after this methodology was published, Xiao [21] suggested a modified version of it by proposing two possible correction schemes. These can be achieved by implementing the end-to-end distance of the remaining residues and by performing L(n) measurements starting from different Cα atoms, instead of only for the terminal Cα. These corrections increase the quality of the regression for high values of n, probably because it is only then the statistical nature of the process becomes more apparent. A detailed comparison of this genre of FD-based measures with the others that attempt to describe symmetry in protein backbone chain can be found in Bytautas et al. [22] and Aszódi and Taylor [23].

The FD magnitudes obtained from the first and second method, in the case of a self-avoiding walk (SAW) model, were 5/3 and 1.40, respectively [9, 10, 21]. It is known that, in three dimensions, FD will approach the topical two-dimensional Gaussian walk (or Θ solvent), whereas it will assume a magnitude 5/3 for SAW, which includes the excluded volume effect [24]. However, since the SAW does not take into account any attractive force, a deviation from idealized magnitude 5/3 is also expected, especially when there are interactions between monomers that might be far apart along the main-chain, but in close proximity within the folded protein [25]. Thus, more realistic polymers are likely to have a chain FD different from that of the theoretically predicted SAW magnitude, which is precisely what was observed from the obtained FD values. For more comprehensive discussion on FD-based description schemes for polymer collapse and polymer scaling in general, the reader is referred to previously published works [24, 26, 27]. However, it is important to remember that proteins are heteropolymers with a variable composition of 20 different amino acids and, additionally, according to the laws of polymer physics, proteins are very small (consisting of merely 50–250 amino acids in general). This is in stark contrast with procedures in the sphere of polymer studies because scaling relationships in polymer physics are (principally) derived for large homopolymers. Hence, one needs to be careful when attempting any possible extension of an algorithm and/or argument that is pertinent in the paradigm of polymer physics. (The reader is referred to Summary box 1 for a more detailed discussion on this topic).

Approach with residue (and atom) distribution

The second approach, which disregards any consideration of the connectivity profile of main-chain atoms, concentrates on the distribution of amino acid residues and atoms within the folded protein structures. Whether these residues and atoms are connected or not is not considered in this approach. This procedure (commonly called the ‘mass fractal’ calculation) studies the mass distributions within concentric spheres using, for example, the mass scales formulation M ∝ R FD (where M is the total mass of all the protein atoms and R is the characteristic length scale). Thus, from studying the linear (scale invariant) portion of the slope of log M versus the log R plot, it is possible to determine the FD. Theoretically, one can accomplish this by counting the number of atoms within concentric spheres, taking any atom of the protein space as the center (see Fig. 3a for a schematic description). However, for most practical applications, the choice of the center of mass of the protein for the role of aforementioned center makes more sense. Although the generic name for the calculation of this genre of FD is mass fractals, this methodology can be used to calculate other pertinent properties associated with protein atoms. A larger mass-FD magnitude implies that within the protein interior the property under consideration is relatively more compact in terms of its distribution. The mass-FD magnitude varies as 2.00 < mass-FD < 3.00. Proteins have an intrinsic self-similarity with respect to their compactness (atom packing). The mass-FD simply quantifies the symmetry of the scale-invariance associated with any protein property that is dependent on atom packing. It can also be extended to the level of residues, but not to the realm of organization of residues in secondary structures (α-helices are not composed of smaller helices). (The reader is referred to the Summary box 2 for a more detailed discussion on this topic).

Fig. 3
figure 3

a Schematic description of mass-fractal dimension calculation. The radial distribution of mass within the protein is obtained first, before studying the scaling of mass with respect to the radius. b Schematic description of the calculation of the correlation dimension (CD) in the context of CD amongst π-electron clouds; dependencies are identified before quantifying the extent of scaling of such dependencies with respect to radial distances

This approach to FD calculation is rather new. Although reference to this approach can be found in Dewey’s work [28], the first rigorous attempt to describe proteins with this methodology occurred as recently as 2005 [29]. Thereafter, many studies [3035] have utilized and extended the scope of this powerful construct to successfully unearth many forms of latent symmetries in the paradigm of protein interior interactions (these are discussed later in this article).

The mass-FD calculation, even though it presents a completely different standpoint of interior fractal study, is related to the approaches discussed above. The philosophy of the calculation of mass-FD can also be traced back to principles of polymer physics. The criterion of a ‘good’ solvent requires that the polymeric units have a preferential interaction with the solvent rather than with each other. As a consequence, the polymer acquires an extended shape and becomes what is called an ‘excluded volume polymer’. In this case, the Flory constant νF assumes the magnitude 3/5. In an ‘ideal’ solvent (the so-called Θ solvent), the strength of the interaction of polymer with the solvent is identical to the strength of the interaction with itself. As a result, in an ‘ideal’ solvent, the polymer assumes a more compact shape than it does in a ‘good’ solvent, with \( \upsilon_{\text{F}} = 1/2\). Polymer collapse takes place in the presence of a ‘poor’ solvent. In such a case, polymer–polymer interactions become more favorable than polymer–solvent interactions. Hence, the polymer forms a collapsed or a globule state with \( \upsilon_{\text{F}} = 1/3 \). This entire spectrum of possibilities can be elegantly described by R g ∼  F, where R g denotes the radius of gyration (ROG) of the polymer chain (a measure of its compactness), N stands for the number of units and Flory constant υ F assumes different values depending on the solvent conditions. However, as N ~ M and M ~ R FD, we end up with \( \upsilon_{\text{F}} = 1/{\text{FD}} \). Thus, the excluded volume polymer will have \( {\text{FD}} = 5/3 \). For an ideal polymer the FD magnitude will assume that of the topological dimension, namely, FD = 2; for a collapsed polymer, the FD will assume the magnitude of another topological dimension, namely FD = 3, whereby the polymer will assume a homogeneous space-filling nature. Neither \( \upsilon_{\text{F}} = 1/2 \) nor \( \upsilon_{\text{F}} = 1/3 \) is pertinent in the context of a functioning protein; in fact, they merely serve as the trivial limiting cases. Thus, while the relationship 2 < mass-FD < 3 holds for proteins, fledgling mass-FD studies can also be effortlessly associated with the established set of treatise of polymer physics. For a comprehensive review of polymer studies with FD measures, the reader is referred to a (classic) decade-old review [36]. (The reader is also referred to Summary box 1 for a more relevant discussion).

A note of caution, the discussion up to this point may give the reader the impression that the concept of FD is synonymous with that of ROG, but this is not the case. The reasons for their being different are discussed in detail later in this review. This discussion on ROG was included because most of the polymer physics works refer to ROG as a measure of compactness (instead of mass-FD).

Approach with the correlation dimension analysis

There are situations in which the researcher has a qualitative idea about the distribution of certain properties within the (protein) space, but the existence of a possible symmetry in their dependencies are not known. To obtain an objective view of the distribution of the dependencies, it is convenient to opt for the study of a different fractal dimension, namely, the ‘correlation dimension’ (CD). Informally, to determine the CD, the number of pair of points between which a possible dependency might exist is first identified. Amongst these, the ones that lie within a sphere of radius r are counted, C(r) (see Fig. 3b for a schematic description). By assuming a scaling C(r) ∼ r CD, the Grassberger–Procaccia algorithm [37] is implemented to examine the possible presence of a linear stretch in the slope of a log–log plot of C(r) versus r. Although a previous work [38] characterized the roughness of protein surfaces with CD, application of the same for protein interior studies occurred only very recently [39] where it was shown that the CD magnitude for ribosomal RNAs could successfully explain their structural characteristics (local helix formation and long-range tertiary interaction forming three-dimensional structures).

A categorical implementation of CD can be attempted by suitably describing each of the N monomers by some typical atom (say the Cα or centroid). Describing these atoms by points, the distance between two monomers i and j can be denoted as \( |xi - xj| \). CD can then be formally defined as:

$$ {\text{Corr}}(r) = \frac{2}{N(N - 1)}\sum\limits_{i < j}^{N} {\theta (r - |xi - xj|)} $$

where, θ(x) is the Heaviside step function and N(N − 1) is a renormalization term [the presence of which implies that Corr(r) can be considered as the probability that any two residues are in contact at a cut-off distance r]. The term \( \left[ {2\sum\nolimits_{i < j}^{N} {\theta (r - |xi - xj|)} } \right] \) describes symmetrical properties of the contact matrix.

{In fact, Corr(r) is an unbiased estimator of the correlation integral:

$$ C(r) = \int {{\text{d}}\mu (x)} \int {{\text{d}}\mu (y)\theta (r - |x - y|)} $$

Both Corr(r) and C(r), monotonically decrease to zero as r → 0. If C(r) decreases with a power law, namely, C(r) ~ r D, then D (found from the slope) is called the correlation dimension of μ. Comprehensive theoretical discussion on this can be found elsewhere [40].

Finally, the dimension is defined by: \( D = \lim_{r \to 0} \frac{{\log {\text{Corr(}}r )}}{\log (r)} \). Details of the implementation of this algorithm along with a thorough discussion of several aspects of it can be found in previous publications [38, 39]. (The reader is referred to Summary box 2 for a more detailed discussion).

Approaches with renormalization group theory

As has already been mentioned, proteins are large macromolecules, and protein structures undergo large-scale fluctuations. In fact to ensure proper functioning, proteins often undergo large-amplitude movements [41, 42]. Since these fluctuations occur over all length scales, determination of a single characteristic scale parameter becomes extremely difficult (if not impossible). As a result, correlation functions between various biophysical properties within proteins typically demonstrate non-analytical behavior. More often than not, these correlation functions tend to follow fractional power law scaling [28] and depend on the dimensionality of the protein (under a given context), rather than on its microscopic characteristics.

Although the renormalization group theory (RGT) was originally applied to describe critical phenomena [43], the study of fractals and RGT are naturally linked because as in the case of fractal studies, the concept of scale invariance plays a major role in RGT. Briefly, the absence of an internal scale is one of the fundamental features of fractals. This can be rephrased by asserting that in fractals a great many number of scales coexist in a self-similar manner, which in turn reminds us of a comparable situation known from thermodynamic systems at the critical point. Divergence of correlation length at the critical point can be mathematically handled by assuming a composition of subsystems with some special coupling constant [44]. In such a case, each of the subsystems can be assumed to be built up of sub-subsystems with another coupling constant, and so on—which can reliably be called the self-similarity of protein interior organization. While FD-based studies attempt to quantify this self-similarity, the central idea of the RGT is that a change in the scale should lead to the same behavior apart from a renormalization of the coupling constant.

Two principal transformations are studied in the RGT paradigm, namely, the coarse graining and the scaling [45]. During the analysis of large-scale fluctuations of a protein, studies of its micro-scale perturbations might not always assume a great importance. Hence, in order to tackle the huge complexity associated with the fluctuations of proteins, researchers have come up with the efficient strategy of removing the length of the shortest interactions before the system is redefined by rescaling the length scale [28].

Since proteins exhibit self-similarity in their organization, this self-similarity can be captured with RGT and used to probe the various facets of (time-variant and context-dependent) interior dependencies. A categorical example of this can be found from the analysis of protein folding, which has been proved to be comparable to a phase transition [46, 47]. These problems are characterized by a multiplicity of scale lengths [one observes a large spectrum of correlations that correspond with the critical point, from short-range (among residues close by) to long-range (amongst residues situated far apart)]. This discussion of the scope of RGT implies that it can be suitable for a rigorous and objective description of such phenomena. Indeed, taking a cue from polymer physics studies [48], RGT has also been extensively applied in the realm of protein structure studies ([4952] and references therein). While Bohr and Wolynes [53] employed it for a protein folding study with contact energies, Chan and Dill [54] described its possible effectiveness for protein stability studies, and Coveney [55] used it for ‘systematic coarse-graining’ of a system of coupled nonlinear ordinary differential equations (see also relevant references therein) while studying protein interaction profiles. However, even after all these applications, the use of RGT has yet to become a common practice while attempting the scaling and coarse-graining operations in protein paradigm; as expressed in a recent work: ‘rigorous mathematical procedures, such as RGT, have yet to be applied to the general definition of coarse-grain models’ [56] while studying proteins.

Since FD- and RGT-based studies are closely related, there is an overlap between these approaches, and indeed many FD-based studies have explored their interface [5759] with serious intent. It should not be forgotten, however, that these studies have been performed on a protein network paradigm and not on protein structure. In the sphere of protein interior studies, one merely finds a small number of FD studies that tangentially mention RGT [8, 17, 18] although there are two fantastic works [28, 60] in which the approaches and results of FD and RGT (in the context of polymer and proteins) are compared and contrasted in admirable detail.

Approach with the spectral dimension

Owing to the finite temperature effects, the constituent residues of proteins fluctuate around their native positions. An objective characterization of this dynamics is of pivotal importance when studying protein functions and is usually investigated through normal modes analysis (NMA) [61, 62]. A normal mode of an oscillating system is a pattern of motion in which all parts of the system move sinusoidally with the same frequency. The frequencies of the normal modes of a system are known as its natural frequencies or eigenfrequencies. The spectral dimension (d S) governs the density of the low-frequency normal modes of a fractal or protein. It (d S) is defined [63] according to the asymptotic behavior of the density of harmonic oscillations at low frequencies. To be precise, with the expression: \( g(\omega )\sim \omega^{{d_{\text{S}} - 1}} \), when [ω → 0], where g(ω) denotes the density of modes with frequency (ω), and g(ω) denotes the number of modes in the frequency range [ωω + ]. Hence, d S can undoubtedly be identified as a marker directly related to the vibrational dynamics of a fractal.

Although an old study [64] on the interpretation of fractals provides a hint of this concept, its formalization can be found first an article by Burioni and Cassi [65] and subsequently in two other published studies [66, 67]. The applicability of this construct increases significantly when it was based upon the generalization [66] of the Landau–Peierls instability criterion [68], which provides a connection between the spectral dimension (d S) and FD. The number of amino acids along the protein backbone N has been established recently [30]. This relationship is given by:

$$ \frac{2}{{d_{\text{S}} }} + \frac{1}{\text{FD}} = 1 + \frac{b}{\ln (N)} $$

where the parameter b shows a weak dependence on temperature and interaction parameters. The impact of this derivation can be easily ascertained by observing the varied application of it (notably in [68, 69]) in problems across protein structure studies. For a detailed biophysical understanding of the interplay of fractal and spectral dimensions, the reader is referred to the article of Reuveni [3].

Qualitatively, the spectral dimension describes the effective connectedness within protein structures, of which a large magnitude corresponds to high topological connectedness (the reader is referred to Summary box 2 for a more detailed discussion). This complements the mass-FD magnitude, which typically quantifies how completely a protein property fills the protein space. The spectral dimension magnitude coincides with the Euclidean dimension in the case of lattices, but in general, d S assumes non-integer values of between 1 and 3.

Results obtained with fractal-based investigations

Fractal-based procedures can extract valuable information about the latent symmetry in terms of the arrangement of atoms and residues. They can also be indispensable in searching out patterns from a spectrum of nonlinear, time-dependent and context-dependent interactions amongst the interior biophysical properties. Thus, not surprisingly, these algorithms have been applied (meaningfully) in a wide range of cases. Due to the aperiodic arrangement of the amino acids, conformational states of proteins consist of many sub-states with nearly the same energy [70]. Experimental evidence indicates that each sub-state of the protein has in itself a large number of sub-states, and the potential energy function is statistically self-similar, having the same form on many different scales [51]. Here we present the principal findings from a wide variety of FD-based studies.

FD-based protein conformation studies

Fractal dimension is a reliable indicator of protein conformation because it provides a quantitative measure of the degree to which a structure (and any property associated with the structure) fills the space. Based on this recognition, as far back as 1982, Stapleton and his co-workers [71] showed that changes in protein structure under solvent conditions could conveniently be monitored by FD. In fact, their studies [17, 18, 71, 72] reported for the first time that the geometry of the carbon backbone determined from X-ray diffraction data and the vibrational dynamics of proteins as measured by Raman scattering are both fractals. The results of their work proved that FD determined from the structure correlates (almost) perfectly with that determined from the dynamics. Subsequently, in 1984, another group took this question up and worked further on the Stapleton finding that the FD of the polymer (ω) coincides with the d S that governs the phonon density of the states g(ω) at small frequencies as \( g(\omega )\sim \omega^{{d_{\text{S}} - 1}} \). However, their interpretation [73] of fracton was challenged [74] soon after (on the ground that fracton dimension only considers the scalar excitations, while the phonons are vector excitations), and the exponent in the Stapleton finding was related to the spectral dimension. For an in-depth account of these concepts, interested readers are referred to earlier publications ([18, 7375] and references therein), to appreciate their pivotal significance in the context of contemporary theoretical soft condensed matter research.

FD-based ion-channel kinetics studies

Liebovitch and co-workers [7678] proposed a fractal model of ion channel kinetics in 1987. The fact that their model was more consistent with the conformational dynamics of proteins helped in establishing that fractal models can be accurate and reliable when conformational transitions between states that consist of a hierarchy of sub-states are taken into consideration. Liebovitch et al.’s interpretation of the results was radical because it was based on the notion that dynamic processes in proteins occur with many different correlation times. By adjusting the parameters of the model, the observed gating behavior could be described over a wide time range. The immediate impact of this study on their then contemporary studies can be understood by comparing the (varying) approaches and interpretations suggested a number of these studies [7982] with those that were attempting to describe the same genre of results with other tools. Subsequent works by Liebovitch [83, 84] on the analysis of patch clamp recordings of the sequence of open and closed times of cell membrane ion channels were able to establish a new trend in ion channel kinetics studies, which asserted that ion channel proteins have many conformational states (connected by large number of pathways) of nearly equal energy minima. These states were shown not to be independent but, rather, linked by physical mechanisms that result in the observed fractal scaling. The impact of these findings can be appreciated from the profile of succeeding studies [8589] on fractal kinetics from 1991 onwards. For example, in a recent work [89] it has been proved that patterns in spiking activity in suprachiasmatic nucleus neurons can be explained with the fractal point process model, which implies the presence of self-similarity in the profile of spike trains, proving thereby that no characteristic time scale dominates the dynamics of the spiking process.

FD-based attempts to relate protein structure and dynamics

It has long been known that protein dynamics involves a broad range of interconnected events occurring on various time scales [90]. It is therefore hardly surprising that protein systems exhibit nonexponential behavior. The fractal approach facilitates the development of models of protein dynamical processes. There is a wealth of nonexponential rate processes in biophysics, and fractal models can be effective in describing such processes [91]. Although some dynamical models [92, 93] had been developed to describe the vast array of nonexponential processes, discriminating between them was a difficult (yet important) problem. Fractal models have a unique appeal they allow a structural parameter, the FD, to (automatically) be related to a dynamic parameter, such as d S, the spectral dimension. Dewey [27, 91, 94] explored this inherent direct connection between structure and dynamics in a series of studies. To study the emergence and characteristics of these nonexponential behaviors involved in the conformational change of proteins and protein structures in general, the reader is referred to one of Dewey’s articles [28]. Recent applications of this approach can be found in a number of publications [9597].

Studies in fractal kinetics

The discussion on fractal studies of ion channel kinetics and nonexponential rate processes falls under the broad category of ‘fractal kinetics’ study. It was Kopelmann [98] who suggested that the general phenomenon of logarithmically decreasing reaction rates should be collectively considered under the umbrella term of ‘fractal kinetics’. His assertions were proven experimentally under a number of conditions [99]. Spatial and/or energetic heterogeneity of the medium, or non-randomness of the reactant distribution in low dimensions, has been suggested as (possible) mechanisms underlying fractal kinetics. Kopelman [99] showed how surface diffusion-controlled reactions, which occur on clusters or islands, can be expected to exhibit anomalous and fractal-like kinetics. Mechanisms of fractal kinetics exhibit anomalous reaction orders and time-dependent (for example, binding) rate coefficients, being therefore advantageous. Investigations [100] (with theoretical considerations raised by statistical methods of random walk on fractal structures) have proven that fractal kinetics is not due to the number of binding sites on each protein molecule but, rather, is a reflection of the fractal properties of enzymes (and proteins in general) themselves. Since the roughness of protein surfaces follows fractal scaling [101] and FD is a global, statistical property, insensitive to time-dependent tiny fluctuations, the results in [100] can be easily understood. The same study had (notably) established that non-integral dimensions of the Hill coefficient (used to describe the allosteric effects of proteins and enzymes) are direct consequences of fractal properties of proteins. Another consequence of this inherent fractal nature is demonstrated by the power-law dependence of a correlation function at certain coordinates and the power-law dependence of the correlation function on the surface of analyte-receptor complexes over time [95]. The original Kopelman algorithm was given strong theoretical support when it was shown that fractal kinetics owes its emergence to spatial self-organization of the reactants induced by the compact properties of diffusion [102]. Being an extremely significant platform, countless applications of this scheme have been attempted over the years, with a number of these being quite important [103107]. The interested reader might also benefit from studying the work of Turner et al. [103] who carried out an in-depth comparison of various pertinent schemes.

A discussion on fractal kinetics cannot be complete without a note on fractal time studies. However, this area falls outside the purview of this review. The interested reader can find an excellent discussion on chemical kinetics and fractal time and their pertinence in the realm of protein interaction studies elsewhere ([108], pp 158–163 of [28]).

FD-based results on protein structure

The recent trend of FD-based studies indicates a strong inclination towards discovering protein structural invariants. Since mass-FD is a measure of the compactness, it can alternatively be used as an effective measure of the extent of torsion that the extended polypeptide undergoes during folding: the stronger the torsion of the systems, the larger it’s mass-FD, and vice versa. Therefore, FD reflects, to a large extent, the structural information of proteins and is itself a reliable construct to study the extent of folding. Enright and Leitner [29] formalized this intuitive notion by systematically calculating the mass-FD of a set of 200 proteins. They found the magnitude to be 2.489 ± 0.172, which was consistent with those obtained from dispersion relations and the anomalous subdiffusion computed for the same set of proteins [109], which were (in turn) related to the theory of mass fractal dimension proposed by Alexander and Orbach [63]. A large body of incisive studies performed during the last 3–4 years has zoomed in on two aspects of Enright–Leitner’s finding, i.e., the influence of the magnitude of the mass-FD on protein dynamics and energy flow and the non-compactness of the folded primary structure. The findings from these studies were unexpected. Here we outline a number of them.

The similarity between the geometry of a globular protein and a percolation cluster was pointed out by Liang and Dill [7]. However, a (masterly) follow-up study [110] revealed that energy flow through the vibrational states of proteins is anisotropic and that this process can be reliably described by the theory of transport on a percolation cluster in terms of the dimension of the cluster and the density of vibrational states. Although the precise functional roles of the energy propagation channels could not be identified, the researchers proposed the possibility of relating these to the fast cooling of the proteins during the reaction, directed energy transport to speed up the reaction, communication signals and allostery.

Adopting the fractal point of view made it possible to analyze protein topology and protein dynamics within the same framework. This advantageous fact was utilized wonderfully in a recent work by De Leeuw et al. [2] in which a universal equation of state for protein topology was derived. Although the marginal stability criterion of proteins had been discussed in an earlier study [111], De Leeuw et al. [2] proved that most proteins ‘exploit’ the Landau–Peierls instability [112] to attain large-amplitude vibrations. The authors insisted that this was the key mechanism by which proteins perform large-amplitude conformational changes to ensure proper function while maintaining an invariant native fold. Furthermore (perhaps most importantly), they proved that the majority of proteins in the Protein Data Bank (PDB) exist in a marginally stable thermodynamic state, namely a state that is close to the edge of unfolding. These represent exciting new findings on the state of existence of proteins, and they could never have been brought to the fore without efficient and accurate implementation of FD-based concepts.

The horizon in this area has recently opened up yet further when the answer to a basic question, namely, ‘why, although proteins seem to vibrate under a complex energetic landscape thanks to their highly involved dynamics, the same energetic landscape appear harmonic near the minima?’ was answered. This was achieved [113] by challenging the very validity of the harmonic approximation concerned, and the study was able to establish that anomalous vibrational dynamics in proteins is a manifestation of a fractal-like native state structure. To obtain an even more inclusive picture of this aspect of protein structure studies, one can complement the aforementioned [113] assertion with that from another recent study [114] in which the latter authors were able to identify a possible mechanism, namely, the small fractal dimension of ‘first passage networks’, that accounts for reliable folding in proteins with rugged energy landscapes. On a related note, it is useful to know that a work on protein folding that is too slow to be simulated directly, asserted that the fractal Smoluchowski equation could reliably model protein folding by subdiffusion [115]. In this way, this methodology could achieve what the standard paradigm of either diffusion (the usual Smoluchowski equation) or normal-diffusion continuous time random walk of a single order parameter under thermodynamic influences would have had difficulty achieving.

Another equally unexpected set of results [31] was presented when the degrees of non-compactness of mass, hydrophobicity and polarizability within proteins were quantified across four basic structural classes. This study revealed that barring all-α class, all of the major structural classes of proteins have an amount of unused hydrophobicity left in them. The presence of unused hydrophobicity in the bacterial type 3A cellulose-binding domain protein ‘1nbc’ (all-beta domains) is shown in Figure 4. The amount of unused hydrophobicity was observed to be greater in thermophilic proteins than in their (structurally aligned) mesophilic counterparts. The all-β proteins (thermophilic, mesophilic alike) were identified with a maximum amount of unused hydrophobicity, while all-α proteins were found to possess minimum polarizability. These results proved that the origin of α-helices are possibly not hydrophobic but electrostatic, whereas β-sheets are predominantly hydrophobic in nature.

Fig. 4
figure 4

Profiles of the mass-fractal (MFD), hydrophobicity-fractal (HFD) and polarizability fractal (PFD) dimensions for the protein (chosen randomly) ‘bacterial type 3a cellulose-binding domain’ (PDB id.: 1NBC; no. of residues 155, radius of gyration 8.53) are plotted simultaneously. Stretches in these profiles that are linear and parallel to the abscissa (i.e. radial distance from the center of mass of the protein) denote the extent to which these profiles are scale-invariant, with variations in radial distance. Ordinate magnitudes corresponding to these stretches provide the value of MFD, HFD and PFD for the protein. The [+(HFD − MFD] parameter quantifies the magnitude of unused hydrophobicity left in the protein. In the present case, ([+(HFD − MFD]) is given by [+(HFD − MFD] = [(2.32 − 2.18)] space-filling unit = [+0.14] space-filling unit

Staying with non-compactness, but approaching the questions from a different perspective, the backbone of the ribosome was examined in another study [38] by employing correlation dimension (discussed in section Approach with correlation dimension analysis). The results of this study revealed that capacity dimensions of rRNAs are less than their embedding dimension, implying the presence of some empty spaces in rRNA backbones. Going back to the principles of polymer physics in the context of proteins, Hong and Jinzhi [35] derived a unified formula for the scaling exponent of proteins under different solvent conditions. Since this formula was obtained by considering the balance between the excluded volume effect and elastic interactions among monomers, one can expect it to be the focus of a lot of attention in future studies. Hong and Jinzhi’s results showed that the scaling exponent is closely related to the FD of a protein’s structure at the equilibrium state. Hidden dependencies between protein structural class-specific FD magnitudes and kinetic–thermodynamic parameters (the folding and unfolding rate, folding–unfolding free energy) were recently studied [39]. The results of this study confirmed the dependence of FD values on the fold type and on the location and connectivity of the secondary structures. However, the authors left a cautionary note by reporting that the presence of turns tends to increase the values of FD, irrespective of the structural classes. In an effort to describe accessible surface area, the number of amino acids and the radius of gyration with a FD-centric protein view, a recent work on protein (non)compactness [116] found that compactness of a protein might well be described in terms of packing between random spheres in percolation threshold and crumpled wires.

Although the description of a single protein molecule as a contact network of amino acid residues is not new, a recent study [117] has investigated the properties of the shortest path length between the number of nodes by employing the network topological dimension, the fractal dimension and the spectral dimension. The authors proved conclusively that residue contact networks of proteins in native structures are not small world networks but fractal networks.

Gaining new knowledge of the protein interior with FD

While various successful applications of FD-based constructs in probing protein interiors are described in the section Results obtained with fractal-based investigations, much is yet to be done in this area. Just to emphasize this very point, that is, how as yet-unexplored questions can be rigorously investigated through the innovative use of FD constructs, we present here a glimpse of correlation-dimension-based investigations of protein structural parameters. Since the underlying symmetries can be studied in short, medium and long-range interactions between various residues with an invariant framework, correlation dimension (CD)-based studies on the protein interior can prove to be extremely useful. The scope and resolution of the obtained results underscore the point the application of this methodology in a series of elaborate and systematic studies will provide extensive information on numerous (elusive) dependencies between biophysical and/or biochemical invariants within proteins.

Implementation of CD-based constructs

Correlation dimension (as explained in section Approach with correlation dimension analysis) is different than the other FD constructs, because it attempts to investigate the self-similarity in distributions between correlated properties (or dependencies) which arise out of particular dispositions of atoms or residues. In other words, instead of studying the self-similarity in distribution of atoms or residues—something that the mass-fractal analyses [explained in section Approach with residue (and atom) distribution] performs—here, one examines the self-similar characteristics of correlations amongst the properties that come to being due to any given distribution of atoms or residues. A correlation dimension-based analysis of key biophysical properties necessary for protein stabilization is shown in Fig. 5. While some of these are quantifiable through other possible means, others, say the possible self-similarity in the distribution of π-electron clouds of the aromatic amino acids, will be difficult to obtain through other procedures. While in the section CD-based investigation of local and global dependencies amongst peptide dipoles units we describe how SCOP (structural classification of proteins [118])-class centric inferences can be obtained with CD-based constructs, in the section CD-based investigation of dependency distribution amongst hydrophobic residues, charged residues and residues with π-electron clouds we describe the utility of this methodology in quantifying the extent of dependencies amongst properties of individual residues. The section CD-based investigation of local and global dependencies amongst peptide dipoles units therefore presents the results of general nature, whereas the section CD-based investigation of dependency distribution amongst hydrophobic residues, charged residues and residues with π-electron clouds presents the results at the resolution of particular proteins. Instead of merely concentrating on characterization of self-similarity in electrostatic interactions, the general problem of protein stability was addressed by separately investigating the self-similarity in correlations among peptide dipoles, charged amino acids, aromatic amino acids and hydrophobic amino acids. Conserved patterns derived from the results of these four independent investigations provided a unique scheme by which to quantify protein stability at a general (SCOP fold and SCOP class) level.

Fig. 5
figure 5

Denoting the spatial correlation between two residues separated by distance r (alternatively, the probability of two residues to be in contact at a cut-off distance r) by C(r), limit of [log(C(r))/log(r)] was calculated when r → 0. This magnitude quantified the correlation dimension for each of the cases. In other words, from the slope of each of the four curves, the corresponding correlation dimension was calculated. Details of the methodology and background for each of these figures can be found in the sections Approach with correlation dimension analysis and FD-based attempts to relate protein structure and dynamics. a CD of hydrophobic amino acid, b CD of charged amino acid, c CD of dipoles and d CD of aromatic amino acid for selected proteins are shown

CD-based investigation of local and global dependencies amongst peptide-unit dipoles

Why is it advantageous to resort to CD while studying protein electrostatics?

Many quantitative analyses have attempted to describe the delicate balance of forces in the complex charge subsystem and hard-to-model dielectrics of the protein interior. Proposed simulation strategies to study this problem range from continuum dielectric methods [119] to an explicit approach for dealing with polarizability [120]. In other words, starting from the Tanford–Kirkwood dielectric cavity scheme, the analytical solution of the Poisson–Boltzmann equation, the non-linear numerical finite difference [121], boundary element algorithms [122] and, finally, the sophisticated empirical generalized Born solutions [123] have all been tried. However, the explicit treatment of polarizability, even at its modest linear response level, involves a high-computational price and detailed know-how of molecular modeling practices. Thus, the immediate need of many protein scientists, i.e. an easy-to-understand and easy-to-calculate measure to describe protein electrostatics, is not always addressed by the aforementioned procedures. In this context, the use of CD-based analyses of the protein interior may be immensely beneficial. Since charge–charge and peptide dipole interactions have been identified to play important roles in the stabilization of proteins [124] and in determining the native fold [125], we demonstrate here how CD-based investigations on these can bring valuable information to the fore using an invariant simple framework.

Calculation of CD to quantify dipole distribution

Although the peptide group is uncharged at a normal pH, its double-bonded resonance form accounts for a dipole moment of approximately 3.6 Debye, directed from oxygen of C=O, to hydrogen of N–H bond. In the case of certain secondary structures, such as the α-helix, these dipole moments align with each other to produce a significant resultant dipole moment, which in turn may account for non-trivial contributions to protein stability. The electrostatic field resulting from the arrangement of helix dipoles has been proposed as an important factor contributing to both helix stability and specificity [126]. Åqvist [127] and coworkers have demonstrated that this effect is short-ranged, and its influence is confined largely to individual backbone dipoles localized within the first and last turns of the helix, resulting in a formal positive charge at the helix N terminus and formal negative charge at the C terminus. However, the global picture regarding protein interior electrostatics, as a function of backbone conformation of the native structure, does not necessarily emerge from that. Since dipoles in an α-helix are electrostatically aligned nearly parallel to the axis of the helix [128], such a global picture can be obtained straightaway by calculating the general dependency profile between dipoles originating from the numerous mutual spatial dependencies between the peptide-unit dipoles across the entire backbone. The mere vectorial summation to calculate the resultant dipole vector will fall short of quantifying this aspect of dipole distribution in protein structures.

Hence, choosing the standard threshold for considering a pair of residues to be in long-range contact [129, 130], namely a maximal distance of 8 Å between their centroids [alternatively, the distance between Cβ atoms Cα for glycines) can be chosen; the results do not vary appreciably (unpublished data)] and resorting to the formulae provided in section Approach with correlation dimension analysis, the CD were calculated for every non-redundant protein structure (taken from the PDB [131]) within all of the SCOP folds with a statistically significant number of non-redundant protein structures. All of the non-hydrogen atoms were considered while calculating the centroid of an amino acid. The set of formulae mentioned in the section Approach with correlation dimension analysis was used to calculate the CD. The reader is referred to Theiler [132] for details of the methodology. After obtaining the CD for each of the proteins belonging to a particular SCOP fold, average of obtained CDs for a particular SCOP fold was subsequently calculated to draw biological inferences at the SCOP-class level. While Tables 1 and 2 contain the list of 15 SCOP folds with the maximum and minimum magnitudes of CD between peptide dipoles, respectively, the entire list can be found in the Electronic Supplementary Material 1.

Table 1 List of 15 SCOP folds with maximum correlation dimensions between peptide-unit dipoles
Table 2 List of 15 SCOP folds with minimum correlation dimensions between peptide-unit dipoles

Brief discussion on peptide-unit dipole distribution within SCOP folds

Most of the studies on peptide-unit dipoles [126, 128] have concentrated on the nature and magnitude of the (resultant) dipole moments, especially in the context of α-helices. Results obtained with CD, on the other hand, have quantified a different aspect of peptide-unit dipoles in proteins, namely, quantification of the prevalent symmetry in dependencies amongst the spatial arrangement of the dipoles across proteins from all seven SCOP classes. Somewhat unexpectedly, the results revealed that peptide-unit dipoles in α/β class of proteins are, in general, more correlated to each other than the peptide-unit dipoles in proteins belonging to all-α class proteins. Of 15 SCOP folds with a maximum CD amongst peptide-unit dipoles (Table 1), only one entry (‘α/α toroid’, with the maximum CD) belongs to the all-α class of proteins, nine belong to the α/β class of proteins, four belong to the α + β class of proteins, and one is from multidomain proteins. Hence, although peptide-unit dipoles in α-helices are electrostatically aligned nearly parallel to the axis of the helix [128] and although their resultant effect in helix stability and specificity [126] is well-known, with respect to ensuring maximum dipole–dipole interactions, the α/β structures appear to be better optimized. Then again, in the particular case of fold ‘α/α toroid’, the closed circular arrangement of array of α-hairpins presents perfect positioning for the peptide-unit dipoles to correlate with each other, which explains α/α toroid’s having the highest CD for peptide-unit dipoles. Notably no structural domain belonging to SCOP class all-β was found in the list of folds with a maximum CD between peptide-unit dipoles.

The assertion of better inherent correlations between dipoles in α/β proteins is vindicated by Table 3 and by results shown in Table 2 too. Amongst the 15 SCOP folds with a minimum CD between peptide-unit dipoles, not a single entry was found in α/β class of proteins. Four entries from the ‘small proteins’ class underline the fact that in this SCOP fold, the correlation (and dependencies) between peptide-unit dipoles on each other is extremely less. Interestingly, four folds from the all-α class are found to have the least spatial correlations amongst peptide-unit dipoles which, supported by the data in Table 1, contradicts the popular notion that dipole–dipole interactions within all-α proteins are the best. On the other hand, Table 3 results show unambiguously that the spatial correlation (and therefore, dependency) between peptide dipoles in all-α and all-β class of proteins is almost same. This result is, however, not unexpected because CD does not take into account the directionality of dipoles, but merely attempts to quantify their spatial correlations.

Table 3 Correlation dimension between peptide-unit dipoles across four major SCOP classes

CD amongst charged amino acids across all seven SCOP classes

Considering His, Arg, Lys, Asp and Glu as the charged amino acids, we carried out an analysis on the non-redundant structures from all seven SCOP classes with the same methodology as used in the previous case. Table 4 shows 15 SCOP folds with maximum values of CD amongst charged amino acids, and Table 5 shows 15 SCOP folds with minimum magnitudes of CD amongst charged amino acids. Interestingly, eight out of top 15 CDs between charged amino acids were observed in proteins from the α/β class (Table 4). This is striking similar to the results obtained from the SCOP-class-wide peptide-unit dipole correlation pattern (Table 1), where nine of the top 15 CD magnitudes were observed in proteins from the α/β class. The optimized electrostatic environment of α/β proteins can alternatively be assessed from their (conspicuous) absence from Table 5. Although it was expected that the proteins from the SCOP-class ‘small proteins’ and ‘designed proteins’ might have a minimal CD among the (aforementioned) five charged residues (Table 5), it was unexpected to observe minimal CDs among those five charged residues even among the proteins from three all-α folds (‘RuvA C terminal domain like’, ‘λ-repressor like DNA binding domains’ and ‘SAM domain like’). Furthermore, since these are exactly the same folds with minimal CD amongst peptide-unit dipoles (Table 2), it may be indirectly inferred that the electrostatic environment within the proteins belonging to these domains is minimally conducive.

Table 4 List of SCOP folds with a maximum correlation dimension amongst the charged amino acids (His, Arg, Lys, Asp and Glu)
Table 5 List of SCOP folds with a minimum correlation dimension amongst the charged amino acids (His, Arg, Lys, Asp and Glu)

CD amongst aromatic amino acids across all seven SCOP classes

It is known [133] that Tyr, Trp and Phe possess the ‘π-system’. While many studies have been conducted on characterizing cation–π interactions, π–π stacking, among others, the symmetry in spatial correlation between π-electron clouds is far from being understood. Considering all of the (non-hydrogen) atoms belonging to aromatic rings of Tyr, Trp and Phe, we carried out an analysis on the non-redundant structures from all the seven SCOP classes, using the same methodology as in the previous case. Table 6 shows 15 SCOP folds with maximum values of CD amongst π-electron clouds of the aromatic amino acids, whereas Table 7 shows 15 SCOP folds with minimum magnitudes of CD amongst the same aromatic amino acids. Conforming to the trends observed in the distribution of peptide-unit dipoles and charged amino acids, six out of top 15 CDs between aromatic amino acids were observed in proteins from the α/β class (Table 6). Aromatic amino acids from proteins belonging to the α + β class feature in the same list for four of the top 15 cases. The marked absence of proteins belonging to either of these SCOP classes from Table 7 implies the existence of conducive electrostatic environments within α/β and α + β proteins in general, resulting in interactions between π-electron clouds of Tyr, Trp and Phe being maximized. On an unexpected note, interactions between aromatic amino acids of proteins belonging to SCOP class all-α were observed to be minimal amongst all seven SCOP classes, which is demonstrated by the presence of six all-α folds in Table 7.

Table 6 List of SCOP folds with a maximum correlation dimension amongst the aromatic amino acids
Table 7 List of SCOP folds with a minimum correlation dimension amongst the aromatic amino acids

CD amongst hydrophobic amino acids across all seven SCOP classes

Analysis of self-similarity in interactions amongst hydrophobic amino acids was performed by quantifying their dependencies (or correlations) on each other. This can be achieved by calculating the CDs amongst Gly, Ala, Val, Ile, Leu, Met and Cys for proteins belonging to the seven SCOP classes. The observed pattern, namely, that more than half of the SCOP folds with maximum values of CD amongst the hydrophobic amino acids belong to the α/β class (Table 8), point definitely to the superior stability profile of proteins belonging to the α/β structural class in general. Such an assertion of better stability is further vindicated by a marked absence of α/β folds in Table 9, which shows 15 SCOP folds with minimum values of CD amongst the hydrophobic amino acids. Although ‘peptides’, ‘small proteins’ and ‘designed proteins’ were expected to have negligible hydrophobic interaction profiles, the presence of four all-α folds (‘RuvA C terminal domain like’, ‘long-α hairpin’, ‘λ-repressor like DNA binding domains’ and ‘SAM domain like’) and two all-β folds (‘SH3 like barrel’ and ‘glycosyl hydrolase domain’) in Table 9 indicate that it will be simplistic to expect an enhanced nature of hydrophobic interactions amongst proteins from many classes belonging to the all-α and all-β SCOP classes.

Table 8 List of SCOP folds with a maximum correlation dimension amongst the hydrophobic amino acids
Table 9 List of SCOP folds with a minimum correlation dimension amongst the hydrophobic amino acids

General inferences about protein stability from SCOP-class-wide separate analyses of CD between charged residues, the peptide dipole, aromatic amino acids and hydrophobic amino acids

The CD-based analyses conducted here involved the study of self-similarity according to three aspects of protein electrostatics as well as a separate examination of self-similarity in hydrophobic interactions. Although studies of protein stability are numerous and multifaceted, quantitative investigations of self-similarity in interactions that influence protein stability on a completely global level are difficult to find. Even from the limited analyses conducted here, certain significant information on the latent features of protein stability could be unearthed, including a detailed characterization of the electrostatic environment within protein structures belonging to various SCOP folds across all seven SCOP classes. For example, we found that in the four tables showing the names of SCOP folds with maximum CD magnitudes (Tables 1, 4, 6 and 8, each comprising 15 SCOP folds), the names of nine SCOP folds appear consistently. Proteins belonging to these nine SCOP folds, namely, α/α toroid (all-α), α/β-hydrolases (α/β), PLP-dependent transferase-like (α/β), TIM β/α-barrel (α/β), β-lactamase/transpeptidase-like (multidomain proteins), cysteine proteinases (α + β), nucleotide-diphospho-sugar transferases (α/β), protein kinase-like (α + β) and ribokinase-like (α/β), can therefore be considered to be those having the maximum stability in general. Such a conserved profile of symmetry in stability-enhancing factors can further be observed in proteins belonging to folds ‘S-adenosyl-l-methionine-dependent methyl transferases’(α/β) and Class II aaRS and biotin synthetases (α + β) because they appear in three of the aforementioned four tables. The fact that some of these SCOP folds with maximal CD magnitudes are amongst the most evolutionarily conserved folds (say, TIM β/α-barrel, α/β-hydrolases, etc.) is therefore not surprising.

On a diametrically oppsing scenario, one finds an even more conserved profile in the lack of stability by observing the consistency in the occurrence of the names of SCOP folds with minimum CD magnitudes (Tables 2, 5, 7 and 9, each presenting 15 SCOP folds). To elaborate, ten SCOP folds appear consistently in all four tables, while four other SCOP folds appear in three tables. Proteins belonging to these SCOP folds, namely, conotoxins (peptides), transmembrane helical fragments (peptides), knottins (small peptides), zinc finger design (designed proteins), parallel coiled coil (coiled coil proteins), β-β-α zinc fingers (small proteins), glucocorticoid receptor-like (small proteins), rubredoxin-like (small proteins), RuvA C terminal domain-like (all-α) and λ-repressor-like DNA binding domains (all-α), are therefore expected to have a very low stability profile. Amongst these, the fold conotoxins (peptides) is conspicuous in its securing a minimum CD magnitude in all four aspects of the current investigation, indicating the least possible stability for proteins belonging to this fold. Four other SCOP folds are observed to be conserved across three of the aforementioned tables, i.e. SH3-like barrel (all-β), long-α hairpin (all-α), glycosyl hydrolase domain (all-β) and ‘stalk segment of viral fusion proteins’ (coiled coil proteins), indicating also a lack of stability of proteins belonging to these folds. Concentrating solely on the electrostatic environment, one can identify two folds (conotoxins and transmembrane helical fragments) of class ‘peptides’ as having the least conducive electrostatic milieu since they are found to register the least CD magnitudes in correlations among the peptide dipoles and, separately, among the charged amino acids (His, Arg, Lys, Asp and Glu).

These results clearly indicate that many more latent and general patterns of a reported nature can be unearthed by systematic studies of protein interior properties with CD-based analyses.

CD-based investigation of dependency distribution amongst hydrophobic residues, charged residues and residues with π-electron clouds

The reasons for the usefulness of CD-based analyses in studying protein electrostatics have been discussed in sections Why is it advantageous to resort to CD while studying protein electrostatics? and General inferences of protein stability from SCOP-class-wide separate analyses of CD between charged residues, peptide-unit dipole, aromatic amino acids and hydrophobic amino acids. SCOP-class centric (top–down) analysis to demonstrate the technique has been provided in sections Calculation of CD to quantify dipole distribution and CD amongst hydrophobic amino acids across all seven SCOP classes. Here we show the utility of the same analysis based on the amino acid centric (bottom–up) framework. Considering the centroids (instead of Cα-atoms) of amino acids as true indicators of positions, the magnitude of CD (found from the slope of the log–log plot) between predominantly hydrophobic amino acids (Fig. 5a) within the protein ‘carbon monoxide dehydrogenase (PDB id: 1su7) was calculated. (As in the case of previous calculations, the set of formulae mentioned in section Approach with correlation dimension analysis was used. The reader is referred to a previous publication [132] for details of the methodology.) Similarly, considering His, Arg, Lys, Asp and Glu as the charged amino acids, the correlation dimension between charged amino acids was calculated for the protein ‘cold-active citrate synthase’ (PDB id: 1a59) (Fig. 5b). To quantify the self-similarity prevalent in the dipole moment distribution of a complex multidomain protein ‘archaeosine tRNA-guanine transglycosylase’ (PDB id: 1iq8), we calculated the CD (shown in Fig. 5c). Equally interesting was the study of the CD between π-electron clouds of Tyr, Trp and Phe in ‘bacterial glucoamylase’ (PDB id: 1lf6) (Fig. 5d). Although Fig. 5 shows the results obtained from individual protein structures, it is easy to note that comprehensive scale-up of these on a statistical scale can unearth hidden dependencies between pertinent entities embodying biophysical or biochemical properties.

Problems with FD and necessary precautions while working with FD

There are a number of inherent problems associated with FD-based toolkits, and whoever is interested in exploring the scope of utilizing FD-based measures should be aware of these. For example, as a consequence of the scale-invariance of fractals, the functions describing the latter can never be smooth. This is a significant characteristic because it implies that the normal practice of expanding a function to the first order and then to approach the linearized problem will not be compatible with FD-based constructs. The scale-invariance of fractals ensures that one cannot find a scale where functions become smooth enough to linearlize. Due to this innate non-analytic nature, fractals have undefined derivatives, making the application of common principles of differential calculus to them absolutely useless. Thus, the very property that makes fractals useful for describing irregular natural shapes (and phenomena) are in themselves problematic.

On another note, the incongruity between the (idealistic) geometrical requirements of a fractal object and the (real-life) statistical character of the obtained results might seem a bit disconcerting for beginners. While the strict definition of FD demands an ‘infinite scale reduction’, this cannot be expected in calculations with real-systems (say, proteins) because the size of a protein is restricted—i.e. it cannot span until infinity. Thus, the FD of a protein property can only be measured within a range of scale. On the other hand, for many protein properties, the magnitude of FD might show a dependence on the number of residues or atoms—which, again, contradicts the idealistic expectations. However, experience suggests that for a system with a statistically significant number of residues (or atoms in some cases), many of these problems disappear—at least when calculating the FD within a certain scaling range and not across the entire possible scaling range. If, however, the aforementioned dependencies are still found (in other words, if no scale-invariant zones are found to describe the distribution of the property concerned), it is possible to suspect that the distribution of the property is not self-similar in the first place.

These problems can be translated into certain (common) categorical problems:

  1. 1.

    For example, there are non-trivial differences between proteins and ‘classical collapsed polymers’ (pp. 41–44 of [28]). Hence, some of the theoretical constructs of polymer physics cannot be applied ‘directly’ in the realm of protein structure studies. Similarly, while it is known that energy surfaces of proteins are rough hypersurfaces in high-dimensional configuration spaces with an immense number of local minima [70] and that proteins share this feature with other complex systems, such as spin glasses, glass-forming liquids, macromolecular melts, among others [134], one should not forget that compared to the aforementioned complex systems, a protein is a relatively small system. Thus, a blind application of FD-based procedures pertinent for spin-glass studies might not be judicious in the realm of proteins.

  2. 2.

    Furthermore, one should always be aware of the statistical nature of FD-based measures. This implies that they can be useful for studying a property from a ‘top–down’ perspective but not from a ‘bottom–up’ standpoint, in cases where the atoms are distinguishable. Thus, one needs to be careful while studying the ‘local’ profile of certain properties with FD measures. Unfortunately, ascertaining the limit of this ‘local’-ness in terms of the precise number of (interacting) parameters is primarily dependent upon experience. However, a reasonable approach can be to consider a statistically significant number of such interactions. Defining any of the fractal exponents with five to ten atoms makes no sense. On the other hand, problems with aspects of an interesting parameter averaging out due to the innate constraints of FD-based methodologies is not uncommon (see, for example, [38]). Drawing from that, one can assert that probing the protein interior with fractal exponents for some ‘nano’ or ‘microscopic’ properties (or emergence of it) might be extremely risky. Similarly, one should be aware of the possibility that a system might show statistical self-similarity only when the X and Y-axes describing it are magnified by different scales. These are called ‘self-affine fractals’ and while they are not exactly common, they are also not rare in the paradigm of protein studies [135].

  3. 3.

    FD-based analyses often provide results in (somewhat) abstract form that are difficult to be explained fully from our present state of knowledge of biophysics and biochemistry. (Examples of such results can be found in many recent and old works [117, 136, 137]). Thus, one should be patient in deciphering the abstract nature of fractal results.

Summary and outlook

Various approaches to calculate the FD of proteins have been discussed in this review. The results obtained from the studies presented complement each other. Even a cursory look at the various paper repositories will demonstrate the rapid growth in the number of published studies during the last 3 years that have attempted to explore the protein interior with fractal exponents. This body of research collectively presents the contour of a promising multifaceted technique that can honestly and reliably probe an object as complex as protein. Indeed, a lot has been achieved using the FD; however, one can observe that many published studies end with proposed new studies to take the obtained information to completeness. For example, even after the (pioneering) work of Leitner (discussed in section FD-based results on protein structure) [110], one needs to relate the complete information of energy transport channels within individual proteins before associating them to their functional roles. Similarly, studies need to be performed with the ‘universal equation of state’ (discussed in sections Approach with the spectral dimension and FD-based results on protein structure) [2, 30] to fully understand the biophysical and biochemical implications of protein’s existence at ‘the edge of unfolding’ before attempting to use it for practical protein engineering applications. Likewise, in another recent work [31] (discussed in section FD-based results on protein structure), non-trivial correlations between polarizability-FD and hydrophobicity-FD in all-α, α/β and (especially) α + β proteins have been reported. Do these imply a small yet significant dependence of the dielectric constant on the hydrophobicity content of a protein (and vice versa)? Similarly, one needs to explore the uniqueness of distributions of structural properties in all-β proteins to answer why neither their mass nor hydrophobicity distribution show any dependence on their polarizability distributions. In this context, perhaps an even more challenging problem will be to ascertain exactly how many atoms are needed to observe the emergence of mass-FD or hydrophobicity-FD or polarizability-FD? Since, as a recent work indicates, the emergence of protein’s biochemical and biophysical properties with its statistical nature might well be mesoscopic in nature [138] and since FD in the context of protein structure studies essentially quantifies the statistical self-similarity, the evaluation of the exact number of atoms to observe the emergence of interior FD parameters will be immensely beneficial.

None of the questions and none of the points discussed in the paragraph above are simple. However, many of the unsolved problems in the realm of FD-based protein interior studies can be identified with far more (surprisingly) simple questions. For example, as mentioned in the sections Approach with residue (and atom) distribution and Approaches with backbone connectivity, the FD of the protein backbone is independent of the secondary structural content of the protein, which implies that secondary structural classes can reliably be modeled as Hamiltonian walks [28]. (Hamiltonian walk is a lattice walk in which every point is visited once and only once, with no intersecting paths; it can be used to model polymers that show compact local and global scaling, such as ‘crumpled globules’ [139]). Crumpled globules tend to imply an absence of knots in the protein structures [28]. However, knots do occur in proteins and not entirely rarely [140, 141]. How does one resolve this contradiction from an objective general perspective?

Referring to an even simpler (?) case, we concentrate on amino acid frequency studies. The distribution of pairs of amino acids is not a linear combination of frequencies of occurrences of the constituent individual amino acids [142], and the distribution of a given residue along a polypeptide chain is fractal in nature [137]. However, the latter study also reports a negative correlation between the probabilities of occurrence of various residues and their FDs for which, to the best of our knowledge, no general satisfactory answer has been proposed. Similarly, although a recent study [117] establishes that the amino acid residue network in the protein native structure forms a fractal—and not small-world—network, no general biophysical cause could be attributed to this. At the same time, however, the authors’ explanation of the reasons underlying the percolated nature of the amino acid residue-contact network suggests the possibility of incorporating a recent assertion [2] that the majority of proteins in the PDB exist in a marginally stable thermodynamic state, namely, a state that is close to the edge of unfolding. A comprehensive study involving two of these from the perspective of molecular evolution might provide us with unique perception of protein structures.

Hence, while a lot of unknown, unexpected and deep information on (nonlinear) dependencies between protein structural parameters can be unearthed with FD-based probing, one needs to be careful while using the FD tools and interpreting the results obtained through their application. It should never be forgotten that apart from their prominent importance in biological functions, proteins are qualified subjects for studying the dynamics of complex systems. The FD-based investigations of protein interior organization can serve as a unique platform for interactive studies between physicists and biologists. However, to end on a cautionary note, although FD-based investigations of protein properties are insightful and although they can often reveal unexpected information, FD-based measures are not panacea. Therefore, one needs to be aware of the intrinsic limitations of these methodologies and at the same time take necessary precautions (as outlined before) while implementing them.