The methods of fractal geometry and nonlinear dynamics are widely used nowadays for research in diverse fields, including physics [1, 2], chemistry [3, 4], biology [5, 6], ecology [7], economics [8, 9], and technology [10, 11]. There are studies that employ bioinformatic approaches to analyze the relationship between the primary and spatial structure of proteins [12–15]. Procedures based on the analysis of spatial series, including spectral power analysis, detrended fluctuation analysis (DFA), and the normalized range method (R/S), are widely used in these studies [16, 17]. The Hurst coefficient is a key characteristic of spatial and temporal series, which enables the assessment of the degree of randomness/non-randomness and detection of long-term memory in the functions analyzed [18]. It should be noted that large protein molecules were analyzed in most of the published works, while little attention was paid to peptides. The aim of this work was to study the spatial series based on glycine and alanine peptides with the DFA and R/S methods.

Model Peptides

The model peptides studied consisted of glycine (Gly, G), alanine (Ala, A), and random combinations of these residues (Table 1); the peptide’s length ranged from 5 to 50 amino acid residues. Three-dimensional structure of model peptides was simulated in the HyperChem program software [19] with data from the amino acid database. Two conformations of the peptides—the α-helix (L; ϕ = –58°; ψ = –47°; ω = 180°) and the single-stranded β-structure (L; ϕ = 180°; ψ = 180°; ω = 180°)—were used for the analysis. Histograms of the interatomic distances calculated with a step size of 0.01 Å (Figs. 1, 2) were considered as spatial series.

Table 1.   Primary structure of the peptides and the number of glycine (mG) and alanine (mA) residues
Fig. 1.
figure 1

Histogram of interatomic distances (r, Å) in the Gly10 peptide with an α-helix structure.

Fig. 2.
figure 2

Histogram of interatomic distances (r, Å) in the Gly10 peptide with β-structure.

METHODS

The DFA and R/S methods were used to study the model peptides. The DFA algorithm can be represented as a series of steps [20, 21]:

(1) the discrete series X of N samples was transformed by subtracting the mean (μ) and summing: yk = \(\sum\nolimits_{i = 1}^k {({{x}_{i}} - \mu )} \);

(2) the resulting series was divided into non-overlapping blocks of the same length n;

(3) the local trend yk, n was estimated by the least squares method in each block of size n;

(4) the fluctuation function F(n) = \({{\left( {{{\left( {\sum\nolimits_{k = 1}^N {{{{\left( {{{y}_{k}} - {{y}_{{k,n}}}} \right)}}^{2}}} } \right)} \mathord{\left/ {\vphantom {{\left( {\sum\nolimits_{k = 1}^N {{{{\left( {{{y}_{k}} - {{y}_{{k,n}}}} \right)}}^{2}}} } \right)} N}} \right. \kern-0em} N}} \right)}^{{0.5}}}\) was determined for each block of size n; and

(5) the coefficient γ was calculated by taking the double logarithm of the equation F(n) = constnγ. In the present work, a linear function was used to assess the local trend. The minimum and maximum block sizes were 4 and N/4, respectively. Calculation of the Hurst coefficient (H) was based on the ratios between different fractal measures [21].

In the case of the normalized range method, the following algorithm was used [18, 22]:

(1) the original data block of the length N was transformed by subtracting the average and summing;

(2) the range R (the difference between the maximum and minimum value of the series) and the standard deviation S were calculated;

(3) the normalized range R/S was calculated;

(4) the series analyzed was divided into two identical blocks;

(5) steps 1–4 were repeated for each block until the block length was more than 8; and

(6) the coefficients of the linear equation

$$\log \left( {{R \mathord{\left/ {\vphantom {R S}} \right. \kern-0em} S}} \right) = {\text{const }} + H\log \left( n \right).$$

were calculated to determine the Hurst coefficient.

The coefficients and statistical characteristics of the regression models were estimated using the SVD software [23]. The following parameters were used as the statistical characteristics: N, the number of points; R2, the linear correlation coefficient; SD, the standard deviation; and Δ, the standard error of the equation coefficient.

RESULTS AND DISCUSSION

The interatomic distance histograms shown in Figs. 1 and 2 are typical for the peptides studied. For instance, it can be noted that the conformational transition from the α-helix to the β-structure is accompanied by the maximum linear size and intensity values approximately doubling. Therefore, we should expect the behavior of the spatial series studied to largely depend on their conformational state.

As is known [24], the existence of power functions of the type

$$f\left( x \right) = c{{x}^{k}},$$

where x and f(x) are variables and c and k are constant coefficients, is one of the indications that a temporal (spatial) series possesses fractal properties. Exponent k is a characteristic of scale invariance of the series under investigation in this case. In a spectral power analysis study, for example, the square of the amplitude serves as the f(x); the frequency, as x; and the spectral index (–β), as k. In the case of DFA, these parameters are represented by F(n), n, and γ, respectively.

Since there are different methods for studying data series, it is important to know how the coefficients in the power relationships are connected to each other, and a dichotomous model [21] can be used to assess this connection. The fundamental concept of this model consists in the possibility of classifying the series under investigation as stationary fractional Gaussian noise (fGn) or as non-stationary fractional Brownian motion (fBm) according to the results of the analysis of the k values. If the DFA method is used, processes with k(γ) <1 are assigned to the former type and those with 1 < k(γ) <2, to the latter type. Given that the Hurst coefficient (H) is widely used to characterize the fractal properties of temporal (spatial) series, we will consider the following relationships:

$$H = \gamma \,({\text{fGn}});\,\,\,\,H = \gamma - 1\,({\text{fBm}}).$$

Examples of the calculation of fractal parameters for the Ala50 molecule by the DFA and R/S methods are shown in Figs. 3 and 4. The linear models obtained have good statistical characteristics, and the slope coefficients provide information on the degree of invariance in the spatial series analyzed.

Fig. 3.
figure 3

Dependence of the fluctuation function (F(n)) on the block size (n) for the Ala50 peptide (α-helical structure).

Fig. 4.
figure 4

Dependence of the normalized span (R/S) on the block size (n) for the Ala50 peptide (α-helical structure)

The results of the calculations of the fractal characteristics for peptides composed of Gly and Ala are shown in Table 2. The mean values (Hα and Hβ) were calculated to elevate the statistical significance of the data obtained. As evident from the data presented, the replacement of glycine by alanine does not lead to significant changes in the H value. At the same time, the dependence of the Hurst coefficient on the number of monomers in the chain is quite weak. The Hα value ranges from 0.50 to 0.55 for (Gly)m and from 0.49 to 0.58 for (Ala)m α-helical peptides. Variation in this range shows that the correlation between the members of the series is either non-existent (white noise, H = 0.5) or positive (H > 0.5); i.e., the growth/decline or decline/growth trend observed is likely to continue in the future [18, 24]. The transition from the α-helix to the β-structure is accompanied by an increase in the Hurst coefficient. The Hβ values for the spatial series based on β-structures vary in the range of 0.60 to 0.70 for (Gly)m and 0.66 to 0.78 for (Ala)m. Such values are characteristic of persistent processes (that is, the series analyzed possess memory and the trends observed in the series are likely to persist in the future).

Table 2.   Hurst coefficients (H) and standard errors (Δ) calculated by DFA (HDFA) and R/S (HR/S) methods, their average values (Hα, Hβ) and standard deviations (±s) for two conformations (α-helix and β-structure) of peptides composed of m glycine or alanine residues

A similar result was obtained in a study of 25 spatial series derived from histograms of interatomic distances in four series of homologous organic compounds [25]. A positive correlation (H > 0.5) between the members of these series was established, and the value of the Hurst coefficient did not depend on the length of the homologous series.

The effect of the monomer type, chain length, and conformation on the Hurst coefficient values in peptides composed of Gly and Ala was considered above. Actually, peptides consist of different amino acids. Therefore, we investigated the behavior of model peptides with glycine and alanine residues combined in a random manner (Table 3). Firstly, it can be noted that a positive correlation—in other words, a tendency to preserve the trend (H > 0.5)—was observed in all the spatial series analyzed. Moreover, a tendency toward an increase of the H value with an increase in the number of monomers was observed, in contrast to the results for the (Gly)m and (Ala)m peptides. Thus, Hα increased from 0.61 to 0.75 as the number of monomers m in an α-helical peptide increased from 10 to 40, and the Hβ values in the case of β-structures ranged from 0.80 to 0.95. The Hurst coefficients for the (Gly)m and (Ala)m peptides ranged from 0.49 to 0.51 under these conditions (Table 2). Secondly, the transition from the α-helix to the β-structure led to an increase in the value of the Hurst coefficient, similarly to that observed in the case of the (Gly)m and (Ala)m peptides.

Table 3.   Hurst coefficients (H) and standard errors (Δ) calculated by DFA (HDFA) and R/S (HR/S) methods, their average values (Hα, Hβ) and standard deviations (±s) for two conformations (α-helix and β-structure) of peptides of m monomers based on random glycine and alanine combinations.

Note that the data we obtained point to the persistent behavior of the spatial series derived from the analysis of model peptides and show good agreement with the published data. For instance, the H value for the data series based on the B factors of the main protein chain in 14 lysozyme complexes ranged from 0.637 to 1.000 [26]. The correlated and non-random character of the amino acid sequences investigated was demonstrated in a study of spatial series of 32 different proteins [17].

Analysis of the relationship between the structure and properties of a compound is one of the fundamental objectives of chemistry. Description of the structure of compounds by a set of quantitative values (descriptors) that characterize the properties of a substance at different atomic and molecular levels becomes necessary when the quantitative structure–property (–activity) relationships (QSPR/QSAR) are used to address the connections of the structure and function. The values of the Hurst coefficient calculated in this work are indicative of the presence or absence of memory in the spatial series of chemical compounds, and we can regard these values as the quantitative characteristics of a new molecular property. The correlation coefficients (r) between the values of H and 3224 molecular descriptors calculated in the DRAGON software [27] were determined to characterize the significance of this parameter and reveal its connections to the other parameters of the chemical compounds. These descriptors characterize the compounds in sufficient detail, since they reflect various physicochemical, electronic, topological, spatial, and other properties of molecules. Non-informative descriptors, which were constant in over 95% of all cases, were excluded from further analysis. As a result, 1307 descriptors were left. The histogram of correlation coefficients calculated in increments of 0.05 is shown in Fig. 5. The minimum, maximum, and mean values of r were 0.002, 0.785, and 0.310, respectively.

Fig. 5.
figure 5

Histogram of the distribution of correlation coefficients between the H values and the 1307 descriptors.

A simple scheme for descriptor analysis proposed earlier [28] is based on the study of pairwise correlation coefficients [29]. According to this scheme, descriptors with r ≥ 0.99; 0.99 > r ≥ 0.80; 0.80 > r ≥ 0.50 and 0.50 > r are fundamental, important, probable, and specific, respectively. Application of these criteria to the data in Fig. 5 showed that 976 of 1307 descriptors had r < 0.5, and the r value for descriptor 331 was in the range of 0.80 > r ≥ 0.50. Thus, the share of specificity in the H values is 100 × 976/1307 = 74.7%, and the remaining share of the probability is 100 – 74.7 = 25.3%; in other words, the Hurst coefficient can be considered a very specific descriptor. It should be noted that the highest correlation coefficients (r = 0.785–0.757) correspond to the topological indices of the full information content of the 5th, 3rd, and 4th orders (TIC5, TIC3, and TIC4). The computation of these (and similar) content indices is based on the well-known Shannon formula [30]:

$$I = - \Sigma ({{p}_{i}}{{\log }_{2}}{{p}_{i}}),\,\,\,\,{{p}_{i}} = {{{{n}_{i}}} \mathord{\left/ {\vphantom {{{{n}_{i}}} n}} \right. \kern-0em} n},$$

where I is information, ni is the number of elements in subset i, and n is the total number of all elements of the system equal to n = Σni, where pi is the probability of event i. This formula can be used to assess the heterogeneity of any system, including molecular systems. Let us use it for the analysis of two types of polymers (peptides). To be specific, let us fix the total number of monomers (n) at 50. Let one of the polymers consist of monomers of one type (i = 1, n1 = 50) and let the other be an equal mixture of monomers of two types (i = 2, n1 = 25, n2 = 25). A simple calculation shows that I = 0 in the first case and I = 1 in the second case; i.e., the amount of information is changing (increasing). This result can explain, to some extent, why peptides composed of random combinations of glycine and alanine residues have higher H values than the peptides composed of glycine or alanine only.

CONCLUSIONS

The study demonstrated the similar behavior of spatial series based on interatomic distance histograms of glycine or alanine. The Hurst coefficient changed only slightly when the number of monomers in the model (Gly)m and (Ala)m peptides varied from 5 to 50. The coefficient values (H > 0.5) point to the existence of a positive correlation between the members of the series. The transition from the α-helical conformation to the β-structure leads to an increase in H. An increase in the number of monomers in peptides produced by a random combination of Gly and Ala residues is generally accompanied by an increase in the Hurst coefficient. The H values for the β-structure are higher than those for the α-helix. Persistent behavior, that is, the presence of long-term memory, can be postulated for the majority of spatial series studied.