A new definition and properties of the similarity value between two protein structures

Saberi Fathi, S. M.

doi:10.1007/s10867-016-9429-0

A new definition and properties of the similarity value between two protein structures

Original Paper
Published: 13 September 2016

Volume 42, pages 621–636, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Biological Physics Aims and scope Submit manuscript

A new definition and properties of the similarity value between two protein structures

Download PDF

S. M. Saberi Fathi¹

319 Accesses
3 Citations
Explore all metrics

Abstract

Knowledge regarding the 3D structure of a protein provides useful information about the protein’s functional properties. Particularly, structural similarity between proteins can be used as a good predictor of functional similarity. One method that uses the 3D geometrical structure of proteins in order to compare them is the similarity value (SV). In this paper, we introduce a new definition of the SV measure for comparing two proteins. To this end, we consider the mass of the protein’s atoms and concentrate on the number of protein’s atoms to be compared. This defines a new measure, called the weighted similarity value (WSV), adding physical properties to geometrical properties. We also show that our results are in good agreement with the results obtained by TM-SCORE and DALILITE. WSV can be of use in protein classification and in drug discovery.

Rényi’s divergence as a chemical similarity criterion

Article 22 November 2021

Theoretical and Computational Aspects of Protein Structural Alignment

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In a quantitative manner, comparing two protein tertiary structures to evaluate their similarity is a major challenge. A successful comparison can provide answers to some important questions in structural biology, cell biology, and biochemistry [1]. In particular, it is believed that functional similarity can be predicted from the structural similarity between proteins. The 3D structure of a protein is obtained by various experimental methods such as X-ray or electron crystallography and sometimes NMR [2]. If there is no crystallographic structure of a protein, computational structure prediction methods exist that use sequence similarity. In sequence similarity, a technique called homology modeling is used based on the structure of a known protein as a template to predict the structure of an unknown protein [3]. If structural information of the protein exists, there are methods that have been developed to compare the structures [4–13].

Examples of methods based on numerical techniques to predict structural information are: SCOP (Structural Classification of Proteins) [9, 10], CATH (Class, Architecture, Topology and Homologous superfamily) [13], TM-SCORE [14, 15], STRUCTAL software [16], FSSP (Families of Structurally Similar Proteins) [17] and DALILITE [18].

Recently, in Ref. [19], the similarity value (SV), as a geometrical-based structural property, was introduced as a new protein similarity measure. The SV is an alternative measure to the root-mean-square deviation (RMSD). ‘SV’ is defined as a normalized RMSD of the protein distances in reciprocal space so that the protein’s atomic coordinates are mapped into the corresponding Fourier space. There are theorems in mathematics that allow us to perform this task by using Wigner-D functions and then by using crystallography concepts to arrive at the structure factor [19]. The advantage of defining SV in the reciprocal space is that it solves the known problem of different sizes of two compared proteins. Thus, there is no need to use partial or local similarity tests. An example of using a method that relies on the partial RMSD for computing the similarity value is STRUCTAL software [16]. SV is also sensitive to protein topology (for a brief explanation regarding the differences between SV and other methods see [19]).

In this paper, we propose an improved SV definition, called the ‘weighted similarity value’ (WSV) in order to add some important physical properties required to adequately compare any two proteins. We define WSV by adding a lower limit on the reciprocal space dimension for the two proteins that are being compared. This constraint ensures that we do not lose any information in mapping from the protein’s spatial space to the reciprocal space. We also consider the masses of the atoms as a physical property in the protein shape function. The importance of adding mass to the shape function comes from the structure factors in X-ray scattering data [20]. Thus, adding the atomic masses to the shape function provides more reliable computed structure factors as we show later in this paper.

We compare the results regarding protein similarity obtained by WSV with NRMSD, DALILITE, and TM-SCORE, and we show that our results are in good agreement with these methods. DALILITE is a multiple alignment method, which is based on the alignment of the amino acid sequences and the secondary structure states (helix, sheet, coil) of the two proteins being compared [18]. Since DALILITE is a multiple alignment method, the results given by DALILITE have multi-valued z-scores and corresponding similarity values between two proteins [18].

The template modeling score (TM-SCORE) is a global fold similarity measure between two protein structures with different tertiary structures and it is independent of proteins sizes. The TM-SCORE is a normalized measure and has a value in the [0,1] range; when it is equal to 1, the two proteins are similar [21].

2 Methods

RMSD is defined as a dissimilarity parameter between two proteins as follows:

$$ RMS{D}^2=\frac{2}{N\left(N-1\right)}{\displaystyle \sum_{i<j}^N}{\displaystyle \sum_{j=2}^N}{\left({d}_{i\boldsymbol{j}}-{d}_{ij}^{\mathit{\hbox{'}}}\right)}^2=\frac{2}{N\left(N-1\right)}{\displaystyle \sum_{i<j}^N}{\displaystyle \sum_{j=2}^N}\left({d}_{ij}^2+{d}_{ij}^{\mathit{\hbox{'}}2}-2\ {d}_{ij}{d}_{ij}^{\mathit{\hbox{'}}}\right) $$

(1)

where N is the number of proteins’ atoms and d _ij is defined as the elements of the distance matrix between the atoms’ positions of a given protein, as is the case for d ^'_ij . Here, we assumed that the two proteins in question have the same number of atoms. If the numbers of atoms of the two proteins are not equal, we should use a partial RMSD definition. RMSD is a semi-bounded parameter (between zero and infinity). We now define ‘normalized RMSD’ (NRMSD) as a bounded similarity parameter between two proteins. First, we introduce the following auxiliary parameter:

$$ {D}^2=N\left(N-1\right)\times RMS{D}^2=2{\displaystyle \sum_{i<j}^N}{\displaystyle \sum^{\boldsymbol{N}}}\left({d}_{ij}^2+{d}_{ij}^{\mathit{\hbox{'}}2}-2\ {d}_{ij}{d}_{ij}^{\mathit{\hbox{'}}}\right) $$

(2)

and define:

$$ NRMSD = \frac{1}{2}\ \left(1 - \frac{D^2}{d_1^2+{d}_2^2}\right) $$

(3)

where d ² = 2∑ ^N_i <j ∑ ^N_j = 1 d ²_ij is the vector length (sum of the squares of arrays), as is the case for d ^' 2. If the two proteins are not correlated, we have D ² = d ² + d ^' 2 and NRMSD = 0. If we have a maximum correlation between these two proteins (two proteins are the same), i.e., D ² = 0, then, NRMSD = 1/2. In the next step we define WSV.

The SV was defined by using the Wigner-D function in conjunction with a series expansion of the protein’s shape functions [19]. The Wigner-D functions [22] describe the surface of a 4-sphere and they are an extension of spherical harmonic oscillators (SHO). The surface of a 4-sphere is a three-dimensional manifold, which can be explored by using a set of three angles, defined as Euler angles. On the other hand, Euler angles describe a motion in three-dimensional Euclidean space. Thus, we can project a three-dimensional Euclidean space onto the three-dimensional manifold (4-sphere surface). This means we project a body onto the surface of a 4-sphere. Adding atomic masses, M _atom to point coordinates gives gravitational attraction for a given projected point. Thus, we define the protein shape function as:

$$ f\left({\alpha}_i,{\beta}_j,{\gamma}_k\right)=\left\{\begin{array}{c}\hfill {M}_{\mathrm{atom}},\ \mathrm{if}\ \mathrm{there}\ \mathrm{is}\ \mathrm{an}\ \mathrm{atom}\ \mathrm{with}\ \mathrm{mass}\ {M}_{\mathrm{atom}}\hfill \\ {}\hfill 0,\ \mathrm{else}\ \mathrm{where}\hfill \end{array}\right. $$

(4)

where i, j, k = 1, 2, …, N (N is the number of protein’s atoms) and M _atom is the molar atomic mass in the atomic mass unit (in the definition of SV for all atoms we have M _atom = 1). Here, (α _i, β _j, γ _k) are three Euler angles corresponding to the position of this atom in the corresponding (x _i, y _j, z _k) PDB (Protein Data Bank) entry. We now expand a protein shape function in terms of the Wigner-D functions, D _lmn(α, β, γ), which span a basis set as follows:

$$ f\left(\alpha, \beta, \gamma \right)={\displaystyle \sum_{l=0}^{\infty }}{\displaystyle \sum_{m=-l}^l}{\displaystyle \sum_{n=-l}^l}{C}_{lmn}{D}_{lmn}\left(\alpha, \beta, \gamma \right) $$

(5)

where C _lmn s are the coefficients of the series expansion and they are unique for a given function, f(α, β, γ). Some theorems in mathematics allow us to use the coefficients of expansion of a function by the Wigner-D function as a three-dimensional Fourier transform of this function [23, 24]. Thus, in the above expansion, the C _lmn s corresponds to elements of the three-dimensional Fourier transform of f(α, β, γ). From crystallography considerations, it is readily recognized that these are the coefficients of the crystal shape function as a structure factor [25]. Thus, C _lmn s are the protein structure factors. Now, we can see why adding the masses of atoms is so important because in X-ray scattering the atomic masses play an important role in determining the corresponding structure factors [26]. C _lmn s can be obtained by the following relation:

$$ {C}_{lmn}=\frac{\left(2l+1\right)}{8{\pi}^2}{\displaystyle \int }{\displaystyle \int }{\displaystyle \int }f\left(\alpha, \beta, \gamma \right)\ {D_{lmn}}^{*}\left(\alpha, \beta, \gamma \right)\ \sin \beta\ d\beta\ d\alpha\ d\gamma $$

(6)

where we have used the orthogonality relation between the Wigner-D functions as follows:

$$ {\displaystyle \int }{\displaystyle \int }{\displaystyle \int }{D_{l^{\mathit{\hbox{'}}}{m}^{\mathit{\hbox{'}}}{n}^{\mathit{\hbox{'}}}}}^{*}\left(\alpha, \beta, \gamma \right)\ {D}_{lmn}\left(\alpha, \beta, \gamma \right)\ \sin \beta\ d\beta\ d\alpha\ d\gamma =\frac{8{\pi}^2}{\left(2l+1\right)}\ {\delta}_{l{l}^{\mathit{\hbox{'}}}}{\delta}_{m{m}^{\mathit{\hbox{'}}}}{\delta}_{n{n}^{\mathit{\hbox{'}}}} $$

(7)

Now, in the reciprocal space, the two shapes (proteins) are described with the same dimensions [19], however, they have different numbers of atoms. This is due to the use of Wigner-D functions. The dimension of reciprocal space, N _R, is given by:

$$ {N}_R={\displaystyle \sum_{l=0}^{L_{max}}}{\left(2l+1\right)}^2 = \frac{1}{3}\ \left({L}_{max}+1\right)\left(2{L}_{max}+1\right)\left(2{L}_{max}+3\right) $$

(8)

where L _max is an arbitrary maximum value chosen in the computation of C _lmn.

The coefficients C _lmn s belong to the complex space and we can embed them in the (N _R × 2) -dimensional Euclidean space such that S≡(Real(C _lmn), Imaginary(C _lmn)) where S≡{S _ij}, (i = 1, 2, ⋯, N _R and j = 1, 2) is a matrix of structure factors. In this step, we can define an (N _R × N _R) -distance matrix for S and then, we define the SD parameter between two proteins as follows:

$$ S{D}^2=2{\displaystyle \sum_{i<j}^{N_R}}{\displaystyle \sum_{j=2}^{N_R}}{\left(s{d}_{ij}-s{d}_{ij}^{\mathit{\hbox{'}}}\right)}^2=2{\displaystyle \sum_{i<j}^{N_R}}{\displaystyle \sum_{j=2}^{N_R}}\left(s{d}_{ij}^2+s{d}_{ij}^{\mathit{\hbox{'}}2}-2\ s{d}_{ij}s{d}_{ij}^{\mathit{\hbox{'}}}\right) $$

(9)

where sd _ij and sd ^'_ij are the elements of the distance matrix in the reciprocal space of each of the two proteins that is defined by:

$$ s{d}^2=\left(\begin{array}{cc}\hfill {S}_{11}\hfill & \hfill {S}_{12}\hfill \\ {}\hfill \vdots \hfill & \hfill \vdots \hfill \\ {}\hfill {S}_{N_R1}\hfill & \hfill {S}_{N_R2}\hfill \end{array}\right)\left(\begin{array}{ccc}\hfill {S}_{11}\hfill & \hfill \cdots \hfill & \hfill {S}_{N_R1}\hfill \\ {}\hfill {S}_{12}\hfill & \hfill \cdots \hfill & \hfill {S}_{N_R2}\hfill \end{array}\right)=\left(\begin{array}{ccc}\hfill {S}_{11}^2+{S}_{12}^2\hfill & \hfill \cdots \hfill & \hfill {S}_{11}{S}_{N_R1}+{S}_{12}{S}_{N_R2}\hfill \\ {}\hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill \\ {}\hfill {S}_{11}{S}_{N_R1}+{S}_{12}{S}_{N_R2}\hfill & \hfill \cdots \hfill & \hfill {S}_{N_R1}^2+{S}_{N_R2}^2\hfill \end{array}\right) $$

(10)

Here, we add a constraint on the definition of WSV by always making sure that N _R ≥ max(N ₁, N ₂) where N ₁ and N ₂ are the numbers of atoms of the two compared proteins. Then, we define $ {L}_{max}=\left\lfloor {N}_R^{\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$3$}\right.}-1\right\rfloor $ where ⌊.⌋ indicates the integer part of the number in the brackets. Now, we introduce a direct measure to characterize the similarity between two proteins, which depends on their geometries and physical properties (masses and positions of their atoms). Thus we define the weighted similarity values, WSV, as:

$$ WSV = \frac{1}{2}\left(1-\frac{S{D}^2}{s{d}^2+s{d}^{\mathit{\hbox{'}}2}}\right) $$

(11)

where $ s{d}^2=2{\sum}_{i<j}^{N_R}{\sum}_{j=1}^{N_R}c{d}_{ij}^2 $, is the vector length (sum of the squares of arrays), as is the case for sd ^' 2. If the two proteins are not correlated, we have SD ² = sd ² + sd ^' 2 and then, WSV = 0. If we have a maximum correlation between these two proteins (two proteins are the same), i.e., SD ² = 0, then, WSV = 1/2.

The range of atomic masses for the proteins is given in the following. The heaviest atom’s weight in a protein can be a sulfur atom, with a mass about 32.065 a.m.u. and the lightest atom mass is for hydrogen with a mass about 1.00794 a.m.u. We have also considered the atomic mass of some metal atoms in the liganded proteins.

We also wish to compare WSV with the other measures of protein similarity, namely (NRMSD, TM-SCORE, and DALILITE). We use these methods separately as targets and observe that WSV predicts the similarity or dissimilarity in close agreement with their predictions. To analyze it in this way, we compute ‘sensitivity’ (or the probability of prediction similarity between two proteins), ‘specificity’ (or the probability of prediction dissimilarity between two proteins), ‘accuracy’ (probability that the WSV measure is true or what it is supposed to measure), ‘precision’ (probability that if a test is repeated, it gives the same result), and ‘F-score’ (probability of giving a positive (similarity) prediction, or performance of higher sensitivity) [27] as explained below.

To compute sensitivity, specificity, etc., we first normalize TM-SCORE and DALILITE (referred to as M-score) to 0.5. Thus, when M = 0.5, the two proteins are completely similar and when it is 0, the two proteins are completely dissimilar. Then, we assume that a measure that predicts similarity between two proteins does so with any value greater than 0.25^{Footnote 1} and dissimilar proteins with a value less than 0.25. We have a true positive (TP) result when both measures predict similarity, true negative (TN) when both methods predict dissimilarity, false positive (FP) when WSV predicts similarity, and M-score predicts dissimilarity and false negative (FN) when WSV predicts dissimilarity and M-score predicts similarity. The definitions of sensitivity, specificity, etc., are given in Table 4.

We also compare WSV with the other scores by introducing a relative difference between WSV and M-score as:

$$ dif=\frac{\left|WSV-M\right|}{\left(WSV+M\right)} $$

(12)

When dif = 0, the WSV and M-score have the same prediction values and when dif = 1, this means the WSV and M-score have totally different prediction values. In other words, one predicts that the two proteins are similar and the other predicts that they are totally dissimilar.

3 Results and discussion

In this paper, we defined WSV as a development of SV by including some physical properties of proteins in its definition and a constraint on the dimension of the reciprocal space. In Tables 1 and 2, we show a comparison of the WSV with SV [19], RMSD, NRMSD, TM-SCORE [14, 15] and DALILITE [18] values for 48 and 86 datasets, respectively, where both liganded and unliganded proteins are listed in the supplementary material of Li et al. [1] (these sets are reported in http://dragon.bio.purdue.edu/visgrid_suppl). We reported only minimum and maximum similarity values between two proteins predicted by DALILITE. The data acquisition for the TM-SCORE [14, 15] was obtained by the Zhang Lab’s server http://zhanglab.ccmb.med.umich.edu/TM-SCORE/ for 48 and 86 datasets (only 84 data of the 86 dataset and 47 data of the 48 dataset were used; because there are no TM-SCORE values) and for DALILITE [18] it was obtained by the Holm’s Lab’s server: http://ekhidna.biocenter.helsinki.fi/dali_lite/start) for 48 and 86 datasets (only 85 data of the 86 dataset and 47 data of the 48 dataset were used because there are no DALILITE values).

Table 1 A set of 48 protein structures with WSV, SV [19], and RMSD from Li et al. [1], NRMSD, TM-SCORE [14, 15], and DALILITE [18]

Full size table

Table 2 A set of 86 protein structures with WSV, SV [8], and RMSD from Li et al. [1], NRMSD, TM-SCORE [16, 17], and DALILITE [18]

Full size table

A way to see how the mass of atoms and restriction on the space dimension perform the similarity criterion between two proteins is to compute the WSV and SV correlation with RMSD. The correlation between WSV and RMSD for the 48 dataset is 0.45 and for the 86 dataset is 0.55, which are better than the correlation between SV and RMSD for these two datasets, i.e., 0.32 and 0.36, respectively. In Ref. [19], a complete discussion is given to explain why we do not expect to see a high correlation between RMSD and SV (WSV). This is why we defined NRMSD for comparison with WSV. NRMSD is a bounded parameter that removes the inconvenience of semi-bounded RMSD. Also, both the parameters WSV and NRMSD are similarity criteria.

Figures 1 and 2 show the histogram of dif between WSV and NRMSD for the 48 and 86 datasets. We see that 60% of WSV and NRMSD prediction for the 86 dataset have less than 10% differences and 70% of their prediction values have less than 20% differences. The results for the 48 dataset also show a 60% agreement between WSV and NRMSD with less than 10% differences and 67% for 20% difference of prediction values. The disagreement between WSV and NRMSD by a 80% difference of prediction values for the 48 dataset is equal to 4.5% and for the 86 dataset is equal to 5.8% (a summary of these results is given in Table 3). Figures 3, 4, 5, and 6 show the histogram of dif computed between WSV and TM-SCORE and also DALILITE. These figures and also Table 3 show good agreement between WSV and these methods.

Table 3 Differences between WSV and the other methods by using dif

Full size table

Table 4 shows the sensitivity, specificity, accuracy, and precision of WSV compared with the other scores (NRMSD, DALILITE, TM-SCORE) as targets. In summary, as Table 4 shows, comparing WSV with NRMSD, the ‘sensitivity’ or the probability that two proteins are determined to be similar by WSV is about 85.7% (80.0%) for the 86(48) dataset and the ‘specificity’ or the probability that the two proteins are determined to be dissimilar by WSV is equal to 62.5% (37.5%). The accuracy of the method (WSV) for the 86(48) dataset is 81.4% (72.9%) and the precision is 90.9% (86.5%), which indicates that both measures show good agreement between WSV and NRMSD predictions. The F-score for the 86(48) dataset shows that the performance to give similarity prediction is about 88.2% (83.1%), which is an expected result because the two datasets for the proteins examined here are closely similar. These results show that WSV could be a good alternative parameter for RMSD (or NRMSD); it does not involve the protein size issue and provides a normalized similarity criterion between any two proteins. The results of the comparison between WSV and TM-SCORE [14, 15] and DALILITE [18], in the same manner as for WSV and NRMSD and reported in Table 4, show that a good agreement exists between WSV and these methods’ predictions and also good precision of WSV.

Table 4 The computation of sensitivity, specificity, accuracy, precision, and F-score for the 48 and 86 datasets

Full size table

All of the above results show that WSV appears to be a reliable alternative parameter for RMSD (or NRMSD). WSV is a geometrical criterion while it also includes physical properties. Moreover, it does not suffer from the protein size problem and it provides a similarity criterion between two proteins as well as other criteria.

For computing WSV, we used an i7 laptop with 8 GB RAM. The time required to complete this computation depends on the proteins’ sizes and on average it takes about 3 min (for small proteins it takes about 1 min and for large proteins the computation takes about 6 min). In Tables 1 and 2, we also show the L _max used for each pair of proteins.

4 Conclusions

In this paper, we introduced WSV, which displays two major differences compared to SV. First, we weighted the shape function by atomic masses, which stresses the importance of the individual atoms in the computation. Second, we extended the dimensions of the reciprocal space at least up to the largest compared proteins’ sizes (measured by the number of atoms). This condition ensures that we do not lose any information about the proteins when we map them onto the reciprocal space. As discussed in the Results and discussion section, these two changes in SV improve the correlation between WSV with RMSD relative to SV. We compared WSV with NRMSD, TM-SCORE, and DALILITE by using statistical concepts such as sensitivity, specificity, etc. The results show good accuracy and precision for WSV. Also, we computed a relative difference (dif) between WSV and other methods, which also shows good agreement between WSV predictions and other scores. Our results confirm the reliability and usefulness of our method and show that WSV can be used alternatively with RMSD in helping to find protein similarity in various areas of protein science and in drug discovery.

WSV is now defined as a geometrical structural score. To develop this work in the future, it is suggested to define a score on both the WSV- and domain-based structural methods. Also, we wish to emphasize that WSV is a geometric-based method, sensitive to the protein's atoms positions and their masses. Thus, if one of these parameters changes, WSV will also change. Apparently, for two structurally similar proteins with dissimilar sequences, WSV does not give structural homologues as a result. This hypothesis will be examined in our future research and if it is indeed verified, this could present an advantage of WSV relative to SV or RMSD.

Notes

When the TM-SCORE is less than 0.2 it corresponds to randomly chosen unrelated proteins whereas with a score higher than 0.5 we generally assume the same fold in SCOP/CATH [21]. Here we normalized the TM-SCORE to 0.5 (i.e., we divided it by 2).

References

Li, B., Turuvekere, S., Agrawal, M., La, D., Ramani, K.,Kihara, D.: Characterization of local geometry of protein surfaces with the visibility criterion. Proteins 71, 670 (2008)
Rupp, B., Wang, J.: Predictive models for protein crystallization. Methods 34, 390 (2004)
Article Google Scholar
Arnold, K., Bordoli, L., Kopp, J., Schwedem, T.: The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinf. (Oxford, England) 22, 195–201 (2006)
Article Google Scholar
Kolodn, R., Petrey, D., Honig, B.: Protein structure comparison: implications for the nature of “fold space”, and structure and function prediction. Curr. Opin. Struct. Biol. 16, 393–398 (2006)
Article Google Scholar
Carugo, O.: Recent progress in measuring structural similarity between proteins. Curr. Protein Peptide Sci. 8, 219–241 (2007)
Article Google Scholar
Zhang, Y.: I-TASSER server for protein 3D structure prediction. BMC Bioinf. 9, 101186 (2008)
Google Scholar
Betancourt, M.R., Skolnick, J.: Universal similarity measure for comparing protein structures. Biopolymers 59, 305–309 (2001)
Article Google Scholar
Kihara, D., Sael, L., Chikhi, R., Esquivel-Rodriguez, J.: Molecular surface representation using 3D Zernike descriptors for protein shape comparison and docking. Curr. Protein Pept. Sci. 12(6), 520–530 (2011)
Article Google Scholar
Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247(4), 536–540 (1995)
Google Scholar
Andreeva, A., Howorth, D., Chandonia, J.-M., Brenner, S.E., Hubbard, T.J.P., Chothia, C., Murzin, A.G.: Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res. 36, D419–425 (2008).
Maiorov, V.N., Crippen, G.M.: Significance of root-mean-square deviation in comparing three-dimensional structures of globular proteins. J. Mol. Biol. 235(2), 625–634 (1994)
Article Google Scholar
Carugo, O., Pongor, S.: A normalized root-mean-square distance for comparing protein three-dimensional structures. Protein Sci.: Publ. Protein Soc. 10(7), 1470–1473 (2001)
Article Google Scholar
Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., Thornton, J.M.: CATH—a hierarchic classification of protein domain structures. Structure (London, England: 1993) 5, 1093–1108 (1997)
Zhang, Y., Skolnick, J.: Scoring function for automated assessment of protein structure template quality. Proteins 57, 702–710 (2004)
Article Google Scholar
Xu, J., Zhang, Y.: How significant is a protein structure similarity with TM-SCORE = 0.5? Bioinformatics 26, 889–895 (2010)
Article Google Scholar
Levitt, M., Gerstein, M.: STRUCTAL. A structural alignment program. Stanford University (2005). Available from: http://csb.stanford.edu/levitt/Structal.
Holm, L., Ouzounis, C., Sander, C., Tuparev, G., Vriend, G.: A database of protein structure families with common folding motifs. Protein Sci. 1, 1691–1698 (1992)
Article Google Scholar
Hasegawa, H., Holm, L.: Advances and pitfalls of protein structural alignment. Curr. Opin. Struct. Biol. 19, 341–348 (2009)
Article Google Scholar
Saberi Fathi, S.M., White, D.T., Tuszynski, J.A.: Geometrical comparison of two protein structures using Wigner-D functions. Proteins 82, 2756–2769 (2014). doi:10.1002/prot.24640
Article Google Scholar
Daniel, K.P., et al.: Reconstruction of SAXS Profiles from Protein Structures. Comput. Struct. Biotechnol. J. 8(11), e201308006 (2013). doi:10.5936/csbj.201308006
MathSciNet Google Scholar
Zhang, Y., Skolnick, J.: TM-align: a protein structure alignment algorithm based on the TM-SCORE. Nucleic Acids Res. 33(7), 2302–2309 (2005). doi:10.1093/nar/gki524
Article Google Scholar
Wigner, E.P.: Gruppentheorie und ihre Anwendungen auf die Quantenmechanik der Atomspektren. Vieweg Verlag, Braunschweig (1931)
Book MATH Google Scholar
Potts, D., Prestin, J., Vollrath, A.: A fast algorithm for nonequispaced Fourier transforms on the rotation group. Numer. Algorithms 52, 355–384 (2009)
Article ADS MathSciNet MATH Google Scholar
Hielscher, R., Potts, D., Prestin, J., Schaeben, H., Schmalz, M.: The Radon transform on SO(3): a Fourier slice theorem and numerical inversion. Inverse Prob. 24, 025011 (2008)
Article ADS MathSciNet MATH Google Scholar
Lipson, H., Taylor, C.A.: Fourier Transforms and X-ray Diffraction. Bell, London (1958)
McKie, D., McKie, C., Essentials of Crystallography, Blackwell Scientific Publications, (1992). ISBN 0-632-01574-8.
Sokolova, M., Japkowicz, N., Szpakowicz, S.: Beyond Accuracy, F-score and ROC: a Family of Discriminant Measures for Performance Evaluation, American Association for Artificial Intelligence, (2006). https://www.aaai.org/Papers/Workshops/2006/WS-06-06/WS06-06-006.pdf.

Download references

Acknowledgments

I thank Dr. Jack A. Tuszynski (University of Alberta) for his helpful comments.

Author information

Authors and Affiliations

Department of Physics, Ferdowsi University of Mashhad, Mashhad, 9177948974, Iran
S. M. Saberi Fathi

Authors

S. M. Saberi Fathi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. M. Saberi Fathi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saberi Fathi, S.M. A new definition and properties of the similarity value between two protein structures. J Biol Phys 42, 621–636 (2016). https://doi.org/10.1007/s10867-016-9429-0

Download citation

Received: 15 December 2015
Accepted: 12 August 2016
Published: 13 September 2016
Issue Date: October 2016
DOI: https://doi.org/10.1007/s10867-016-9429-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A new definition and properties of the similarity value between two protein structures

Abstract

Similar content being viewed by others

Rényi’s divergence as a chemical similarity criterion

Theoretical and Computational Aspects of Protein Structural Alignment

Theoretical and Computational Aspects of Protein Structural Alignment

1 Introduction

2 Methods

3 Results and discussion

4 Conclusions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A new definition and properties of the similarity value between two protein structures

Abstract

Similar content being viewed by others

Rényi’s divergence as a chemical similarity criterion

Theoretical and Computational Aspects of Protein Structural Alignment

Theoretical and Computational Aspects of Protein Structural Alignment

1 Introduction

2 Methods

3 Results and discussion

4 Conclusions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation