Introduction

Bioinorganic or biological inorganic chemistry is the discipline dealing with the interaction between inorganic substances and molecules of biological interest [13]. It is a wide scientific field that addresses the role, uptake, and fate of elements essential for life, the response of living organisms to toxic inorganic substances, the function of metal-based drugs, the synthetic production of functional models, and so on. The interaction between metal ions or metal-containing cofactors and biological macromolecules can be studied in atomic detail through 3D structural studies, thus providing a connection between bioinorganic chemistry and structural biology [4].

Metal ions are bound to biological macromolecules via coordination bonds. The bonds are made by so-called donor atoms that can belong to either the polymer (protein or nucleic acid) backbone or side chains/bases. Additional donor atoms may belong to nonmacromolecular ligands, such as oligopeptides, small organic molecules, anions, and water molecules. The ensemble comprising a metal ion (or cluster of metal ions) together with its donor atoms defines the metal-binding site. Metal-binding sites are occasionally extended to include all of the atoms in the donor amino acid or nucleotide. Databases reporting on the geometric properties of metal-binding sites in proteins [5] or nucleic acids [6] are available. They are derived from the coordinate files deposited in the Protein Data Bank (PDB) [7]. Metal-binding sites have been shown to be useful for the bioinformatic analysis of metal-binding proteins (metalloproteins) and, in particular, for the prediction of metalloproteins from genome sequences [810]. We have described how the inclusion of the surroundings of the metal-binding site in structure-based analyses strengthens the relationship of the sites with functional properties [11, 12]. This larger ensemble can be thought of as the minimal environment determining metal function, which in previous work we dubbed the “minimal functional site” (MFS). In practice, we defined an MFS in a metal–macromolecule adduct as the ensemble of atoms containing the metal ion or cofactor, all its ligands, and any other atom belonging to a chemical species within 5 Å from a ligand [11, 13] (Fig. S1). The MFS describes the local 3D environment around the cofactor, independently of the larger context of the protein fold in which it is embedded. The usefulness of the MFS concept outlined above has its chemicophysical foundation in the fact that the local environment of the metal has a determinant role in tuning its properties and thus its chemical reactivity [14, 15]. Instead, the macromolecular matrix is instrumental in determining, e.g., substrate selection [16] or partner recognition [17].

To make MFS analyses available to the scientific community, we developed two different resources: (1) MetalPDB [18], a database of all MFSs contained in the PDB, which is automatically updated, providing access to structural and functional information, including atomic coordinates, for each MFS in any metal-binding macromolecule of known 3D structure; (2) MetalS2 (Metal Sites Superposition) [12], a tool for the metal-centered superposition of MFS pairs, applicable to structures already in the PDB or to structural files belonging to the user. In the present work, we present MetalS3 (Metal Sites Similarity Search), a new tool that bridges the two aforementioned resources by allowing researchers to input the coordinates of one MFS and perform a systematic search of the entire MetalPDB database to identify structurally similar sites, regardless of overall fold similarity or protein homology. MetalS3 is based on the same conceptual approach of MetalS2, with some minor modifications. However, its implementation as a tool for a database search makes possible a completely different usage scenario, with a main focus on knowledge discovery through the unbiased exploration of the structural space of metal sites.

Methods

The MetalS3 algorithm

MetalS2 performs the superposition of two MFSs by performing the following steps [12]: (1) computing and overlapping the geometric centers of the metal atoms contained in each MFS; (2) systematically computing a set of initial configurations (poses), in each of which the geometric centers of the metals and two different pairs of donor atoms from the two sites are used to superimpose the MFSs (Fig. S2); (3) ranking all the poses on the basis of a specifically designed scoring function; (4) optimizing a subgroup of the poses (by default, those in the best 40 % of the entire score range) by allowing the geometric centers and the ligands to be displaced with respect to one another. The MetalS2 score consists of three terms that account, respectively, for the biochemical similarity of the amino acids put in correspondence (sequence similarity term), the ratio between the total length of the sequence alignment and the length of the smallest site (i.e., the fractional coverage of the smaller site) (fractional coverage term), and the number and length of consecutive sequence segments in the superposition (fragmentation term). Amino acid correspondences are established on the basis of Cα–Cα and Cβ–Cβ distances. In step 4 of the procedure, the root mean square deviation (RMSD) of the coordinates in the superposition is optimized and amino acid correspondences are reevaluated. Note that atoms from exogenous (i.e., nonprotein, non-nucleic acid) ligands are not included in the computation neither of the RMSD nor of the score. The reason for this is that, especially in the context of MetalS3, we want to identify and quantify similarities among the macromolecular components of the MFSs. Exogenous ligands contribute to the definition of each MFS geometry as well as to the calculation of the set of initial poses, which is based purely on geometrical considerations. Thereafter, and especially for the purpose of scoring the solutions, such ligands are no longer taken into account. This makes the final ranking dependent only on the similarities between the macromolecular structures, as desired, and avoids possible biases due to common arrangements of the ligands around the metal ion, e.g., as for chelators such as hydroxamic acid derivatives in zinc enzymes, which maintain a fixed geometry in most or all structures.

For the present work, we implemented a new Web interface, MetalS3, that allows a user to upload a metal-containing macromolecular structure (or select it from the MetalPDB database) in PDB format, select any MFS (automatically detected) contained in it, and systematically compare it against all MFSs in MetalPDB using the MetalS2 algorithm. A list of hits is returned by MetalS3, sorted by the corresponding score. We introduced some minor modifications to the MetalS2 procedure and scoring function described in the previous paragraph. In MetalS3, the fractional coverage term always refers to the input (query) MFS rather than to the smallest site of the pair being superposed. In addition, the optimization step is iterated as long as the superposition score keeps decreasing.

To reduce the computational effort, we imposed some limitations on the difference in the number of donor atoms between the query MFS and any MFS from MetalPDB, which are recapitulated by the following formula:

$$\left\{ \begin{gathered} a = \frac{N}{ 4},\;{\text{if,}}\;\frac{N}{ 4} > 2 , {\text{ else}}\; 2\hfill \\ b = 4N,\;{\text{if}}\; 4N < 2 0 ,\;{\text{else 20}} \hfill \\ \end{gathered} \right.$$
(1)

where a and b are, respectively, the smallest and largest number of donor atoms that an MFS from the database can have for it to be included in the search set and N is the number of donor atoms in the query. In practice, any MFS in MetalPDB with a number of donor atoms outside the [ab] range is excluded from the search. For example, a query MFS with four donor atoms will be compared only with MFSs from MetalPDB having between two and 16 donors. We believe that the application of the above-mentioned restriction does not reduce the usefulness of the results, as it seems reasonable to assume that any structural similarity between MFSs with a disparity in the number of donor atoms beyond the limits imposed by Eq. 1 does not have functional relevance.

Implementation of MetalS3

All back-end scripts are implemented in Python 2.6.6 (http://www.python.org/) on a Linux platform. The front end was implemented using Mako, a template library written in Python included by default with the Pylons Web application framework, JavaScript, and Cascading Style Sheets. By using the Python language, we could also exploit the following resources: SciPy 0.7.2, a library of scientific and numerical routines; NumPy 1.4.1, a language extension that adds support for large and fast, multidimensional arrays and matrices; and p3d [19], a Python module for structural bioinformatics. The MetalS3 server is currently hosted on a 24-CPU (AMD Opteron™ 6234) server.

The MetalS3 Web interface

The Web interface of MetalS3 allows the user to run queries against all representative MFSs of the equistructural MFS clusters defined in MetalPDB. Each of these MFSs represents a group of sites that are found in proteins with the same fold, as judged from sequence similarity and Pfam [20] domain assignments, and occur at the same spatial location within that fold. For example, a single representative MFS represents all the sites of rubredoxins from various organisms and with different metalation. MetalPDB currently contains 17,936 clusters of equistructural MFSs. As mentioned previously, the dataset of representative MFSs against which the query is actually compared is the subgroup of all 17,936 sites that satisfies Eq. 1. Thus, the size and the characteristics of the subgroup depend on the input query MFS, and particularly on the number of donor atoms it contains. In turn, this influences the overall calculation time.

After a calculation is finished, the user is presented with a list of hits having structural similarity to the query, ordered by the total MetalS3 score (the list can be resorted according to different parameters, such as individual score components). It is then possible to select a specific hit, i.e., a specific representative MFS, and run a refinement calculation in which the query is compared with each individual site in the corresponding equistructural MFS cluster. A link to the results of the search is e-mailed to the user at the end of each of these two stages.

Results

A brief description of the input and output interfaces of MetalS3 is available as electronic supplementary material (text and Figs. S3 and S4). We conducted various experiments to assess our implementation of MetalS3 with respect to its capability to identify relevant hits within the MetalPDB database as well as with respect to the typical times required to obtain the results of a calculation.

Because MetalS3 searches are initially performed only against representative MFSs and not the entire content of the MetalPDB database, it is important to assess whether this approach consistently returns relevant functional information. To do this, we used an example dataset of 100 different MFSs randomly picked from deposited PDB structures (Table S1). These examples, which differed in metal content as well as coordination number and geometry, were used as input queries to MetalS3. Crucially, the examples were selected in order to avoid including any representative MFS as defined in MetalPDB. In this way, we could straightforwardly classify the output of MetalS3 depending on whether the best-scoring hit corresponded to the representative of the cluster to which each query MFS was known to belong. In fact, even though the clustering procedure implemented in MetalPDB does not directly compare the structure of the different MFSs assigned to a cluster, in the large majority of cases the MFSs within a cluster should be similar to each other because the proteins in the cluster can be assumed to be homologous. In 75 % of cases, this was indeed observed. Notably, if we optimize all the poses, instead of a well-scoring subgroup, the above-mentioned result increases only to 76 %. We then analyzed manually the 25 cases for which the best hit identified by MetalS3 was not the representative MFS of the cluster to which the query belongs in the MetalPDB database. For 20 of them we observed that the result obtained depended on the clustering within MetalPDB being incomplete, i.e., failing to group together MFSs that indeed are bound to homologous proteins. In turn, this is due to missing Pfam assignments or, often, to a given protein superfamily being mapped to multiple Pfam domains [21]. Instead, in five cases MetalS3 identified a structural similarity between a pair of MFSs (the query and the returned hit) that was higher than that between the query and the representative MFS of its equistructural cluster in MetalPDB. These are cases where either highly similar MFSs are embedded in different folds (three) or the MFS representative does not adequately represent the cluster (two). The representative MFS of a cluster is chosen solely on the basis of the resolution (i.e., quality) of the corresponding 3D structure [18]. Consequently, the representative MFS cannot be regarded as a sort of “average” MFS, and there is no specific property regarding its structural similarity to the other MFSs in the cluster. A third option is that the assignment of the query MFS to the MetalPDB cluster, which was performed automatically, did not reflect the large structural variability of the MFSs within the cluster. This was not observed here. An additional consideration is that, because of the way the score is constructed, smaller query MFSs tend to be less discriminative and therefore may more easily provide high-scoring hits also to MFSs not closely related (but still structurally similar).

If one looks at the five best scoring hits, then in only ten cases from the 100 examples run was the MFS representative of the cluster of the query site not included. As already mentioned, in two instances we observed that the specific representative MFS did not reflect the “consensus” coordination geometry of its cluster. However, in most cases, the reason for the observed behavior was an incomplete clustering of the structures, in turn typically resulting from problems in the mapping of Pfam domains. This caused structures highly similar to the query not to be included in the same equistructural cluster.

The calculation times are dependent on the number of donor atoms (N) in the query MFS, as the number of poses that need to computed and compared scales with N(N − 1) [12] (Fig. 1). For a given number of atoms, calculations are faster the higher the number of donor atoms from exogenous ligands (such as small metal-binding molecules or ions) because these are not considered in amino acid matching and RMSD computations (see “Methods”). The calculation times are less than 2 h for sites with up to four protein donor atoms, whereas, owing to the parabolic increase of calculation times, they are as long as 10 h for sites with nine donor atoms (if all are from protein ligands) and within 24 h for multinuclear sites with 12 donor atoms from the protein moiety. Of all representative MFS sites collected in MetalPDB, 95.1 % have nine donor atoms or fewer. Under the assumption that MetalPDB adequately describes the diversity of MFSs occurring in nature, the data given above may suggest that users will most often submit queries that can be dealt with in 10 h or less. In any case, results are always sent to the users via e-mail, as even the simplest calculations require at least a few minutes.

Fig. 1
figure 1

Calculation times for MetalS3 queries as a function of the number and type of donor atoms. Dashed lines are the best fit to a second-order polynomial

Discussion

MetalS3 is a Web interface that allows the user to systematically compare an MFS of interest (query) with the contents of the MetalPDB database [18], i.e., with an ensemble representing the diversity of known MFSs. This is achieved through a suitably modified implementation of the MetalS2 algorithm [12]. Typically, the hits returned for a query will comprise sites that are contained within a protein homologous to the protein containing the query MFS as well as sites from unrelated proteins. The presence in the output page of one or the other type of hit, as well as their relative abundance, will depend on the cutoffs defined to exclude hits from the visualization (Fig. S3). The cutoffs can be adjusted also after the calculation has finished, through the “Filter Results” button on the output page (Fig. S4). Increasing the cutoff values will result in a longer list of hits being displayed.

Our test calculations show that the top position in the list of the hits is highly likely to be occupied by an MFS contained within a homolog of the query protein; when the top five hits are considered, this is verified for as many as 90 % of the examples that we run. According to the definition of the MetalPDB database, on which MetalS3 builds, this situation corresponds to the query and hit MFSs belonging to the same equistructural cluster. For MFSs in MetalPDB to be clustered, it is actually requested that the sites occupy the same position within the fold after the entire protein structure has been superimposed, and the structures of the MFSs belonging to a given cluster are not compared with one another. The approach of MetalS3 is entirely different, as it operates only on the MFSs, disregarding the rest of the protein structure. The very good correlation between the fold-based clustering results and the MetalS3 output points to the high similarity of the local 3D structure around the metal site being a possible indicator of metalloprotein homology. This is supported also by the fact that in 20 of the examples, MetalS3 indicated that the clustering within MetalPDB was incomplete. Incomplete clustering typically results from the homology relationship between metalloproteins bearing structurally similar MFSs being hidden by the fact that the Pfam domain assignments we use in the definition of equistructural clusters are fine-grained and may occasionally separate a single superfamily into multiple domain definitions. To address this issue, the user can verify if the Pfam domains of interest belong to the same Pfam clan [21]. One can possibly further speculate that if the MFS properties must be defined tightly to make possible the correct protein function [i.e., to correctly define the reactivity of the metal ion(s) in the MFS], then conservation of the 3D structure of the MFS will be particularly strict among homologous proteins. Consequently, the intracluster variability of the MFS structure may be informative on the requirements imposed by the catalysis on the MFS features or, in other words, on how the functional and mechanistic properties of the system are encoded in the structure.

A practical application of MetalS3 is to detect MFS structural similarities that are not associated with a homology relationship among the proteins harboring the MFSs (indicated by the MFS mapping to a shared Pfam domain or domain clan). These situations may be indicative of the occurrence of common functional properties that are endowed by the MFS itself. Such observations can provide useful hints for experimental work. In this usage scenario, the best hit returned by MetalS3 is often uninteresting (i.e., when it is bound to a protein with the same domain composition as the protein containing the query MFS), and one should focus on worse-scoring hits. Operatively, the domain composition of a hit MFS can be immediately obtained by looking up that MFS in the MetalPDB database [18]. Below, we briefly discuss some examples not included in the 100 test dataset.

As a first example, we took one of the two equivalent Fe3S4 clusters in the PDB structure of fumarate reductase from Wolinella succinogenes (PDB ID 1QLB [22]), which is identified as site 1qlb_4 in MetalPDB (hereafter, we will use the PDB code in lowercase letters followed by an underscore and a number to indicate a specific MFS within the MetalPDB database, whereas we will use the PDB code in uppercase letters to indicate the PDB entry). This site is located with a ferredoxin-type domain, and it is likely to be part of the electron transfer pathway. MetalS3 returns as the fifth hit, with a total score of 1.98, a site harboring an Fe3S4 cluster in the D subunit of the structure of the DNA-directed RNA polymerase from Sulfolobus solfataricus P2 (PDB ID 2PA8 [23]). Despite a sequence identity between these two MFSs of only 13 % over 15 amino acids, the superposition is good (RMSD 0.799 Å) (Fig. 2).

Fig. 2
figure 2

Output result page for a calculation performed using the 1qlb_4 site as the query. The inset shows the structural alignment to the fifth hit, 2pa8_1

The latter cluster, which is possibly an Fe4S4 cluster in vivo, is found in the corresponding subunits of the polymerases from various species of Archaea and Eukarya, but not of Bacteria [24]. The domain containing the MFS within subunit D is not present in all archaeal RNA polymerases, but it is actually characteristics of a specific evolutionary lineage of Archaea. Here we observed that the binding mode of the Fe3S4 cluster within subunit D of S. solfataricus P2 polymerase actually bears some similarity to an unrelated episilonproteobacterial system.

A second example is provided by the MFS containing the magnesium(II) ion identified as residue 9,018 (MetalPDB entry 1g0u_1) within the structure of the core particle of the yeast proteasome (PDB ID 1G0U [25]). This MFS is interfacial, as it contains protein ligands from subunits I and Y. MetalS3 returns hits also to sites containing metal ions other than magnesium. One of these is the MFS defined around the calcium(II) ion identified as residue 501 in the structure of human calcium and integrin binding protein 1 (PDB ID 1Y1A [26]), with a total score of 2.427 and, in particular, a sequence identity of 0 % (Fig. 3). This MFS is located within an EF-hand motif. Such a structural similarity would be extremely hard to identify by any other method, especially a sequence-based method. Magnesium(II) and calcium(II) are known to compete for binding in EF-hand sites [27]. The similarity between the two MFSs may thus underlie commonalities in the atomic mechanism by which the metal affinity is tuned.

Fig. 3
figure 3

Output result page for a calculation performed using the 1g0u_1 site as the query. The inset shows the structural alignment to the seventh hit, 1y1a_1

3ZFJ is a recently solved NMR structure of a PhtD domain from Streptococcus pneumoniae that binds a single zinc(II) ion [28]. At the time of writing, it is not yet included in the MetalPDB database and therefore simulates well the situation of a real user. MetalS3 identifies the 2CS7 structure [29] as the second best hit. In fact, both proteins contain the Pfam domain “Strep_his_triad,” and have 23 % sequence identity. This is a case where the next update of MetalPDB would put the two in the same equistructural cluster. The above-mentioned proteins have a role in the uptake of zinc(II), by scavenging zinc(II) ions and then providing them to the extracellular membrane-anchored AdcAII transporter at the surface of S. pneumoniae. The first hit is an iron-binding MFS from Escherichia coli galactose 1-phosphate uridylyltransferase (structure 1GUP [30]). This iron ion plays a structural role and is not essential to the enzyme activity [31]. It is useful to compare the hits returned by using either 3ZFJ or 2CS7 as queries. Among the shared top-scoring zinc proteins, one finds an MFS from structure 4HHJ [32], identified by the zinc ion with residue number 1,001. This ion has been proposed to have a structural and/or regulatory role for the activity of this RNA-dependent RNA polymerase [33]. Another common hit is from PDB entry 2E26 [34], identified by the zinc ion with residue number 603, which describes the structure of mouse reelin, a secreted glycoprotein. This ion is observed in the structures of both reelin alone and reelin in complex with apolipoprotein E receptor 2 [35], where it has fractional occupancy. Finally, MetalS3 identifies the zinc-containing MFS of the ZinT protein (PDB ID 1TXL; S. Eswaramoorthy and S. Swaminathan, unpublished) as a further hit to the MFS in 2CS7; the MFSs of 3ZFJ and 1TXL also display good structural similarity (Fig. 4). ZinT is a periplasmic zinc transporter that facilitates metal recruitment during zinc shortage by binding zinc(II) with high affinity and subsequently transferring it to the ZnuA component of the ZnuABC membrane transporter [36, 37]. Intriguingly, in the zinc(II)-specific ABC uptake system AdcABC of S. pneumoniae, the AdcA protein, which does not interact with PhtD domains (see above), is a fusion between a ZnuA-like protein and a ZinT-like protein [38]. In summary, the present MetalS3 analysis identified a minimal zinc-binding structure as being associated with reversible metal ion binding in zinc(II) transport, where different protein systems for zinc(II) uptake contain structurally similar MFSs, and in (hypothesized) zinc(II)-dependent regulation of intermolecular interactions.

Fig. 4
figure 4

Selected high-scoring zinc sites among the search results for a zinc-containing minimal functional site (MFS) from 3ZFJ. The 3ZFJ query structure is always in blue and in the same orientation. The superpositions to the sites a 2cs7_1, b 4hhj_1, c 2e26_5, and d 1txl_1 are displayed. Only protein ligands are shown; ZN(603) in 2E26 is additionally coordinated by two water molecules; ZN(216) in 1TXL is additionally coordinated by a water molecule

An additional example is provided by the 4NAO structure, a homodimer that contains a single iron(II) ion per subunit [39], which was released in the PDB on January 15, 2014, and is not yet included in MetalPDB. This enzyme is an iron(II)/2-ketoglutarate-dependent dioxygenase that hydroxylates an N-(d-lysergyl-aminoacyl) lactam in the ergot fungus Claviceps purpurea. MetalS3 identifies similarities to various other dioxygenases that are active against different substrates. In particular, the best hit is the iron(II) site of the 2CSG structure, an uncharacterized protein addressed by the Midwest Center for Structural Genomics, with 17 % sequence identity between the sites. Both structures feature organic ligands (2-ketoglutarate for 4NAO; succinate, which is a reaction product, and isocitrate for 2CSG) bound to the metal ion in corresponding positions (Fig. 5a). The second hit is a isopenicillin N synthase from Emericella nidulans (PDB ID 1ODM) [40]. This site has lower RMSD and higher sequence similarity to the query, and also features an organic ligand chelating the iron(II) ion in a manner relatively similar to that of 2-ketoglutarate of 4NAO (Fig. 5b). Notably, isopenicillin N synthase is not dependent on 2-ketoglutarate, whose functional role is performed by the tripeptide substrate [41]. The third hit contains a group of dioxygenases more closely related to 4NAO, which includes human phytanoyl-CoA dioxygenase (PhyH; PDB ID 2A1X). The article describing 4NAO provides a detailed comparison with PhyH and its homolog PhyHD1, which are actually the best results returned by a Dali [42] search based on the entire structure [39]. The 2-ketoglutarate molecules present in the 4NAO and PhyH structures chelate the metal ion in a closely similar manner (Fig. 5c). Finally, the fourth hit is a manganese(II) site in the 2-ketoglutarate-dependent dioxygenase AlkB (PDB ID 4JHT) [43] (Fig. 5d). AlkB is an iron(II)/2-ketoglutarate-dependent dioxygenase that catalyzes the oxidative demethylation of nucleic acids and histones [44]. It can bind manganese(II) in its catalytic site, yielding an inactive enzyme. Indeed, the aforementioned 4jht_1 site is the representative of a relatively large equistructural cluster in MetalPDB that contains the other structurally characterized AlkB MFSs. The cluster contains, for example, also the 3O1T structure [45], where the iron(II) ion is chelated by succinate, again in a position close to that of 2-ketoglutarate in 4NAO. The systems described in this paragraph map to three different, but related to the same superfamily, Pfam domains: DUF1479 (2CSG), 2OG-FeII_Oxy (1ODB, 4JHT), and PhyH (4NAO, 2A1X). The results include also a case of a system where the physiological iron(II) ion was substituted in vitro. Thus, even for a large and widely studied protein superfamily such as that of iron(II)/2-ketoglutarate-dependent dioxygenases, MetalS3 proves useful in the analysis of a newly solved structure to identify relationships across different subgroups in a manner that is independent of overall fold similarity.

Fig. 5
figure 5

The four top-scoring sites among the search results for the iron(II)-containing MFS in the A chain of 4NAO. The 4NAO query structure is always in blue and in the same orientation. The superpositions to the sites a 2csg_1, b 1odm_1, c 2a1x_1, and d 4jht_1 are displayed. The organic iron(II) ligands present in the various MFSs are shown as sticks. Water molecules are not shown

Concluding remarks

MFSs in metal-binding biological macromolecules constitute a novel viewpoint for the elucidation of the mechanisms of function in these systems [11]. In this frame, we have developed the MetalPDB database [18]. MetalPDB contains a systematic analysis of all known MFSs. In particular, within the database all MFSs were grouped into so-called equistructural clusters. Each cluster contains all MFSs located at corresponding positions within the fold of homologous proteins. Recently, we developed the MetalS2 program and Web server to perform pairwise structural superpositions of MFSs, providing a ground for the quantitative evaluation of MFS similarity [12]. MetalS3, which is described in this work, is a Web-based tool (http://metalweb.cerm.unifi.it/tools/metals3/) that adopts the MetalS2 algorithm to perform searches in the MetalPDB database. This is implemented as a first coarse-grained search against the ensemble of the MFSs representing MetalPDB equistructural clusters, followed by a refinement step in which the query MFS is compared with all the MFSs in a user-selected cluster. Although algorithmically very similar, MetalS2 and MetalS3 have somewhat different usage scenarios and make possible access to distinct information. MetalS2 requires the user to have prior knowledge of the structures to be compared, either a pair or a group of related metalloproteins. In contrast, MetalS3 constitutes an unbiased approach to seeking structural similarities between metal sites, independently of the user’s prior knowledge. The hits returned by MetalS3 can be a combination of relatively obvious ones (e.g., homologs of the query metalloprotein) and unexpected ones. The latter can be identified only through the present approach, whereas MetalS2 is a tool to quantify structural similarities within groups of sites already familiar to the user.

The MetalS3 approach may help researchers in the field of bioinorganic chemistry to assess the relationships or evaluate possible evolutionary links between different groups of metalloproteins and may help guide experimentalists’ work in understanding the function of uncharacterized metalloproteins. Overall, this contributes to achieving a better comprehension of the role of metal ions in living systems.