1 Introduction

Pathogenic bacteria because of indiscriminate use of antibiotics have developed marked antibiotics, xenobiotics resistance and widely spread all over the world. Efflux pump systems are one of the major causes responsible for bacterial drug resistance, sustaining their survival [1]. Efflux pumps are transporter proteins localized in the cytoplasmic membrane and are active transporters requiring a source of chemical energy to perform their function [2]. These efflux pumps extrude a wide variety of structurally unrelated compounds and basically belong to five different families such as small multidrug resistance (SMR), resistance-nodulation-division (RND), ATP binding cassette (ABC), major facilitator superfamily (MFS) and multidrug and toxic compound extrusion (MATE).

MATE transporters facilitate the active efflux of a wide range of chemically and structurally diverse substances, including antimicrobials and chemotherapeutics, contributing to multidrug resistance in pathogenic bacteria and malignancies [3]. MATE proteins are present in all living kingdoms [4] and are secondary active transport systems which have a 12-membrane helix topology [5] driven by either Na+ or H+ [6, 7]. MATE efflux family proteins can range from ~ 400 to ~ 700 amino acids (aa)in length [8] whose toxic exclusion is important to maintain cell equilibrium [9]. The helices in MATE protein structure are arranged within two lobes which forms the cavities leading to the binding of substrates [10]. Multidrug efflux pumps work at the frontline to shield bacteria against antimicrobials by lowering drug intracellular concentrations [11].Their transporter function mostly energized by Na+ confers multidrug resistance against bacterial pathogens and cancer cells [9]. Plants protect themselves against microbial pathogens by producing secondary metabolites through MATE transporters [4]. Whereas in human liver and kidney, they extrude xenobiotic cations [11]. MATE family proteins pursue a mechanism as same as MFS transporter, i.e. rocker-switch mechanism [9]. MATE efflux proteins function as antiporters, causing the transporter to alter shape from outward to inward facing. A chemical substance (drug) attaches to the protein in the inward-facing conformation, causing a switch to the opposite shape and the drug to be extruded [12]. MATE has a substrate binding pocket at either the N-lobe or the C-lobe, and the substrate is extruded by bending the surrounding helix [13].

Protein characterization involves the depiction of a protein molecule's biological, chemical and physical properties. Protein computational characterization is the first necessary step to figure out the biological role of a protein [14] and is therefore important to determine the protein's current state. It also plays an important role in the study of the protein characteristics such as domains, oligomeric state, post-translational modifications, protein–protein interactions and protein–ligand interactions [15]. The early stage stability comparison of proteins should be based on computational followed by biophysical characterization [16]. A protein may also be visualized as a network of interacting residues and analysis of such a network that emerges due to interacting residues provides additional information about the structural and functional roles of the residues [17]. The protein interaction networks map depicts which proteins physically interact with one another [18]. The interface residue prediction can improve the understanding of molecular mechanisms of related process and functions [19]. Therefore, in the present study, an algorithm for residue interaction network (RIN) generation has been designed and comprehensive structural analysis has been used to reveal the important aa in MATE efflux transporter proteins.

2 Methodology

2.1 Dataset Retrieval and MATE Efflux Proteins Characterization

Dataset for MATE proteins of P. aeruginosa and S. aureus were retrieved from NCBI database [20]. These sequences were then exposed to the sorting and filtering in order to attain the non redundant sequences. The unique sequences were extracted on the basis of aspects such as removal of partial and uncultured sequences from the dataset, extraction out of the sequences having 400–700 aa length, grouping and sorting out of most similar sequences from the dataset were multiple alignment tool Multalin [21]. Multalin is a tool for multiple sequence alignment either proteins or nucleic acids and based on the dynamic programming approach.

2.2 Physicochemical Parameters Characterization

The physical parameters of MATE efflux proteins in P. aeruginosa and S. aureus were analyzed on the basis of number of aa and composition, molecular weight, instability index (II), theoretical pI, grand average hydrophobicity (GRAVY) and aliphatic index (AI)using ExPASy–ProtParam tool (http://web.expasy.org/protparam).

2.3 Primary Structure Analysis

The primary structure analysis of the polypeptide sequence for each MATE protein sequence was performed using ExPASy–ProtParam tool [22]. The number of hydrophilic aa, hydrophobic aa and their ratio was calculated for each MATE protein sequence using the Eqs. 1, 2 and 3 where nhydrophob, nhydrophil and Ratiohydrophob_hydrophil correspond to the number of hydrophilic aa, hydrophobic aa and their ratio. The average number of hydrophilic and hydrophobic aa in each protein sequence was also calculated.

$${\text{n}}_{{{\text{hydrophob}}}} = \mathop \sum \limits_{i = 1}^{{\text{n}}} ({\text{n}}_{{\text{G}}} + {\text{ n}}_{{\text{A}}} + {\text{ n}}_{{\text{V}}} + {\text{ n}}_{{\text{L}}} + {\text{ n}}_{{\text{I}}} + {\text{ n}}_{{\text{M}}} + {\text{ n}}_{{\text{P}}} + {\text{ n}}_{{\text{F}}} )$$
(1)
$${\text{n}}_{{{\text{hydrophil}}}} = \mathop \sum \limits_{i = 1}^{{\text{n}}} ({\text{n}}_{{\text{Q}}} + {\text{ n}}_{{\text{N}}} + {\text{ n}}_{{\text{S}}} + {\text{ n}}_{{\text{H}}} + {\text{ n}}_{{\text{T}}} + {\text{ n}}_{{\text{Y}}} + {\text{ n}}_{{\text{C}}} + {\text{ n}}_{{\text{W}}} )$$
(2)

*n is the total number of aain a protein sequence, nG is the number of glycine aa, nA is the number of alanine aa, nV is the number of valine aa, nL is the number of leucine aa, nI is the number of isoleucine aa, nM is the number of methionine aa, nP is the number of proline aa, nF is the number of phenylalanine aa, nQ is the number of glutamine aa, nN is the number of asparagine aa, nS is the number of serine aa, nH is the number of histidine aa, nT is the number of threonine aa, nY is the number of tyrosine aa, nC is the number of cysteine aa, nW is the number of tryptophan aa.

$${\text{Ratio}}_{{{\text{hydrophob}}\_{\text{hydrophil}}}} = {\text{n}}_{{{\text{hydrophob}}}} /{\text{n}}_{{{\text{hydrophil}}}}$$
(3)

Furthermore, the helix topologies and number of transmembrane helices for MATE protein sequences were identified using web-based tools TMHM [23] and HMMTOP[24, 25]. TMHMM is a method of prediction of a number of transmembrane helices based on Hidden Markov model. HMMTOP is the web-based server for prediction of the topology of transmembrane membrane proteins and localization of helical transmembrane segments as well.

2.4 Secondary Structure Analysis

The secondary structure of characterized MATE efflux proteins in P. aeruginosa and S. aureus includes number of a-helices, b-turn, extended strand, b-sheet, coils. These secondary structure features were analyzed by ExPASy SIB Bioinformatics SOPMA tool. (https://npsaprabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_sopma.html).

2.5 Tertiary Structure Analysis

The tertiary structures for MATE protein sequences were predicted using Modeller 9.21 [26]. Modeller provides a homology modeling-based approach for three-dimensional protein structure prediction. The protein identity was considered as > 25% for each protein. The observed PDB files for the tertiary structure of proteins were energy minimized with Swiss-Pdb viewer [27] in order to attain the protein model in an energetically favorable state. These tertiary modeled structures were verified as well as Ramachandran plots were constructed using PROCHECK tool [28] which helped in the visualization of energetically allowed and favorable regions for phi and psi backbone dihedral angles.

2.6 Molecular Dynamic Simulations and structure Optimization

Molecular dynamic simulations was performed using Gromacs 5.0 (http://www.gromacs.org/) to test the best model's stability. Structures of the predicted tertiary protein structures were optimized with the GROMACS 5.0 (GROningen MAchine for Chemical Simulations) package using the GROMOS96 53a6 force field for 1000 ns. The protein tertiary structures were subjected to a steepest descent energy minimization for 50,000 steps. The topology parameters for proteins were created using the GROMACS program.

2.7 RIN Generation

Residue interaction network has been generated for these energy minimized proteins. Cytoscape [29] was used to create a network. The C-αatoms of aa residue were considered as nodes and the distance between them we reconsidered as edges. The distances for all aa residues were calculated computationally. RIN provides the analysis of network centrality parameters such as degree centrality, closeness, and betweenness which indicate that the network characteristics of residues in quaternary interactions are differentiable from those of other residues. Centrality quantifies the topological importance of a node or edge in a network. The distance threshold for interacting residues was taken 4.0 to 8.5 Angstrom for each interaction of Cα atoms in aa residues.

Algorithm Design for RIN

The protein tertiary structure data constitute the three-dimensional coordinates for each atom throughout the protein structure. Assume the coordinates in protein structure data be xi, yi, zi. A and B be considered as the first and second C-α atom.

Choose;

if xi, yi, zi ∈ C-α atoms: for each structural coordinates.

The input function includes the three dimensional coordinates (xi, yi, zi) belonging t = to either A or B such that:

$$\left( {{\text{x}}_{{1}} ,{\text{x}}_{{2}} } \right), \, \left( {{\text{y}}_{{1}} ,{\text{y}}_{{2}} } \right), \, \left( {{\text{z}}_{{1}} ,{\text{z}}_{{2}} } \right)$$

where {x1, y1, z1 ∈ A; x2, y2, z2 ∈ B.

Distances between any two C-α atom coordinates may provide the information about the bond formation among them at an optimal distance measure. The distances were calculated by the Euclidian distance calculation approach and a threshold for optimal distance corresponds to the bond-forming C-α atom.

Distance between two C-α atoms, d = √(x1 − x2)2 + (y1 − y2)2 + (z1 − z2)2

In this equation x1, y1, z1 corresponds to structural coordinates of first C-α of atom A, and x2, y2, z2 corresponds to structural coordinates of the second C-α of atom B.

This algorithm is based on the RIN generation through the extraction of C-α atom coordinates from three-dimensional structure coordinates and Euclidian distance calculation.The distances calculated were then exposed to RIN generation and various centrality parameters analysis.

Degree centrality, C d (V)

In a network graph, degree centrality is determined by the total number of direct linkages between nodes [30]. Nodes with a large number of neighbors (i.e. edges) have high degree centrality.

$${\text{C}}_{{\text{d}}} \left( {\text{V}} \right) \, = {\text{ deg}}\left( {\text{V}} \right)$$

Closeness centrality, CC(V)

The term closeness centrality refers to the cumulative distances between two nodes [31]. It specifies which vertices have the shortest path to all others. The greater value of distance signifies lesser closeness.

$${\text{C}}_{{\text{C}}} \left( {\text{V}} \right) \, = { 1}/\sum {\text{dist}}\left( {{\text{u}},{\text{ v}}} \right);{\text{ u}} \in {\text{ v}}$$

Eccentricity centrality, C e (V)

The length of the largest shortest path beginning at a node in a graph is defined as its eccentricity i.e. maximum shortest path length from node u to all other nodes u in v [32]. The reciprocal of a node's eccentricity is defined as eccentricity centrality, Ce(V).

$${\text{E}}_{{{\text{cc}}}} \left( {\text{V}} \right) \, = {\text{ max dist}}\left( {{\text{u}},{\text{ v}}} \right) \, ;{\text{ u}} \in {\text{ v}}$$

The lower value of Ecc corresponds to higher Ce.

$${\text{C}}_{{\text{e}}} \left( {\text{V}} \right) \, = { 1}/{\text{E}}_{{{\text{cc}}}} \left( {\text{V}} \right)$$

Betweenness centrality, C b (V)

Betweenness centrality is used to quantify one node's position as a mediator in a network. If one node locates in the sole channel that other nodes must travel through, then this node is likely to be essential and has a high betweenness centrality [33]. It measures how often a vertex or edge is present in the set of all shortest paths.

$${\text{C}}_{{\text{b}}} \left( {\text{V}} \right) \, = \, \sum \left( {\sigma {\text{st}}\left( {\text{v}} \right)/\sigma {\text{st}}} \right) \, ;{\text{ s}} \ne {\text{t}},{\text{ s}} \ne {\text{v}},{\text{ v}} \ne {\text{t}}$$

σst represents the number of shortest paths from s to t which may or may not pass through the node v and σst(V) represents the number of shortest paths from s to t that pass through v.

3 Results and Discussion

3.1 Characterization and Sequence Analysis of MATE Proteins

Based on the various global features such as amino acid residue length, the number of transmembrane helices, a total of 16 MATE protein sequences were selected after sorting and filtering of data for P. aeruginosa and S. aureus.

3.2 Physicochemical Characterization

The physicochemical characterization of MATE proteins has provided the overview about the various characteristics and behavioral nature of MATE proteins of P. aeruginosa and S. aureus (Table 1). MATE proteins consists of 400 to 700 amino acids exhibiting 40% amino acid similarity in their protein sequences [34] with molecular mass ~ 54 kDa [35]. The selected dataset for MATE protein sequences were found to be comprised of 440 to 550 amino acid residues and of ~ 50,000 Da molecular weight and the amino acid residues were chiefly of hydrophobic nature. The instability index value in the range of less than 40 corresponded to the stability of these proteins as the proteins having value above 40 may be unstable. The higher value of aliphatic index corresponded to the higher thermal stability of MATE proteins. The isoelectric point for the MATE proteins was observed in the range of 8 to 10. Moreover, positive value of GRAVY confirmed the hydrophobic character of most amino acids in MATE protein sequences.

Table 1 Physicochemical features of MATE protein sequences of P. aeruginosa and S. aureus

3.3 Primary Structure Analysis

The primary structure analysis of MATE proteins has revealed the amino acid residues Ala, Gly and Ile to be most occurring with the varied composition of these amino acids varied in each protein sequence (Fig. 1). However, the amino acid residues such as Lys, Cys and Trp were found to be lowest in the MATE protein sequences. The hydrophobic and hydrophilic amino acids analysis has shown ~ 60–70% amino acids to be of hydrophobic nature in each MATE protein sequence (Fig. 2). MATE transporters typically constitutes 12 transmembrane helices [36]. The predicted transmembrane helices were found to be ranging from 10 to 12 transmembrane helices which confirmed the topological similarity of these protein sequences to already existing MATE protein sequences. The presence of transmembrane helices and hydrophobic nature of these proteins has confirmed the occurrence of MATE proteins in transmembrane region of cell.

Fig. 1
figure 1

Major contributing amino acids in MATE protein sequences

Fig. 2
figure 2

Hydrophobic and hydrophilic composition in MATE protein sequences

3.4 Secondary Structure Analysis

The analysis using ExPASy tool has shown the occurrence of alpha helices, beta turns, extended strands and random coils in secondary structure of these proteins. The higher number of alpha helices has signified the thermal stability in MATE proteins of both P. aeruginosa and S. aureus (Fig. 3).

Fig. 3
figure 3

Secondary structure composition of MATE protein sequences

3.5 Tertiary Structure Analysis

The predicted structures for bacterial homologs of MATE proteins have shown resemblance to the existing structures of MATE transporter proteins. The 12 transmembrane helices are arranged symmetrically with 6TMHs forming a bundle each named as N- and C-bundle [37]. The observed protein sequences have shown two lobes, i.e. N-lobe and C-lobe in their topology and pseudo two fold symmetry. The PDB files for the energy minimized tertiary structures have been analyzed for their structural coordinates. The homology modeled structures followed by energy minimization were validated by the Ramachandran plot. The Ramachandran plot analysis for P. aeruginosa (> WP_078465331.1)has shown that 80.4% residues were observed in most favored regions, 15.2% residues in additional allowed regions, 3.4% residues in generously allowed regions and 1% residues were observed in disallowed regions. However, the analysis results for S. aureus (> WP_006191797.1) has shown 88.7%, 9.3% and 1.3% residues in most favored regions, additional allowed regions, and generously allowed regions respectively. The residues in disallowed regions were observed to be 0.8% (Fig. 4).

Fig. 4
figure 4

Homology modeled structures, Ramachandran plot for MATE protein sequences a P. aeruginosa (> WP_078465331.1) and b S. aureus(> WP_006191797.1)

3.6 Molecular Dynamic Simulations and Optimized Structure

The tertiary structures for P. aeruginosa (> WP_078465331.1) and S. aureus(> WP_006191797.1)MATE protein sequences were exposed to molecular dynamic simulations using GROMACS 5.0 using 53a6 force field. These optimized structures showed resemblance to already existing protein structures to their respective protein families in membrane and shown in Figs. 5, 6.In these figures; (a) part shows the side view of protein in membrane which shows their topological resemblance to their respective protein families and (b) part shows the top view which shows the cavity formation for substrate extrusion purposes in the respective protein.

Fig. 5
figure 5

Energy minimized and membrane embedded structure of P. aeruginosa (> WP_078465331.1) MATE protein; Template PDB ID: 3MKT (Cation-bound Multidrug and Toxin Compound Extrusion (MATE) transporter of E. coli)

Fig. 6
figure 6

Energy minimized structure of S. aureus (> WP_006191797.1) MATE protein; Template PDB ID: 7SP5 (Eukaryotic phosphate transporter protein)

The uniqueness of homology modeled structures was validated using RCSB-PDB pairwise structure alignment tool (https://www.rcsb.org/alignment).Structure alignment is the technique of comparing the molecular structures of two or more biomolecules to determine three-dimensional form equivalences. The homology-modeled structures of P. aeruginosa (> WP_078465331.1) and S. aureus (> WP_006191797.1) were compared to the crystal structure of the E. coli cation-bound Multidrug and Toxin Compound Extrusion (MATE) transporter (PDB id: 3MKT) using rigid body alignment and superimposed structures (Fig. 7) were viewed using BIOVIA-Discovery studio 4.0 [38]. The relative orientations and locations of atoms inside each structure stay fixed during the alignment process in a rigid body alignment. Only the general forms of the structures are aligned in the resultant superposition. Rigid body alignments are ideal for identifying structural equivalences between proteins that are closely related evolutionarily and so have comparable forms.

Fig. 7
figure 7

Superimposed structure of Cation-bound Multidrug and Toxin Compound Extrusion (MATE) transporter of E. coli, PDB id: 3MKT(cyan) with a P. aeruginosa (> WP_078465331.1), b S. aureus (> WP_006191797.1)

The parameters root mean square deviation (RMSD), template modeling score (TM- Score), Score, sequence identity percentage (SI%), sequence similarity percentage (SS%) and length (Table 2) were used to describe the extent of overlap or similarity between the Cation-bound Multidrug and Toxin Compound Extrusion (MATE) transporter of E. coli (PDB id: 3MKT) and with P. aeruginosa (> WP_078465331.1), S. aureus (> WP_006191797.1). In superposed structures, the RMSD is calculated between aligned pairs of backbone C-alpha atoms. The smaller the RMSD, the better the alignment of the structures. The topological similarity between the template and model structures is measured by the TM-score. The TM-score goes from 0 to 1, with 1 indicating a perfect match and 0 indicating no match between the two structures. Scores of 0.2 normally suggest that the proteins are unrelated, but scores of > 0.5 usually indicate that the proteins share the same protein fold. Score is a structural similarity metric that is unique to the alignment method utilized. SI % is the percentage of sequence-identical paired residues in an alignment. SS% is the percentage of paired residues in the alignment that are sequencely similar. Length is the number of structurally equivalent residue pairs in the alignment.

Table 2 Pairwise structure alignment characterization parameters

The parameter values revealed a strong three-dimensional structural similarity between the E. coli Cation-bound Multidrug and Toxin Compound Extrusion (MATE) transporter, PDB id: 3MKT, and P. aeruginosa (> WP 078,465,331.1). When comparing two structures, the most usually reported statistic is RMSD, however it is susceptible to local structural variation. Even if the remainder of the structure is perfectly aligned, the RMSD value is high if a few residues in a loop are poorly aligned. The alignment of the S. aureus (> WP 006,191,797.1) with the E.coli Cation-bound Multidrug and Toxin Compound Extrusion (MATE) transporter, PDB id: 3MKT, reveals that both proteins are structurally comparable with RMSD 3.09, TM-Score 0.8 despite the low sequence identity (16%).

3.7 RIN Analysis and Actor Residues Identification

The observed residue interaction networks has revealed the scale-free degree distribution which correspond MATE proteins to be a scale-free ideal network. The residue interaction network and various centrality parameters for P. aeruginosa (> WP_078465331.1) and S. aureus (> WP_006191797.1)were computed (Fig. 8). The majorly contributing nodes has been presented. The results for centrality values (Table 3) have shown the maximum betweenness centrality for LEU282, Ile244, MET27, ALA293 and GLN21 in P. aeruginosa and GLN395, LEU253, MET25, ALA247 and ILE273 in S. aureus, which suggested that these residues have control to pass over the information to other residues and their removal may disrupt the connection between the residues in the protein structure. The nodes GLY183, PHE182, ALA351, ASP42, LEU115 and LEU409, ASN219, ALA408, PHE190, and GLY411 were found to have the maximum degree to which network tends to cluster together. The residues LEU65, ILE244, ALA368, GLY191 and VAL296 in P. aeruginosa and ALA302, LEU293, ILE80, VAL74, and GLY75 were found to have maximum degree centrality values which measure the maximum number of nodes connected to these residues (nodes). These nodes carrying the maximum degree centrality values may be responsible for steric bulk which corresponds to the steric hindrance in the protein structure. The RIN analysis for MATE protein sequences have shown the hydrophobic amino acids to have maximum centrality values, corresponding the hydrophobic amino acids as the major acting amino acids within networks which may be because of the predominance of hydrophobic amino acids throughout the networks for MATE protein sequences.

Fig. 8
figure 8

RIN, Degree centrality graph and betweenness centrality graph (degree centrality: blue, betweenness centrality: red, clustering coefficient: green) a P. aeruginosa (> WP_078465331.1) and b S. aureus (> WP_006191797.1)

Table 3 Centrality parameters for RIN in P. aeruginosa (> WP_078465331.1) and S. aureus(> WP_006191797.1)

Protein network representations offer a systems approach to topological study of complicated three-dimensional structures and may contribute to a better understanding on structure–function links. There are a variety of online and stand-alone tools for constructing RINs from protein structures, such as NAPS [39], WebPSN [40], RINalyzer [41,42,43], NeEMO [44], Ring 2.0 [45], RIP-MD [46], PSN Ensemble [47], pyInteraph, [48], INTAA [49] and gRINN [50, 51]. These tools/software have relatively sophisticated user requirements, and some of them deliver protein network information in a more complicated manner. PSN-Ensemble, pyInteraph and INTAA, online web servers which require MATLAB installation and process results from Interaction Energy matrices only. gRINN is a standalone tool which requires topology/ trajectory files and also provide the results in the form of energy networks. RINalyzer provides the simultaneous, interactive 2D visualization and exploration of a RIN together with the corresponding molecular 3D structure having compatibility with Cytoscape only and as well as require JAVA plugin. However, NAPS, RIP-MD, WebPSN and RING 2.0 are web servers providing the results based on user specified residues but not available in standalone versions. The present approach may be used as a standalone method for RIN based on user specified residues. In this specific case, the centrality parameters were calculated based distance topology from the amino acid residue co-ordinates of PDB files observed after MD simulation. The Euclidian distances between the Cα atoms of amino acid residues were calculated followed by a distance threshold. The residue interaction network visualization and centrality calculations can be done directly by any residue interaction viewing programme, in the present specific case Cytoscape was used. The technique is broad in scope, and it can inspire and suggest strategies for developing standalone RIN analyzers for user specific residues.

The major findings of this study has provided the generalization of key features and computational physicochemical characteristics for MATE efflux proteins in P. aeruginosa and S. aureus. The primary, secondary and tertiary structure analysis provided a better understanding and comparative analysis of features, aa residues and protein folds for both the bacterial species. RIN analysis showed the major actor aa residues which are necessary for the specific folds with the protein structures based on their centrality parameters. The algorithm design provided a standalone approach for RIN analysis for the user specified residues. The detailed findings showed that after sorting and filtering data for P. aeruginosa and S. aureus, a total of 16 MATE protein sequences were chosen. MATE protein physicochemical characterization has provided an overview of the diverse properties and behavioural nature of MATE proteins. The chosen dataset for MATE protein sequences was found to be made up of 440 to 550 amino acid residues with a molecular weight of 50,000 Da, and the amino acid residues were mostly hydrophobic. Proteins with an instability index value less than 40 were stable, but proteins with a value more than 40 were potentially unstable. The higher the value of the aliphatic index, the greater the thermal stability of MATE proteins. The MATE proteins isoelectric point was found to be between 8 and 10. GRAVY result validated the hydrophobic nature of most amino acids in MATE protein sequences. The amino acid residues Ala, Gly, and Ile were shown to be the most often present in MATE proteins, with the proportion of these amino acids varying in each protein sequence. The amino acid residues Lys, Cys, and Trp were discovered to be the least abundant in the MATE protein sequences. The hydrophobic and hydrophilic amino acid study revealed that 60–70% of the amino acids in each MATE protein sequence are hydrophobic. The predicted transmembrane helices ranged from 10 to 12 transmembrane helices, confirming the topological closeness of these protein sequences to previously identified MATE protein sequences. The presence of transmembrane helices and the hydrophobic nature of these proteins confirms the existence of MATE proteins in the cell's transmembrane region. ExPASy analysis revealed the presence of alpha helices, beta twists, extended strands, and random coils in the secondary structure of these proteins. The greater the number of alpha helices, the greater the heat stability in P. aeruginosa and S. aureus MATE proteins. The anticipated structures of bacterial homologs of MATE proteins resemble the known structures of MATE transporter proteins. The reported protein sequences feature two lobes, namely N-lobe and C-lobe, and pseudo two fold symmetry. The structural coordinates of the energy-minimized tertiary structures were determined using PDB data. The Ramachandran plot validation for P. aeruginosa (> WP 078,465,331.1) revealed that 80.4% of residues were found in the most preferred areas, 15.2% in further permitted regions, 3.4% in generously allowed regions, and 1% in banned regions. The analysis results for S. aureus (> WP_006191797.1) has shown 88.7%, 9.3% and 1.3% residues in most favored regions, additional allowed regions, and generously allowed regions respectively. The residues in disallowed regions were observed to be 0.8% (Fig. 4). The optimized structures for P. aeruginosa (> WP_078465331.1) and S. aureus (> WP_006191797.1) showed resemblance to already existing protein structures to their respective protein families in membrane and the cavity formation for substrate extrusion purposes in the respective protein. The parameter values revealed a strong three-dimensional structural similarity between the E. coli Cation-bound Multidrug and Toxin Compound Extrusion (MATE) transporter, PDB id: 3MKT, and P. aeruginosa (> WP 078,465,331.1). The alignment of the S. aureus (> WP 006,191,797.1) with the E.coli Cation-bound Multidrug and Toxin Compound Extrusion (MATE) transporter, PDB id: 3MKT, reveals that both proteins are structurally comparable with RMSD 3.09, TM- Score 0.8 despite the low sequence identity (16%). The observed residue interaction networks revealed the scale-free degree distribution, indicating that MATE proteins are a scale-free perfect network. The results for centrality values showed that LEU282, Ile244, MET27, ALA293 and GLN21 have the highest betweenness centrality in P. aeruginosa and GLN395, LEU253, MET25, ALA247, and ILE273 in S. aureus, implying that these residues have control over passing information to other residues and that their removal may disrupt the connection between the residues in the protein structure. The nodes GLY183, PHE182, ALA351, ASP42, LEU115 and LEU409, ASN219, ALA408, PHE190, and GLY411 were found to have the highest degree of network clustering. The P. aeruginosa residues LEU65, ILE244, ALA368, GLY191 and VAL296 and ALA302, LEU293, ILE80, VAL74, and GLY75 were discovered to have the highest degree centrality values, which quantify the number of nodes associated to these residues (nodes). These nodes with the highest degree centrality values might be in charge of steric bulk, which translates to steric hindrance in the protein structure. The RIN analysis for MATE protein sequences revealed that hydrophobic amino acids had the highest centrality values, corresponding to hydrophobic amino acids as the major acting amino acids within networks, which could be attributed to the predominance of hydrophobic amino acids throughout the networks for MATE protein sequences. NAPS, WebPSN, RINalyzer, NeEMO, Ring 2.0, RIP-MD, PSN Ensemble, pyInteraph, INTAA, and gRINN are some online and standalone tools for building RINs from protein structures. These tools/software have relatively advanced user requirements, and some give protein network information in a more intricate way. Some of them, however, are web servers that provide results based on user-specified residues but are not available in standalone versions. The current approach may be utilised as a stand-alone RIN method based on user-specified residues.

4 Conclusion

MATE antiporters in bacteria reduce the effective cellular concentration of a drug inside the cell(s) rendering them resistant to the toxic effect(s) of an antibiotic. A sound understanding of the structure, molecular mechanism and the residues involved in the functioning of these efflux transporter proteins is lacking in the literature. In this study, a comparative structural and RIN analysis of MATE protein sequences of P. aeruginosa and S. aureus has been observed. The physicochemical properties analysis have confirmed the major contributing global features of MATE protein sequences. The analysis of various structural levels of MATE protein sequences have shown the hydrophobic aa to be majorly contributing throughout MATE protein sequences and Ala, Gly and Ile to be majorly contributing aa. The secondary structure analysis have shown the predominating alpha helices and tertiary structure modeling and validation have shown the resemblance to the already existing MATE protein sequences. An algorithm is developed for RIN generation which has been followed by the analysis of important residues in MATE protein tertiary structure formation. The important residues in the MATE protein’s structure formation have been looked into and analyzed on the basis of different centrality parameters through RIN analysis. The structural and RIN analysis of MATE protein sequences have shown the important aa in MATE proteins which might be helpful in new drug target site identification. The algorithm designed for RIN generation uses the bond formation within the C-α atoms in MATE protein structure and may be used for bond formation analysis for other atoms in MATE protein structure. Despite of online web servers and tools, this algorithm provides a standalone approach for RIN analysis. This algorithm may be applied for RIN generation in efflux transporter proteins and for other studies including protein tertiary structure analysis as well. The actor residues observed from RIN provided the major structural insight towards the MATE protein structure analysis.