Introduction

Staphylococcus sciuri was found by Kloos et al. (1976) after the genus Staphylococcus was defined in 1880. Staphylococci are Gram-positive, anaerobic facultative, catalase-positive but oxidase-negative non-motile bacteria with peptidoglycan and teichoic acid in their cell walls (Zimmerman and Kloos 1976, Kloos 1980, Kloos et al. 1997). It was reported as a common bacterium thriving in a wide variety of environments and was originally thought to be a commensal bacteria found in healthy or ill farms and wild animals, but it has also been found in hospitalized individuals. There are reports of MRSS incidence in healthy broilers showing that S. sciuri may be a source of virulence and resistance genes, also depicting the clonal nature of the methicillin-resistant strains. Other studies investigated the prevalence of methicillin-resistant coagulase-negative Staphylococci (MRCoNS), while rarer studies were reported about MRSS employing selective isolation (Zimmerman and Kloos 1976, Kloos et al. 1997, Dakic et al. 2005). The bacterium causes significant pathogenesis in humans such as endocarditis, wound infections, peritonitis, septic shock, and urinary tract infections; the pathogenicity in animal population, however, is little understood, though antimicrobial resistance is prevalent. The selected pressure of frequent and non-specific use of antimicrobials for preventative, therapeutic, or growth booster purposes, mostly in pigs and poultry, has enhanced resistance and has been seen primarily in domestic animals. This resistance has spread to other wild species that share the same habitats and resources as domestic animals (Hedin and Widerström 1998). These species collect resistance and virulence genes circulating in a particular environment because of their toughness, prolificacy, and dispersal capacities (Kolawole and Shittu 1997). Because these genes are usually encoded in mobile genomic components, this is made easier (plasmids, chromosomal cassettes, transposons). These components may easily be transported horizontally across microorganisms, regardless of whether the receivers are pathogenic or non-pathogenic (Horii et al. 2001). The ancestral Staphylococcal species are likely to be Staphylococcus sciuri. It is often found on the skin and mucous membranes of warm-blooded animals in the environment and people (Torres et al. 2020) and currently is linked to mastitis in dairy cattle (Rahman et al. 2005, Nemeghaire et al. 2014b), dermatitis in dogs (Hauschild and Wójcik 2007), and exudative epidermis in piglets (Hauschild and Schwarz 2003).

It has been discovered that S. sciuri possesses a close homolog of the methicillin-resistance gene mecA seen in S. aureus (Wu et al. 2001). The S. sciuri group comprises five species including S. sciuri (three subspecies), S. lentus, S. vitulinus, S. fleurettii, and S. stepanovicii that have been isolated from diseases caused in both animals and humans (Hauschild and Wójcik 2007, Nemeghaire et al. 2014a). S. sciuri, the genus’s original bacterium, and its closely related species were shown to carry the possible evolutionary progenitor of numerous resistance genes that might serve as a reservoir for S. aureus resistance and virulence genes (Wu et al. 1996), which include mecA gene complex (A to E) and eight cassette chromosome recombinase (ccr) gene complexes (ccrA1B1, ccrA2B2, ccrA3B3, ccrA4B4, ccrC1, ccrA5B3, ccrA1B6, and ccrA1B3) (Katayama et al. 2001).

There have been recent findings of multi-drug resistant (MDR) Staphylococci such as S. sciuri carrying multiple resistant genes towards commonly available β-lactams and other antibiotics, in Africa (Adesoji et al. 2020, Egyir et al. 2022), North and South America (Meservey et al. 2020, Salazar-Ardiles et al. 2020, Saraiva et al. 2021, de Carvalho et al. 2022, Santos et al. 2022), Korean Peninsula (Kim et al. 2019), Asia (Zhang et al. 2022, Boonchuay et al. 2023), Europe (Paterson 2020, Gómez-Sanz et al. 2021, Rey Pérez et al. 2021), Middle East (Khazandi et al. 2018, Al-Hayawi 2022), and other parts of the world, which is an increasing public health concern for treatment of life-threatening infections.

To find novel therapy and other prophylaxis options towards drug, vaccine, and diagnostic biomarkers, the primary phase in all protocols is the target identification in the post-genomics era. This can be achieved through various experimental as well as commonly used in silico approaches including pangenomics, subtractive genomics, structure-based drug designing (SBDD), comparative genomics, genome mining for metabolic pathway reconstruction, network pharmacology–based analyses, and reverse vaccinology, among other recently established computer-aided techniques (Ibrahim et al. 2017, Dalal et al. 2019, Singhal and Mohanty 2019, Dhankhar et al. 2020, Singh, Dhankhar et al. 2022, Zhang et al. 2023, Aregbesola et al. 2021). These methods have widely been used for the identification of protein-based therapeutic and vaccine targets in common and XDR, MDR, and other Pan-drug-resistant pathogenic microorganisms including viruses, bacteria, parasites, and fungus (Hughes 2002, Somani et al. 2019, Khan et al. 2021a, Khan et al. 2022b, Zhang et al. 2023).

The emergence of antibiotic-resistant pathogens due to excessive and unnecessary medications causes their immediate control a challenging assignment, hence using integrated OMICS strategies including but not limited to transcriptomics, metabolomics, and proteomics, among others, for disease target and regulator/inhibitor discovery as well as disease origin and prevalence in a variety of infections to expedite the process with minimized expenses (Shouxiang, Xiaojuan et al. 2021, Lvqin, Xuefeng et al. 2021, Linhui, Yutao et al. 2022, Yuan, Zhang et al. 2022, Dindhoria et al. 2022, Laamarti et al. 2022, Deng et al. 2022). The advantages include reduced time, cost-effectiveness, robustness, labor, and reproducibility to fabricate broad-spectrum therapeutic candidates (Hassan et al. 2014, Radusky et al. 2015, Basharat et al. 2021, Aurongzeb et al. 2022, Irfan et al. 2023).

Methodology, databases, and approaches

Genome selection of Staphylococcus sciuri and prediction of the core genome

The S. sciuri was selected on the basis of broad-spectrum host pathogenesis, and their complete genomes available at the start of this work were retrieved from the Joint Genome Institute-Genome Online Database (JGI-GOLD) (https://gold.jgi.doe.gov/) where the genome data and other statistics are readily available for analyses. This database provides an open source of comprehensive access to information regarding metagenome sequencing projects and their associated metadata around the world (Mukherjee et al. 2017). To construct the core genome of S. sciuri, a high-throughput server called the Pathosystems Resource Integration Center (PATRIC, https://www.patricbrc.org/) was used to predict the core genome by randomly choosing one strain FDAARGOS_285 as the reference genome and the remaining ten strains were compared with this reference strain (Wattam et al. 2017).

Non-host homologous, essential genome, and interactome prediction

After the prediction of core genome, the resultant data file was subjected to NCBI-BLASTp (https://www.ncbi.nlm.nih.gov/) (e-value = 0.0001, bit score = 200, and identity = 25%) against the human genome for filtering pathogen non-host homologs (Altschul et al. 1990). To identify conserved essential targets of S. sciuri, the set of core conserved proteins was submitted to the DEG (Database of Essential Genes) (http://tubic.tju.edu.cn/deg/) and CEG (Clusters of Essential Genes) (http://cefg.uestc.cn/ceg) servers using the default parameters. Essential genes in a bacterium constitute a minimal genome, forming a set of functional modules, which play key roles in the emerging field of synthetic biology and contain all the essential genes currently available (Luo et al. 2014, Liu et al. 2020). The STRING server (https://string-db.org/), utilized for the prediction of protein–protein interactome, serves as a biological database and web resource in molecular biology, encompassing both known and predicted protein–protein interactions (Szklarczyk et al. 2019).

Comparative subcellular localization

The genes/proteins that were selected as non-redundant, essential, and human-non-homologous in the previous step were further analyzed for subcellular localization. This step is important to classify the proteins constituting the secretome and exoproteome of the pathogen. The exoproteome and secretome are considered as an excellent source for vaccine candidates. Subcellular localization of proteins was performed based on a comparative approach using two online subcellular localization tools: PSORTB (https://www.psort.org/psortb/) and CellO2GO (http://cello.life.nctu.edu.tw/cello2go/). The protein sequences in FASTA format with organism type set to bacteria and gram stain set to negative were submitted. For bacteria, protein subcellular localization prediction (SCL) is the accurate tool to assign a possible localization site to a protein by using support vector machines (SVMs). Furthermore, it assigns the five subcellular locations, i.e., periplasm, cytoplasm, extracellular, inner outer membrane, and Gram-negative bacteria. In contrast, CELLO2GO considers SVM functionality at two levels; based on sequence-derived molecular descriptors followed by the probability of the subcellular location. Subcellular allocation and functional evaluation of a target protein is vital for proper drug design process and identification of a precise biological process (Yu et al. 2010, Yu et al. 2014).

Identification of biological pathways and biological function

For the identification of different pathways involved in metabolism, the Kyoto Encyclopedia of Genes and Genomes (KEGG) (https://www.genome.jp/kegg/pathway.html) was used to check which proteins are involved in unique or multiple pathways. KEGG is a collection of databases dealing with genomes, diseases, drugs, and biological pathways (Kanehisa et al. 2017). KAAS (KEGG Automatic Annotation Server) was used to filter the essential proteins for metabolic pathway analysis. In order to find out the biological/molecular function of proteins, UniProt (https://www.uniprot.org/) was consulted that is a freely accessible database of protein sequence and functional information. It contains a large amount of information about the biological function of proteins (Pundir et al. 2017). Furthermore, the molecular weight of each potential target was determined using the ProtParam software (https://web.expasy.org/protparam/) that helped in computational determination of the molecular weight of the subjected proteins. Virulent proteins were prioritized on the basis of molecular weight (Wilkins et al. 1999).

Drug target selection and 3D structure modeling

The potential druggable bacterial protein target has been identified from the reference strain of S. sciuri FDAARGOS_285. In the absence of their complete 3D structures, the possibility of comparative homology modeling was considered by evaluating the template availability of all 3 targets. Structural templates that showed at least 30% identity with > 90% query coverage were accepted. This assessment was carried out via comparison of the protein sequence against the structural resource RCSB-PDB, through the use of BLASTp functionality supported by NCBI. All these steps allowed to choose the current target for CADD analysis (Zhao et al. 2020). The three-dimensional structures of MurA, SecY, and ArgS were unavailable; thereby, as aforementioned, comparative modeling approaches were selected for 3D model prediction by selecting the filtered sequence throughout the pipeline to dig out the desired sequence from the reference genome of Staphylococcus sciuri. The template of MurA (PDB ID: 2AQ9) was selected for its modeling from Escherichia coli (Williams et al. 2007), with sequence identity = 91% and query coverage = 98%. The structural model of the target protein was constructed using the SWISS-MODEL that is an online homology-based web server (Colovos and Yeates 1993, Rufino et al. 1997).

Structure validation and energy minimization

Different online servers were utilized, namely, ERRAT (Colovos and Yeates 1993, Kumari et al. 2023), PDBSum (Rufino et al. 1997, Kumari and Dalal 2022), and ProCheck (Laskowski et al. 1993, Dhankhar et al. 2020, Singh, Dhankhar et al. 2022), to measure the quality of the modeled structure. The quality check measurements play a vital role in enhancing the 3D structure qualities, thereby improving the accuracy of drug-target interactions and increasing the efficacy of the drug.

The selected models of target protein targets, i.e., ArgS, MurA, and SecY, were then subjected to energy minimization to improve their quality. A powerful visualization tool UCSF Chimera was used to analyze the structures and to minimize energy. Gasteiger charges were assigned to proteins, and structural constraints were removed by 1500 rounds of minimization runs (750 steepest descent followed by 750 conjugate gradients) with a step size of 0.02 Å, under ff03.rl force field. Protein targets having undergone energy minimization were evaluated through the validation process and then used for docking studies (Weiner and Kollman 1981, Malik, Dalal et al. 2019).

Druggability, virtual screening, and docking analyses

The information obtained from 3D structures and druggability analyses are important features for prioritizing and authenticating putative pathogen targets. For druggability analyses, the final list of essential non-host and host homologous protein targets was subjected to DoGSiteScorer (https://bio.tools/dogsitescorer) in PDB format. DoGSiteScorer is an automated pocket detection and analysis tool for calculating the druggability of protein cavities (Volkamer et al. 2012). For efficient inhibition, the proper active cavity in the protein three-dimensional structure of molecule binding must be examined. The appropriate active site is categorized based on buriedness, size, shape, and the hydrophobic consideration of the specific site (Pettersen et al. 2004). The active sites of ArgS, MurA, and SecY were determined from different literature sources and was also affirmed manually in the target sequences through sequence alignment. In MOE (v2016) (https://www.chemcomp.com/index.htm) (Molecular Operating Environment), virtual screening (VS), docking, and visualization were performed following a slightly modified protocol adapted by Muneeba et al. (2023) and Hassan et al. (2022; Christoph, Sabine et al. 2015, Syed, Rida et al. 2022, Muneeba, Syed et al. 2023). The grid for molecular docking was centered around the previously selected active site residues/interface residues of the protein according to the protocol modified from Dalal et al. (Dalal et al. 2021). The 2D depiction of some of the residues interacting through H-bond with the corresponding ligands are shown as well (Fig. 10). The molecular docking strategy was divided into three major steps: active site identification, ligand preparation, and molecular docking. The docking procedure was done out using reduced protein and ligand molecules.

Molecular dynamics simulation

Molecular docking simulations were used to investigate the behavior of docked proteins. The Assisted Model Building with Energy Refinement program (AMBER) was employed for this aim, and several modules were used for analysis (Weiner and Kollman 1981, Salomon-Ferrer et al. 2013). The details of biomolecule simulation were broken down into five stages, which are shown below.

System preparation

For the simulation of docked proteins, the AMBER12, module tLEaP, was used, which is an unavoidable part of the system setup that provides an interface for preparing primary coordinates and topology files. The protein was solvated with a three-point transferable intermolecular potential (TIP3P) water box with 8.0 and force fields ff03.rl, GAFF, and ff99SB (Fig. 1). To ensure the accuracy of bonds in docked complexes, angles, and atom kinds, a docked protein system was employed. After preparing the starting files, the simulation procedure began.

Fig. 1
figure 1

Solvation box surrounding the docked protein

Minimization, heating, equilibration, and production

Minimization is usually done to eliminate undesirable confrontations. At a cutoff value of 8, the steepest descent technique and 1000 steps for conjugate gradient were used. After 10 ps of minimization heating using the Langevin dynamics method for temperature control, 100 ps of equilibration at a constant temperature of 300 K is required before the production run begins. During equilibration, the total energy of the system remains constant, while the kinetic and potential energies vary. The manufacturing run for the docked complex was completed in 100 ns (ns), followed by equilibration. Periodic boundary conditions were simulated in the simulation box using a canonical ensemble. To keep the temperature constant, the Berendsen coupling integration procedure was applied (Berendsen et al. 1984).

Simulation trajectory analysis

The PTRAJ (Process Trajectory) module of AMBER12 was used to create output files for analysis and to compute four properties, namely, root mean square deviation (RMSD), root mean square fluctuation (RMSF), the radius of gyration (Rg), and their β-factor, and graphical representations were examined in XMgrace (https://plasma-gate.weizmann.ac.il/Grace/) (Vaught 1996).

Root mean square deviation

The coordinates of alpha carbon (C) are commonly thought to indicate an amino acid’s location in three-dimensional space. RMSD is a metric that allows you to compare the relative locations of protein C atoms by computing their averaged distance over a period of time (Kuzmanic and Zagrovic 2010). It is written mathematically as

$$\mathrm{RMSD}=\sqrt{\frac{1}{N}{\sum }_{i}{\mathrm{d}}^{2}{}_{i}}$$

where N is the number of compared atoms and di is the distance between the ith pair of atoms.

Root mean square fluctuation

RMSF is used to determine the backbone atoms of the docked target (N, C, and C). It is the root mean square of the averaged distance between an atom and its average geometric location in a particular set of dynamics, and it may be read as the set of atom positions recorded over a specific time scale. The RMSF is calculated using the following equation:

$$\mathrm{RMSF}=\sqrt{\sum Tkk\left(\genfrac{}{}{0pt}{}{xi\left(tk\right)-x}{T}\right)}$$

where T represents the time interval, xi represents the position of an atom at a particular time, and x represents the averaged position of the atom.

β-Factor

The term-factor, which is closely related to the RMSF, assesses the spatial displacement of atoms around their mean locations as a result of local vibrational and thermal motions (Kuzmanic and Zagrovic 2010). They may be equivalent in terms of RMSF since they measure fluctuations:

$$\upbeta -\mathrm{Factor}={\mathrm{RMSF}}^{2}\left(\frac{8\pi 2}{3}\right)$$

Radius of gyration

The radius of gyration is used to assess the overall packing quality and density of a structure. It is a physical characteristic that may be estimated experimentally, most commonly via small-angle X-ray scattering (SAXA). The following equation was used to quantify the compactness of a macromolecular system:

$${R}_{\mathrm{g}}={\sum }_{i}^{N}={}_{1}{m}_{i}{\left({r}_{i}-{r}_{\mathrm{cm}}\right)}^{2}/{\sum }_{i}^{N}{=1}^{mi}$$

where N is the total number of atoms, mi denotes the mass of atom I, ri denotes the position vector of an atom I, and rcm denotes the molecule’s center of mass. Figure 2 represents an overview of all steps that have been followed in this work, whereas Table 1 shows the number of proteins/genes screened in each step of subtractive genomic/proteomic approach.

Fig. 2
figure 2

A schematic block diagram of different steps and tools employed in the subtractive genomics approach for mining druggable targets in Staphylococcus sciuri and identification of novel TCM inhibitors

Table 1 Total number of genes/proteins screened in each step of subtractive genomic/proteomic approach for druggable targets in Staphylococcus sciuri. The core genome drastically reduced after host homology and gene essentiality analyses and then to only three as the final pathogen targets

Results and discussion

Data retrieval

In the present study, eleven (11) out of one hundred and twelve (112) strains of Staphylococcus sciuri were included that have been reported to be completely sequenced until 2021. All the sequence data is available at the National Center for Biotechnology Information (NCBI) https://www.ncbi.nlm.nih.gov/genome/ for downloading and downstream analyses. This study emphasizes on exploration of the genomes of the selected strains. A reference strain of S. sciuri (FDAARGOS_285) was randomly selected for further comparative analysis.

Prediction of core and non-host homologous genes/proteins

For the construction of the core genome, the Pathosystems Resource Integration Center (PATRIC) was used. Among the 11 strains of S. sciuri, one strain FDAARGOS_285 was taken as a reference and the rest of strains were compared to it (Fig. 3). The core genome file contained 1784 genes that were then submitted to NCBI-BLASTp (E-value = 0.0001, bit score 100, and identity 25%) against the human genome for filtering pathogen-specific non-host homologs. Among these gene sequences, considering the human genome as the host, we found 170 non-host homologous proteins. This step is important to avoid cross-reactivity and binding of the drugs to undesired host protein sites.

Fig. 3
figure 3

Circular comparative genome representation of S. sciuri genomes generated through the PATRIC Server. A FDAARGOS_285 vs nine strains (1/9). B FDAARGOS_285 vs one strain 1/1. Different colors and their intensities show the presence or absence of different genes, genic islets, genomic island, or other genetic materials among different strains of S. sciuri

Analyses of essential genes and protein–protein interaction

The non-host homologous 170 core proteins were then subjected to BLASTp against essential proteins present in DEG (http://tubic.tju.edu.cn/deg/). The file was then subjected to NCBI BLASTp by using the Perl script with the threshold E-value = 10e−4, bit score = 100, and sequence identity =  ≥ 30% against prokaryotes, eukaryotes, and archaea by which druggable targets were reduced to 35 potential targets. The STRING (https://string-db.org/) database is used to determine the inter relation between proteins, which is essential for the proper functioning and gives a detailed knowledge about protein involved in single or multiple pathways. Out of 35, 10 proteins showed multiple interactions. Thus, selecting them for drugs would account for more specificity and accurate results (Fig. 4).

Fig. 4
figure 4

STRING analysis for protein–protein interactions. The different nodes in the network represent the proteins while the network edges represent specific and meaningful protein–protein associations. The network is a scalable vector graphic [SVG]; interactive. The different node colors show the different level of interactions whereas the edge colors show their known, predicted, and other interactions. The colored nodes show the query proteins and first shell of interactors, the white nodes represent second shell of interactors, empty nodes represent proteins of unknown 3D structure, and filled nodes represent that some 3D structure is known or predicted. The edges indicate both functional and physical protein associations, whereas line color indicates the type of interaction evidence and the line thickness indicates the strength of data support. Among the known interactions, those in cyan are from curated databases and those in purple are experimentally determined. In predicted interactions, those in green are from gene neighborhood analyses, those in red are gene fusion events, and those in blue are from gene co-occurrence. The other remaining interactions are olive = text-mining, black = co-expression, navy blue = protein homology

Comparative subcellular localization and identification of biological pathways

The 35 druggable targets were then further proceeded for subcellular localization prediction. We have predicted the comparative subcellular localization of all proteins by using PSORTb (https://www.psort.org/psortb/) and Cello2go (http://cello.life.nctu.edu.tw/cello2go/). Out of the total 35 proteins, 33 were cytoplasmic proteins and 5 were membrane proteins. The results are mentioned below in Fig. 5 and Table 4.

Fig. 5
figure 5

Comparative cellular localization prediction using PSORTb and Cello2go web servers. The relative abundance of the predicted targets as membrane and cytoplasmic proteins is shown. In most cases the membrane-bound small molecular structures/proteins are antigenic in nature and are regarded as good adjuvant/vaccine candidates, whereas the cytoplasmic proteins are generally considered as good drug targets for inhibiting vital metabolic cellular processes

These proteins were then further subjected to the KEGG database for pathway analysis. It was discovered that seven proteins were involved in multiple pathways. The determination of molecular pathways is essential and a very important step because it tells us the estimate and extent to which a protein is necessary for a molecular pathway (Supplementary Fig. 1). Table 2 contains the functionally annotated 7 important non-host homologous proteins.

Table 2 Identification of biological pathways using the KEGG (Kyoto Encyclopedia of Genes and Genomes). The table describes the vital pathways of the seven putative targets and are tabulated as gene/protein names, protein functions, and the respective metabolic pathways in which they play key biological role/s

Drug target selection and 3D structure modeling

Drug targets have been selected based on their mechanism of function and virulence check, molecular weight, pathway analysis, and druggability. Results inferred seven best targets responsible for resistance against antibiotics. Herein, further investigation showed that three targets, namely, ArgS (WP_058610923), MurA (WP_058611897), and SecY (WP_058612677), have more pathogenic responses according to literature support. The 3D structure of protein availability is the starting point for CADD analysis. Structures of protein (ArgS (WP_058610923.), MurA (WP_058611897.1), and SecY (WP_058612677.1) were generated from online servers like SWISS-MODEL. Models generated from SWISS-MODEL were selected for further analysis based on physicochemical properties and quality assessment measures. Structures generated through Swiss-Model are given below (Fig. 6).

Fig. 6
figure 6

3D structures generated through SWISS-MODEL. A WP_058612677, B WP_058611897, C WP_058610923

Besides significant coverage, model 1 showed strong stereochemistry with no residue in the disallowed region and the lowest Z-score (Supplementary Fig. 2). Energy minimization was done to relax the structure and remove the steric clashes of the side chain. Ramachandran plots of the selective models showed that maximum residues are present in the most favored regions. Stereo-chemical properties of comparative homology modeled structure are given below (Fig. 7; Table 3).

Fig. 7
figure 7

Ramachandran plot representing Psi and Phi angles of the selected models and showing the % amino acid residues of the 3D-modeled structures in four different quadrants of the Ramachandran plot

Table 3 Stereo-chemical properties of the predicted final targets. The table shows the values in percentage of the amino acid residues of the 3D-modeled structures in different quadrants of the Ramachandran plot and the Z-scores as a measure of their respective qualities

Validation of 3D models and druggability analysis

Only three cytoplasmic proteins were chosen as potential therapeutic targets out of a total of seven proteins based on their percentage identity of more than 25% and the pathways in which they are involved. The final list of essential non-host good-quality protein targets was subjected to DoGSiteScorer in PDB format. After that, Target Pathogen Database has been used in order to analyze druggability and other biochemical functions. Druggable pockets of final three targets are given below in Fig. 8 and Tables 4, 5, and 6.

Fig. 8
figure 8

Identification of druggable pockets of the top three predicted targets using Protein + of the DoGSiteScorer

Table 4 Pocket detection and protein druggability score for WP_058610923.1. The surface topology of the receptor macromolecule/s in terms of different physicochemical descriptors such as volume, surface area, and drug scores, etc., among others, determines the druggability of a pocket via DoGSiteScorer. A pocket with a drug score close to 1 is considered highly druggable pocket
Table 5 Pocket detection and protein druggability score for WP_058611897.1. The surface topology of the receptor macromolecule/s in terms of different physicochemical descriptors such as volume, surface area, and drug scores, among others, determines the druggability of a pocket via DoGSiteScorer. A pocket with a drug score close to 1 is considered a highly druggable pocket
Table 6 Pocket detection and protein druggability score for WP_058612677.1. The surface topology of the receptor macromolecule/s in terms of different physicochemical descriptors such as volume, surface area, and drug scores, among others, determines the druggability of a pocket via DoGSiteScorer. A pocket with a drug score close to 1 is considered highly druggable pocket

Molecular docking, inhibitor selection, and ADMET profiling

Active site information for the docking procedure is criterion-based. The following steps were involved for this procedure.

In the current study, the Traditional Chinese Medicine (TCM) library was used containing 36,043 compounds used as an inhibitor for docking into ArgS, MurA, and SecY active sites. The top hits of TCM were docked, and top five compounds were analyzed for each receptor.

A total of 36,043 ligands were docked into the active site of the target using the Molecular Operating Environment (MOE) software. For this, selected binding pocket orientation of the active compound was also identified.

Selected ligand molecules were docked into the active site of the target using MOE (Fig. 9). Corresponding hydrogen bonds and binding affinity were also calculated using MOE (Fig. 10). The highest score achieved for compound 1, compound 2, and compound 3 with binding affinities − 7.9, − 7.7, and − 7.9 kcal/mol against the target proteins, respectively. Docking scores and respective binding affinity for the top 5 compounds arranged in descending order are provided below against each protein target (Tables 7, 8, and 9). Detailed visualization analysis was carried out through MOE and the preferred orientation of the ligand binding.

Fig. 9
figure 9

Depicting 3D graphics of all the three docked complexes with the best inhibitor shown in the binding cavity. The ligand-receptor complex was generated via the UCSF CHIMERA tool

Fig. 10
figure 10

2D depiction of the ligand-receptor complexes of the final protein targets representing H-bond with the corresponding best inhibitor. The dotted lines show the H-bond interactions b/w the inhibitor and the amino acid residues of the target protein. The different colors correspond to the chemical nature of interactions and amino acids

Table 7 Docking results of inhibitors with corresponding binding affinities via H-bond within the ArgS binding site. The S-score (docking score) of the MOE software manifest the thermodynamic stability of the ligand-receptor complex system
Table 8 Docking results of inhibitors with corresponding binding affinities via H-bond within the SecY binding site. The S-score (docking score) of the MOE software manifest the thermodynamic stability of the ligand-receptor complex system
Table 9 Docking results of inhibitors with corresponding binding affinities via H-bond within the MurA binding site. The S-score (docking score) of the MOE software manifest the thermodynamic stability of the ligand-receptor complex system

In silico prediction of drug-likeness and ADMET profiling of drug candidates helps reduce the expense of synthesis, preclinical, and clinical research (Kar and Leszczynski 2020). Furthermore, molecular properties of top hits compounds were calculated using Swiss ADME (Table 10).

Table 10 Physicochemical molecular properties of top hits/ADME profile via SWISS ADME. The evaluation/features of top hist are important in computationally mining for potent inhibitors

Molecular dynamics simulation

The most fundamental element associated with the function of proteins is their conformational dynamics. Functional information of protein molecule in encrypted in its structure. To unravel its functional variability, a comprehensive understanding of the structure is needed. In the current study, MD simulation was performed to explore the conformational aspect of protein–ligand interactions and to evaluate the stability of the homology model and enzyme-inhibitor complex. Data reduction analyses like root mean square deviation (RMSD) and root mean square fluctuation (RMSF), the radius of gyration (Rg), and β-factor values were used to determine the conformational changes and stability index of secondary structure elements of the simulated complexes.

Root mean square deviation

RMSD explains the backbone analysis and Cα atoms dynamics over the period of docked protein over the 100-ns time period, and it was observed at the 15-ns fluctuation, but the remaining graph of simulation stability was observed. The average RMSD value for docked protein was 1.17 Å. Figure 6 shows a maximum peak of 1.67 Å. Overall, the pattern of the RMSD graph does support any major domain shifts within the structural framework of the protein–ligand complex. The placement of ligand was well complemented within the binding site during simulation and does not destabilize the protein as shown in Fig. 11.

Fig. 11
figure 11

RMSD plot of simulated ArgS (WP_058610923.), MurA (WP_058611897.1), and SecY (WP_058612677.1) protein complex for the 100-ns simulation run

Root mean square fluctuations

Structure flexibility and fluctuation of Cα residues over time are observed by the RMSF. The average RMSF of docked ArgS, MurA, and SecY proteins calculated from 100 ns was 1.3 Å with 2.4 and 2.7 Å while a maximum peak has been noticed at, while major fluctuations at 76,103, 203 to 263 and 336 residues then at the end of the graph for 518 and 560 residues were observed. That was mostly the loop region of the protein. Till the end of 100 ns, many fluctuations appeared in the graph of Fig. 12. One of the stability proofs of the protein in the simulation run was that the active site residue His125 had an RMSF value of less than 1.0 Å.

Fig. 12
figure 12

RMSF of simulated ArgS (WP_058610923.), MurA (WP_058611897.1), and SecY (WP_058612677.1) protein over the 100-ns simulation run

β-Factor analysis

β-Factor explains the thermal stability and flexibility of the protein overtime. The quantity of β-factor is measured in RMSF. Therefore, its value on the level of localized atomic fluctuation collectively contributes to the global vibrational movement of the protein and its thermal stability. The average β-factor values for ArgS (WP_058610923.), MurA (WP_058611897.1), and SecY (WP_058612677.1) were calculated which are 86.7, 105.8, and 130.7 Å, respectively, demonstrating the higher instability from residue numbers 203 to 263, 502, and 599 of protein (Fig. 13).

Fig. 13
figure 13

β-Factor graphs of simulated ArgS (WP_058610923.), MurA (WP_058611897.1), and SecY (WP_058612677.1) proteins over the 100-ns simulation run

Radius of gyration

The radius of gyration was calculated to evaluate the structural compactness as a time function for the 100-ns simulation of protein–ligand complexes ArgS (WP_058610923.), MurA (WP_058611897.1), and SecY (WP_058612677.1). The average values of 17.3, 16.7, and 17.4 Å, respectively, for docked protein (Fig. 14) denoted the stability of the protein structure.

Fig. 14
figure 14

The radius of gyration of simulated proteins ArgS (WP_058610923.), MurA (WP_058611897.1), and SecY (WP_058612677.1) over the 100-ns simulation time period

Here, by employing a subtractive genomics approach, we have reported some essential non-host homologous protein–based putative targets in principally an animal-associated bacterial species whose clinical relevance to humans is increasing day by day. These targets include argS (arginine-tRNA ligase) with arginine-tRNA ligase and ATP binding activities, and have an important role in the aminoacyl-tRNA biosynthesis pathway, a key player in protein synthesis; murA (UDP-N-acetylglucosamine 1-carboxyvinyl transferase 1), which is involved in amino sugar/nucleotide sugar metabolism and peptidoglycan biosynthesis; and finally, secY (translocase subunit secY) with protein transmembrane transporter and signal sequence binding activities. This target is involved in multiple vital bacterial processes such as quorum sensing, protein export, and bacterial secretion system (Holden et al. 2004, Gill et al. 2005). In addition to the protocol followed here for therapeutics targets mining in S. sciuri and novel inhibitors, there are other well-established in silico computational approaches in the literature that have been used to identify other novel and potential antibacterial agents targeting other important protein-based targets in the Gram-positive Staphylococcus genus, with a main focus on Staphylococcus aureus. For example, The target FmtA is a core member of the Staphylococcus aureus cell wall stimulon, a factor that affects methicillin resistance in S. aureus strains, interacting with teichoic acids and shown to be localized to the cell division septum. FmtA, as part of the catalytic activity, hydrolyzes the ester bond between the backbone of teichoic acids and d-Ala, which are polyribitol-phosphate or polyglycerol-phosphate polymers found in the S. aureus cell envelope (Rahman et al. 2016). Recently, Vikram Dalal and his group have performed numerous biophysical, structural, and in silico studies to show the binding interaction and complex stabilities of newly identified inhibitors towards FmtA from S. aureus. However, the reported screened molecules need to be tested, modified, and experimentally validated to develop the effective antimicrobial compounds against S. aureus (Dalal et al. 2019, Dalal et al. 2021, Dalal et al. 2022, Singh, Dhankhar et al. 2022). Some other related in silico studies have reported potent inhibitors against GraR, a member of the two-component regulatory system GraR/GraS and is involved in resistance against cationic antimicrobial peptides (CAMPs) (Meehl et al. 2007, Dhankhar et al. 2020). Potential lead molecules were identified by performing a structure-based pharmacophore modeling against the lipophilic membrane (LLM) protein that regulates bacterial lysis rate and methicillin resistance level in S. aureus (Kumari and Dalal 2022). Similarly, two other individual studies by the same group have reported further novel inhibitors against the ribosome biogenesis GTP-binding protein (YsxC), a GTPase that interacts with 50S/30S subunits of the ribosome, and β′ subunit of RNA polymerase, and thereby play an important role in bacterial protein synthesis of S. aureus (Kumari et al. 2022, Kumari et al. 2023). FemC is another methicillin-resistance factor that regulates the synthesis of peptidoglycan in the Gram-positive Staphylococcus aureus. A set of natural product-like compounds from Selleckchem and Enamine databases were screened for inhibitor mining by taking into consideration the active site of the validated FemC model (Dalal and Kumari 2022). The methodology employed here and other in silico cloning and vaccine design studies, hereby, report potent protein-based targets and inhibitors that are required to be validated and may further be utilized to develop novel scaffolds for antimicrobials against S. aureus targets (Khan et al. 2021b, Khan et al. 2022a, Khan et al. 2022c).

Conclusion

Research methodologies were adopted to identify the potential therapeutic candidates in the Gram-negative and MDR pathogen S. sciuri. Genome subtraction aids the identification of pathogen-specific potent drug targets involved in crucial metabolic pathways. Virtual screening and molecular docking were followed to mine the inhibitors from the TCM library. Molecular docking resulted in 1326 compounds as the top inhibitors against ArgS (WP_058610923.), MurA (WP_058611897.1), and SecY (WP_058612677.1). Furthermore, MD simulation confirmed that in the physiochemical environment, the drug-receptor complex attains stability due to structural rearrangements concerning time. Besides minor fluctuations, inside chain and loop movement stability of the inhibitor were observed. Structural stability observed in the docked complex after simulation studies confirms the prospective roles of the selected ligand as a lead compound. The ADMET profiling of the final three TCM compounds further paved a way for its practical feasibility whereas the predicted protein-based three targets could further aid, bridging the gap between the existing and novel pathogen targets. The literature survey of the predicted target proteins manifest that they play a pivotal role in bacterial survival, pathogenesis, and infection establishment. Synthesis of the cell-wall components/peptidoglycan biosynthesis is of utmost importance to retain structural integrity along with antibiotics resistance. On the other hand, protein biosynthesis, nucleotide metabolism, Quorum sensing, and the different types of bacterial secretion systems are always very attractive targets in any drug development procedures. These findings/outcomes of the current study could enhance pharmacological design to develop more potent, efficient, and specific drugs against MDR S. sciuri.