Introduction

Early 2020, the World Health Organization (WHO) acknowledged the most recent coronavirus (COVID-19) a pandemic and a globewide emergency that causes a global outbreak after its emergence from Wuhan, China in December of 2019 (Kamaz et al. 2020; Anjorin 2020). The ongoing pandemic has resulted in more than 168 million confirmed cases with 3.5 million deaths (as at 28th May, 2021) across 218 countries (Sakurai et al. 2021). The culprit virus that is devastating our world is the severe acute respiratory syndrome 2 (SARS-CoV 2), with a spherical shape, single-stranded, and positively sensed RNA virus. The virus has a genome length of approximately 30,000 base pairs consisting of about eleven open reading frames (ORFs) that encodes numerous proteins (structural and non-structural) that are of importance to the viral life cycle (Anjorin 2020; Alazmi and Motwalli 2020; Yoshimoto 2020). Once SARS-CoV 2 gains entry into the host cell, the viral RNA is liberated into the cytoplasm which results in the expression of replicase gene. The replicase gene accommodates two overlapping open reading frames, ORF1a/1b which covers two-third of the viral genome. The ORFs undergo expression to yield polyprotein 1a and 1ab. These polyproteins are biochemically sliced by two cysteine-rich proteinases known as 3-chymotrypsin-like protease or main protease (Mpro/3CLpro) and papain-like protease (PLpro) emancipating 16 non-structural proteins (nsp1 to 16).

3-Chymotrypsin-like protease (nsp5) is a three domain (I to III) protease that is of significance for the sustenance of the viral life cycle since its maturation leads to the production of nsp4 to nsp16 after self-cleaving from the precursor polyprotein (Dai et al. 2020; Jin et al. 2020). The cleft between domains I and II features a noncanonical Cys-His dyad that recognizes amino acids in substrates from the N terminus to C terminus (Dai et al. 2020; Jin et al. 2020). In it active form, is homodimeric with two protomers (A and B) (Jin et al. 2020). There are no homologs of Mpro present in humans, making it an ideal target for antiviral drug design. Additionally, the role it plays in the replication and proliferation of SARS-CoV 2 cannot be silenced. The process of discovering and developing drugs is time-consuming and cost-effective, thus the need for the use of computer-aided therapeutic discovery (CATD) which embraces structure-based, system-based and ligand-based therapeutic design (Dai et al. 2020; Romano and Tatonetti 2019; Prieto-martı et al. 2019; Umar et al. 2021a). Today, the key roles played by computational methods in therapeutic design, discovery and development expedition cannot be overemphasized because of its numerous dimensional usage for assemblage of data, processing it before evaluation and interpretation (Dai et al. 2020; Romano and Tatonetti 2019; Prieto-martı et al. 2019; Umar et al. 2021a).

From our erstwhile computational study, we showed that 3-O-(6-galloylglucoside) serves as potential inhibitor of SARS-CoV 2’s main protease. However, this compound was found to have poor druglike properties that might affect it down the drug discovery pipeline even though it showed a good binding score with main protease of SARS-CoV 2 (Umar et al. 2021a). In this current study, we designed 100 novel molecules through an artificial neural network-driven platform called LigDream (https://playmolecule.org/LigDream/) (Skalic et al. 2019) using 3-O-(6-galloylglucoside) as parent molecule for design. These new compounds were subjected to inflexible druglikeness screening to select those that could serve as oral drugs. Then a virtual screening was implemented to get the best hit compounds against Mpro. The hit compounds were examined for their pharmacokinetic properties in silico. Finally, quantum mechanics/molecular mechanics study was carried out on the overall hit compound with controls to determine the stability and reactivity of the lead molecule.

Materials and methods

Ligand selections

The ligands selected for this current in silico study are 3-O-(6-galloylglucoside) (seed molecule, PID: 73,157,749), remdesivir (control antiviral drug, PID: 121,304,016) and novel compounds (Fig S1 and Table S1). Three-dimensional conformers of the seed molecule and control drugs in structure data format (SDF) are sourced from a chemical repository server known as PubChem (https://pubchem.ncbi.nlm.nih.gov/compound/).

Generation of novel compounds from 3-O-(6-galloylglucoside)

A total of 100 novel compounds were generated using an artificial neural network-driven platform called LigDream module of playmolecules server (https://playmolecule.org/LigDream/) (Skalic et al. 2019). The SMILE string of 3-O-(6-galloylglucoside) was uploaded to the server and run to generate 100 new SMILE strings for different compounds. The platform uses two networks, auto-encoders and captioning networks that could differentially design several compounds (100) starting with a lone seed compound (3-O-(6-galloylglucoside)). Furthermore, these 100 novel compounds were filtered for druglikeness using five rules, viz Lipinski’s (Lipinski et al. 2012), Veber’s (Veber et al. 2002), Muegge’s (Muegge et al. 2001), Egan’s (Egan et al. 2000) with Ghose’s et al. (1999); and bioavailability score via the SwissAdme server (http://www.swissadme.ch/index.php) (Daina et al. 2017). SMILE Strings of the novel molecules was uploaded onto the server and was run to evaluate their druglikeness. Those compounds which show no violation to the five rules and a bioavailability score of 0.55 and above were considered for the computer-aided molecular docking against main protease of SARS-CoV 2.

Selection and preparation of protein target

Three-dimensional (3D) structure of main protease (Mpro/nsp5) of SARS-CoV 2 (PDB ID: 6LU7) (Jin et al. 2020) was retrieved from the Protein Database (PDB) (http://www.pdb.org/pdb). Furthermore, the protein was free from all heteroatoms and consequently minimized using protein preparation and minimization tools in Cresset Flare© software, version 4.0 (https://www.cresset-group.com/flare/). The protein minimization was executed under the General Amber Force Field (GAFF), with gradient cutoff of 0.200 kcal/mol/A and iterations was set to 2000 iterations (Stroganov et al. 2011).

Protocol validation of virtual docking steps

Authentication of molecular docking step is required through step validation as done earlier (Umar et al. 2021a, 2021b) to corroborate its exactitude and consistency. Our intent is to replicate the binding posture of a re-docked ligand of a protein that was co-crystallised alongside it. Thus, the co-crystallised ligand (N-leucinamide) was detached from the Mpro and primed for re-docking using Cresset Flare software. The ligand was then re-docked back into Mpro’s binding region using Auto Dock Vina integrated in Python Prescription (PyRx) (Trott and Olson 2010). The docked complex was aligned with the cognate crystal structure of Mpro bearing the co-crystalized ligand to acquire the root mean square deviation (RMSD) value in PyMOL.

Preparation of druglike novel compounds for virtual docking

The novel compounds that show druglikeness and good bioavailability score were sketched using MarvinSketch© (ver. 15.11.30) software and transformed into their best energetic and stable configurations using Merck molecular force field (MMFF94) (Halgren 1996) using Open babel integrated within Python Prescription (version 0.8).

Virtual docking

The molecular docking was achieved through flexible docking procedure (Trott and Olson, 2010) previously used by ( Umar et al.2021b). PyRx 0.8, a suite integrated with Auto Dock Vina, was utilized for the molecular docking study. The specific target site for the receptor corresponding to the substrate-binding region was adjusted using the grid box with dimensions (18.08 × 26.45 × 26.30) Å, and the centre was attuned based on the site of substrate binding in the protein consisting of the following amino acids; Thr25, Thr26, His41, Cys44, Met49, Tyr54, Phe140, Leu141, Gly143, Cys145, Asn142, His163, His164, Met165, Ser144, Glu166, Pro168, His172, Val186, Asp187, Arg188, Gln189, Phe185, Thr190, and Gln192 (Dai et al. 2020; Jin et al. 2020; Umar et al. 2021a). The compounds with docking score similar to that of the Seed molecule and control drug at the end of the experiment, were subjected to molecular interaction analysis with the aid of PyMOL© Molecular Graphics (version 2.4, 2016, Shrodinger LLC) and LigPlot+ (Laskowski and Swindells 2011).

In silico ADMET prediction

ADMET (Adsorption, Distribution, Metabolism, Excretion and Toxicity) is important to analyze the pharmacokinetics of the proposed molecule which could be used as a drug. ADMETSar server was used to predict the ADMET properties of the compounds with the best hits after molecular docking analysis (Cheng et al. 2012; Yang et al. 2018). SMILES of the ligands from PubChem (https://pubchem.ncbi.nlm.nih.gov/compound/) were uploaded onto the search bar of the servers and were predicted.

Quantum mechanics/molecular mechanics analysis

Quantum reactivity descriptors of a lead molecule, seed molecule and control drug was calculated using the Molecular Orbital Package (MOPAC2016). The computations were executed through the semi-empirical method Parametric Method 7 (Kishor and Bhoop 2013). The output file generated from the geometric optimization of the molecules was used to calculate quantum reactivity descriptors such as Highest Occupied Molecular Orbital (HUMO), Lowest Unoccupied Molecular Orbital (LUMO), Energy Gap, Chemical hardness and softness, Molecular surface electrostatic potential (MEP) and electronic energy.

Results and discussion

Generation of novel compounds and druglikeness screening

We used 3-O-(6-galloylglucoside) to produce 100 new compounds through LigDream platform (https://playmolecule.org/LigDream/). The significance of this platform to solve common drug discovery problems is tied to their potency to prevent over utilization of resources (Skalic et al. 2019). These new compounds show new scaffolds and functional groups which covers new site of chemical space that upholds lead-like characteristics (Supplementary Figure S1). The outcome of the inflexible druglikeness screening indicated that eight (8) new molecules scaled through without violating any of Lipinski’s, Ghose’s, Veber’s, Egan’s and Muegge’s rules; also showing an Abbot bioavailability score of 0.55 (Table 1). Druglikeness is defined as a qualitative valuation that offer the chance for a compound to be an oral drug with reverence to bioavailability (Daina et al. 2017). In the early stage of drug discovery voyage, these valuations are routinely deployed to filter chemical libraries [in our case our chemical library is made of 100 novel molecules from 3-O-(6-galloylglucoside)] to expunge those molecules with properties that are discordant with an acceptable pharmacokinetics profile. This is an indication that the 8 novel molecules can serve as orally active drug.

Table 1 Screening of 3-O-(6-galloylglucoside) and 100 new molecules derived using the LigDream online tool

New compounds bind to main protease of SARS CoV 2

We employed virtual docking to define potential binding and interactions between 8 new molecules, 3-O-(6-galloylglucoside) and remdesivir with SARS CoV 2’s main protease. From our findings, 3O6G had a binding affinity of −8.4 kcal/mol while remdesivir displayed a binding affinity of −8.2 kcal/mol (Fig. 1 and Table 2). Also, the novel molecules (C1, C2, C17, C23, C30, C33, C35 and C54) showed good binding (Figs. 1, 2) although molecules C33, C35 and C54 showed the utmost binding affinity of -8.3 kcal/mol. The molecular interaction studies using PyMOL and LigPlot+ of our compounds showed that they have interactions with amino acid residues domiciled within the binding domain for substrates in Mpro (Figs. 1, 2). Of note is the ability of our compounds to be able to interact with His41 and Cys145 either by hydrogen bonding or hydrophobic interaction. This interaction can be key to stopping the activity of Mpro for processing the replicase polyprotein and subsequently blocking the maturation of the virus as these two amino acid residues are key in interacting with the enzymes normal substrate.

Fig. 1
figure 1

Virtual Docking of C33, 3-O-(6-galloylglucoside) and remdesivir against Mpro of SARS CoV 2. a Validation of virtual docking steps. The superimposition of the re-docked (cyan) and native (pink) Ligands yields an RMSD value of 0.101 Å. The 2D and 3D molecular interactions of 3-O-(6-galloylglucoside) (b, e), C33 (c, f), and Remdesivir (d, g)

Table 2 Molecular interactions of lead ligands with the amino acid residues in the binding region of SARS-CoV 2’s Main protease
Fig. 2
figure 2

2D molecular interaction plot of new compounds docked against Mpro. a C1, b C2, c C17, d C23, e C30, f C35 and g C54. Ligands are in blue sticks, amino acids interacting through hydrogen bond are in brown sticks, hydrogen bonds are represented in green dashes while hydrophobic interactions are presented as red curved spikes

The observation of 3-O-(6-galloylglucoside) binding to SARS-CoV 2 main protease from this current study is in agreement with the observation made in our previous study that looked at binding of gallic acid derivatives with five non-structural proteins (nsps) of SARS-CoV 2 (Umar et al. 2021a). ( Das et al. 2020) have demonstrated that most molecules docked against SARS-CoV 2’s main protease interacted with His41 and Cys145. Similarly, the in silico study of Umar et al. ( 2021) showed that Mangiferin, binded to Mpro by producing interactions with key amino acid residues significant in stopping the activity of Mpro especially the catalytic dyad residues, His41 and Cys145.

Our findings in this current in silico work through the molecular interaction fingerprints of the lead molecules are in tandem with those of previous studies (Kamaz et al. 2020; Jin et al. 2020; Putu et al. 2020; Zhao et al. 2020).

In silico ADMET prediction

For a drug to be effective, a potent small molecule should reach its target in the body in ample concentration, and remain there in its bioactive nature for a long period to exact its therapeutic influence. In line with this, it is pertinent to assess molecules early on in the drug development stage for their absorption, distribution, metabolism, excretion, and toxicity (ADMET) parameters as this effort will considerably reduce attrition. In our current in silico work, we assessed the ADMET-linked parameters of three best hit molecules (C33, C35 and C54), 3-O-(6-galloylglucoside) and remdesivir computationally using the ADMETSAR server. Our findings are presented in Table 3. The new compounds showed a good ADMET properties than our seed molecule (3O6G) and remdesivir especially C33. This feat could be linked to the earlier outcome from the inflexible druglikeness screening. Commendably, C33 shows none inhibition to some key liver enzymes (Cytochrome P450 isoforms), human-ether-a-go-go gene (i.e. might not cause prolonged QT interval), and P-glycoprotein transporter, in addition to non-carcinogenic and non-mutagenic aptitudes as predicted. Furthermore, C33 was predicted to be orally available and could be absorbed intestinally.

Table 3 In silico ADMET Profiling

Quantum reactivity analysis

The quantum reactivity descriptors of our lead molecule (C33), 3-O-(6-galloylglucoside) and remdesivir were obtained from MOPAC2016 software. We were able to generate the energy of the highest occupied molecular orbital (EHOMO) and the energy of the lowest unoccupied molecular orbital (ELUMO), which describes the charge distribution of frontier molecular orbitals of our compounds (Fig. 3A–C). From this, the blue color portrays the negative periods, while the red color indicates the positive periods. Thus, we observe that the HOMO and LUMO are frequently located on the aromatic ring, sugar moiety (remdesivir), the amino-, floro-, carbonyl-, phospho- and methylene groups. Similarly, the values of HOMO (the capacity of a molecule to give electrons to an acceptor species is favored by high values) and LUMO (the ability of a molecule to take electrons is preferred by the low values) can determine the stability and reactivity of our compounds (Benbouguerra et al. 2021; Mubarik et al. 2021). C33 was seen to be able to donate electrons (HOMO = −8.898 eV) and also could take electrons (LUMO = −1.636 eV). The energy gap (ΔG) showed that the lead compound is reactive (−7.262 eV) than the seed molecule and control drug (Table 4) and moderately stable. This indicates that there is charge transfer between the atoms of C33. Other parameters termed the global reactivity descriptors such as chemical potential, electrophilicity index, and chemical hardness that is required to analyze the reactivity of the inhibitor molecules were also elucidated. In addition, descriptors such as ionization potential and electron affinity were also determined (Table 4).

Fig. 3
figure 3

Chemical reactivity descriptors based on HOMO–LUMO analysis of a 3-O-(6-galloylglucoside), b C33 and c Remdesivir. The molecular surface electrostatic potential (MEP) maps of d 3-O-(6-galloylglucoside), e C33 and f Remdesivir

Table 4 Calculated quantum reactivity descriptors of the lead, 3-O-(6-galloylglucoside) and Remdesivir using PM7 Hamiltonian method

The vertical ionization potentials Is and electron affinities as were obtained from the HOMO–LUMO values, which was used to define the global electrophilicity value ω (Table 4). It is a measure of the energy stabilization of the system (Siddiqui et al. 2012). Chemical hardness quantifies the resistance to change in the electron distribution in a collection of nuclei and electrons (Siddiqui et al. 2012; Kalaiarasi and Manivarman 2017). The calculated values of the chemical potential, chemical hardness and global electrophilicity of C33 are − 5.267 eV, 3.631 eV and 3.820 eV, respectively. The small value of η for C33 indicates that it is relatively soft on the scale of hardness. By designation, the electrophilicity index is a degree of the susceptibility of chemical species to accept electrons (Siddiqui et al. 2012; Kalaiarasi and Manivarman 2017).

Molecular surface electrostatic potential

The active regions on our three compounds were elucidated by measuring their molecular surface electrostatic potentials (MEP) through quantum chemical calculations (Benbouguerra et al. 2021; Mubarik et al. 2021). This evaluation will locate the site of chemical reactivity of our molecules which is represented by the 3D maps at optimized geometry in Fig. 3D–F. The electrostatic potential difference is indicated by color gradients that is valuable in investigating the link between molecular structure and physicochemical property correlation of molecules, biomolecules and drugs inclusive. Based on the color gradients, the positive potentials corresponding to the nucleophilic reaction sites are depicted in blue color, while the negative potentials regions relate to the electrophilic reaction sites illustrated in yellow and red colors. However, the areas of zero potential are presented in green color (Fig. 3D–F). The variance that occurs in the electrostatic potential generated by our molecule is widely responsible for the binding of our molecule to Mpro binding domain, since the binding region cumulatively is expected to have opposing region of electrostatic potential. MEP of C33 clearly indicates the major negative potential sites cover the fluorine atoms of the phenyl and benzyl rings, and oxygen atoms of the morpholine ring, oxoacetamide chain and carboxylic group that were featured in yellow to red color, as they are the binding region for electrophilic attack. The hydrogen atoms of the amino and carboxylic groups bear the maximum level of positive potential while the rest of the compound seems to possess an almost neutral electrostatic potential (Fig. 4).

Fig. 4
figure 4

Structure of the lead molecule: 2-({[(2S)-2-(2-amino-3,4,5-trifluorophenyl)morpholin-4-yl](oxo)acetyl}amino)-4,5-difluorobenzoic acid

The IUPAC name of the lead molecule in our current computational study was generated using ChemSketch and MarvinSketch software and the query from both software returned similar name for the molecule which is 2-({[(2S)-2-(2-amino-3,4,5-trifluorophenyl)morpholin-4-yl](oxo)acetyl}amino)-4,5-difluorobenzoic acid (See supplementary information).

Conclusion

We found out from this present study that eight (8) molecules from the 100 novel molecules generated from an artificial neural network-based platform showed the ability to serve as oral drugs. Also, C33, C35 and C54 out of these molecules displayed good binding affinity of −8.3 kcal/mol against Mpro. The ADMET profiling indicates that C33 was better than C35 and C54. Finally, the quantum chemical reactivity descriptors analysis showed that C33 was moderately stable and more reactive than the controls. However, there is an urgent need to carry out Molecular Dynamics Simulation (MDS) of at least 100 ns, synthesis of the lead molecule, and follow-up with extensive experimental studies to ascertain its efficacy against this ongoing pandemic.