Introduction

Neglected tropical diseases (NTDs) are a group of infectious diseases which are endemic in developing countries majorly tropics and sub-tropics. Unlike Human Immuno-deficiency Virus (HIV)/Acquired immuno-deficiency syndrome (AIDS), tuberculosis (TB) and malaria which are receiving greater treatment and research funding, NTDs are truly neglected (Wilsher 2011). According to the World Health Organization (WHO), ailments like human African trypanosomiasis, schistosomiasis, onchocerciasis (river blindness), lymphatic filariasis (elephantiasis), rabies, and buruli cancer are amongst many others classified as NTDs (Hotez et al. 2020). This study is focused on two common filarial diseases; Lymphatic filariasis and onchocerciasis. Lymphatic filariasis is caused by wuchereria bancrofti, Brugia malayi and Brugia timori, while onchocerca volvulus is the causative organism for onchocerciasis (Bakowski et al. 2019). Elephantiasis has been reported to cause over 2.8 million disabilities worldwide, whereas river blindness is the global second leading cause of blindness (Jacobs et al. 2019; Cooper and Nutman 2013). Both diseases were equally reported to further aggravate the health of those who are already down with life-threatening infections like HIV, TB or malaria (McGillan 2017). Elephantiasis is transmitted by a wide range of mosquitoes, while onchocerca volvulus is transmitted to its hosts by the blackflies (McGillan 2017).

One specific approach adopted generally over the years to reduce the impact of these filarial infections on the population, is the method of Mass Drug Administration (MDA) (Carter et al. 2020). Notable drugs administered through MDA programs over time include ivermectin, albendazole, and diethyl carbamazine, either as a dual (annual to bi-annual) or as triple-drug (once every 3 years) treatment (Jacobs et al. 2019; Carter et al. 2020). Unfortunately, the various drugs administered through MDA programs lack enough efficiency to eliminate the adult worms. It is therefore, important to identify new approaches to eliminate adult worms so as to effectively cut down the time frames for both diseases’ elimination (Jacobs et al. 2019; Lakshmi et al. 2010). Fortunately, these nematodes causing filarial diseases are said to endosymbiotically co-habit with a gram-negative bacterium called Wolbachia (Ugbe et al. 2021). Wolbachia is known to be widely distributed and infects a wide range of insects and nematodes species of the phylum arthropod (Kurz et al. 2008). Wolbachia pipientis is the strain which is common to nematodes causing filarial diseases (McGillan 2017). The nature of endosymbiotic relationship between Wolbachia and infected worm is unclear, it has however been reported that wolbachia is important during the embryonic development process in infected nematodes (Townson et al. 2000). It has also been suggested that wolbachia can synthesize detoxification enzymes such as catalase, while other reports suggested that the bacteria may play a significant role in nutrition for the host (Henkle-Duhrsen et al. 1998). In the search for new anti-filarial drugs, some researchers have chosen to target wolbachia, which past researches have shown that its elimination from the host filarial nematodes leads to antifilarial effects with the reduction of adult worm’s lifespan (McGillan 2017; Bouchery et al. 2013). A clinically relevant anti-bacteria drug, doxycycline (Reference drug in this study) has over the years been used for the treatment of lymphatic filariasis and onchocerciasis. However, the treatment method lacks the necessary efficiency to be administered through the MDA owing to its requirement for long treatment periods of about 4–6 weeks, contraindications in pregnancy and in children (McGillan 2017). Therefore, efforts in the development of novel wolbachia inhibitors with short treatment periods and reduced complications are important.

Computational tools play a major role in lead optimization phase of drug discovery (Sliwoski et al. 2014). It saves cost, time and tends to be more effective than the traditional methods (Lawal et al. 2021). The knowledge of Quantitative Structure Activity Relationship (QSAR) helps in establishing a relationship between various molecular structures of molecules and their experimental activities (Adeniji et al. 2019). Molecular docking simulation is a computer aided virtual screening method which probes the binding of ligands in the active sites of the protein target using a valid docking tool (Ibrahim et al. 2020). Pharmacokinetics analysis is important in the pre-clinical study of new drug compounds in order to ascertain how such drug compounds affect the living organism when administered. Some of the most important pharmacokinetics properties to be determined during pre-clinical testing include Absorption, Distribution, Metabolism, Excretion, and Toxicity (ADMET) (Lawal et al. 2021; Ibrahim et al. 2021). Physico-chemical properties such as molecular weight, Topological Polar Surface Area (TPSA), lipophilicity, water solubility, hydrogen bond donors, and hydrogen bond acceptors are necessary to predict drug’s likelihood of being orally bioavailable (Lipinski et al. 2001). The choices of molecules for oral bioavailability have been guided by several rules such as the Lipinski’s ‘rule of 5’ (RO5), Veber rule, Ghose rule, Egan, and Muegge etc. (Sun et al. 2020).

Ubiquitination play a critical role in the regulation of many cellular processes in living cells especially eukaryotes (Bailey-Elkin et al. 2014). Manipulating host ubiquitin signaling is becoming an increasing occurrence amongst bacterial and viral pathogens including Wolbachia. This is possible because most pathogens encode deubiquitinases, which helps them to subvert host signalling (Schubert et al. 2020). Therefore, the bacteria viability within the host environment may be heavily affected by inactivation of deubiquitinases. Some deubiquitinases have been reported to belong to the Ovarian Tumor (OTU) family (Bailey-Elkin et al. 2014). In this study therefore, Crystal structure of an OTU deubiquitinase from Wolbachia pipientis wMel (PDB: 6W9O) was used as the therapeutic protein receptor for docking with the newly designed compounds of pyrazolopyrimidine class. The structure of OTU deubiquitinase used in this study was predicted by the method of X-ray diffraction and expressed in Escherichia coli (Schubert et al. 2020).

Many Pyrazolopyrimidine compounds have been reported as having a variety of different biological activities such as anti-tuberculosis, anti-malarial, and antiviral agents, anti-depressants and inhibitors of kinase (Asati et al. 2021; McGillan et al. 2021). However, most of the drugs belonging to this class marketed to date are said to produce hypnotic and/or anxiolytic effects. In order to exploit the anti-filarial effect of the pyrazolopyrimidine class therefore, McGillan et al. (2017) synthesized a series of 52 pyrazolopyrimidine analogues and reported their biological activities against Wolbachia (wAlbB) infected insect cells (Aedes albopictus, C6/36). The purpose of this study is to: develop a QSAR model capable of predicting the activities of some pyrazolopyrimidine derivatives as potent anti-wolbachia agents; carry out a theoretical design of new pyrazolopyrimidine derivatives as anti-wolbachia agents based on the established QSAR theoretical model, while subjecting same to pharmacokinetics and molecular docking studies in order to evaluate their drug-likeness properties and binding interaction pattern respectively.

Materials and methods

Data collection

McGillan in his Ph.D. thesis (2017) reported the synthesis of several small molecules anti-wolbachia agents of the pyrazolopyrimidine class, as part of the Anti-Wolbachia drug discovery and development programme towards identifying alternate drugs for the treatment of filarial diseases. Their biological activities were tested against Wolbachia (wAlbB) infected insect cells (Aedes albopictus, C6/36) and reported in nanomolar. Consequently, a dataset of 52 pyrazolopyrimidine analogues with relatively better half-maximal effective concentration (EC50) values were extracted and used for this theoretical study. The bioactivities (EC50) were converted from nanomolar (nM) to logarithmic scale (pEC50) using Eq. (1) (Wang et al. 2020). Figure 1 shows the structural template of pyrazolopyrimidine, while the molecular structures of the various pyrazolopyrimidine derivatives and their observed EC50 and pEC50 values are shown in Table 1.

$$ pEC_{50} = - \log_{10} \left( {EC_{50} \times 10^{ - 9} } \right) $$
(1)
Fig. 1
figure 1

Pyrazolopyrimidine structural template

Table 1 Molecular structures and anti-wolbachia activities of pyrazolopyrimidine derivatives

Molecular geometry optimization

The two-dimensional (2-D) structures of the various compounds were drawn using the ChemDraw Ultra (version 12.0.2), saved as MDL mol file format and then fed separately into the Spartan 14 software (version 1.1.4) in three-dimensional (3-D) structural form. The 3-D structures were first optimized geometrically by energy minimization. Thereafter, Molecular Mechanics Force Field (MMFF) was used to minimize their chemical structures in order to remove tension energy of the molecules’ conformation. Further optimization was then conducted using Density Functional Theory (DFT) with Becke’s three-parameter read-Yang-Parr hybrid (B3LYP) option and utilizing the 6-31G basis set. The thoroughly optimized structures were then saved in SD file format for use in descriptor calculation (Wang et al. 2020; Li et al. 2004).

Molecular descriptor calculation

The resulting data in SD file format obtained earlier from the optimization process were imported into the Pharmaceutical Data Exploration Laboratory (PaDEL)-descriptor software (version 2.20) to calculate the molecular descriptors for all fifty two (52) pyrazolopyrimidine derivatives (Lawal et al. 2021).

Data-set pretreatment and splitting into training and test sets

The pretreatment of the generated descriptor pool was carried out using the Drug Theoretical and Cheminformatics Laboratory (DTC Lab) based software GUI 1.2 so as to remove descriptors which were not informative (Adeniji et al. 2020). The pretreated data were then divided into the modeling train set data and external evaluation test set data in the ratio of 70:30 respectively, with the help of DTC Lab derived software which utilizes the Kennard Stone method for data set division (Kennard and Stone 1969). The splitting of data set into training and test sets was based on the closeness of the representative points of the test set to the representative points of the training set in the multidimensional descriptor space (Ugbe et al. 2021).

MLR-GFA model building

The Genetic Function Approximation (GFA) as a statistical technique in the Material Studio software (version 8.0) was used to generate the models based on Multi-Linear Regression (MLR) approach. GFA was used to obtain the optimum descriptor combinations constituting the QSAR models, while MLR helps to establish the relationship between the biological activities, pEC50 (dependent variable) and the molecular descriptors (independent variables) (Arthur et al. 2020). The Multi-linear regression equation assumes the following form (Eq. 2) (Adawara et al. 2020):

$$Y={k}_{1}{x}_{1}+{k}_{2}{x}_{2}+{k}_{3}{x}_{3}+ \dots C$$
(2)

where Y represents the dependent variable; ‘k’s and ‘x’s represent respectively regression coefficients and independent variables; ‘C’ equals intercept or regression constant.

Assessment of model quality (internal validation)

Internal validation assessment of the built models was carried out on Material studio by GFA approach using the Friedman formula, correlation coefficient (R2) and cross validation coefficient (Q2cv). Determination of the Friedman Lack-Of-Fit (LOF) allows for the best fitness score to be obtained. LOF is defined as follows (Eq. 3):

$$\mathrm{LOF}=\frac{\mathrm{SEE}}{{(1-\frac{\mathrm{c}+\mathrm{d x p}}{\mathrm{M}})}^{2}}$$
(3)

where: SEE is the Standard Error of Estimation (SEE), with low values indicating high quality model. SEE is defined thus (Eq. 4):

$$SEE=\sqrt{\frac{{({Y}_{exp}-{Y}_{pred})}^{2}}{N-P-1}}$$
(4)

c represents number of terms in the model, d is a user-defined smoothing parameter, p is total number of descriptors in the model while M equals the number of data in the training set (Adeniji et al., 2018).

Another important parameter, the correlation coefficient (R2) measures the degree of fitness of the regression equation. R2 value closer to 1 is an indicative of high quality model.

R2 is a commonly used internal validation parameter and is expressed thus (Eq. 5):

$$ R^{2} = 1 - \left[ {\frac{{\sum \left( {Y_{\exp } - Y_{pred} } \right)^{2} }}{{\sum \left( {Y_{\exp } - \overline{Y}_{training} } \right)^{2} }}} \right] $$
(5)

where: Ῡtraining, Yexp, and Ypred equal respectively, the mean pEC50, experimental activity and the predicted activity in the training set.

R2 is usually adjusted in order to afford the model stability and reliability because it directly varies with increase in number of descriptors. The adjusted R2 is defined thus:

$${R}_{adj}^{2}=\frac{{R}^{2}-p(n-1)}{n-p+1}$$
(6)

where: p is the number of descriptors in the model, and n equals the number of compounds in the training set.

Another important parameter is the leave-one-out (LOO) cross-validation regression coefficient (Q2cv), which determines the ability of a built QSAR model to predict the activity of new compounds. A high value of Q2cv indicates a high internal predictive power and a good robustness of the QSAR model. Q2cv is defined as follows (Eq. 7):

$$ Q_{cv}^{2} = 1 - \left[ {\frac{{\sum \left( {Y_{pred} - Y_{exp} } \right)^{2} }}{{\sum (Y_{exp} - \overline{Y}_{training} )^{2} }}} \right] $$
(7)

Yexp and Ypred represent the experimental activity and predicted activity in the training set respectively. Ῡ training equals the average pEC50 in the training set.

Assessment of model quality (external validation)

The model’s predictive power was assessed externally to show if the model could predict the activity values of the test set compounds. The predictive strength of the model depends on the value of the predicted R2 (R2 test) defined thus (Eq. 8) (Isyaku et al. 2020):

$$ R_{test}^{2} = 1 - \frac{{\sum \left( {Ypred_{test} - Yexp_{test} } \right)^{2} }}{{\sum \left( {Ypred_{test} - \overline{Y}_{training} } \right)^{2} }} $$
(8)

Ypredtest = predicted activity of test set, Yexptest = experimental activity of test set, Ῡtraining = mean value of experimental activity of the training set.

Furthermore, the data was subjected to the Golbraikh and Tropsha acceptable model criteria using the MLRplusValidation tool (version 1.3) as follows (Roy et al. 2013; Edache et al. 2020).

\(\left| {r_{o}^{2} - r_{o}^{\prime 2} } \right|\) (Threshold value < 0.3).

\(\left| {r^{2} - \frac{{r_{o}^{\prime 2} }}{{r^{2} }}} \right|\) (Threshold value < 0.1).

k′ (threshold value 0.85 ≤ k ≤ 1.15).

where: r2 = square correlation coefficients of the plot of experimental activity versus predicted activity values. ro2 = square correlation coefficients of the plot of experimental activity versus predicted activity values at zero intercept. r′o2 = square correlation coefficients of the plot of predicted activity versus experimental activity at zero intercept. k′ = slope of the plot of predicted activity against experimental activity at zero intercept.

Y-randomization test

The Robustness of the built QSAR model was assessed by Y-randomization technique in which MLR models are generated by randomly shuffling the dependent variable while keeping the independent variables constant (Adawara et al. 2020). This is for a confirmation that the QSAR model built is strong and not created by chance. A low R2 and Q2 values for several iteration indicates a good applicability of the built model. The coefficient of determination (\({cR}_{p}^{2}\)) is defined as follows (Eq. 9):

$$ cR_{p}^{2} = R X \left[ {R^{2} - \left( {R_{r} } \right)^{2} } \right]^{2} $$
(9)

where: \({cR}_{p}^{2}\) = Y-randomization coefficient, R = correlation coefficient for Y-Randomization, Rr = average ‘R’ of random models. \({cR}_{p}^{2}\) value greater than 0.50 is a requirement for the model to pass Y-randomization test.

Statistical analysis of the descriptors

Mean effect (ME)

The mean effect (ME) value shows the relative contribution of each descriptor in a model, defined as (Eq. 10):

$$ME=\frac{{B}_{j}{\sum }_{i}^{n}{D}_{j}}{{\sum }_{j}^{m}({B}_{j}{\sum }_{i}^{n}{D}_{j})}$$
(10)

where: βj is the coefficient of the descriptor j in the model, Dj is the value of each descriptor in the data matrix for each molecule in the training set, m is the number of the descriptor that appears in the model, n is the number of molecules in the training set (Abdullahi et al. 2019).

Variance inflation factor (VIF)

The degree of multi-co-linearity or correspondence between the descriptors is measured by the Variance Inflation Factor (VIF), usually defined as (Eq. 11):

$$VIF=\frac{1}{(1-{R}^{2})}$$
(11)

where: R2 is the correlation coefficient of the multiple regression between the variables within the model. VIF value of 1 indicates no inter-correlation exists for each variable, for VIF in the range of 1–5, the related model is acceptable; and if VIF is greater than 10, the related model is unstable and unacceptable (Abdullahi et al. 2019).

Evaluation of the model applicability domain

Evaluating the applicability domain (AD) of a QSAR model is important to ascertain the reliability and robustness of the built QSAR model. AD provides one the chance to estimate the uncertainty in the prediction of compounds based on their similarity with the training set compounds, used in the model building (Tropsha et al. 2003). The leverage approach was used to describe the AD of the developed model. The leverage (h) of a particular chemical compound is defined thus (Eq. 12):

$$h=X({X}^{T}X{)}^{-1}{X}^{T}$$
(12)

where: X = m × k descriptor matrix of the training set compound, XT = transpose matrix of X.

The warning leverage (h*) which is the range of values used to check for influential molecule or outlier is defined below (Eq. 13):

$${\mathrm{h}}^{*}=3\frac{(j+1)}{m}$$
(13)

where: m = number of training set compounds, j = number of descriptors in the model.

A plot of the standardized residuals against leverages otherwise called the William’s plot was used to evaluate the significant area in the model’s chemical space. As a rule, compounds which fall within this area on the plot are the approved predicted compounds (Adeniji et al. 2020; Veerasamy et al. 2011).

Ligand based drug design

The ligand based drug design approach was adopted in designing Six (6) new pyrazolopyrimidine analogues basically by deletion, substitution and insertion of substituent(s) into the template structure (43) based on the information provided by the molecular descriptors (Majorly GATS2v, GATS6s, ATSC4e and GATS8s) (Adeniji et al. 2020). The newly designed compounds and reference drug used in this study (doxycycline) were prepared in four steps as earlier reported under ‘molecular geometry optimization’ section; drawing of chemical structures, energy minimization, minimization by MMFF, and optimization by DFT approach.

Prediction of pharmacokinetic properties

Pharmacokinetics properties prediction constitute an absolutely necessary stage in drug discovery’s early phase because only molecules with good drug-likeness properties and excellent ADMET profiles advance into the pre-clinical research phase (Lawal et al. 2021). Hence, Two (2) pyrazolopyrimidine derivatives (16 and 43) with better inhibitory activities and lower residual values alongside the newly designed compounds (A1A6) were investigated for their drug-likeness and ADMET properties using the online web servers; http://www.swissadme.ch/index.php and http://biosig.unimelb.edu.au/pkcsm respectively. The Lipinski’s rule of five (RO5) is a widely used criterion for oral bioavailability. Hence, the tested compounds would be assessed for oral bioavailability using the RO5 criteria (Lipinski et al. 2001).

Molecular docking study

The newly designed compounds and the reference drug (doxycycline) were docked onto the receptor (OTU deubiquitinase)’s binding pocket using the iGEMDOCK software, while using Biovia Discovery Studio Visualizer to analyze the resulting protein–ligand interaction profiles (Ibrahim et al. 2021; Kumar et al. 2016).

Results and discussion

QSAR study

A theoretical study (QSAR) was conducted on fifty two (52) pyrazolopyrimidine derivatives, in order to establish a quantitative relationship between their structures and their anti-wolbachia activities. The built models were subjected to both internal and external validation tests, in which model 3 (Eq. 14) best satisfied the requirement for a good QSAR model. Table 2 described the various descriptors used in the model, while the experimental and predicted activity values together with their residual values for pyrazolopyrimidine derivatives were presented in Table 3. Also, the predicted activity values were plotted against those of experimental activity for both training and test sets and presented in Fig. 2. A further plot of standardized residual against experimental activities was obtained and presented in Fig. 3. In order to ascertain the stability, robustness, reliability and predictive power of the built QSAR model, internal and external validation tests were conducted, and the results presented in Table 4.

Table 2 Selected descriptors used in the QSAR model
Table 3 Experimental, predicted and residual values of pyrazolopyrimidine derivatives
Fig. 2
figure 2

Plot of predicted pEC50 against Experimental pEC50 for training and test sets

Fig. 3
figure 3

Plot of standardized residual against experimental pEC50 for training and test sets

Table 4 Validated parameters of the QSAR model

A combined GFA and MLR approaches led to the selection of seven (7) descriptors, and generation of four (4) QSAR models respectively. Model 3 (Eq. 14) was found to best satisfied the requirement for a reliable QSAR model. The low residual values between the experimental and predicted activities as shown in Table 3 imply that the model has a high predictive strength. The R2 values of 0.8104 and 0.750 for training set and test set respectively as obtained from Fig. 2 compare perfectly well with those obtained from GFA (0.8104 and 0.7501) and MLRplusValidation analysis (0.8104 and 0.7501) as reported in Table 4. The grouping together of points along the line of best fit in Fig. 2 shows that the experimental and predicted activity values are well correlated, indicating that the built model is reliable and robust. The random spread of standardized residuals on both sides of zero in Fig. 3 is an indication that the built model is free of any systematic error.

$$ \begin{aligned} pEC_{50} & = - 1.067945613*ATSC4e - 4.379278216*AATSC4s \\ & + \;4.035879647*GATS2v + 0.922556367*GATS8e \\ & + 1.109246296*GATS6s - 0.097883280*nsSeH + 0.696906031*SssSnH2 - 0.253197268 \\ \end{aligned} $$
(14)

Additionally, Pearson’s correlation statistical analyses were performed on the values of all seven descriptors in the built QSAR Model and the results were reported in Table 5. Another significant validation test is the Y- Randomization test, which was also performed and the result presented in Table 6. The low correlation coefficients (less than 0.50) which exist between each descriptor in the built model (Table 5) indicate no inter-correlation between each descriptor. The Variance Inflation Factor (VIF) for all 7 descriptors has values ranging from 1 – 5, an indication of the stability and acceptability of the built model (Table 5). The absolute t-statistics of each descriptor is greater than 2, showing that the selected descriptors were good (Adeniji et al. 2018). Also, Table 5 shows that the evaluated p values at 95% confidence level for all descriptor were less than 0.05. This means that the alternative hypothesis which posits that there is a relationship between inhibitory activities and the descriptors holds. Additionally, the values of the Mean Effect (ME) reported in Table 5 provide vital information on the effect and degree of each descriptor’s contributions in the model. The magnitudes and signs of ME values signify their respective strength and direction on the molecules’ inhibitory activities. All the descriptors except nsSeH have positive ME, indicating that increasing or decreasing their values will lead to a corresponding increase or decrease in the anti-proliferative activities respectively. Increasing the values of nsSeH on the other hand will lead to a decrease in the anti-wolbachia activities. GATS2V with the highest ME value has the greatest influence on the molecules’ inhibitory activities. GATS2v which is the Geary autocorrelation—lag 2/weighted by van der Waals volumes, which has a positive ME is suggested to contribute positively to anti-wolbachia activity. It measures the strength of the relationship between van der Waals volumes of two atoms in a molecule that are two bond apart (Adawara et al. 2020).

Table 5 Pearson’s correlation and statistical analyses of descriptors used in the QSAR model
Table 6 Y-Randomization test parameters

The low values of R2 and Q2 obtained from the random reshuffling (Table 6) inferred that the built model is stable, robust and reliable. The value of coefficient for Y-randomization, cR2p (0.747636) greater than 0.50, supports the claim that the built model is powerful and not inferred by chance.

The scatter plot of the standardized residuals versus the leverages (William’s Plot) obtained to ascertain the model’s applicability domain is as shown in Fig. 4. The William’s plot clearly shows that all the compounds falls within the square area ± 3 of standardized cross-validated residual. It can therefore be inferred that no outlier is present in the data set. However, eight compounds (2, 3, 4, 5, 7, 9, 23 and 42) were found with leverage values greater than the calculated warning leverage (h* = 0.67), and are said to be influential molecules.

Fig. 4
figure 4

The plot of standardized residuals against the leverage values (William’s plot)

Consequently, compounds 16 and 43 with relatively higher predicted inhibitory activities of 7.953 and 7.558 respectively (Table 3), having also contained within the model’s applicability domain space (Fig. 4), were subjected to drug-likeness test for possible selection as lead molecule for designing new prominent analogues.

Ligand-based drug design

The molecular structure of the lead compound (43) and the structural template are presented in Fig. 5A, B, while predicted activities of the newly designed compounds are presented in Table 7.

Fig. 5
figure 5

A Molecular structures of the lead compound B Structural template for the ligand-based design

Table 7 Predicted pEC50 of the newly designed pyrazolopyrimidine compounds

One of the objectives of the ligand-based design is to be able to design new molecules with better inhibitory activities than their template molecule. Here, the predicted pEC50 values of the designed compounds were higher than that of the template molecule (43) in the order: A4 (8.9601) > A1 > A6 > A5 > A3 > A2 > 43 (7.5581) as shown in Table 7. It therefore affirmed that the various structural modifications of the template structure were based on the information provided by the molecular descriptors in the built QSAR model.

Pharmacokinetics properties prediction

Results of the pharmacokinetics investigation conducted on 16, 43 and the six (6) newly designed compounds were presented in Table 8, 9, while Fig. 6a, b shows the oral bioavailability radar of 16 and 43.

Table 8 Predicted drug-likeness properties of selected and newly designed compounds
Table 9 Predicted ADMET properties of the newly designed compounds
Fig. 6
figure 6

A Oral bioavailability radar of 16; B Oral bioavailability radar of 43

The Lipinski’s rule for oral-bioavailability states that a drug molecule is more likely to have poor absorption or permeation when it has Hydrogen Bond Donors (HBD) of greater than 5, Hydrogen Bond Acceptors (HBA) > 10, Molecular Weight (MW) > 500, and lipophilicity (MLOGP > 4.15 or WLOGP > 5) (Lipinski et al. 2001). Molecules that satisfy at least three out of the four requirements are said to obey the Lipinski’s rule for oral-bioavailability (Lawal et al. 2021). As shown in Table 8, all tested molecules perfectly obeyed the Lipinski’s rule by showing no violation. Also, the reported values of Topological Polar Surface Area (TPSA) for all molecules are less than 140 Å2. Additionally, the synthetic accessibility scores of all tested molecules are in the easy portion (˂ 5.00), indicating easy laboratory synthesis. Notwithstanding the relatively higher inhibitory activity of compound 16, ligand 43 was the preferred lead molecule, because it possessed a more suitable physico-chemical properties for oral bioavailability as shown from the oral bioavailability radar in Fig. 5. The estimated water solubility (Log S) ranges from moderately soluble (16, 43, A3 and A4) to soluble (A1, A2, A5 and A6). All compounds showed no pains and brenk alerts except A6 which showed 2 structural alerts due to the presence of N – C – Halogen and 3-membered heterocycle moieties.

The predicted ADMET properties in Table 9 showed that, the Human Intestinal Absorption (HIA) was high (> 90%) for all newly designed compounds. Skin permeability is a significant consideration for the development of transdermal drug delivery. A skin permeability constant LogKp of greater than − 2.50 is an indication of low skin permeability. Consequently, all the newly designed compounds have LogKp of less than − 2.50, showing good skin penetration ability. Also, all the tested compounds are non-substrates of P-glycoprotein, an enzyme which acts as a biological barrier by extruding toxins and xenobiotics, including drugs out of cells, while A3 and A4 are inhibitors of both P-glycoprotein I and II, indicating that these molecules may easily mediate to reach their target sites with little or no resistance from P-glycoprotein. For a drug molecule to penetrate the Blood–Brain Barrier (BBB) and Central Nervous System (CNS), it is recommended that the logarithmic ratio of brain to plasma drug concentration (logBB) be greater than -1 and the blood–brain permeability-surface area product (logPS) be greater than -3 respectively. Consequently, all the newly designed compounds showed logBB of greater than − 1, an indication that these molecules cross the BBB. However, all the molecules showed very poor CNS permeability i.e. logPS ˂ − 3. Furthermore, Cytochrome P450 enzymes are important detoxification enzymes in the body which oxidize xenobiotics to facilitate their excretion. The two major isoforms responsible for drug metabolism, CYP-34A and CYP-2D6 were reported. Only A3, A4 and A6 are substrates of CYP-3A4. No substrates of CYP-2D6 and no inhibitors of CYP-34A and CYP-2D6. The degree of drug elimination from the body is measured by the drug’s total clearance, which is within the accepted range for these newly designed compounds. All molecules showed a negative AMES toxicity, indicating that they are non-mutagenic and cannot act as carcinogen. Additionally, the predicted values of Maximum Recommended Tolerated Dose (MRTD) for all molecules were included in Table 9. MRTD value of less than or equal to 0.477 log (mg/kg/day) is considered low, and high if greater than 0.477 log (mg/kg/day). The overall drug-likeness and ADMET properties showed good pharmacokinetic profiles for these molecules. Therefore, the newly designed molecules except A6 (with 2 structural alerts) could be considered as potential drug candidates for the treatment of lymphatic filariasis and onchocerciasis.

Molecular docking study

The 3D structures of the receptor and ligand A1 were presented in Fig. 7A, B; while results of the docking study conducted between the target receptor (OTU deubiquitinase) (Fig. 8), the newly designed compounds, and the reference drug doxycycline (Fig. 7) were presented in Table 10 and Figs. 9, 10, 11, 12, 13, 14, 15.

Fig. 7
figure 7

A 3D structure of prepared receptor (OTU deubiquitinase) B 3D structure of prepared ligand (A1)

Fig. 8
figure 8

Molecular structures of doxycycline

Table 10 Predicted binding interaction profile of designed compounds with OTU deubiquitinase
Fig. 9
figure 9

2-D and 3-D view of the interaction between OTU deubiquitinase and A1

Fig. 10
figure 10

2-D and 3-D view of the interaction between OTU deubiquitinase and A2

Fig. 11
figure 11

2-D and 3-D view of the interaction between OTU deubiquitinase and A3

Fig. 12
figure 12

2-Dand 3-D view of the interaction between OTU deubiquitinase and A4

Fig. 13
figure 13

2-D and 3-D view of the interaction between OTU deubiquitinase and A5

Fig. 14
figure 14

2-D and 3-D view of the interaction between OTU deubiquitinase and A6

Fig. 15
figure 15

2-D and 3-D view of the interaction between OTU deubiquitinase and Doxycycline

All tested molecules bind well into the target site cavity in the order; A1 (87.32 kcal/mol) > A3 > A6 > A2 > A4 > doxycycline > A5 (− 78.70 kcal/mol) as reported in Table 10 and Figs. 9, 10, 11, 12, 13, 14, 15, indicating that the newly designed compounds with the exception of A5 bind more strongly to the protein target (OTU deubiquitinase) than the reference drug doxycycline. As seen from Table 10 and Fig. 915, A1 was observed to have interacted well with the binding site of the OTU deubiquitinase receptor through five (5) conventional hydrogen bonds, one (1) carbon-hydrogen bond and one (1) π-donor hydrogen bond. The hydrophobic interactions include one (1) π -anion, four (4) π-alkyl, one (1) alkyl, two (2) π–π T-shaped, and one (1) halogen interactions. The hydroxyl group on the pyrimidine ring system formed 3 conventional hydrogen bonds; two with ASP-177 at distances of 2.55 Å and 2.77 Å, and one with ASP-175 at a distance of 3.30 Å. The Nitrogen atom of the pyrazole ring system and the linker Nitrogen between the pyrimidine and pyridine ring systems formed one conventional Hydrogen bond each with VAL-174 at distances of 3.10 Å and 2.68 Å respectively. Other hydrogen bond interactions are carbon hydrogen bond with LYS-173 at a distance of 3.30 Å and π-donor hydrogen bond with TYR-176 at a distance of 2.86 Å. Also observed was a halogen interaction between one of the fluoro groups on the pyrrolidine ring system and ASN-36 at a distance of 3.69 Å. Others are hydrophobic interactions with PRO-39 (alkyl and π-alkyl), LYS-173 (π-alky), TYR-176 (π–π T-shaped) and ASP-175 (π-anion).

In general, the various ligands were observed to make very close contacts with great number of amino acid residues including Hydrogen bonding and hydrophobic interactions, which are two very significant interaction types in drug-receptor binding as shown in Figs. 9, 10, 11, 12, 13, 14, 15. Unlike the newly designed compounds with a fair combination of both hydrogen bonding and hydrophobic interactions with the receptor, the interactions of doxycycline with the target protein were predominantly hydrogen bonding, and having only one hydrophobic interaction (pi-anion) with the aspartic acid group (ASP) at position 116 of the target receptor. Imberty et al. (1991) reported that hydrogen bond can be classified as strong or weak based on the distance between hydrogen donor and hydrogen acceptor (dis(D-A)) as follows; 2.5 Å ˂ dis(D-A) ˂ 3.1 Å (strong hydrogen bond) and 3.1 Å ˂ dis(D-A) ˂ 3.55 Å (weak hydrogen bond). Consequently, most of the Hydrogen bond distances of the new compounds with the receptor, indicate strong Hydrogen bond interactions with the respective amino acid residues, while doxycycline showed weak hydrogen bond interactions with threonine (THR) at position 109, serine (SER) at position 121 and ASP at position 116. This clearly indicates how well the newly designed compounds bind with OTU deubiquitinase, an essential protein for the survivability of bacteria Wolbachia.

Conclusions

In this study, four (4) QSAR models were developed with a series of fifty two (52) pyrazolopyrimidine derivatives as anti-wolbachia agents, amongst which Model 3 best satisfied the requirement for both internal and external validation tests. The model was used to excellently predict the anti-wolbachia activities of the various compounds, including the newly designed analogues. Compound 43 was selected as lead molecule ahead of compound 16 as a result of its relatively better drug-likeness properties. All the newly designed compounds showed good pharmacokinetic properties with no violation of the Lipinski’s RO5, are orally bio-available, and as well as skin permeable. The molecular docking results showed stronger binding affinities between OTU deubiquitinase receptor and all newly designed molecules than the chosen reference drug (doxycycline) with the exception of A5, an indicative of good protein–ligand binding interactions. Hence, these new molecules have demonstrated the potential to arrest Wolbachia OTU deubiquitinase, thereby cutting down chances of the bacteria survival, and which in turn affects the growth and viability of the filarial worms (causative agents for lymphatic filariasis and onchocerciasis). These new compounds could therefore be developed as potential drug candidates for the treatment of lymphatic filariasis and onchocerciasis. More so, laboratory tests (in vitro and in vivo) could be conducted to validate the computational results.