INTRODUCTION

Efforts to improve the speed of bringing new drugs to patients are sorely needed, as highlighted by the FDA’s Critical Path Initiative. For example, since its introduction in 1995, the Biopharmaceutics Classification System (BCS) has had a significant impact on the drug regulatory process and practice. For an immediate release orally active dosage form, the rate and extent of its absorption is determined by its aqueous solubility and permeability in the gastrointestinal tract. The BCS therefore represents a new robust model for bioequivalence studies based on physiological parameters and physicochemical properties of drug molecules. The BCS as adopted by the World Health Organization (WHO), classifies the drug molecules listed on the essential medicines list (EML) based on their solubility and permeability characteristics into four different classes (Fig. 1). Accordingly, certain drug classes can be considered for a biowaiver, i.e. approval of products based on their in vitro drug dissolution tests instead of their human bioequivalence data, a costly task for drug manufacturers. Such waivers significantly improve the speed and decrease the cost of bringing orally administered therapeutics to market. Currently, the BCS system allows a waiver of in vivo bioequivalence testing of immediate-release solid dosage forms for class 1 drugs (1). Whereas waivers for class 3 drugs are recommended only based on scientific justifications (2,3).

Fig. 1
figure 1

a The Biopharmaceutics Classification System (BCS) as defined by the FDA after Kasim et al. (4). b Biopharmaceutics Drug Disposition Classification System (BDDCS) proposed by Wu and Benet where major route of elimination (metabolized vs unchanged) serves as the permeability criteria.

Drug classification according to BCS requires knowledge of solubility and permeability data. The determination of drug permeability is typically based on experimental permeability data or well-defined mass balance studies. This information is available only for a small fraction of EML listed drugs (4). A biowaiver currently can be requested for orally active immediate-release dosage forms (≥85% release in 30 min), containing drugs with high solubility over the pH range 1 to 7.5 (dose/solubility ratio <250 ml) and a high permeability (fraction absorbed ≥90%), provided excipients used in the formulation do not interfere with the drug absorption process. Drugs with narrow therapeutic range and drugs designed to be absorbed from the oral cavity may not be considered for biowaivers (5). Thus, the central idea of the BCS classification system is to predict in vivo pharmacokinetic performance of drug products from in vitro drug solubility and permeability characteristics (6).

More recently, Wu and Benet (6) extensively examined about 167 BCS classified drugs. They aptly noticed that pharmacokinetic considerations like effects of food, absorptive transporters, efflux transporters, and routes of elimination (renal/biliary) were important determinants of overall drug absorption and bioavailability for immediate release oral dosage forms. Thus, they suggested that classifying molecules based on the extent of metabolism is less ambiguous as compared to permeability or extent of absorption. This classification may also increase the number of class 1 drugs that would become eligible for biowaivers (7). The BDDCS, like BCS, proposes to classify drug molecules into four classes (Fig. 1), defining the extensive metabolism criterion as ≥50% (±10%) metabolism of an oral dose in vivo in humans. Based on this criterion, a few drugs that were previously BCS class 1 were reclassified as BDDCS class 3 and thus would not be eligible for biowaivers. A study by Takagi et al. (7) observed at least eight drugs in the BDDCS class 1 were eligible for biowaivers. Considering its overall significance, the BDDCS approach could be helpful in successfully classifying drugs in class 1, thereby increasing their eligibility for biowaivers.

A challenge for both BCS and BDDCS is the actual classification of drugs based on the required in vitro data for metabolism, solubility or permeability. However, there has been considerable research over the last decade on computational or in silico methods for prediction of absorption, distribution, metabolism and excretion (ADME) (8,9). The objective of the present study was to enable simple and fast BDDCS classification by developing computational classification models predicting BDDCS class from molecular properties. Computational models were developed based on data for 165 drugs as a training set based on BDDCS data (6). To further test and challenge our models, we have retrieved an additional 56 drugs listed in the WHO EML that were not previously classified under BDDCS but with ample literature data available to enable classification.

MATERIALS AND METHODS

Drug List

A training set of 165 drugs for computational model building was obtained from the published literature (6). An additional set of 56 drugs, that were not included in the original BDDCS, were retrieved from the WHO EML publication and were used to challenge the computational models (6,10). This collection was subsequently employed as a test set upon classification according to BDDCS criteria. Classification was established based on an extensive literature survey of drug disposition data as well as individual physicochemical parameters described below.

Solubility Definition

Drug solubility data for classification purposes was obtained from standard references (1113) and expressed in mg/ml. Where solubility data were not available or undefined, guidelines were taken from Kasim et al. (4). Maximum dose strength data was obtained from WHO Essential Medicines core list and expressed in milligrams (10).

Dose Number calculations

The dose number (D 0) was calculated using (14):

$$D0 = \frac{{{\left( {{M_0} \mathord{\left/ {\vphantom {{M_0} {V_0}}} \right. \kern-\nulldelimiterspace} {V_0}} \right)}}} {{Cs}} $$
(1)

Where M 0 is highest dose strength (mg), C s is the solubility (mg/ml) and V 0 is 250 ml.

BDDCS Classification of Compounds

Drug disposition data (Table I) for 56 previously unclassified drugs was obtained from an extensive literature search. Aqueous solubility for each therapeutic drug class was obtained from the Merck Index (14th Edition) and other pertinent literature references (11). Pharmacokinetic data such as plasma half-life, bioavailability, P-glycoprotein (P-gp) affinity, Cytochrome P450 affinity and extent of metabolism were obtained from the literature by MedLine searching using a combination of descriptive keywords and Boolean operators. Additionally, web sources such as DrugBank (15) and http://www.drugs.com were used. Based on this collective information, the drug molecules were assigned to a BDDCS class.

Table I Drug Disposition Data for Drugs Listed in the World Health Organization’s Essential Medicines List

Computational Modeling

Data Collection and Molecule Building

The current dataset comprises 221 drug molecules collected from various literature sources. Molecules were downloaded from PubChem (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi) as 2D SMILES strings which were converted to SDF format and imported into Chem3D Ultra (CambridgeSoft, Cambridge, MA) to generate MOL2 files. The molecules were subsequently energy minimized in SYBYL v.7.1 (Tripos Associates, St. Louis, MO) using the Tripos force field (16) and Gasteiger–Hückel charges with distance-dependent dielectrics and the conjugate gradient method with a convergence criterion of 0.001 kcal/mol.

Descriptor Calculations

The number of hydrogen bond donors and hydrogen bond acceptor groups were calculated with ChemDraw Ultra 8.0 (CambridgeSoft, Cambridge, MA); clogP and polar surface area (PSA) were calculated using Sybyl v.7.1. One hundred VolSurf descriptors (17) were calculated from 3D molecular fields using VolSurf 4.0 implemented in SYBYL. Five different probes including water (OH2), carbonyl oxygen atom (O), amphipathic (BOTH), carbonyl oxygen atom (O::), and Sp2N with lone pair (N:=) probes were used for descriptor calculation. Volsurf descriptors include descriptors for size, shape, hydrophilic and hydrophobic regions, interaction energy amongst other descriptors. A total of 149 Molconn-Z descriptors (2D topological) were generated (18), including κ-molecular shape indices, topological state, shape, Wiener and Shannon indices.

Model Building and Validation

Recursive Partitioning (RP) calculations were performed using the rpart module of the R package (19). RP can be used to mine large data sets in order to uncover hidden patterns within data and assign appropriate class. RP attempts to determine the relationship between a set of dependent (X) and independent variables (Y) using the simple mathematical function, Y = f(X). The result of RP is a “tree” or “decision tree” or “graph”. The data is divided (partitioned) into nodes (branches), where data with similar properties tend to occupy the same node. A tenfold cross-validation study was performed on the training set.

The R program was also used for random forest (RF) calculations (20). The total number of trees was set to 1000. The other optimizable parameter in the random forest approach is m try, i.e. the number of descriptors (p) randomly sampled as candidates for splitting at each node. When m try equals the number of descriptors (m try = p) this is commonly termed “bagging.” The number of descriptors was increased systematically with an increment of 5. In general, the so-called “out of bag error” (OBB) estimate can be considered equivalent to a cross-validation study. In OBB, one third of the compounds are randomly selected as a test set and a model is developed from the remaining compounds. The optimum m try was chosen such that %OBB is minimum. Thus, a lower %OBB indicates a higher accuracy of the model.

The Kernlab package in R was used for generating support vector machine (SVM) models. The scaling of the training and test set descriptors avoided domination of any descriptor with a large numerical value in the final SVM model. The two optimizable parameters in the radial basis function (RBF) kernel are C and sigma. The average value obtained from the automated optimal sigma calculation (sigest) method in R was used. The value of C was determined using k-fold cross validation (k = 10). The corresponding value of C with the lowest cross-validation error was then used for modeling. In k-fold cross validation the entire dataset is divided into k subsets of almost equal size. The model is trained using the k − 1 subset and the remaining subset is then used as prediction set. The advantage of k-fold cross validation is that the entire dataset is eventually used for both training and testing.

Six different models were generated for each method utilizing different combinations of molecular descriptors. The following descriptor combinations were used: model 1 = ChemDraw (CD) and VolSurf (VS); model 2 = CD, VS, clogP and polar surface area (PSA); model 3 = CD and MolConnZ (MZ); model 4 = CD, MZ, clogP and PSA; model 5 = CD, VS and MZ; model 6 = CD, VS, MZ, clogP and PSA.

Consensus Analysis

The consensus analysis for the test set was performed using predicted data from all three computational modeling methods. Three different rules, namely arithmetic mean, the harmonic mean and the median were applied for the generation of consensus classes. A total of 4 models were generated using each rule. Model 1 = best model from RP, RF and SVM; model 2 = best model from RP, and RF; model 3 = best models from RP and SVM; model 4 = best model from RF and SVM.

Validation Metrics

Three metrics were considered in evaluating model predictive performance. The three metrics are denoted absolute accuracy, consumer’s accuracy, and producer’s accuracy, with the latter two novel metrics attempting to consider the viewpoints of the consumer and the producer of the model. The percent accuracy is the percent of all drugs correctly predicted. In applying percent accuracy to a particular class, absolute accuracy is the percent of all drugs in the class that are correctly predicted to be in that class. Absolute accuracy carries no emphasis for avoiding one type of error over another. However, in practice, it is well appreciated that type I and type II errors represent different categories of inaccuracy. The selection of a final model for each method was based on percent accuracy of the test set. The percent accuracy for each class was calculated using:

$${\text{Percent accuracy = }}\frac{{{\text{True Predictions}}}} {{{\text{True Predictions + False Predictions}}}} $$
(2)

Generally, a type I error can be described as a “false positive” error and a type II error as a “false negative” error. Further a type I error occurs when a predicted class is biopharmaceutically more favorable than the true class and was denoted as a “consumer risk.” In order to reflect this perspective in a metric to evaluate model predictive performance, the algorithm was developed to denote “consumer’s accuracy”. In a type II error, a predicted class is biopharmaceutically less favorable than the true class; an error of this type can be denoted as a “producer risk”. In order to reflect this perspective, “producer’s accuracy” was devised.

RESULTS

BDDCS Classification of the Test Set

To validate and challenge the computational models generated from the Wu and Benet data set, a collection of 56 drugs was retrieved from the WHO EML that had not been listed previously under the original BDDCS list (1,6). Drug disposition data was available for 56 drugs after an extensive literature survey (Table I). Interestingly, the collection of 56 drugs extracted from WHO-EML contained 26 compounds that had been classified ambiguously (Table I, column 11) by Lindenberg et al. (5) under BCS. Furthermore, 31 of the drugs in this collection were previously unclassified (Table I, column 12) under BCS by Kasim et al. (4), while 4 drugs were classified ambiguously in this 2004 report. Due to the relatively large number of compounds shared between our test set collection and the compound list classified by Lindenberg and colleagues, we used their BCS classification for comparison against our present BDDCS classification (1,5). Importantly, some of the drugs displayed a shift in BCS class to a new BDDCS class. For example, doxycycline is classified as a BCS class 1 drug, but its drug disposition data, featuring >50% unchanged urinary clearance, and low aqueous solubility would indicate class 4 under BDDCS. This classification is consistent with observations made by Wu and Benet for BDDCS class 4 drugs, which are mainly eliminated, unchanged via biliary or renal routes (6). Eight drugs, namely acetylsalicylic acid, benznidazole, biperidine, methyldopa, nifurtimox, penicillamine, penicillin V and thiamine, that were classified previously as BCS class 3 drugs were reclassified as BDDCS class 1 drugs. As an example, benznidazole has high aqueous solubility at the dose administered, 96% bioavailability and extensive cytochrome P450 metabolism, clearly justifying its classification under BDDCS class 1 (21,22). Similar disposition characteristics rationalize the reclassification of the other seven therapeutics.

Glibenclamide, an oral antidiabetic drug with low solubility (0.01 mg/ml) at the dose administered (5 mg) has been categorized in BCS class 4. However, glibenclamide is a confirmed substrate for both P-gp (23) and cytochrome P450 (24) and should thus be classified within BDDCS class 2. On similar grounds, a BCS 4 to BDDCS 2 class shift can be justified for mercaptopurine (25,26), retinol palmitate (27) and sulfasalazine (28,29). Thus, after extensive literature referencing the 56 test-set drugs from the BCS list were reclassified according to the BDDCS guidelines. This classification was then used to test the computational models that were built using 165 training set drugs obtained from the original BDDCS list (6). The resulting class distribution data for training and test set molecules (Table II) demonstrates equal compound allocation across classes 1–3; however, the percentage of molecules classified within class 4 in both data sets is less than 10%. This low frequency distribution of class 4 compounds is likely to affect both model generation and predictive confidence. Therefore, caution should be used in the interpretation models of the ensuing models with regard to class 4 compounds. The probability of randomly selecting a training set class 1, 2, 3 or 4 drug is 36.4% (60/165), 30.9% (51/165), 25.5% (42/165) and 7.3% (12/165), respectively. In contrast, the random probability for selecting a test set class 1, 2, 3 or 4 drug is 46.5% (26/56), 19.6% (11/56), 25% (14/56) and 8.9% (5/56), respectively.

Table II Class Distribution of Drugs in Training and Test Sets

Model Generation

Recursive Partitioning

To guide splitting criteria and optimize decision tree induction a tenfold cross-validation was performed on the training set data. A total of 6 descriptive models were generated with similar average training class accuracy (67, 70.1, 65.7, 66.1, 68.2, and 68.2% respectively). As expected, prediction of class 4 molecules became the defining criterion for a successful model (Tables III and IV). In fact, only model 1, based on VolSurf descriptors and the number of hydrogen bond donor and acceptor atoms per molecule, was capable of designating appropriate node splitting criteria to determine class 4 compounds. The best model can correctly identify 66.7% (40/60) of the compounds in class 1, 94.1% (48/51) of the compounds in class 2, 73.8% (31/42) of the compounds in class 3, and 33.3% (4/12) of the compounds in class 4 (Table V). However, the average performance (33.1%) on the test set is unsatisfactory.

Table III Type I and Type II Errors in Class Prediction
Table IV Confusion Matrix for Training and Test Set using Different Machine Learning Methods
Table V Percent Accuracy of Training and Test Set Molecules

Nevertheless, some simple rules and criteria can be obtained from the decision tree in Fig. 2. The descriptors which are important for BDDCS classification are W1, W3, HB1 and HB7 resulting from an sp2 hybridized nitrogen probe containing one lone pair (N:=), W6 from an sp2 carboxyl oxygen atom probe (O::), and W3 and W6 from a water probe (OH2). In general, W1 and W3 account for polarizability and dispersion forces within a molecule, whereas W6, HB1, and HB7 represent polar and hydrogen bond donor and acceptor regions (recorded with different probe atoms). HB1 and HB7 is calculated as the difference between the hydrophilic volumes between water (OH2) and the (N:=) probe.

Fig. 2
figure 2

Recursive Partitioning tree for model 1. W1.N., W3.N., HB1.N. and HB7.N. are descriptors arising from sp2 nitrogen with one lone pair probe; W6.O. from sp2 carboxy oxygen atom probe, W3.OH2 and W6.OH2 from water probe.

Random Forest

Two metrics were used as a measure of prediction accuracy for all models, namely the percent accuracy for predicting external test set molecules and the out-of-bag (OBB) estimate. Again, as in recursive partitioning, six random forest models were developed and the best model (model 3, based on ChemDraw and Molconn-Z descriptors) was selected based on OBB error rate convergence. The overall OBB error ranged from 35.1 to 42.4%. In general, the technique of “bagging” (i.e. m try = p) did not improve the predictive ability of the models (as compared to models with m try < p). Again, predictive ability is poor for class 4 drugs as indicated by 25 and 0% accuracy for training and test set compounds, respectively. However, prediction accuracy for test set molecules in classes 1 and 3 is significantly better than those obtained with recursive partitioning, exhibiting a 1.5 to twofold increase over random selection.

Support Vector Machine

The level of training error tolerated is controlled by the parameter C and models were generated using C values of 0.1, 0.5, 1, and 10 up to 100 (with 10 point increments). The optimum value of C, 40, was determined using k-fold cross-validation studies on the training set data (k = 10). Models 1, 5 and 6 were able to correctly identify all the molecules in their respective classes (100% accuracy); however, based on prediction accuracy on the test set data, the performance of model 1 is superior to that of models 5 and 6, largely due to problematic prediction of class 4 compounds. Models 3–5 failed to predict class 4 compounds altogether in the test set (i.e. 0% accuracy), whereas models 1, 2 and 6 displayed 20% prediction accuracy (data not shown). Overall, the predictive performance of the SVM models is significantly better compared to RP and RF models (Table V). Despite its outstanding internal consistency, the test set prediction for class 1 is actually inferior compared to the other methods, misclassifying 15 out of 26 compounds as class 3. On the other hand, only 3 out of 14 compounds in class 3 were erroneously predicted as class 1. Intrinsic to its algorithm, support vector regression may overfit data within the training set, thereby incurring a performance penalty in predicting test set molecules.

To ascertain the distribution of molecules in descriptor space, a principal component analysis (PCA) was performed on descriptors from the best SVM model, i.e the model including CD and VS descriptors. The PCA score plot provides an estimate of the descriptor space of training and test molecules (Fig. 3). The first three principal components of the training and test set can explain 69.3 and 69.8% of the variance, respectively. Test set molecules such as salbutamol, clomifene, folic acid and cefaxime are outside the descriptor space of training set molecules. Salbutamol, clomifene and folic acid are class 3 drugs and are accurately predicted by the SVM model. However, cefaxime is a class 4 drug predicted to be a class 3 compound by the SVM model. Consequently, PCA analysis provides a convenient method for identifying outliers or molecules that are far removed from the training set descriptor field, thus providing lower confidence in their predictions.

Fig. 3
figure 3

Principal component analysis score plot for training and set. Training and test set drugs are shown as open and filled symbols, respectively.

Consensus Analysis of Models

RF, RP, and SVM models can correctly predict 73.1% (19/26), 63.6% (7/11) and 78.6% (11/14) compounds in class 1, class 2 and class 3 respectively. Both RP and SVM models can be used for class 4 prediction however the accuracy is poor possibly due to the limited number of molecules in the training and test set. Along these lines we investigated consensus modeling approaches where combinations of different modeling methods are used (30). The consensus model resulted in significant improvement for predicting class 2 and 4 test set drugs. The prediction accuracy using arithmetic mean model 1 and 3 for class 2 and 4 is 81.8 and 40% respectively (results not shown). However, consensus modeling does not improve prediction accuracy across all classes. The prediction accuracy for class 3 is generally worse than for the individual models. However, the prediction accuracy for class 2 and 4 drugs is higher than individual models, provided a combination of consensus models is used.

Validation Metrics

The three metrics, absolute accuracy, consumer’s accuracy and producer’s accuracy denoted a model’s predictive performance. Percent accuracy is the percent of all drugs correctly predicted (Eq. 2). Absolute accuracy is the percent of all drugs in the class that are correctly predicted to be in that class. Table III identifies the occurrence of type I (false positive error) and II (false negative error), depending on true class. Consumer’s accuracy builds upon the absolute accuracy, but further penalizes type I errors. Correspondingly, consumer’s accuracy de-empasizes type II errors and attenuates the impact of such errors to protect the interest of the consumer. Analogous to consumer’s accuracy, producer’s accuracy attempts to consider a particular viewpoint in assessing model predictions. Consumer’s accuracy emphasizes avoidance of “false positives,” whereas producer’s accuracy emphasizes avoidance of “false negatives”. Each matrix partially de-emphasizes the type of error that it is not focused on. For any real dataset of predictions, consumer’s accuracy and producer’s accuracy can be expected to differ, since type I and type II error rates are generally different. Consumer’s accuracy and producer’s accuracy will span absolute accuracy, except in the case where type I and II errors are identical, where all three metrics will be identical. In general, a drug producer and consumer may have differing needs for accuracy in the prediction of BDDCS classifications. The best SVM model in the present study illustrates that consumer’s accuracy for the test set is lower than its corresponding producer’s accuracy (Table VI), thereby leading to a higher risk of misclassification for the consumer.

Table VI Percent Consumer’s and Producer’s Accuracy of each of the Computational Models Used to Assign BDDCS Class

DISCUSSION

The BCS has been a helpful guide to classify compounds based on their aqueous solubility and gastrointestinal permeability (31). Wu and Benet (6) emphasize that the clinical impact of efflux transporters in modulating oral absorption and drug pharmacokinetics is most applicable to class 2, and possibly class 4 compounds. For example, high permeability allows facile cellular penetration for class 2 compounds, but low solubility (perhaps mainly due to high lipophilicity) will limit the effective concentration entering the cell, thereby preventing saturation of efflux transporters. Consequently, efflux transport can affect class 2 compounds’ extent of oral bioavailability and their rate of absorption (6). Thus, classification of compounds according to BDDCS guidelines may allow for a scientific basis towards their observed clinical behavior as a result of their interactions with P-gp and other efflux transporters. This, in turn, allows for a deeper understanding of their pharmacokinetic behavior and their potential for drug–drug interactions. Identifying molecules that interact with efflux transporters is important for drug discovery but is also generally reliant on time consuming in vitro and in vivo studies. However, computational models are now available to assist in this process, as we have recently shown for rapidly retrieving substrates or inhibitors for P-gp from commercial databases with in vitro validation (32).

The goal of the current study was to (1) investigate computational methods to produce predictive models; (2) automatically and rapidly classify compounds into BDDCS classes; and (3) use physicochemical properties derived from molecular descriptors alone. In applying these algorithms one can choose from a binary or a quaternary classification system with hard or soft class assignments. A binary system would independently assess the parameters solubility and metabolism for each compound and either uniquely bin the compounds in high/low categories (hard assignment) or use a gradated scale to plot the parameters; class assignment in the latter case would be determined by predefined criteria for low/high solubility and poor/extensive metabolism. The present study chose to apply a unified (quaternary) binning system with unambiguous (hard) class assignment. To accomplish this, we used machine learning methods as these algorithms have been widely used and validated with large datasets and are exceptionally suited to identify important properties and molecular descriptors from diverse arrays of data. In this study, we have captured a range of applicable descriptors for physicochemical properties, including easily interpretable descriptors determined by widely available chemical drawing software (e.g. ChemDraw) or web-based tools (e.g. PubChem or ChemSpider), as well as complex descriptor sets from commercial vendors such as MolconnZ (18) and VolSurf (17).

Among the best models in the current study, SVM model 1 revealed an exceptional level of training and test set prediction accuracy. Interestingly, ChemDraw and VolSurf descriptors are important for classifying class 2, 3 and 4 drugs, whereas a combination of ChemDraw and MolconnZ descriptors are useful for class 1 predictions. Thus, a combination of 2D and 3D descriptors are important for class 2, 3 and 4 drugs, whereas 2D descriptors alone are relevant for class 1 predictions. These observations would suggest that combinations of both models and descriptors may be necessary for optimal prediction of the different BDDCS classes.

Since this is the first report on predictive model development for BDDCS, a direct comparison with previously established computational models is not possible; however, we believe that a critical evaluation with respect to earlier BCS models, especially the study by Bergström et al. (33,34), is warranted. Although the overall prediction accuracies of the training and tests sets between the Bergström study and our models are comparable, there are several important differences that should be highlighted: (1) the Bergström study encompasses a very small number of compounds, presumably with limited chemical space; (2) PTSA is a conformation-dependent property requiring a 3D structure for calculations, thereby reducing the portability of the models between independent laboratories; and (3) solubility and permeability parameters are independently calculated. Interestingly, in the current study, models that included PSA or cLogP as a descriptor generally underperformed compared with models using alternative descriptors. This is in marked contrast to the work by Bergström et al. (34), who determined that PTSA could satisfy both drug solubility and permeability for BCS calculations. In agreement with our present data, however, they determined that cLogP could be excluded as a descriptor without the model losing predictive power. In fact, they found that the molecular surface areas alone contained sufficient information regarding lipophilicity. Analogous to this, it is likely that MolconnZ and VolSurf descriptors alone sufficiently capture lipophilicity to render cLogP redundant as a separate descriptor.

It is important to point out that most difficulty was encountered in predicting class 4 compounds. This was not entirely unexpected, since the number of training set molecules for class 4 compounds is small compared to the other classes. Inevitably, the models were trained primarily for classes 1–3 even when weighting is applied, leading to a disproportionate bias to predicting these classes. However, this may not be a major concern for the pharmaceutical industry because generally the number of compounds under development in classes 3 or 4 is low; for example, fewer than 10% of current compounds in the drug discovery pipeline of GlaxoSmithKline fall within BCS class 3 or 4 (35). Additionally, the bias towards accurately predicting class 1 and 2 compounds can be viewed as favorable in that these drugs may encounter fewer issues during the subsequent drug development process.

Options for improving the confidence in the computational models include the application of a combination of models. Additional computational methods such as k-Nearest Neighbor (kNN), Kohonen and Sammon mapping could be evaluated in the future alongside additional molecular descriptors and an enlarged training and test set with more examples of class 4 compounds. The utilization of Tanimoto similarity, PCA or other graphical mapping tools to assess the distance of a test set molecule from the training set will also aid in improving the confidence in predictions. We also foresee availability of these models to a global audience using web-based applications or their integration into existing database tools.

In summary, the present study represents a new development for rapidly assigning drugs to BDDCS classifications, providing useful additional insight into bioavailability aspects of a drug. This could have significant application in the drug discovery field to a priori identify molecules that may have future developability issues.