Introduction

Pharmacogenetics is a recent field of research that focuses on investigating the variability in drug effects due to genetic factors. Genetic variation occurs at many levels: drug absorption, distribution and metabolism, receptors for drug action and drug elimination. Single nucleotide polymorphisms (SNPs) have been identified which induce modifications in the pharmacokinetics (drug course through the body) or pharmacodynamics (drug efficacy and safety) of a given drug. SNPs have also been shown to modify bioavailability [1, 4] and decrease excretion [25] – sometimes inducing severe toxicity [10] – and they have been linked to drug efficacy [6, 13]. Thus, pharmacogenetics is the next step to providing individualised treatments.

Studies including pharmacogenetic data have become more numerous over the last few years. In an overwhelming majority of these studies, non-compartmental analysis (NCA) has been used to compare pharmacokinetic measurements, such as AUC (area under the curve) or maximum concentration between groups. This technique requires a large number of sampling points for every subject. On the other hand, modelling approaches can take advantage of sparse individual designs and can be used in patients with routine clinical data [26]; these more sophisticated approaches, however, are seldom used. One issue with these approaches is the method used for covariate selection and hypothesis testing, since detecting a gene effect can be thought of as a model selection problem. A wide variety of approaches have been proposed. The mainstream method consists in stepwise selection [17, 23], possibly following prior screening of relevant candidate covariates. The criterion for model selection is usually the likelihood ratio test (LRT), which is widely used to compare nested mixed-effect models. Tests assessing the statistical significance of the final parameters in the final model, such as the Wald test, can also be used as a selection criterion [26]. Other criteria can be used in model selection, such as the Akaike (AIC) or the Bayesian Information Criterion (BIC) [22]. Regardless of the method used, the clinical relevance is sometimes also assessed by examining the magnitude of the effects found.

In a previous paper, Verstuyft et al. estimated the AUC of digoxin, a probe for the activity of P-glycoprotein, in healthy volunteers using non-compartmental analysis and showed an increase in subjects carrying the TT genotype for the C3435T polymorphism of multi-drug-resistance 1 (MDR1) [38]. The objective of the present paper was to reanalyse the data in the investigation of Verstuyft et al. by a modelling approach, using three covariate model selection methods: (1) likelihood ratio tests; (2) backwards selection from a full model using Wald tests, which take into account potential correlations between covariates; (3) model selection using the BIC, which considers all the potential models. A related problem in covariate selection is that the false positive rate (type I error) of the tests has been shown to increase when the estimation methods rely on linear approximations to the likelihood [7, 40]. In the present work, we therefore use a recent estimation method (EM), the stochastic EM algorithm SAEM [20]. Although the three methods can be applied together with other estimation algorithms, SAEM allows the estimation of the likelihood without approximation, via stochastic simulation, and has been shown to have better statistical properties than linearised methods [33].

Materials and methods

Data

Pharmacokinetic data was collected from 32 healthy volunteers included in three pharmacokinetic interaction studies dealing with oral digoxin [38]. Seven subjects participated in a macrogol-digoxin interaction study [30], 12 in a grapefruit juice-digoxin interaction study [2] and 13 in a dipyridamole-digoxin interaction study [39]. The three studies were performed in accordance with the Declaration of Helsinki and its amendments. Protocols were approved by the Ethics Committee of the Pitié-Salpêtrière Hospital (CCPPRB), Paris, France, and written informed consent was obtained from all subjects. The three studies took place in the same clinical unit under the supervision of the same research team.

All subjects received a 0.5 mg oral dose of digoxin with a glass of water after an overnight fast. Pharmacokinetic samples were obtained at times 15, 30 and 45 min, and 1, 1.5, 2, 4, 6, 8, 12, 24 and 48 h after the dose for two of the studies [2, 39]. For the last study [30], samples were taken at 15, 30 and 45 min, and 1, 1.5, 2, 2.5, 3, 4, 6, 9, 12, 16, 24, 48 and 72 h.

The three studies included 23 men and 9 women, with a mean age of 25.8 ± 5.2 years (range: 19–35 years). The patients were genotyped for two MDR1 polymorphisms, the C3435T polymorphism in exon 26 and the G2677T/A polymorphism in exon 21. In the study of Verstuyft et al. [39], patients were genotyped prior to inclusion to balance the genotypes for the C3435T polymorphism, while in the two other studies, genotyping was performed after inclusion. As a result, the genotypes of the 32 patients for this polymorphism included 10 TT (mutant homozygotes, 31%), 8 CT (heterozygotes, 25%) and 14 CC (wild-type homozygotes, 44%). G2677T/A genotyping revealed 12 GG (38%), 11 GT (34%), 7 TT (22%), 1 GA (3%) and one AA (3%) subjects, with a linkage disequilibrium between the two polymorphisms (Somer’s D′ = 0.72).

Digoxin was measured using a modified enzyme multiplied digoxin immunoassay (EMIT 2000; Dade Behring, Cupertino, Calif.), with a quantification limit of 0.1 ng/ml. The MDR1 C3435T and G2677T/A genotypes were determined by TaqMan allelic discrimination. More details on the analytical methods can be found in Verstuyft et al. [37, 38].

Statistical methods

Pharmacokinetic model

The pharmacokinetics of digoxin were described using a two-compartment model [15] with first-order absorption and elimination, and an absorption time-lag, using the analytical form of the model. We assumed a proportional variance model for the residual error. This model included six parameters: ka, kel, Vc/F, Tlag and the two transfer rate constants, k1,2 and k2,1. Interindividual variability was estimated for the first four parameters, with no covariance between them (diagonal variance matrix Ω).

Denoting f to be the function describing this model, the statistical model for concentration y I j in individual I at time t I j is:

$$ y_{{i\,j}} = f{\left( {\theta _{i} ,t_{{i\,j}} } \right)} + \varepsilon _{{i\,j}} $$
(1)

where θ i denotes the vector of parameters for individual i, and its components are assumed to follow a log-normal distribution:

$$ \theta _{i} = \theta _{0} e^{{\eta _{i} }} $$
(2)

where η i (0, Ω) is the vector of individual random effects.

The residual errors ɛ i j are assumed to be independent, with distribution (0, \( \sigma ^{2}_{{i\,j}} \)), where the variance of the error is modelled using a proportional error model: \( \sigma ^{2}_{{i\,j}} = \sigma ^{2} \,f{\left( {\theta _{i} ,t_{{i\,j}} } \right)}^{2} \).

The model for covariate effect describes the relationship between the individual pharmacokinetic parameters and a given covariate. The effect of polymorphism in exon 26 on a component θ(k) of the vector of parameters θ was modelled as:

$$ \theta ^{{{\left( k \right)}}}_{i} = \theta ^{{{\left( k \right)}}}_{0} {\left( {1 + \beta ^{{{\left( k \right)}}}_{{{\text{CT}}}} } \right)}^{{CT}} {\left( {1 + \beta ^{{{\left( k \right)}}}_{{{\text{TT}}}} } \right)}^{{TT}} e^{{\eta _{i} }} $$
(3)

Thus, the expected value of \( \theta ^{{{\left( k \right)}}}_{i} \) is \( \theta ^{{{\left( k \right)}}}_{0} \) for subjects with genotype CC, \( \theta ^{{{\left( k \right)}}}_{0} {\left( {1 + \beta ^{{{\left( k \right)}}}_{{{\text{CT}}}} } \right)} \) for subjects with genotype CT and \( \theta ^{{{\left( k \right)}}}_{0} {\left( {1 + \beta ^{{{\left( k \right)}}}_{{{\text{TT}}}} } \right)} \) for subjects with genotype TT. This model was used for the four parameters with variability (ka, kel, Vc/F, Tlag). In the following, we will drop the superscript (k) for simplicity. For each parameter in the model, there are five possible models for the gene-parameter relationship: the full model with three classes as in Eq. 3 (denoted H1 in the following), three intermediate models with two classes that we denote H0a :{β CT  = 0}, H0b :{β TT  = 0} and H0c :\( {\left\{ {\beta _{{CT}} - \beta _{{TT}} = 0} \right\}} \) and the model with no gene effect H0:{β CT  = β TT  = 0}. In the following, we first illustrate the three covariate selection approaches using the polymorphism in exon 26, then we apply these methods considering all the available covariates.

Backward covariate selection using the Wald test

One approach to selecting the covariate model is to estimate the parameters of a full model and perform a significance test using the Wald statistics to select which parameters should be kept in the model [26]. The advantage of this method is that model selection is performed in one step and that interactions between covariates are taken into account in the estimation of the parameters. Given the model described in Eq. 3, we test if the three parameters β CT , β TT and (β CT  − β TT ) are significantly different from zero by comparing the corresponding Wald statistics to the critical value of a χ2 with 1 df.

A screening step is often performed to eliminate candidate covariates that have a very small probability of influencing the parameters, thereby improving the estimation of the remaining parameters in the model. We choose an arbitrary value of 0.25 as the significance threshold, and we eliminate the covariates for which the p-values of the three tests corresponding to the three null hypotheses H0a , H0b and H0c are higher than 0.25. This yields a simplified model in which some parameters are modelled according to model H1 and some parameters are the same regardless of the genotype. This step eliminates from the model those relationships that are totally irrelevant and increases the precision of estimation of the other, possibly meaningful, parameters.

In the next stage, we estimate once again the parameters and their standard errors using the simplified model. For each parameter modelled using H1, the p-values of the three Wald tests are used to select the appropriate relationship, after correction for multiple tests by applying the Simes procedure [3, 35]. This method allows control of the family-wise error rate for the three simultaneous tests performed. For a given parameter, the final model for the gene-parameter relationship depends on which hypotheses are rejected. For example, if H0a and H0c are simultaneously rejected for a parameter, Eq. 3 simplifies to:

$$ \theta _{I} = \theta _{0} {\left( {1 + \beta _{{{\text{CT}}}} } \right)}^{{CT}} e^{{\eta _{I} }} $$
(4)

This procedure leads to the final model.

Forward covariate selection using the log-LRT

Convergence problems and non-identifiability may occur when trying to estimate the parameters of a full model with many covariates. The alternative is to build the model using forward selection. Different forms of this approach are used in most studies using nonlinear mixed-effect models [17, 23].

For forward selection, we start from a model without covariates (basic model) and compute empirical Bayes estimates (EBE) of the individual parameters. One-way analysis of variance (ANOVA) is used to test for a difference between the three genotypes for each parameter [23]. As previously, we begin with a screening step: candidate relationships are selected as those where the p value of the ANOVA is less than 0.25. We then model the candidate relationships as in Eq. 3 one at a time, starting with the most significant according to the LRT. We stop when none of the remaining relationships provide a significant improvement in the model according to a LRT. We then test for all parameter-gene relationships the three submodels H0a , H0b and H0c using the LRT again, correcting the p values using the Simes procedure. The best model for the corresponding relationship is selected as in the previous strategy, based on the p-values for the three corresponding tests.

Covariate selection using the Bayesian Information Criterion

We compared the two previous selection methods with model selection using the BIC given by:

$$ BIC = - 2LL + P\log {\left( {n_{{tot}} } \right)} $$
(5)

where P is the number of parameters (fixed and variance) in the model and n tot is the total number of observations.

The best model is defined as the model with the lowest BIC. For model selection using the BIC, we also consider models close to the lowest BIC. From the definition of Bayes factor as a ratio of posterior to prior odds used in Bayesian model selection, Raftery shows that the strength of evidence of one model versus the other is limited when models are within 3 points of BIC, while a larger difference provides positive evidence [18, 29].

A practical problem is the number of models to test. For each parameter in the model, there are five possible models when considering the genotype for exon 26 alone. To test all possible combinations for the four parameters with variability would require generating and fitting 625 models. Although technically feasible here, this would soon become impractical with more covariates or more parameters; therefore, we propose a simplified approach. In a first step, for each parameter we keep the model with the lowest BIC, as well as models within 3 points of BIC to the lowest. The model without covariate (H0) is also added to this list of possible models. In a second step, we build combined models where the possible models for one parameter are combined with each of the models for the other parameters. We estimate the corresponding BIC, and the best model is selected is the one having the lowest BIC overall. Again, we also examine models with BIC close to the lowest value.

Estimation method

The parameters are estimated using maximum likelihood approaches. Because the regression function is nonlinear with respect to the random effects, the likelihood function has no closed form. The most commonly used estimation methods rely on approximations of the likelihood function through first-order Taylor expansions and have been implemented, for example, in the \( {\text{nlme}} \) package in R/Splus [27], and in the NONMEM software [34]. To avoid this approximation, Bayesian approaches have been proposed which integrate the likelihood using Monte-Carlo Markov chains (MCMC) [36]. An alternative approach is to consider random effects as missing data and to use the EM algorithm [9]. An algorithm called SAEM has been recently developed using the EM approach: stochastic approximation combined with MCMC methods to simulate the random effect in the E-step provides a convergent algorithm and consistent estimates of the population parameters [8]. This method has better statistical properties since no linearisation is involved in the computation of the likelihood and, hence, the statistical tests based on the results have better properties [20]. It has also been recently applied in two applications, the study the pharmacokinetics of saquinavir in HIV patients [21] and the modelling of the viral load decrease to compare two treatments in a clinical trial [32].

The SAEM algorithm is implemented in the MATLAB language in the software MONOLIX, available on the author’s website (http://www.math.u-psud.fr/~lavielle/monolix/logiciels.html). We used version 1.1 of MONOLIX, in a Linux environment (Red Hat 9.0; GNU Fortran compiler), with MATLAB version 7. The analysis of the results was handled using the R statistical and graphical environment [28]. MONOLIX provides an estimate of the parameters (fixed effects and variance of the random effects) as well as an estimate of the estimation error via the Fisher information matrix [20].

The likelihood is computed by an importance sampling procedure [31]. Since a good estimate of the log-likelihood was required to perform LRTs, we used the average of five successive estimations of the likelihood to obtain a more stable estimate.

Model building

The three strategies described above were applied to the digoxin data in terms of the exon 26 polymorphism. We then performed the same analysis for exon 21. For the G2677T/A polymorphism in exon 21, five different genotypes were found in the population (GG, GT, GA, TT and TA). We performed first an analysis taking them all into account, and second an analysis where we regrouped the mutant alleles, yielding three groups (group 1, GG; group 2, GT or GA; group 3. TT or TA). The influence of the polymorphism in exon 21 was analysed first independently from the results of the analysis, including exon 26, then including the model developed for exon 26 alone. We also considered the homozygous wild-type diplotype (combined genotype) CC-GG, combining the GG genotype at position 2677 in exon 21 and the CC genotype at position 3435 in exon 26, versus all other diplotypes. The functional haplotype has previously been shown to influence the area under the curve (AUC) of digoxin [16]. Other haplotype analyses were not performed since the number of subjects was too small. Finally, full covariate analysis was performed; the following covariates were available in the study in addition to gene effect: gender, age, weight, body mass index and smoking status. Renal function was not evaluated in these subjects.

We examined the following plots to evaluate the goodness-of-fit of the final model provided by each approach: scatterplots of predictions (population and individual) versus individual observations; population-weighted residuals versus predictions and versus independent variable (time); absolute individual-weighted residuals versus individual predictions. In addition, model validation was performed using prediction distribution errors [5], which are computed as the quantiles of the observations in the predicted distribution. The predicted distribution for each observation was obtained through 1000 simulations of the data set given the final model. The prediction distribution errors were decorrelated as proposed in Brendel et al. [5] to take into account the correlation induced by the multiple observations within one subject. If the model is adequate, the distribution of the prediction distribution errors is expected to follow a uniform distribution over the interval [0–1], and we used a Kolmogorov-Smirnov test to test this assumption.

Results

Backward covariate selection using the Wald test

A full model including the effect of exon 26 genotype on all parameters was fit. The volume of distribution was the only parameter for which at least one of the p values of the Wald tests for the gene effect was lower than 0.25. The results are shown in Fig. 1: for each parameter, we show the estimate of β CT , β TT and the difference β CT  − β TT as well as the corresponding confidence interval. The horizontal line represents the expected value of 0 in the absence of effect. As seen from this figure, only β TT and β CT  − β TT for parameter Vc/F were found to be significantly different from zero using Wald tests.

Fig. 1
figure 1

Estimates of the genetic fixed effect for the different parameters in the model

The model was then re-run with only Vc/F, yielding the following estimates for the gene effects: β CT  = 0.065 (NS), β TT  = −0.164 (p < 0.01), β CT   − β TT   = 0.229 (p < 0.02). A final model was therefore run, including only a different Vc/F for TT subjects.

Forward covariate selection using the LRT

Figure 2 displays the empirical Bayes estimates of the four parameters with intra-individual variability (ka, kel, Vc/F and Tlag), separated according to the genotype for exon 26. As with the Wald test, only Vc/F was found to have a significant relationship with the MDR1 polymorphism on exon 26 (p < 0.017 according to the ANOVA), with the other three tests yielding p values larger than 0.4. Inclusion of the full gene effect in the model for Vc/F led to an improvement in the model (p = 0.007 according to a LRT, df = 2).

Fig. 2
figure 2

Empirical Bayes estimates of ka, kel, Vc/F and Tlag with the model without covariate

In the next and final stage, we then tested the three submodels versus H1 using LRT; this yielded the following p values: p = 0.003 for H0a  = {β CT  = 0}, p = 0.29 for H0b  = {β TT  = 0}, and p = 0.049 for \( {\text{H}}_{{0c}} = {\left\{ {\beta _{{CT}} - \beta _{{TT}} = 0} \right\}} \). Using the Simes procedure, the final model selected was the model where TT subjects have different Vc/F from the two other groups. For the effect of exon 26 polymorphism, the model selected by this strategy was therefore the same as that for the selection based on Wald tests.

Covariate selection using the Bayesian Information Criterion

The selection for each parameter separately yielded the following results: for Vc/F, the best model was a model with a different population mean for the TT subjects; for the other parameters, the best model was a model without covariates, and there was no model within 3 points of BIC of the lowest model. The results are illustrated in Fig. 3, which shows the BIC of the five models tested for each parameter. For each parameter, the model with the lowest BIC is shown as a full circle.

Fig. 3
figure 3

Bayesian Information Criterion (BIC) for the five models tested for each parameter: basic model (no gene effect); submodels with βCT = βTT, βCC = βTT or βCC = βCT; full model with βCC and βTT

The models were then combined, and again, the best model overall was here the model with a different population mean for Vc/F in TT subjects.

Final model

For the analysis of exon 26 alone, the three methods led to the same final model, a model where the carriers of the TT genotype have a different population mean for Vc/F.

The same analyses were done considering the genotype for exon 21. We found no significant parameter-genotype relationship when considering the five genotype groups for exon 21, but some genotypes were present in few subjects, suggesting a lack of power. When we regrouped the subjects into three groups according to the number of mutant alleles, the estimate of the volume of distribution was slightly lower in group 3 (TT or TA) versus the other groups (p < 0.04). However, when the model for exon 26 was taken into account, this relationship disappeared, showing that the difference is accounted for by exon 26 because of the strong linkage between the two exons (Somers D′ = 0.72). The three approaches presented above therefore gave the same results for the selection of genetic covariates in the model.

In addition, 11 subjects carried the CC-GG diplotype, and a slight difference was found between the estimates of the volume of distribution for these patients when using a Wald test (p = 0.045). However, the two other methods (BIC and EBE) did not pick this difference up.

In the final step, we then added the other covariates to the model. Because of the strong linkage disequilibrium between the two exons, the full covariate model included only exon 26. Using the Wald test approach, the final model included a different population mean for Vc/F in TT subjects, as previously, as well as a smaller absorption time-lag in women. Using the LRT approach, we also found a small increase in Tlag (2.5%) for smoking patients, but the size of the effect was not clinically significant and thus the final model was the same as with the Wald approach. The BIC approach was not implemented for the full covariate selection because of time constraints.

The parameter estimates and estimates of the standard errors are given in Table 1. The parameters were all well estimated, with standard errors lower than 20% except for the two covariate effects, for which it was less than 40%. The residual (intra-individual) error was also small (17%). In this model, subjects with the TT genotype have a volume of distribution lower by 17% relative to carriers of at least a C allele, and women have a 54% shorter absorption time relative to men. The within-subject variability was largest for the absorption rate constant ka.

Table 1 Estimates of the population pharmacokinetic parameters for the final model

A plot of the concentrations of digoxin as a function of time for the three genotypes for exon 26 is shown in Fig. 4. Overlayed is the corresponding population predictions for the group. Diagnostic graphs for this model are shown in Fig. 5. The two upper graphs show the population (left) and individual (right) predictions, respectively, versus observed concentrations. The two bottom graphs show the individual predictions for the first two subjects in the dataset. The graphs show a satisfactory fit, and the absorption phase is well described. A slight underestimation can be seen around 24 h, as the model does not capture a small rebound at that time. We tested two alternative models, one with a double absorption phase and one assuming enterohepatic recycling, but both encountered numerical difficulties and unphysiological estimates, and the bias in the model was not improved. Therefore, the two-compartment model was kept. We performed model validation for the final model; using prediction discrepancies, we did not reject the hypothesis that the data observed could have been obtained under the model (NS, p = 0.49) and considered the model to be adequately qualified.

Fig. 4
figure 4

Concentration versus time data for digoxin, for the three genotype classes for exon 26 polymorphism (in log-scale). Overlayed is the line corresponding to the predictions using the population parameters in each group, for men

Fig. 5
figure 5

Goodness-of-fit plots for the final model. Top Population predicted concentrations versus observed concentrations (left); individual predicted concentrations versus observed concentrations (right). Bottom Predicted concentrations (line) overlayed on observed concentrations (dots) for the first man (left) and the first woman (right) in the dataset

Discussion

With the recent availability of cheaper genotyping methods, it is now possible to collect genetic information related to drug transporters, metabolic complexes or receptor structure on a routine basis in clinical trials or before a patient is given a new treatment. In pharmacokinetics and pharmacodynamics, the time course of drug concentrations or effects is described using models with a small number of parameters, and pharmacogenetic data is being increasingly used to characterise their variability. There are now reports of pharmacogenetics studies for a large variety of drug classes, confirming the widespread interest and potential applications of pharmacogenetics.

The statistical analysis in these studies, however, is usually limited to using non-compartmental approaches to study the influence of genotype on AUC, apparent clearance or maximum or trough concentration. Only a few papers have reported the use of more sophisticated methods, such as mixed-effect models or Bayesian analysis, despite the fact that these approaches can be more informative. These latter methods also can take advantage of sparse designs, which could be useful when designing studies for screening genetic factors or for use during therapeutic monitoring. We present here the first pharmacokinetic population model for digoxin that includes a pharmacogenetics analysis.

In this study, we used three different methods to explore the relationships between the pharmacokinetic parameters and the genetic covariates: forward stepwise selection, Wald test-based selection and criteria-based selection. Although in the present application, these led to the same final model, the three methods all have different characteristics and strengths.

The tests for the three approaches are asymptotic; that is, they assume that the number of subjects as well as the number of points per subject is large enough. All three methods require good estimates of the likelihood, and the Wald test additionally requires good estimates of the standard errors. The only approximation in the computation of the estimated standard errors of estimation involved in SAEM lies in the asymptotic approximation applied to the finite dataset [20], so that we expect better statistical properties of the tests based on estimates obtained by SAEM relative to more traditional methods based on first-order linearisation such as are implemented in NONMEM [34] or in the library \( {\text{nlme}} \) for R [28]. Indeed, the standard errors of estimation of the parameters estimated using the SAEM software have been shown to be accurately predicted [33].

Genetic covariates (genotypes or haplotypes) are usually modelled as categorical covariates, except for some genes such as CYP2D6 where a numeric variable representing the number of mutant alleles has been used as the genetic covariate [19]. Categorical covariates bring specific challenges. We need to estimate one parameter for each possible genotype and the number of possible covariate models increases exponentially with the number of genotypes. Also, the dataset is often unbalanced, with sometimes a very small number of patients for the rarer genotypes, which can generate problems for parameter estimation.

Given these specific challenges, the capability of being able to select the covariate model in one step with the Wald test is appealing, and has been proposed by Panhard [26]. All the potential relationships are included in the model, and a simultaneous estimation of the significance of all the parameters is provided. This approach could be most interesting in sparse data settings where the empirical Bayes estimates (EBE) do not contain as much information as they do in our example where the pharmacokinetic sampling was rich. The three methods described above can be applied regardless of the estimation method, and have been used in NONMEM [17] and \( {\text{nlme}} \) [26]. Using the new algorithm SAEM, we can obtain good estimates of the parameters and their estimation error, which allows us to select the covariate model by backwards deletion from a full model. Compared to the two other methods, the Wald test requires an additional assumption in that the confidence interval for the estimated parameters is assumed to be symmetrical, which makes it less robust than the LRT.

The LRT, by contrast, does not require any additional hypothesis beyond that of the asymptotic. Stepwise inclusion is therefore the main method used for covariate model selection in pharmacokinetics/pharmacodynamics models. However, it suffers from a number of known problems: inflation of type I error due to multiple testing during the building process, selection bias, collinear variables and no guarantee that the final model selected using these methods is the correct model [41]. Inflation of the type I error is also inherent to the first-order linearisation of the log-likelihood used by NONMEM [7], while the stochastic approximation of the log-likelihood performed by SAEM retains a type I error closer to the nominal value, as shown in simulation studies [14]. Variants of stepwise methods include building generalised additive models using the empirical Bayes estimates of a model without covariate [24]; however, these do not address the issues mentioned above. An interesting combination of the LRT approach and the Wald approach could be outlined as follows: first, build a full model with all potential covariates included and keep as candidate covariates those for which the Wald test is significant; finally, build the covariate model using LRT-based forward or backward selection from these candidate covariates. This would reduce the number of models to test in the selection while allowing for a combination of covariates to enter the model.

Finally, the advantage of criterion-based strategies lies in their systematic exploration of all possible models. The use of model selection criteria such as AIC or BIC is more frequent in the Bayesian literature [29] and has a solid theoretical background in information theory. In practice, however, AIC often proves to be anti-conservative and has been shown to be non-consistent [42], and here we use the BIC. Criterion-based strategies have two main drawbacks. The first drawback is that the number of possible models increases exponentially with the number of covariates, although we can simplify the number of possible models by considering prior physiological knowledge to eliminate unlikely parameter-genotype relationships. The second drawback is that there is no formal test of the relative performance of two models. Kass and Raftery propose using the difference in BIC as a measure of the strength of evidence of one model versus another [18], but one can be left with several competing models of similar strength using that approach.

In summary, despite known problems that we have discussed here, stepwise selection strategies are less computationally cumbersome than criteria-based selection, while being more robust to poor estimations of the standard errors than selection based on the Wald test. However, it can be useful to explore candidate relationships using this last method, especially in the presence of a large number of covariates, because, as shown here, it can provide reliable estimates in one step and because effects due to a combination of several covariates may be missed by stepwise approaches.

The strategies outlined in this work can be used for all types of covariates (demographic data, clinical characteristics, biological measurements, among others) as well as for building the structural model.

Our main finding, the difference in volume of distribution found for TT subjects, explains the higher AUC observed for these subjects in the previous non-compartmental analysis performed using this data [38]. It can be interpreted as a higher bioavailability in TT subjects relative to CC or CT subjects. This result should be confirmed in patients receiving digoxin and probably does not warrant dose adjustment for digoxin, especially considering the high variability in absorption. A possible exception would be to adjust dosage in certain populations, such as elderly patients or patients receiving other co-medications. The proportion of digoxin-treated patients experiencing therapeutic drug monitoring has been shown to increase with the number of PgP inhibitors received [11], which could make it useful to determine the genotype governing PgP activity [12].

In conclusion, we modelled the pharmacokinetics of digoxin, including pharmacogenetic data, using nonlinear mixed-effect models. Our main finding was that carriers of the TT genotype for the C3435T polymorphism in exon 26 of the MDR-1 gene have a lower apparent volume of distribution. Several methods can be used to test for genetic effects. In addition to the usual stepwise selection method, we recommend using the Wald test to screen candidate covariates.