Introduction

Meropenem (MERO) is broad-spectrum antibiotics of the carbapenem class, with activity against both Gram-positive and Gram-negative bacteria that are widely used in critically ill patients [1]. The antimicrobial activity of MERO is time-dependent, meaning that its efficacy is related to the fraction of the dosing interval that the free plasma concentration is above the minimum inhibitory concentration (fT>MIC) [2]. New recommendations have been made on the optimization of beta-lactam treatment in ICU patients, such as the use of continuous or prolonged infusion of beta-lactam antibiotics in order to maximize its clinical efficacy for a target attainment of 100% fT>4xMIC [3]. Indeed, suboptimal drug exposure can lead to mortality and, possibly, the emergence of antibiotic resistance, while overexposure may be neurotoxic [4].

Population pharmacokinetic (PK) analyses develop mathematical models using specific patient group that then describe drug’s concentration in various populations and allow to optimize drug-dosing regimens [5]. Multiple population PK models have been developed during the last few years. However, these models often do not involve an external evaluation procedure, although it is recommended in order to apply the model to a clinical setting [6]. An external evaluation involves using an independent dataset to assess the accuracy of an existing model and to predict its outcome using patients with similar characteristics to those the model was developed with [7]. Knowing if the model will fit a given population of ICU patients proves to be a challenge. Besides, studies have shown that patients admitted to intensive care units (ICUs) display large variability in PK primarily due to altered pathophysiology, as opposed to healthy individuals or non-critically ill patients [8]. Due to this, optimal dosage is particularly difficult in these patients. Population pharmacokinetic models can be used in dosing simulations to predict the probability of target attainment of different dosing regimens. This proves to be useful for decision-making in a clinical setting.

The aim of this study was to perform an external evaluation of published population PK models of MERO in order to test their predictive performance in a cohort of ICU patients.

Material and methods

Patients

This study was a prospective, multicenter observational study performed between October 2015 and May 2017, in 2 ICUs (multidisciplinary critical care (20 beds), cardiovascular critical care (20 beds)), Timone University Hospital, Marseille, France. Ethical approval was obtained from the Institutional Review Board (Comité de Protection des Personnes Sud-Est I, Saint-Etienne, France, protocol n° 2017-A01446-47) which waived the need for written consent. Patients or next of kin was orally informed of the goal and design of the study. The antimicrobial agent assays were performed by the university-affiliated pharmacological laboratory (Department of Clinical Pharmacology and Pharmacovigilance of Timone University Hospital, Marseille, France). Blood samples were collected from patients who received a standard loading dose of 2 g of MERO and subsequently a maintenance dose of 4 g as a continuous infusion. Antibiotic maintenance dose was adjusted according to impaired renal function. Therapeutic drug monitoring was done on day 1, day 4, day 7, and at the end of treatment, and dosage was modified according to the clinician to optimize treatment. Samples were transported in 5 h to the pharmacological laboratory and were centrifuged for 10 min at 3000g and 4 °C. If necessary, plasma samples were stored at − 20 °C until analysis and MES buffer (4-Morpholineethanesulfonic acid, 1 M) was added to plasma (1/10). The assay protocol was adapted according to the method of Verdier et al. [9] and validated according to the EMA guidelines [10]. Briefly, before high-performance liquid chromatography (HPLC) analysis (Dionex Ultimate 3000 HPLC, Ultraviolet detection), sample preparation consisted of protein precipitation using acetonitrile and then liquid-liquid extraction by dichloromethane. The MIAA (5-Methoxyindole-3-acetic acid) has been chosen as internal standard. The limit of quantification was 0.5 μg/ml and the upper limit of linearity has been validated at 50 μg/ml. In case of higher concentrations observed, the sample was diluted before reanalysis.

Published models

The PRISMA statement for reporting systematic reviews and meta-analyses was used to conduct the literature review included in the study [11] (Fig. 1). A literature search in PubMed/Medline database was made using the following search terms: (meropenem AND pharmacokinetics/ OR renal elimination/ OR (pharmacokinetic* OR ((pharmaco OR drug) ADJ kinetic*) OR area under curve? OR AUC OR (renal ADJ (elimination? OR excretion? OR clearance?))) OR (((nonlinear OR non-linear) ADJ mixed effect model*) OR NONMEM OR WinNonMix OR P-PHARM OR NLMIXED OR ADAPT) AND EXP population/ OR population groups/ OR (population? or ethnic group?)). Additional relevant studies were manually screened from the identified publications.

Fig. 1
figure 1

PRISMA flow diagrams for the inclusion of studies in this review

The inclusion criteria for the published models were as follows: (1) the study was a population pharmacokinetic analysis of MERO in ICU adult patients; and (2) the publication was written in English. The articles were excluded if (1) the model was not developed with software using parametric approach or (2) the non-availability of sufficient data to rewrite the models.

Model evaluation

The external evaluation was conducted using the nonlinear mixed effect modeling (NONMEM version 7.4; ICON Development Solutions, Ellicott City, MD, USA) software. The plots were made using the R version 3.5.1. The final population PK models were re-established based on the formulas and parameters reported for each of the included publications. The external evaluation was performed without any additional fitting of the model to the data (MAXEVAL = 0 option in NONMEM). To assess the predictive performance of this model, we first compared goodness-of-fit plots of the predicted concentration and the observed concentration. The global fit of the population pharmacokinetic models was also evaluated by visual predictive checks (VPC). Prediction error (PE) was used to assess the predictive performance of the models. PE was determined by the following equation:

$$ PE=\frac{C_{pred}-{C}_{obs}}{C_{obs}}\times 100\% $$

where Cpred and Cobs are the predicted and observed concentration, respectively [12]. The median prediction error (MDPE) and the median absolute prediction error were used as a measure of bias and inaccuracy, respectively [13].

  1. 1.

    Inaccuracy (i.e., the size of the typical miss) of model predictions is given for the ith individual, by the median absolute performance errors (MADPEi).

$$ {\mathrm{MADPE}}_{\mathrm{i}}\ \left(\%\right)=\mathrm{median}\left(|{\mathrm{PE}}_{\mathrm{i}\mathrm{j}}|,\mathrm{j}=1,\dots, {\mathrm{N}}_{\mathrm{i}}\right) $$
  1. 2.

    Bias (i.e., the direction and the size of the deviation from the predicted concentration) of model predictions is given for the ith individual by the median prediction error (MDPEi):

$$ {\mathrm{MDPE}}_{\mathrm{i}}\ \left(\%\right)=\mathrm{median}\left({\mathrm{PE}}_{\mathrm{i}\mathrm{j}},\mathrm{j}=1,\dots, {\mathrm{N}}_{\mathrm{i}}\right) $$

A MDPE between – 20 and 20% and a MDAPE ≤ 30% are considered as acceptable criteria for bias and inaccuracy [14]. A Bland-Altman diagram was also produced for comparison purposes and to show the trend in prediction error. If a model presented acceptable bias and accuracy, its predictive performance based on simulation would be evaluated and the statistics of the observed and simulated time-concentration profiles would be compared using VPC and NPDE tests [15].

Evaluation of predicted concentration 24 h after the first dose

As the meropenem clinical efficacy is related to concentration greater to the minimal inhibitory concentration during 100% of time interval, the predicted C24h using each of the 4 population pharmacokinetic models was examined. The C24h (after the first dose) was calculated for all patients using covariates only (a priori prediction) or by using observed meropenem concentration(s) (Bayesian estimation) [16]. The predicted C24h estimated using the a priori and Bayesian approach were compared.

Results

Patients

Thirty ICU patients on MERO infusion were enrolled in this study. Three patients were excluded because of incomplete information. The final database consisted of 27 patients (15 men and 12 women) (Table 1). In total, 84 concentrations were used for external evaluation. Creatinine clearance was estimated using the Cockcroft-Gault formula. The median Simplified Acute Physiology Score (SAPS) II was 38 (range 17–87) and the median Sepsis-related Organ Failure Assessment (SOFA) score was 5 (range 2–15). Seventeen patients had documented infections (Enterobacter aerogenes (n = 1), Enterobacter cloacae (n = 3), Enterococcus faecium (n = 2), Escherichia coli (n = 7), Klebsiella pneumoniae (n = 2), Pseudomonas aeruginosa (n = 2), Morganella morganii (n = 1), Raoultella ornithicolithica (n = 1), Serratia marcescens (n = 1), Staphylococcus aureus (n = 1)). Comedication included amikacin (n = 8), gentamicin (n = 1), and vancomycin (n = 2).

Table 1 Characteristics of the patients in the published studies and the external validation dataset

Published models

Seven models were screened [17,18,19,20, 22,23,24], but three were excluded due to insufficient information. We tried to reach the author of one article in order to obtain the values of the residual variability without any success. The predictive performance of four models was evaluated using the external evaluation dataset [17,18,19,20]. The characteristics of the patients in the included studies are summarized in Table 1. The details of the evaluated models are listed in Table 2. Three studies described MERO pharmacokinetics with two-compartment models. Several covariates were used: creatinine clearance (n = 3), bodyweight (n = 2), age (n = 2), and albumin (n = 1). Estimated values of clearance (CL) and central volume of distribution (V or V1) were between 9.38 and 13.6 L/h and 7.9 and 26.2 L, respectively.

Table 2 Summary of the evaluated pharmacokinetic models

Model evaluation

Figure 2 shows plots of predicted concentrations versus observed concentrations for each evaluated model. All evaluated models presented an under-prediction of concentrations with large inaccuracy. The values of bias and inaccuracy of the evaluated models are described in Table 3. The values ranged between − 14.4 and − 33.6 for the bias and 32.4 and 52.8 for the inaccuracy. The Bland-Altman diagram confirmed the trend in prediction error (Supplementary material, Figure S1). The predictive performance of the predicted meropenem concentration time profiles as revealed from the VPC of the population pharmacokinetic models was highly variable (Supplementary material, Figure S2).

Fig. 2
figure 2

Population-predicted concentrations versus observed concentrations for meropenem models. a Delattre et al. (R2 0.576), b Li et al. (R2 0.629), c Mattioli et al. (R2 0.403), and d Roberts et al. (R2 0.616)

Table 3 Prediction error of the evaluated pharmacokinetic models

Evaluation of predicted concentration 24 h after the first dose

The C24h of a course of therapy was predicted a priori (covariates only) and estimated from the observed meropenem concentrations (Bayesian) with each model. The distribution of the predicted meropenem C24h was highly heterogeneous (Fig. 3). Models such as that of Roberts et al. [20] demonstrated similar ranges for the a priori and Bayesian estimated C24h. This indicated that an a priori prediction of this model is promising, but can be further improved with the inclusion of observed meropenem concentrations and Bayesian.

Fig. 3
figure 3

A priori predicted (light gray) and Bayesian estimated (dark gray) meropenem concentration 24 h after the first dose. Box plots represent the 25th, 50th, and 75th percentiles

Discussion

Several population PK models of MERO [17,18,19,20, 22,23,24,25,26,27,28,29,30,31] in critically ill patients have been published over the past decades, and predictability was often not fully evaluated [32]. After evaluation, none of the selected models in this study was found to be suitable for clinical application as they did not meet the defined criteria. Four population PK models of MERO [17,18,19,20] corresponded to the inclusion criteria and were evaluated, but none turned out to be valid for the external validation dataset as bias and inaccuracy were outside of the determined values. Amongst the four models, two were outright eliminated, while the model of Roberts et al. [20] had a MDPE and MDAPE outside of the acceptable values, but near the limits. However, this model showed a low difference between the a priori and Bayesian estimated C24h. This indicated that an a priori prediction of this model could be promising. The best predictive model of MERO was developed by Delattre et al. [17], as its bias was within the criteria, and the inaccuracy was close to the determined value, albeit outside of the range. However, bias was close to the limits of admissibility and showed the risk of under-prediction of this model. Moreover, the difference between the a priori and Bayesian estimated C24h was as high. This indicates that an a priori prediction of this model will not be promising. Only an integrated assessment of all predictive performance markers (bias, inaccuracy, goodness-of-fit plots, and forecasting performance) can provide evidence of a model’s relevance. Although we initially had four possibly models for MERO, none ended up being appropriate for clinical use under the herein defined criteria.

Our population consisted of patients with severe infections, which corresponded to the patients diagnosed with severe sepsis or septic shock that all studies recruited. However, as bias and inaccuracy were either near or out the threshold, there may be other factors that vary between both populations that have to be considered, such as the severity of the illness, comedications, and the underlying pathologies common in each group. Moreover, the high values of the between-subject variability and residual variability in all the included publications also indicate how different each population of patients is from the other. This could be due to several factors, including the patient population that the PK models were derived from and the methods used in the PK modeling. Even though commonly used covariates such as creatinine clearance and body weight were incorporated in most of the models, a significant variability still remains, which would explain the difficulty to validate its accuracy using a given population. The different methods used to estimate covariates such as the serum creatinine can affect the transferability of the model to a clinical setting [6]. Amongst the four evaluated models, certain covariates are more commonly used than others. Renal clearance plays a significant role in the predicted levels of MERO, as the estimated creatinine clearance was incorporated as a covariate in the PK equation for CL in all evaluated models except for Mattioli et al. [19]. As MERO is mainly eliminated by renal route, changes in renal functions would affect the concentrations of the beta-lactam antibiotics [33]. Therefore, evaluating a patient’s kidney function is primordial to the validity of the model. Estimation of renal function in critically ill patients remains an issue; it is clear that creatinine level is an imperfect tool in assessing kidney function. All studies evaluated creatinine clearance using Cockcroft-Gault formula. But, as previously described, the Cockcroft-Gault formula is highly inadequate in intensive care unit [34] but remains largely used in clinical practice and could be explained the large observed variability. Future investigations should evaluated the Cystatin C as marker of renal function to reduce this variability [34]. Body weight was also used as a covariate by Delattre et al. [17] and Li et al. [18]. As described by Marsot et al., the body weight has an effect on dose optimization, which could be explained by its relationship with the volume of distribution [35]. The value of the latter can lower or increase the concentrations of the antibiotics. Mattioli et al. [19] included albumin level in their PK equation for the volume of distribution. Indeed, hypoalbuminaemia is a common but frequently neglected condition in intensive care unit [35]. The incidence is reportedly as high as 40–50% [36]. With decreasing albumin concentrations, an increase in the unbound fraction drugs can occur [37]. However, the inclusion of that covariate is not common in other studied models. Additionally, the patients from the external validation dataset were administered meropenem by continuous infusion (CI), which differed from the mode of administration of some of the selected studies (Table 1). However, Roberts et al. [20] compared the mode of administration between patients that received meropenem either intermittent bolus administration or an equal dose administered by CI. They concluded that while the administration by CI maintained higher concentrations in subcutaneous tissue and plasma, it did not have a major impact on the final data. The discrepancy between mode of administration of the population groups could still be a potential weakness of the study.

The design of the study also can explain a part of the observed variability [38]. Indeed, the number of patients from whom the model is derived also plays a significant role in assessing the model’s robustness [38]. In this study, the number of patients and the number of samples differed greatly between each model and the external validation dataset. Roberts et al. [20] only used ten patients to develop their population PK model, which could suggest a decreased reliability due to the small numbers of patients. Compared with Roberts et al. [20], Delattre et al. [17] collected data from 88 ICU patients treated with amikacin but only 19 of these were coadministered with MERO. Only internal evaluations such as bootstraps and/or visual predictive checks (VPC) were done to assess the chosen models. Moreover, half of the selected models [17, 18] were developed based on routine clinical data, possibly leading to incorrect documentation or a lack of information. A study by Alihodzic et al. [39] demonstrated that erroneous records could have a considerable impact on the precision and accuracy of the estimated PK parameters in population PK modeling.

Recently, Dhaese et al. [31] also realized an external evaluation in which eight population PK models of MERO were compared. We found similar results, concluding that there is a large variability in the predictive performance of the models when performing an external validation using an independent dataset of ICU patients. Moreover, Dhaese et al. [31] suggested that an external evaluation should also be based on its tendency to under- or over-predict concentrations and that the consequences of under-dosing of MERO are probably more important than the consequences of overdosing. However, several recent studies showed that neurotoxicity exists with beta-lactams [40, 41]. High such as low MERO concentrations must be avoided. There is a need to develop a population PK model for each cohort of patients, as every intensive care unit has a different population, which can affect the pharmacokinetics.

Some remarks need to be made regarding the design of this study. First, this study included therapeutic drug monitoring data, resulting in the limited number of blood samples per patient collected, and the data were collected from a clinical setting. Another limitation for this study was the low number of patients (n = 27) used for the external evaluation. This element is also considered to be a limitation in assessing the forecasting performance. Indeed, forecasting performance can also be a criterion for evaluation [16]. Furthermore, the fact that the exclusion criteria restricted the models chosen to those developed by a parametric approach might be considered a limitation.

Conclusion

In conclusion, the predictive performance of evaluated PK models in ICU patients showed a wide variability. External evaluation in the population of interest is a necessary step before defining new dose adaptation. This brings to light the difficulty to generalize model in ICU population and the need to develop one specific model to each population of critically ill patients.