We read with interest the article by Concordet et al. on the analysis of the bioequivalence study between old and new formulations of Levothyrox®, which was recently published in Clinical Pharmacokinetics [1].

Concordet et al. reported that “more than 50% of healthy volunteers enrolled in a successful regulatory average bioequivalence trial were actually outside the a priori bioequivalence range”, and related this fact to the numerous adverse drug reactions (ADRs) that were reported following the large-scale switch that occurred when the new Levothyrox® formulation replaced the old formulation. This was reported likewise in the French media quoting this article.

Moreover, Concordet et al. hypothesized that the large proportion of patients with a ratio of adjusted T4 area under the curves (AUCs) outside the bioequivalence range of 0.9–1.11 may originate from a possible subject-by-formulation interaction, and therefore question the ability of the standard average bioequivalence approach to guarantee the within-patient switchability between the new and old formulations.

There is a more likely explanation for the occurrence of the observed peak of ADRs concomitant with the switch from the old to the new Levothyrox® formulation.

Regarding a possible subject-by-formulation interaction, although it cannot be fully assessed without a replicated crossover design, it is still possible to evaluate whether the observed large fraction of patients with a ratio of AUCs outside the 0.9–1.11 range is compatible with the absence of such interaction, or if, conversely, as assumed by Concordet et al., it could only have occurred if an interaction truly exists. To this purpose, we programmed statistical simulations (code available upon request) using the R language, to generate log(AUC) in a crossover trial data under a priori assumptions for (1) the population AUC means for the test and reference formulation; (2) between-subject variability of individual means, as well as (3) their correlation coefficient with (2) and (3) defining the covariance matrix of the Gaussian bivariate distribution for the individual means; and (4) the within-subject variability for the test and reference formulation in order to simulate log(AUCs) normally distributed around individual means. These simulations allowed to mimic the generating biological process of bioequivalence data as described in the 2001 FDA Guidance document “Statistical Approaches to Establishing Bioequivalence” [2]. We simulated a large number (n = 10,000) of crossover trials with the same characteristics as the Levothyrox® bioequivalence trial (204 subjects, unreplicated two periods) parameterized with equal population means and, most importantly, without subject-by-formulation interaction (i.e. assuming equal between-subject variability of individual means and a unit correlation between those individual means). Since our request for raw data to the French ANSM was left unanswered, we derived the assumption for the common within-subject standard deviation (\(\sigma_{WT} = \sigma_{WR} = 0.234\)) from the coefficient of variation (CV) value (23.7%) found in the public report [3] using the formula \(CV = 100 \times \sqrt {\left( {\exp \left( {s^{2} } \right) - 1} \right)}\); for both formulations, we took the population log means as the log of the average AUC means (\(\mu_{T} = \mu_{R} = \log \left( {\left( {1852.079 + 1864.359} \right)/2} \right)\)) from the same report. Finally, for the between-subject standard deviation (\(\sigma_{BT} = \sigma_{BR} = 0.3\)), we used the estimated standard deviation of the subject-specific random-effect model that we fitted on the (log) AUCs that were calculated and made public by Concordet et al.

With this large set of simulated trials, we estimated that there was a likelihood close to 100% that a trial ends up with a proportion as large as the 67.2% observed in the levothyroxine bioequivalence trial for AUC ratios outside the 0.90–1.11 range. This is not unexpected since the bioequivalence range relates to the ratio of means rather than the ratios of individual AUC measurements which are expected to have substantially more variability.

In other words, even without any subject-by-formulation interaction, the mere impact of the within-subject variability can lead individual AUC ratios to display the observed variability. We conclude that the extent of variability observed for AUC ratios is actually highly compatible with the absence of subject-by-formulation interaction, and therefore does not give particular credibility that such phenomenon has been at play in the Levothyrox® trial.

We actually believe that the observed peak of adverse reactions has been mainly triggered by the changes of specifications between the old and new formulations. The upper limit of levothyroxine tablet content of the old Levothyrox® formulation at release was 110%, higher than the standard 105% limit used for other pharmaceutical formulations. This was, at that time, authorized worldwide in relation to the progressive levothyroxine degradation over time due to spontaneous oxidation. By contrast, for the new Levothyrox® formulation, according to ANSM published documents [4], the limits were set at 98–105% at release and 95–105% at the end of the shelf-life period, which was, in addition, shortened from 3 to 2 years.

Therefore, when the old formulation was switched to the new formulation, some patients had up to 10% change of levothyroxine daily dose, depending on the batch used (time delay after release) thus on the content difference of tablets between the old and new formulation. Patients who were very sensitive to a 5–10% levothyroxine daily dose change could therefore have suffered from unbalanced thyroid function with associated symptoms. It is noteworthy that in the pharmacovigilance review published by the French agency ANSM [5], only 1.46% of the 2.3 million patients who switched from the old to the new formulation reported ADRs, most of which were identical to those usually reported during levothyroxine treatment and related to their thyroid status. The unusually large scale of patients having shifted to the new formulation within a short period of time, i.e. a few months, explains why these ADRs could have been detected despite representing a low percentage of the treated population. Dose adjustments and other available levothyroxine formulations solved the problem, with a return to a ‘normal’ level of ADR notifications a few months later.