We read with great interest the ‘Current Opinion’ on a new levothyroxine (LT4) formulation by Concordet et al. [1] and the follow-up article and comments in Clinical Pharmacokinetics [2,3,4], evaluating the conclusions from the original study by Gottwald-Hostalek et al. [5]. The topic is of particular interest given the fact that after switching to the new drug with the altered formulation, a large number of adverse drug reactions (ADRs) have been reported to the French network of pharmacovigilance centres [6]. This was surprising given prior demonstration of bioequivalence between the new and old formulations as per legal requirements [5]. When two formulations of the same drug have been demonstrated to be bioequivalent according to established criteria, it is widely assumed that they are equivalent as to their therapeutic effect and can be used interchangeably.

In the LT4 bioequivalence study, 204 healthy volunteers received a single oral dose of 600 µg of the new and old LT4 formulations in a cross-over design and their total serum T4 concentrations were repeatedly measured following the ingestion of the drug over a time period of 72 h [5]. The ratio of the baseline-adjusted geometric least-square means between the new and old formulations was reported for both the area under the curve (99.3%) and maximum concentration (101.7%), and the results were considered indicative of their bioequivalence [5]. Concordet and colleagues reexamined the trial data, showing a high individual exposure ratio [1]. These authors argued that a different conceptual framework of individual bioequivalence should have been used to identify possible larger intra-individual variability for the new formulation, compared with the old drug [1]. This might have revealed potential issues, which then surfaced later when the drugs were switched on a mass scale [1]. Others did not share their view [3, 4].

It still remains unknown what may have caused the surge in ADRs after the mandatory replacement of the old drug with the new drug. Drug interchangeability can be classified either as drug prescribability or drug switchability [7]. Drug prescribability is defined as the choice of the treating physician to prescribe an appropriate drug for a new patient among various approved alternatives. Switchability refers to exchanging one drug for an alternative product in the same patient. The latter is more critical, as a previously successful treatment regimen may be potentially compromised. However, average bioequivalence generally cannot univocally imply drug prescribability or drug switchability. More specifically, two dose-equivalent and presumed bioequivalent synthetic LT4 preparations showed a change in bioavailability and altered biochemical response (FT4, thyroid-stimulating hormone) in a rigorous, randomised, double-blind cross-over trial [8]. Hence, the exposure-effect relationship of LT4 must be considered in hypothyroid patients. This varies considerably, depending (among other influences) on the variable activation of the pro-drug LT4 into the biologically more active hormone T3 (conversion efficiency) [9].

We note that thyroid hormone measurements in a population display an exceptionally high degree of individuality (low individuality index) [10]. Thereby, the individual group members do not share characteristic statistical moments of the group, such as the group mean, variance or covariance [11]. In other words, they fail to meet a mathematical prerequisite for statistical averaging [12]. This situation may be of relevance for the current debate.

To assess the situation, we retrieved a publicly available data set of the equivalence trial [1, 5] to estimate the baseline-adjusted area under the curve for each subject according to the linear trapezoid method, and to obtain the time-averaged incremental T4 concentrations for each subject by dividing the total area under the curve by the total time elapsed. The variability of increments in serum total T4 concentrations in individual subjects from this LT4 equivalence trial was examined and visualised in Fig. 1. Interestingly, despite closely concurring mean T4 concentrations between the two different formulations (difference − 0.12 ng/mL, p = 0.75, paired t test), a paired difference plot uncovered considerable diversity among individual subjects in both their start and end concentrations and the respective distances between the two concentrations (Fig. 1). The absolute difference exceeded 10% in 67% and 30% in 27% of the pairs. The coefficient of variation for this measure was large, but comparable between the two formulations (31.2%, 31.5%). The intraclass correlation was estimated to be 0.66.

Fig. 1
figure 1

Time-averaged incremental T4 concentrations in individual subjects receiving either 600 µg of levothyroxine as an oral dose of a new or old LT4 formulation. T4 concentrations shown were based on baseline-adjusted area under the curves divided by the elapsed time of 72 h. Publicly accessible data of the trial [5] were retrieved from reference [1]. ID identification

This pattern suggests that it is important to recognise the intra-personal clustering present in thyroid hormone measurements and to appropriately account for the level properties in the analysis of clinical studies involving thyroid hormones [11]. This can unmask individual differences in the averaged treatment response and identify clinically distinguishable subgroups within an indiscriminate population or group. Dissimilar clusters of individuals may frequently have requirements for treatment success and ADRs that are vastly different from those of the averaged population [9].

We conclude that the unsurprising presence of high variability among individual subjects in the LT4 equivalence trial together with a large number of reported ADRs should caution against the fallacy of statistical averaging of thyroid parameters, demanding a more individualised approach in clinical thyroidology. This example may also serve as a reminder that pharmacological bioequivalence and clinical interchangeability are related but different concepts. Thyroid hormones are unique drugs with critical dosing, low therapeutic tolerance, and high variability in their biochemical and symptomatic responses among individual patients [9]. Clinicians cannot be over-reliant on drug interchangeability but should always closely monitor the effect following any change in the medication.