1 Introduction

With the use of the QT interval as a sensitive, albeit unspecific, biomarker for arrhythmogenicity of medicines [1], the need to eliminate the influence of heart rate on this interval became more prominent. There is currently agreement that the correction proposed nearly a century ago by Fridericia [2] (corrected QT interval [QTc] using Fridericia’s formula [QTcF]) is usually sufficient as long as the medicine under investigation does not have an effect on heart rate [3]. Subject-specific (QTcI) and population-specific (QTcP) corrections can be derived based on data from the subjects under investigation and are alternatives to QTcF. However, the guidance on how to proceed if an effect on heart rate is present is rather vague and trialists have little help when they need to implement these alternatives. Recently, regulators, in particular the US Food and Drug Administration (FDA) have emphasized the need to use these alternatives. Two articles by Malik et al. [4, 5] provide substantial support to the claim that the use of alternatives is really needed and, at the same time, show the limitations of these alternatives.

In this commentary I first provide some theoretical background on heart rate correction and then discuss the merits and limitations of the two articles mentioned and try to place them within the current research situation on this topic.

2 Some Basics

There is an obvious dependency between RR and QT that can be measured by calculating the correlation between the two in a sample of measurements. To be more precise, the dependency is between QT and the RR values in the preceding time interval of about 2–3 min (RR-history) [6, 7], but we will ignore this for now. QTc can be defined as the quantity that is maximally correlated with the observed QT under the conditions that (1) it is not correlated to RR and (2) it coincides with the QT for RR = 1 s. Depending on the assumptions on the statistical distribution of QT and RR and, therefore, on the domain in which the correlation is determined, different forms of QTc will be obtained. For example, if QT and RR are assumed to be normally distributed, a linear correction will result, while under the assumption of a log-normal distribution we will obtain a log-linear or parabolic correction of the form QTc = QT/RR−β.

A very important aspect when deriving QTc is the distinction of between-subject and within-subject variability. In particular, when QTc is used as a biomarker for arrhythmogenicity we are nearly always interested in the change in QTc from baseline (ΔQTc). The use of ΔQTc removes the between-subject variability, leaving the within-subject variability. If we want to derive a correction along the lines outlined already, we first have to remove the between-subject variability and determine the QTc so that it is uncorrelated with RR within each subject. The simplest way is to determine QTcI separately for each subject, but this may become problematic if we do not have enough datapoints. Alternatively, we may set up a mixed-effects model that accounts for the between-subject variability of QT—mainly by introducing a random intercept for each subject—and the within-subject variability. The latter can then be separated into two parts: one that can be explained by variations in RR and the remaining part that is QTcI. As a by-product of this analysis, one can also obtain a proper study or population-specific correction (QTcP), i.e., the mean correction across subjects. This population-specific (within-subject) correction is different from the derivation that was used by Malik et al. (see discussion in Malik et al. [5]), but has been used elsewhere, e.g., in Mendzelevski et al. [8]. It will not suffer from the problems that arise if within- and between-subject variability are confounded.

Seen from this perspective, determination of a correction method is a problem of statistical inference and, thus, results will come with a certain uncertainty. In practical terms, QTcI and QTcP will vary from one study to another even if these have been conducted under identical conditions and based on the same population of subjects. Wang et al. [9] nicely show that the improvement with respect to systematic errors when passing from QTcF to QTcP and QTcI is counterbalanced by an increase in variability of the estimates.

There is agreement that if a study- or subject-specific correction is to be derived, it needs to be based on sufficient data and a sufficiently large spread of RR values [1, 3]. If this condition is not met, the estimated correction factorFootnote 1 will show increased error and therefore will become unreliable.

From the exposition given, it becomes clear that if both QTc and RR are modified by a varying covariate at the same time, correlation will result. In particular, if a medicine affects both QTc and heart rate, it will induce correlation between the two unless the effect is constant in the full sample, a condition that is not realistic. Dang and Zhang [10] pointed to this and concluded that QTc must be derived from drug-free data.

In almost all applications, a correction method is developed with the aim to apply it to new data. In these situations, there is the danger of what has been termed ‘tunnel vision’. By definition, if applied to the dataset it was derived on, QTcI will fit at least as well as QTcF and QTcP. As we have seen, the individual correction factor βi for subject i will also be derived with a larger standard error than that for QTcP [9]. Therefore, when selecting a method to be applied to new data, validation is highly recommended. This validation must be performed on a dataset that is different from the one QTcI and QTcP have been derived from. Tornøe et al. [11] describe the method of mean squared slopes, which can be used for this purpose. However, based on the earlier discussion, it seems appropriate that, unlike the recommendation from Tornøe et al. [11], validation should not be performed on data obtained from individuals while on a drug unless it is clear that the substance does not affect the heart rate.

3 What do Malik et al. Add?

The first of the two articles by Malik et al. [4] investigates a situation where the correction—individual- or study-specific—is derived from a set of drug-free baseline data obtained in resting conditions, i.e., with a range of RR intervals that is substantially smaller than that seen in individuals receiving a drug that has an effect on heart rate. They confirm that in this situation there is a substantial random error in the estimates of correction factors, which results in high variability of the results between simulated studies. More importantly, however, they also show that there is a systematic bias in the estimated effect on QTc. They attribute this to the smaller spread of RR in the data used to determine the correction, which is assumed to result in an underestimation of the correction factor. This is a plausible explanation, but would benefit from experimental or theory-based verification.

In the second article, Malik et al. [5] use a similar setting to investigate what happens if correction factors or exponents are derived from a dataset that includes data obtained from individuals receiving a drug. Their results confirm that there is a systematic error. The striking result is that with an increased effect of the drug on heart rate, the estimated effect on QTc will tend to zero. This could also have been predicted based on theoretical considerations, but it is instructive to see that completely misleading results can already be seen with a heart rate increase of 15–20 bpm.

The innovative contribution of the two articles by Malik and colleagues is that they introduce the tool of simulation based on real data to the field of heart rate correction. Previously, the same authors published results based on similar techniques to investigate the effect of RR history on heart rate correction [12]. Simulations have the advantage that the ‘truth’ is known and, therefore, systematic errors can be assessed reliably. Given the complexity of the field, simulation-based results are highly welcome, particularly if they can be combined with theoretical considerations. The two contributions are timely and show that the actual discussion on heart rate correction is relevant if a medicine is involved that has the potential to change heart rate.

4 Further Research Needed

The simulations reported in the two articles by Malik et al. [4, 5] are restricted to two basic scenarios, one with a large number of subjects and relatively constant concentrations and one with a small sample size and varying concentrations. They do not allow identification of individual factors that contribute to, let alone quantify, this contribution. Above all, it would be important to observe what happens if data including a wider range of heart rates were to be used for the determination of the correction factors. The authors suggest that the narrow RR range results in a lower correction factor, but provide no further details, neither based on a theoretical regression model nor on experimental data. Likewise, the relative contributions of the other design parameters, i.e., sample size and range of dug concentrations, remain confounded with only two scenarios.

The results given provide a stimulus to address the open questions mentioned in the preceding paragraph, be it with similar simulation methods, based on theory, or using real data. For the practitioner who has to design a study to assess the arrhythmogenic effect of a drug that is likely to also affect heart rate, they give a warning but little help.

Therefore, the next steps in research should include a better understanding of existing and novel methods to widen the range of heart rates used to determine a correction method. Clearly, and this is also supported by the second contribution of Malik et al. [5], this must not be done by including on-treatment data. The methods proposed to achieve a wider distribution of heart rate include postural manoeuvres, although there is doubt that these can be performed reliably. If RR history is taken into account properly, one could also consider ECG measurements obtained under less restrictive conditions or the use of all reliably measured heart beats on a baseline day to determine the heart rate correction.

The results presented are a timely, clear, and necessary warning that the problem of heart rate correction cannot be ignored. On the other hand, there is ample experience with studies based on QTcF and no reason to believe that these are flawed in cases of no effect of the drug on heart rate. Further research should help practitioners decide when they have to go the extra step to obtain QTcI and, more importantly, how to do so. Experience is needed on methods to obtain drug-free data with a wider range of heart rates that is of high quality and in a manner that can be included in the already tight schedule of a first-in-man study.