FormalPara Key Points

The previously proposed QTc quality metrics, which quantifies intra- and intersubject consistency in QTc differences, captures factors of QTc quality.

Of the factors evaluated for their influence on QTc quality, QTc measurement methodology is the main driver of QTc quality.

1 Introduction

Most drugs with systemic bioavailability have to undergo a thorough QT (TQT) study which aims at establishing whether the drug affects cardiac repolarization as assessed by the rate-corrected QT interval (QTc). A QTc prolongation of regulatory interest is defined as the upper one-sided 95 % confidence interval (CI) of mean baseline- and placebo-corrected QTc change exceeding 10 ms at any post-dosing timepoints [1, 2].

Overall quality of QTc data is an important factor of TQT studies. Improvement in QTc quality allows the sample size to be decreased, making the study more economical. It could also make early clinical studies sufficiently powered to satisfy the International Conference on Harmonisation (ICH) E14 requirement [1, 2]. In early clinical studies, such as the first-in-man investigations, the necessity of demonstrating assay sensitivity by introducing a separate positive control arm or period is highly impractical. Rather, different proofs of study quality need to be established. Related to this, a set of QTc-quality tests have recently been proposed with the possible aim of eventually removing the need for a positive control [3]. Among others, the proposed tests only deal with the systematicity of QTc measurements and not the bias, which is of less importance when measuring the placebo-corrected QTc change. However, before these QTc-quality tests can be considered further, an evaluation of factors that impact on their results is needed.

Having this in mind, this investigation used the data from a spectrum of existing TQT studies to evaluate the proposed QTc-quality tests. Also, to propose strategies for systematic improvement of QTc data quality, we investigated the influence of different QTc measurements and computations, which were demonstrated to improve the responses to the pharmacological positive control [4].

2 Methods

2.1 Data

TQT studies with multiple drug-free baselines submitted to the US FDA between January 2006 and December 2012 (N = 34) were available for the purposes of this investigation. For each study, the baseline QT and R-R interval measurements at each nominal timepoint were obtained, together with the annotated ECG waveforms from the ECG Warehouse (http://www.ecgwarehouse.com).

2.2 Proposed QTc-Quality Tests

A detailed description of the proposed QTc-quality tests has been previously published [3]. Briefly, the concept is to quantify known physiological relationships and to verify that the consistency of baseline-to-baseline or subject-to-subject differences are preserved, providing confidence in the measurements of the QT and RR (R-R interval).

There are three components of the QTc-quality tests, all of which are evaluated using the randomized treatment allocation to avoid period effects. Test 1 evaluates whether or not the study captures physiologically known differences in individual QTc values, i.e. that intersubject variability is larger than intrasubject variability, with higher intrasubject variability in females compared with males. The test 1 values, or intrasubject variability, are computed by gender. Test 2 quantifies the stability of QT measurements by timepoint across treatment periods. The assumption of test 2 is that the individual baselines tightly reproduce each other. Test 2 is computed as the lower and upper CIs of the time-matched inter-baseline differences. As the CIs might narrow due to increasing sample size, we also included a test for the standard deviation of the time-matched inter-baseline differences. Lastly, test 3 evaluates the stability of intersubject differences across treatment periods; i.e. whether the same QTc difference between different subjects is maintained from baseline to baseline. For test 3, the intersubject differences can either be computed in the averaged QTc values (test 3 − average) or time-matched intersubject differences (test 3 − time-matched). In this analysis, the same software implementation of the tests was used as in their seminal proposal [3] (see the original publication of the tests for further details).

2.3 Protocol Factors

Two protocol factors were considered, potentially influencing the QT measurement quality—QT measurement methodology and recorder type. QT measurement methodology was obtained from the study protocols and defined as either semi-automatic (partial use of a computerized ECG measurement), fully manual, or unknown (if not explicitly stated in study protocol). Recorder type was defined as continuous (e.g. 10-s extractions from a 12-lead Holter) or standard 10-s bedside, if explicitly not stated in the protocol.

2.4 Influence of QTc Computation

To investigate the influence of QT computation, the QT measurements were remeasured by applying the pattern-matching technique as proposed by Hnatkova et al [5]. This technique has been shown to decrease the intrasubject variability [5] and to lower the overall data variability when evaluating the exposure-response of moxifloxacin [4].

In addition, as in the study investigating the moxifloxacin response [4], the impact of using the average heart rate from the 10-s ECG instead of the originally reported heart rate was also investigated. Four different QT and heart-rate combinations were investigated, resulting in different data variants: originally reported QT and heart rate (data 1); originally reported QT and 10-s average heart rate (data 2); QT interval corrected using pattern matching with originally reported heart rate (data 3); and QT interval corrected using pattern matching with 10-s average heart rate (data 4).

The Fridericia-corrected QTc values [6] were used since no substantial heart rate changes in drug-free recordings were anticipated.

2.5 Statistical Analysis

The difference between the QTc-quality test results in groups based on protocol factors was assessed using the Mann–Whitney test. The differences between the numerical values of each of the QTc-quality tests derived from the four data variants were assessed using the Mann–Whitney test. p-values <0.05 were considered statistically significant. Statistical evaluations were performed using R version 2.15.3 (R Foundation for Statistical Computing, Vienna, Austria).

3 Results

The 34 investigated studies (Table 1) included a total of 1,874 subjects (773 females, 41 %). The median number of subjects per study was 53 (range 16–123), with a median of 46 % of female subjects (range 0–100 %). The median number of nominal timepoints and treatment periods were 11 (range 4–28) and 4 (range 3–6), respectively. The median number of 10-s ECGs included in each study was 17,724 (range 4,751–67,007). The median percentage of measurable 10-s ECGs was 98 % (range 90–100 %). In 20 studies, the ECGs were sampled at 1,000 Hz, and in the remaining 13 studies at 500 Hz, except for one study in which the ECGs were sampled at 180 Hz. The median amplitude resolution was 2.5 μV (range 1.0–6.25 μV).

Table 1 Investigated thorough QT studies and their ECG characteristics

3.1 Influence of Protocol Factors

Table 2 shows the comparison of the QTc-quality test for groups of studies based on the protocol factors. There was no difference for ECG recorder type, while the QTc-quality test values were significantly lower (i.e. the QTc data quality was better) in a subset of the tests in studies with semi-automatic QT measurement compared with studies using fully-manual QT measurement.

Table 2 QTc-quality test values grouped by protocol factors

3.2 Influence of QTc Computation

Figure 1 shows the QTc-quality test values obtained from four different data variants. For all QTc-quality tests, the test values were significantly lower (i.e. better data quality) with pattern-matched QT measurements (data 3, 4) compared with original QT measurements (data 1, 2), irrespective of method used to obtain RR intervals (original vs. derived from 10-s signals). There was also a numerical trend of lower QTc-quality test values when combining the pattern-matched QT measurements with RR intervals from full 10-s signals compared with the pattern-matched QT combined with original RR data (data 4 vs. data 3).

Fig. 1
figure 1

Boxplot for each of the QTc-quality tests grouped by data variants of QT [original (data 1, 2) or pattern-matching (data 3, 4)] and RR (R-R interval) [original (data 1, 3) or 10 s average (data 2, 4)] interval measurements. Comparisons between each of the QTc-quality tests by data variant were done using the t-test with the p-values shown (where no p-value is shown, the difference was not statistically significant). Lower and upper for test 3 refer to the proportion of differences below and above −10 and 10 ms, respectively. abs absolute values, avg average, tm time-matched, CI confidence interval, SD standard deviation

4 Discussion

A statistically significant improvement of all the QTc-quality test values was found when involving QT measurement with pattern matching, with or without heart rate data derived from full 10-s signals (data 3 and 4) compared with original QT and RR measurements (data 1). The combination of improved QT measurement consistency by pattern matching with heart rate data from full 10-s ECGs (data 4) led to further lowering of the test values (compared with data 3), and thus to further improvement in data quality.

The observation of a significant difference between some of the QTc-quality tests between studies using manual and semi-automatic QT reading suggests more consistent QTc data in semi-automatically read studies. Unfortunately, we have no data available on how many automatic ECG readings were manually over-read in individual studies. There was no noticeable difference between studies with continuous recordings compared with the others. The lack of a difference might be caused by procedures of ECG extraction from continuous recordings. In some studies, a fixed timepoint relative to the nominal timepoint was used for the extraction (i.e. the extractions from continuous Holter recordings paid no attention to the quality of the signal).

The observation that pattern-matching adjustment of QT measurement and the use of 10-s average RR interval values improve the data quality is important from a practical point of view. It is also consistent with the observation that these techniques increase the precision of other characteristics of TQT measurements [4]. The implementation of these techniques can thus be universally recommended for future TQT studies, as well as other investigations that require accurate and systematic QT and RR measurements.

It has previously been reported that extracting 10-s ECG segments from continuous recordings at stable heart rates with low noise results in more robust QTc data [7]. Our findings do not confirm this observation. Possibly, however, different studies might have used different techniques for extracting the 10-s ECGs from the continuous signals, not necessarily following the previously published assumptions [7]. It should also be noted that in some cases, the ECG recording methodology was not explicitly mentioned, and standard 10-s bedside recordings were assumed. However, it is unlikely this influenced the results because if the application of continuous recording had led to increased quality, and if some of the Holter studies had been incorrectly classified as standard, the quality metric for continuous would be trending to lower values compared with standard, which it did not.

Our observations related to the pattern-matching technique are consistent with the original report by Hnatkova et al. [5]. Using the technology, we found a similar reduction in intrasubject QTc variability, which was also found by Meyer et al. [8]. The finding is also consistent with our observation that the pattern-matching technique of QT measurement improves the response to moxifloxacin-based positive control [4]. The improvement in QTc quality by using 10-s average heart rate instead of the heart rate derived from the preceding RR interval is consistent with the reduction in QTc intrasubject variability achieved by QT/RR hysteresis correction [9]. It is likely that by applying proper hysteresis correction, i.e. using longer heart rate history, will lead to further reduction of intrasubject QTc variability. Not surprisingly, and consistent with previous reports, we also found intrasubject QTc variability larger in females compared with males.

4.1 Limitations

The relatively small number of studies available for this analysis makes subgroup comparisons difficult (e.g. continuous vs. standard recordings). There are several aspects that we were unable analyse, including the influence of different computer algorithms used in semi-automatic measurement, ECG extraction procedures, and the differences among individual readers involved in the processing of ECGs in the same study. These details are not provided by pharmaceutical sponsors submitting a study for regulatory approval. Because of the small numbers, we were also unable to evaluate any differences between studies using single-lead and global QT measurements. It is also likely that the number of ECG replicates could influence the results [10], but as the majority of studies that were included used three replicates (27 of 34 studies), we could not explore the influence of the number of replicates on the QTc quality metrics. Finally, the large spread of data also prevented us from evaluating the influence of the duration of the period between study baselines. Nevertheless, these limitations are unlikely to have biased our main findings.

5 Conclusion

From a practical perspective, the study offers two principal conclusions. First, the previously proposed tests of assessing QTc data quality provide valid characterization of the study data in a variety of conditions. Second, the QTc data quality can be systematically improved by applying adjustment based on the pattern-matching procedures. Similarly, the use of RR interval values derived from longer sections of continuous recordings (most likely longer than the 10-s data used in this study [9]) could substantially improve the data quality in investigations that require accurate QTc measurements.