Introduction

Since the first clinical reports more than 35 years ago, 123I-MIBG imaging developed to a widely accepted tool in the diagnostics of disorders of sympathetic cardiac innervation.1-3 In clinical context, 123I-MIBG is mainly used in patients with documented chronic heart failure and decreased left ventricular ejection fraction in order to identify patients with increased short-term risk of cardiac mortality.4-7

As the determination of absolute myocardial uptake of 123I-MIBG is not feasible at present, the use of the heart-to-mediastinum ratio (HMR) was introduced and validated for the characterization of global myocardial sympathetic innervation.8-10 Based on the results of a large prospective multicenter clinical trial, a HMR cut-off of 1.6 is now generally used for the identification of patients with high mortality risk.4

General limitations of such translating a cut-off value of the continuous variable HMR into the categories high and low mortality risk have recently been discussed elsewhere.11 From technical point of view, the selection of collimator and implementation of scatter correction as well as the introduction of quantitative SPECT imaging may influence HMR systematically and, therefore, may lead to different cut-off values.12-14

A further, potentially relevant technical factor, which may have influence on the value and the variability of HMR, is the method of ROI definition.15-17 In this context, most publications reported generally good observer reproducibility.15,17-19 However, even relatively small variability values may be clinically relevant, and this issue has rather been underestimated in the past.11,20

Additionally, in everyday clinical practice, ROI setting is performed by different certified nuclear medicine technicians and is not completely standardized.

The aim of our study was to compare the inter-observer and intra-observer variabilities of a minimally standardized manual method of ROI definition with a semi-automated tool published earlier.21 Furthermore, we tested the comparability of HMR obtained by one of the methods described above with those generated in the clinical routine diagnostics.

Material and Methods

We retrospectively analyzed 32 digitally archived planar images of 20 consecutive patients (median 63.5 years; 2 females, 18 males) who were referred to 123I-MIBG imaging between January 2014 and October 2015. 60% of these patients (12/20) had cardiac arrhythmias, and in 40% (8/20), neurodegenerative disease was suspected. The patients had a mean heart rate of 74 ± 20/min (range 50-134/min) and a mean blood pressure of 124/75 ± 14/12 mmHg (range 95/50-150/90 mmHg). Among the patients, 40% (8/20) suffered from hypertension, 20% (4/20) from diabetes, and 35% (7/20) from dyslipidemia, respectively. Regarding the patients with cardiac arrhythmias, 16.7% (2/12) had an implantable cardioverter defibrillator (ICD), 25% (3/12) were treated with ventricular tachycardia (VT) ablation, and 83.3% (10/12) were treated with pulmonary vein isolation (PVI). Furthermore, 100% (12/12) received beta blockers and 58.3% (7/12) antihypertensive medication, respectively. Among the patients with suspected neurodegenerative disease, 87.5% (7/8) received antiparkinson medication. All patients gave written informed consent. The retrospective evaluation was approved by the local Ethics Committee (No.: WF-056/15).

123I-MIBG Imaging Protocol

Patients underwent planar imaging at 15 minutes and 4 hours (n = 12) or only at 4 hours (n = 8) after injection of 185 MBq (n = 10) or 370 MBq (n = 10) 123I-MIBG (AdreView™, GE Healthcare B.V., Eindhoven, The Netherlands). Images were acquired with the anterior view over the thoracic region. The acquisition time was 10 minutes (zoom 1.0, matrix 256 × 256, pixel size 2.4 × 2.4 mm). We used ECAM variable angle or Symbia T (Siemens Medical Solutions, Hoffman Estates, USA) gamma camera system with standard LEHR collimators of the manufacturer.

Image Processing

Three different methods for ROI definition were used to determine HMR (Figure 1):

Figure 1
figure 1

Examples of ROI definitions on planar 123I-MIBG images using different methods. (A) ROIs are defined during the clinical routine evaluation (Method CLI). Number 1 corresponds to the myocardial region and number 2 to the mediastinal background; (B) ROIs are defined according to a standardized protocol (Method STA); (C) ROIs are defined semi-automated (Method AUT) using a dedicated software program

  • Method CLI: Heart and mediastinal ROIs were defined manually in daily clinical routine by experienced nuclear medicine technologists, who did not receive further specific instructions.

  • Method STA: The cardiac ROI was manually set by carefully considering the outline of the myocardium and the plane of the left ventricles’ base. A circular mediastinal ROI was drawn in the upper mediastinum around the point with visually lowest regional activity with a size of at least 4 cm2. This procedure was performed by an experienced staff member of the University Hospital Hamburg-Eppendorf (JM) and repeated three weeks after the first evaluation (intra-observer variability). A third evaluation was performed by a second member of staff (CK) with about nine months of clinical experience (inter-observer variability).

  • Method AUT: ROIs were defined using a dedicated semi-automated software as described previously21 by two staff members of the Kanazawa University (KN) and Kanazawa Medical University (KO). Briefly, after pointing into the center of the heart, circular ROI was automatically determined on the heart, for which the radius can be manually changed, if required. A rectangular ROI was automatically placed in the upper mediastinum with a 10% width of the body and a 30% height from the center of the heart to the upper border of the mediastinum. Even with manual adjustment, an operator cannot move the mediastinal ROI directly but adjust only the range of the chest including right and left borders, upper limit of the mediastinum, and the center of the heart. The evaluation was then repeated after 3 weeks (KN) (intra- and inter-observer variability).

Statistical Analysis

Concordance between methods or repeated evaluations was investigated by Bland-Altman-analysis.22 95% limits of agreement (LoA) were expressed as mean of the differences ± 1.96 standard deviation (SD).

Inter-method variability was determined by direct comparison of two methods for all combinations of the methods CLI, STA, and AUT. For the methods STA and AUT, the first observed HMRs, which were used to determine their intra-observer variability, were used.23

Furthermore, the inter-method variability of the methods STA and AUT was assessed by the formula reported by Bland and Altman22,24

$$ {\text{SD}}_{\text{IMV, BC }} = \sqrt {\left({\text{SD}}_{\text{BC}}^{2} + \frac{1}{2}{\text{SD}}_{\text{B,intra}}^{2} + \frac{1}{2}{\text{SD}}_{\text{C,intra}}^{2} \right)} , $$

where SDIMV,BC is the total inter-method variability between STA and AUT, SDBC is the inter-method variability determined by direct comparison of the first observed HMRs of the methods STA and AUT, SDB, intra is the intra-observer variability of method STA, and SDC, intra is the intra-observer variability of method AUT.

Single-factor variance analyses were performed to assess statistical significances of differences between the methods using the analysis of variance (ANOVA) and the post-hoc Tukey-Test.

We also calculated the washout rates (WRs) for 12 patients who underwent early and late imaging using the formula.10

$$ {\text{WR}} = \frac{{\left( {H_{\text{e}} - M_{\text{e}} } \right) - (H_{\text{l}} - M_{\text{l}} ) \cdot k}}{{H_{\text{e}} - M_{\text{e}} }} \cdot 100\, \% , $$

where \( H \) and \( M \) are the mean counts per pixel of the ROI placed over the left ventricle and the upper mediastinum, respectively. The indices e and l refer to early images and late images, respectively. \( k \) is the decay correction factor, assuming that half life of 123I is 13 hours.

All statistic evaluations were performed using SPSS (Version 23.0. IBM Corp. Armonk NY, USA). Data were expressed as mean ± SD. Mean values of differences of HMR were tested using Student’s t test for paired samples. P ≤ 0.05 was considered statistically significant. In order to avoid overestimation of variability, outliers due to definitive ROI displacement during routine data processing were identified and eliminated.

Results

Outliers

Analysis of Bland-Altman-plots revealed three outliers. In one case, we could identify an obvious misplacement of the cardiac ROI with overestimation of the heart extension which included parts of the lung in this region. Consequently, we excluded this result from further analysis. In the other two outliers, no obvious ROI misplacement was identified, and so these data were used for further analysis (Figure 2).

Figure 2
figure 2

Bland-Altman-Plots of heart-to-mediastinum ratio (HMR) obtained by comparing different analysis methods. (A) Method CLI (clinical routine) vs. method STA (standardized protocol); (B) Method CLI vs. method AUT (Semi-automated program); (C) Method STA vs. method AUT

Comparison of Different Methods of ROI Definition

Pairwise comparison of HMR obtained with different methods of ROI definition is depicted (Table 1), and corresponding Bland-Altman plots are presented in Figure 2. The results for the mean difference from comparing the methods CLI with STA, CLI with AUT, and STA were −0.14, −0.22 and −0.08, respectively. SD of the difference resulting from comparing method STA with AUT was 0.09, CLI with STA 0.17, and 0.19 for comparing method CLI with AUT. Including the results of the single-factor variance analyses, ANOVA revealed a significance for comparison among the three methods (P = 0.01). The outcome of the post-hoc Tukey-test confirmed the aforementioned results: Comparing method CLI with AUT, there was a significant difference (P = 0.01) (Table 1). The comparisons between method CLI with STA (P = 0.13) and between method STA with AUT (P = 0.45) demonstrated no significant differences.

Table 1 Comparison of the HMR values obtained with different ROI definition methods

Intra- and Inter-observer Variability

Table 2 summarizes the estimated inter- and intra-observer variability for STA and AUT, including the respective 95% LoA and the corresponding numbers of HMRs belonging to the gray zone around 1.60, which was considered the lower limit of normal sympathetic innervation.

Table 2 Inter- and intra-observer variability of different analysis methods

The intra-observer variabilities for both methods were 0.01 ± 0.05 (mean difference ± SD), the 95% LoA for STA were −0.09-0.11, and for AUT -0.08-0.10.

Considering the inter-observer variability, the values were −0.07 ± 0.07 for method STA and 0.01 ± 0.03 for method AUT. The 95% LoA for method STA was −0.21-0.07 and for method AUT −0.06-0.07 (Table 2). Method AUT presented the smaller inter-observer variability and seemed to be the most stable method.

Defining the 95% LoA as the width of the gray zone and transferring it to the HMR value of 1.60 as the normal limit according to the ADMIRE-HF study, the corresponding limits were 1.46-1.74 for method STA and 1.54-1.66 for method AUT, respectively (Figure 3). Thus, of all HMR measurements in our study, 13 of 32 (41%) for method STA, and 5 of 32 (16%) for method AUT, respectively, were lying within this gray zone and could not be rendered to normal or pathologic sympathetic cardiac innervation.

Figure 3
figure 3

Scatter plots of heart-to-mediastinum ratios (HMR) of different analysis methods with inter-observer variabilities referred to 1.60. The left-point diagram depicts method STA (standardized protocol). The right-point diagram displays method AUT (semi-automated program). The horizontal bars show the 95% LoA of the inter-observer variabilities from Table 2 referred to 1.60 according to the ADMIRE-HF study. A diagram for method CLI (clinical routine) could not be presented, since the HMR were determined only one time in order to avoid stabilizing effects of repeated evaluations

Washout Rate

Table 3 shows the results for the calculated WRs in 12 patients for methods CLI, STA, and AUT. Concerning the difference between two methods, the SDs of WRs are 7.9% for method CLI and STA, 5.8% for method CLI and AUT, and 3.6% for method STA and AUT. The SDs of the WRs concerning the difference between two observers within one method are 5.3% for method STA and 2.6% for method AUT.

Table 3 Washout rate (%) obtained by different ROI definition methods

Discussion

123I-MIBG scintigraphy is a worldwide established method for the evaluation of sympathetic neuronal innervation of the myocardium. It is relevant in evaluation of the prognostic risk of sudden cardiac death in patients with cardiomyopathies and chronic heart failure. Different quantification methods are used in daily clinical routine to determine the HMR, which classifies patients into different risk groups. As a result of the international multicenter ADMIRE-HF study,4 the cut-off value defining patients at risk for fatal arrhythmias has been set at a value of 1.60.

Methodological Considerations

Several factors influence the HMR. Apart from the time points of data acquisition and the specific activity of 123I-MIBG, parameters of data acquisition including selection of collimator and especially ROI definition are critical factors. Surprisingly, the guidelines for 123I-MIBG in cardiac diagnosis10 do not give explicit advice about ROI definition. Somsen et al. indicate the influence of the shape of myocardial ROIs and recommend the use of a standardized shape of the myocardial ROI, including the myocardium and the left ventricular cavity.16,17 Some studies consider the size of mediastinal ROI, but do neither give general recommendations for exact ROI placement nor discuss the effect of ROI positioning on HMR values.17 However, it has been demonstrated that the HMR is in general highly reproducible by analyzing intra- and inter-observer variabilities.15 This was supported by a recent study, stating that the size of manually set ROI in the visually best position does not have a clinically relevant effect on the HMR.19 Other studies confirmed the good agreement in intra- and inter-observer variability using only manual analysis.19 Using a semi-automated ROI definition also leads to low intra-observer variability (less than 0.1) as described by Okuda et al.21 In line with this study, we found a low intra- and also a low inter-observer variability during semi-automated HMR analysis (Method AUT).

Interestingly, only minimal standardization in manual ROI definition (Method STA), by recommending to set the mediastinal ROI around the area with minimal regional activity, resulted in variable values similar to the semiautomatic method (Method AUT). However, we also demonstrated that comparing standardized or semi-automated analysis methods (Method STA and AUT) with a non-standardized method (Method CLI) leads to a significantly higher variability. Table 4 gives a literature overview of different HMR analysis methods and the respective inter- and intra-observer variabilities.

Table 4 Literature overview of inter- and intra-observer variabilities

Due to the limited number of patients, data of WR are to be interpreted carefully. Comparing the results of HMR and WR, the tendencies are the same: method CLI has the highest variability, and method AUT has the lowest.

As WR is calculated based on the ROIs for heart and mediastinum, standardization might therefore have a stabilizing effect.

The Gray Zone and Its Consequences for Patient Classification

A major challenge in HMR analysis is to determine a cut-off value using a continuous variable for dichotomizing a binary decision resulting in a relevant gray zone.11

Transferring the uncertainty of HMR analysis to our patient data reveals that using a confidence interval (CI) of 95% leaves even using a standardized evaluation protocol up to 40% of values in the measurement gray zone. HMR values close to the cut-off value make patient classification challenging. Further, when only one HMR measurement falls between 1.50 and 1.60, there is a relevant chance that the classification will change if the measurement is repeated. A practical question from these data is how confident is the assignment of the individual patient to each of the risk categories,20 and how reliable 123I-MIBG studies are.

Reproducibility of HMR is especially relevant when repeated measurements are necessary to evaluate disease progress or changes after interventions.25 Some studies found that HMR on 123I-MIBG imaging provides prognostic information in patients with chronic heart failure26,27 and dilated cardiomyopathy,28 although neither of them assessed the reproducibility of 123I-MIBG imaging.

Otherwise, some authors found a high reproducibility of planar HMRs18 and SPECT defect scores in heart failure patients.15 Our analysis reveals that HMR results can vary from measurement to measurement and should not be taken as strict values to determine the patients’ prognosis or treatment, especially close to the cut-off value. This problem can be compensated at least partly by estimating probability of cardiac events considering HMR merely as a parameter among others for individual risk estimation. This assessment is supported by a study from Nakajima et al.29 predicting, model-based, different 5-year mortalities for patients with chronic heart failure.

Regarding the role of WR, no specific WR has been determined as the optimal threshold. In literature of 123I-MIBG for prognostic evaluation, threshold for discriminating good and poor prognosis ranged from 27% to 67%, and a normal 123I-MIBG database in Japan showed the upper limit of 34%.30 Our study, however, showed that potential variation of approximately ±10% could happen depending on the ROI setting. If the 20% range is a tentative gray zone for WR, 30%-40% data points in our study could be included in this range. Since the standardized or automatic methods reduced variability of WR, ROI setting should also be standardized for steady WR calculation.

Considering that intra- and inter-observer variabilities are either not taken into account or vary from study to study (Table 2), it may be of relevance to generate an internal measurement gray zone. To facilitate this process, anonymized data exchange between clinics could support this database.

Limitations

Concerning method CLI, HMRs were determined only once in order to avoid stabilizing effects of repeated evaluations.32 Consequently, the intra- as well as the inter-observer variability could not be determined for this method. The corresponding variability and its effect on the gray zone could therefore only be discussed indirectly.

Due to the limited number of patients, results of WR have mainly exemplary character and do not allow final conclusions. However, the influence of ROI setting on WR was demonstrated as crucial. A further possible limitation is the lacking of clinical gold standard partly due to inclusion of cardiology and neurology patients. Without using a gold standard, we could only demonstrate that the width of limits of methodical uncertainty depends on the evaluation method and might be of clinical relevance. Using a validated endpoint could support the discussion.

New Knowledge Gained

The effect of ROI definition on the HMR in cardiac sympathetic imaging has been rather underestimated in the past. We could demonstrate that in clinical context, depending on the method of ROI definition, up to about 40% of results may be lying in a gray zone around 1.6, where the rendering to normal or abnormal innervation is not possible. This problem may compensate using standardized evaluation methods, and their validation by means of clinical software phantoms is strongly recommended.

Conclusion

In 123I-MIBG imaging analysis, calculation of HMR is crucial for risk classification of patients with chronic heart failure. The lack of exact and standardized measurements leads to a variable gray zone influencing clinical decision making. In order to keep the consequences of methodological variabilities controlled, standardization and subsequent validation of analysis are essential. An internal validation of evaluation methods using anonymized datasets is strongly recommended.