Introduction

Neuroblastoma (NB) is the most frequent solid extracranial malignant tumour in children [1]. Based on a risk-adapted therapy, the prognosis in these patients has continuously improved during recent decades. However, NB is still the cause of 15% of all cancer-related deaths in childhood [2]. In Germany, treatment and follow-up are performed in accordance with the NB2004 trial protocol [3]. This allows refined risk stratification into three treatment groups based on clinical factors (age, tumour stage) and genetic determinants (MYCN gene amplification, chromosome 1p36 abnormalities). The high-risk group includes all patients with MYCN amplification as well as patients older than 1 year with stage 4 disease. Despite a highly intensive treatment regimen, including high-dose chemotherapy, external beam irradiation, 123I-metaiodobenzylguanidine (MIBG) therapy and immunotherapy, 5-year event-free survival (EFS) and overall survival (OS) of high-risk patients are only about 40% and 50%, respectively [4, 5]. In contrast, long-term survivors may suffer from therapy-associated morbidity and mortality (e.g. hearing loss, infertility or secondary malignancies) [6, 7]. Furthermore, low-risk patients (no genetic abnormalities, tumour stage 1–3 or 4S) who are primarily treated in the observation group may experience progression to localized or disseminated disease and thus require adapted treatment. Early identification of this subgroup of patients might allow a reduction in intensity of therapy, and therefore of toxicity and complications, by enabling early chemotherapy or primary resection of initially small tumours. However, this would require accurate risk stratification.

MIBG scintigraphy with single-photon emission computed tomography (SPECT) is an integral part of the diagnostic work-up regarding staging and restaging in NB. Furthermore, the Curie and SIOPEN scores which represent the number of body segments affected by MIBG-avid skeletal metastases in patients with stage 4 disease correlate with EFS and OS [8, 9]. Nevertheless, there is currently no quantitative parameter derived from MIBG SPECT that can predict a patient’s outcome independently of the tumour stage. Among other parameters, the metabolic tumour volume (MTV) obtained by positron emission tomography (PET) has shown prognostic value in different entities [10,11,12]. Furthermore, the asphericity (ASP) of the distribution of metabolic activity within the tumour can be quantified and serves as a surrogate marker for the aggressiveness of the tumour biology. It can predict progression-free survival and OS [13, 14]. Recently, the concept of ASP has been successfully transferred to SPECT imaging [15].

Therefore, the aim of the current study was to evaluate the prognostic significance of quantitative measures derived from pretherapeutic MIBG SPECT images as well as laboratory parameters with respect to EFS. The goal was to identify prognostic parameters that might help differentiate and improve the existing risk assessment and identify patients with a high risk of impaired EFS despite having already received risk-adapted treatment. Finally, multifactorial predictive models were calculated to utilize the additional and complementary value of different parameters and their combinations for individualized risk assessment.

Materials and methods

Patients

This retrospective study included 23 consecutively examined children (11 girls, 12 boys; age median 1.8 years, range 0.3 to 6.8 years) newly diagnosed with NB and referred to the two participating departments (Charité – Universitätsmedizin Berlin, Germany, and University Medicine Greifswald, Germany) for pretherapeutic 123I-MIBG scintigraphy/SPECT between 2008 and 2016. All patients fulfilled the following criteria: the scintigraphic examination included SPECT or SPECT/CT, the primary tumour was MIBG-avid, and MIBG SPECT was performed before initiation of any antineoplastic treatment (18 patients) or the latest 1 week after the start of chemotherapy (5 patients). All patients were part of the German therapy optimization trial NB2004; i.e. risk stratification, treatment and follow-up visits were performed according to the NB2004 protocol [3]. An event was defined as disease progression, relapse or death from any cause.

Laboratory parameters and genetic analysis

Laboratory parameters were obtained at the time of diagnosis. Urinary catecholamine metabolites were extracted from samples of spontaneously voided urine and levels of excreted vanillylmandelic acid (VMA) and homovanillic acid (HVA) were calculated relative to that of creatinine. Normal values for the VMA-to-creatinine ratio (VMA/C) were 95 μmol/g in infants and declined to 42 μmol/g in children >4 years of age. Maximum normal values for the HVA-to-creatinine ratio (HVA/C) also declined with increasing age and ranged from 70 to 179 μmol/g. Normal maximum values for serum neuron-specific enolase (NSE) ranged from 20 to 25 μg/l while the upper limits for serum lactate dehydrogenase (LDH) declined from 600 U/l in newborns to 300 U/l in children between 2 and 15 years of age. MYCN amplification status as well as chromosomal aberrations in 1p36 were assessed by fluorescence in situ hybridization (FISH).

123I-MIBG SPECT acquisition

The activity of 123I-MIBG administered was weight-adapted based on the recommendations of the European Association of Nuclear Medicine (EANM) [16]. SPECT imaging was performed 24 h after tracer injection in addition to planar whole-body projections. In 9 of 23 patients (2008 to 2011), MIBG SPECT imaging was performed without attenuation correction using a GE Millennium VG Hawkeye SPECT/CT system (GE Medical Systems, Milwaukee, WI, USA) equipped with a low-energy high-resolution (LEHR) collimator and iterative reconstruction (2D OSEM algorithm, matrix 128 × 128 × 128, voxel size 4.42 × 4.42 × 4.42 mm). In 13 of 23 patients (2012 to 2016), MIBG SPECT was performed using a Siemens Symbia T6 SPECT/CT (Siemens Healthcare, Erlangen, Germany). Images were acquired with a LEHR collimator and reconstructed iteratively (Flash 3D OSEM algorithm, matrix 128 × 128 × 81, voxel size 4.8 × 4.8 × 4.8 mm). In 10 of these 13 patients, combined SPECT/low-dose CT (X-ray tube current 24 mA, voltage 130 kV, slice thickness 1.25 to 5 mm, convolution kernel B08s) was performed instead of SPECT alone, and CT-based attenuation correction was included in image reconstruction.

In one patient, SPECT images were acquired with a GE DST XL dual-head camera equipped with a LEHR collimator (GE Medical Systems, Milwaukee, WI, USA). Reconstruction was performed iteratively (2D OSEM algorithm, matrix 128 × 128 × 128, voxel size 3.95 × 3.95 × 3.95 mm).

123I-MIBG SPECT quantification

SPECT data were quantified with dedicated software (ROVER, version 3.0.23; ABX advanced biochemical compounds GmbH, Radeberg, Germany). The primary tumour was delineated using a semiautomatic, background-adapted algorithm [17], and subsequent manual correction to exclude nonmalignant tissue if required. The maximum count rate (T max), average count rate (T mean), MTV and ASP of the primary tumour were calculated. ASP was defined as the relative deviation of the MTV surface from that of an isovolumetric sphere (as described by Apostolova et al. [14]). The average hepatic count rate (L mean) was defined based on a 30-ml sphere placed in physiological liver tissue to calculate tumour-to-liver count rate ratios (TLCRR; T max/L mean, T mean/L mean).

Statistical analysis

Statistical analysis was performed using SPSS 22 (IBM Corp., Armonk, NY, USA) and R 3.3.2 (Foundation for Statistical Computing, Vienna, Austria, 2016; http://www.R-project.org). Descriptive parameters are expressed as medians, interquartile range (IQR) and range, unless otherwise specified. The associations between the EFS and clinical parameters (sex, age, tumour stage), genetic factors (MYCN amplification status, 1p36), laboratory parameters (VMA/C, HVA/C, NSE, LDH), histology (Hughes grade [18]) and all quantitative SPECT parameters (ASP, MTV, T max/L mean, T mean/L mean) were analysed using univariate Cox proportional hazards regression. Additionally, Kaplan-Meier curves with respective log-rank tests were calculated following binarization of metric parameters using cut-off values defined by receiver operating characteristic (ROC) curves (Supplementary material). The optimal cut-off value was defined as the point on the ROC curve with the shortest distance d to the point (0,1) calculated as follows:

$$ d=\sqrt{{\left(1- Sensitivity\right)}^2+{\left(1- Specificity\right)}^2} $$

All variables with p ≤ 0.1 in the univariate Cox regression were also subjected to step-wise inclusion and exclusion into a predictive model for EFS based on their contribution to the likelihood ratio (LR) χ 2 and Harrell’s C index [19] of the final model. The single parameter with the highest χ 2 was chosen as the starting parameter. Metric variables were included after binarization as described above. The contribution of single parameters to the overall predictive accuracy of the model was evaluated with the LR χ 2 test using the survival and rms package of R. Statistical significance was assumed at a p value of ≤0.05.

A Kaplan-Meier subanalysis separated patients based on the original risk group according to the NB2004 trial. The medium-risk group included one patient only and was therefore not considered in the subgroup analysis. Calculation of p values was omitted due to the low sample sizes in the subgroups. Cut-off values for metric variables calculated for the total population were retained unchanged for this subanalysis.

Results

Table 1 provides an overview of all patients. Of the 23 patients, 12 had been assigned to the observation (low-risk) group, 1 patient to the medium-risk group and 10 patients to the high-risk group according to the NB2004 protocol [3]. The median follow-up duration of the survivors was 36.0 months (range 12 to 107 months; IQR 29 to 65 months). Events occurred in 8 of the 23 patients (34.8%; progression in 4; relapse in 3; death without progression/relapse in 1), and 4 patients (17.4%) died during follow-up. All patients in the medium-risk and high-risk groups showed a partial response (10 patients) or a complete response (1 patient) to induction chemotherapy. Both patients in the observation group treated primarily with short-term chemotherapy (patients 12 and 18) also showed a partial response (Table 1).

Table 1 Overview of all patients with all event data

Prognostic relevance of single parameters

In the univariate Cox regression analysis (Table 2), ASP (p = 0.029; hazard ratio, HR, 1.032 for a one-unit increase; 95% CI 1.0 to 1.06), MTV (p = 0.038; HR 1.012; 95% CI 1.0 to 1.024) and MYCN amplification status (p = 0.047; HR 4.7; 95% CI 1.02 to 21.4) were significant predictors of EFS. Patient age (p = 0.09; HR 0.6; 95% CI 0.3 to 1.1), chromosome 1p36 status (p = 0.09; HR 4.1; 95% CI 0.8 to 21.4) and elevated HVA/C (p = 0.071; HR 1.002; 95% CI 1.0 to 1.0) showed a tendency to significance. All other parameters were not predictive of EFS.

Table 2 Univariate Cox regression analysis for all examined parameters

ROC analysis for metric parameters identified ASP >32%, MTV >46.7 ml and HVA/C >181 μmol/g as optimal cut-off values (Supplementary material). According to Kaplan-Meier analysis, the mean EFS in patients with high ASP (>32.0%) and low ASP were, respectively, 20.8 months (95% CI 10.5 to 31.2 months) and 88.1 months (95% CI 69.3 to 107.0 months; log-rank test, p = 0.013; Figs. 1 and 2). The mean EFS in patients with high MTV (>46.7 ml) and low MTV were, respectively, 21.8 months (95% CI 10.5 to 33.1 months) and 87.3 months (95% CI 67.9 to 106.7 months; p = 0.023). The mean EFS in patients with and without MYCN amplification were, respectively 26.5 months (95% CI 4.2 to 48.8 months) and 85.1 months (95% CI 66.5 to 103.6 months; p = 0.029).

Fig. 1
figure 1

Imaging in two example patients with high and low ASP: transverse (a, e) and coronal (c, g) MIBG SPECT images and transverse (b, f) and coronal (d, h) fused SPECT/CT images in the two patients with the MTV delineated (yellow line). a–d A 1-year-old boy (patient 17) with NB of the right thoracic sympathetic trunk treated in observation group (tumour stage 1). ASP was very low (5%). No event occurred within 29 months of follow-up. e–h A 3-year-old girl in the high-risk group (patient 11) with NB originating from the right adrenal gland (stage 4, MYCN amplified). ASP was high (76%), mostly due to extensive central necrosis (red arrow). Relapse occurred after 15 months, and the patient died after 36 months

Fig. 2
figure 2

Kaplan-Meier curves with respective log-rank tests for EFS for selected parameters with p ≤ 0.1 in the univariate Cox regression (pEFS predicted EFS)

Predictive models

The optimal model for predicting the EFS combined ASP and HVA/C and yielded a χ 2 of 11.5 and Harrell’s C of 0.8. Only slightly higher predictive accuracies were obtained when MTV or MYCN amplification status was added to the model (χ 2 11.7, C 0.82, LR χ 2 test p = 0.64, for MTV, and χ 2 13.0, C 0.85, p = 0.47, for MYCN). In contrast, a slightly but significantly lower predictive value was obtained when MTV and HVA/C were combined in one model (χ 2 10.6, C 0.81, p < 0.001).

Based on the predictive model including ASP and HVA/C, none of the seven patients without metabolic risk factors (mRF) suffered an event while three of ten patients (30%) with one mRF and five of six patients (83.3%) with two mRF suffered an event (log-rank test p = 0.005; Fig. 3). In the risk model including MTV and HVA/C, none of the seven patients without mRF suffered an event, three of nine patients (33.3%) with one mRF and five of seven patients (71.4%) with two mRF suffered an event (log-rank test p = 0.01).

Fig. 3
figure 3

Kaplan-Meier curves with log-rank tests for EFS in patients with zero, one and two metabolic risk factors (mRF) according to the predictive models including ASP and HVA/C or MTV and HVA/C. The log-rank test was performed for the overall model (pEFS predicted EFS)

Subanalysis of different risk groups according to NB2004

In patients in both the observation group and the high-risk group (according to NB2004 [3]), the mRF ASP, MTV and HVA/C as well as the predictive models were able to separate patients with a high risk from those with a low risk of progression/relapse (Fig. 4).

Fig. 4
figure 4

Kaplan-Meier curves for selected single parameters and the predictive models in patients in the observation group (OG) and the high-risk group (HRG) (pEFS predicted EFS)

Patients in the observation group received treatment (chemotherapy) only if organ-threatening tumour growth or tumour progression occurred. Seven of ten patients in the observation group had at least one mRF (ASP HVA/C model). An event did occur in four of these seven patients (57.1%) and had to be treated subsequently (Table 1). Due to organ-threatening tumour growth, one of these four patients (patient 18) underwent limited chemotherapy at initial diagnosis, but experienced disease progression during follow-up necessitating tumour resection. None of the three patients with one mRF but a favourable outcome had been treated initially. No event occurred in any of the five patients in the observation group without a mRF. All patients in the high-risk group were treated with multidrug chemotherapy including high-dose chemotherapy followed by autologous stem-cell transplantation. Five of ten patients in the high-risk group had both mRF, and an event occurred subsequently in four of these patients (80%) (progression/relapse). No event occurred in any of the five patients with only one or no mRF.

Discussion

This retrospective, explorative study evaluated the potential of MIBG SPECT-derived, clinical, genetic and laboratory parameters for individualized prediction of the risk of disease progression/relapse in 23 children with NB.

To the best of our knowledge, this is the first study demonstrating evidence that quantitative parameters obtained from pretherapeutic 123I-MIBG SPECT can help stratify patients with NB according to their risk of disease progression/relapse under current treatment regimens. ASP of the primary tumour was a significant predictor of EFS in the univariate Cox regression analysis. The HR of 1.032 per one-unit increase represents a 15-fold increased risk of progression/relapse in the patient with the highest measured ASP (87%) compared to the patient with the lowest ASP (0%). High ASP is regarded as a surrogate marker for aggressive tumour biology as it results from either intratumoral parts with low tracer avidity (e.g. low tumour perfusion with central necrosis; Fig. 1) or a heterogeneous, spiculated outer surface of the MTV. In patients with stage 4 disease, metastases were not delineated to ensure a uniform methodology throughout the cohort. Applying ASP for the first time in SPECT imaging, Wetz et al. recently showed that high ASP in somatostatin receptor imaging significantly predicts failure to respond to peptide receptor radionuclide therapy in patients with metastatic gastroenteropancreatic neuroendocrine neoplasms [15].

Also, the MTV of the primary tumour itself significantly predicted EFS. Notably, the size of the MTV and its ASP are independent parameters in principle because the ASP of the MTV is calculated after correction for its volume (according to the equation used to define ASP by Apostolova et al. [14]). Nevertheless, as well as large size in itself being a risk factor, large tumours are more likely to show higher ASP and therefore less favourable tumour biology. The TLCRR did not predict EFS in our cohort. Fendler et al. found that a high TLCRR is helpful in differentiating neuroblastic tumours with an unfavourable histopathology (i.e. NB) from ganglioneuroma or ganglioneuroblastoma [21]. However, no dedicated evaluation of its prognostic value in NB has been performed.

In addition to these SPECT-derived parameters, common clinical and genetic risk factors in NB patients were evaluated. However, these established and proven prognostic factors were only included for the purpose of comparison; the aim was not to verify whether or not the factors are relevant. In this small study, although tumour stage was not predictive of EFS, patient age showed a tendency towards significance. It should be noted that all patients underwent risk-adapted treatment which aims to account for these two variables. Moreover, neither of these parameters was found to be a stand-alone prognostic factor in the current risk assessment [3]. In contrast, the MYCN amplification status – the most important single conventional prognostic parameter – predicted EFS in the univariate Cox regression analysis even in patients under risk-adapted treatment.

A higher HVA/C ratio tended to predict EFS. The urinary VMA/HVA ratio has been studied more extensively. However, it was prognostic only in a subgroup of patients with localized disease and non-amplified MYCN [22, 23]. The current results on HVA/C are supported by the findings of two previous studies. Strenger et al. showed in 114 children with NB that an elevated urinary HVA (higher than 2.5 times the mean of healthy patients) predicted shorter OS (p = 0.015) [24]. Zambrano and Reyes-Múgica found in 33 patients with NB significantly higher urinary HVA excretion in patients with stage 4 than in patients with stage 1–3 tumours [25]. A meta-analysis showed that urinary levels of VMA or HVA alone – in contrast to serum LDH or NSE – are not significant predictors, although some of the available data were insufficient for analysis. The authors concluded that interstudy comparison is limited due to a lack of uniform methodology in data acquisition (study sample composition, laboratory sample collection and reporting, definition of “elevated levels”, and statistical parameters) [26]. Thus, the universal usage of any laboratory parameter for risk stratification would first require methodological standardization. In the current cohort, the predictive significance of all laboratory markers and their optimal cut-off values were calculated independently of patient age and age-dependent ranges of normal values. Normal values are only provided for interstudy comparison.

As no single parameter will be sufficient to predict the outcome in any individual patient, multifactorial risk models were designed and calculated. Both final models included the HVA/C ratio and one SPECT-derived parameter (ASP or MTV) and showed only slightly different predictive accuracies. Most importantly, both predictive models allowed the identification of a subgroup of patients in whom no events occurred. These patients would be ideal candidates for treatment de-escalation. In contrast, patients with two combined mRF were very likely to experience disease progression or relapse and could benefit from intensified treatment. Further characterization of patients with intermediate risk (one mRF) is pending – although even the presence of only one mRF in the patients in the observation group substantially increased the risk of an unfavourable EFS. Calculation of these risk models also underlined that the optimal combination was not necessarily defined by the two parameters with the highest single predictive accuracy but by those parameters that provided the highest supplementary value and thus the best combined accuracy (assessed in terms of Harrell’s C and the LR χ 2 value).

The implications of improved initial risk stratification are substantially different between low-risk (observation group) and high-risk patients. While OS in patients with low-risk disease is very good, disease progression may still occur necessitating intensive treatment analogous to that in high-risk patients at worst [3] – which was true for patient 9 in our cohort (Table 1). Early identification of patients with a substantial risk of tumour progression may allow less intensive treatment compared to the situation when tumour progression has already occurred. On the other hand, identification of a subgroup of high-risk patients with a better prognosis than “ultra-high-risk” patients might allow a reduction of therapy intensity in these patients in the future. Kaplan-Meier analysis demonstrated that patients with a low risk and a high risk of tumour progression/relapse can be differentiated among those current “low-risk” and “high-risk” patients using ASP, MTV, HVA/C and the combined risk models (Fig. 4). However, due to the small number of patients, these first results must be interpreted with caution and require validation in a larger cohort. Moreover, these results do not imply that mRF should supplant any of the existing risk factors; as stated above, the aim was to identify parameters derived from previously potentially under-represented modalities to further differentiate the risk assessment in this heterogeneous tumour entity.

The limited sample size was the major limitation of the current study and may have resulted in either underestimation of the predictive relevance of some of the parameters examined (beta error, large 95% CI) or premature confirmation of the predictive value (alpha error) due to a selection bias. However, the latter was minimized by including all consecutive patients treated at both institutions who met basic inclusion criteria. In addition, the medium-risk group according to NB2004 [3] comprised only one patient, which must be taken into consideration, as no conclusions in this risk group can be drawn. Three different SPECT devices were used, and attenuation correction (CT-based) was performed in only ten patients. However, post-hoc intraindividual comparison of SPECT parameters obtained from noncorrected and corrected data did not yield substantially different results in these patients.

Conclusions

In order to improve the outcome of patients with NB a more sophisticated, individualized risk stratification is required. In this explorative study, imaging parameters relating to tumoral metabolic activity, ASP and MTV, derived from pretherapeutic 123I-MIBG SPECT in particular in combination with the level of urinary HVA excretion allowed identification of children with a high risk and a low risk of progression/relapse treated according to current therapy protocols. These results, however, need validation in a considerably larger patient cohort in a multicentre setting to verify the correct selection of predictive parameters as well as the optimal cut-off values.