Introduction

Chronic myeloid leukemia (CML) is a myeloproliferative disorder delineating from a multipotent stem cell. If untreated, this disease follows a typical course, from chronic phase (CP) towards an ill-defined accelerated phase (AP) to the final blastic phase (BP) within 3 to 5 years. In individual patients, this time course may considerably vary, but only single cases are reported surviving in CML-CP longer than 20 years from the pre-imatinib era [58, 59]. The majority of patients are diagnosed in CML-CP [13]. The chance for long-term survival of patients in AP is still very limited, while the median survival once the disease has evolved to BC is only 3 to 6 months [10]. The goal of any treatment approach is therefore to prevent the progress of CML facing the problem of how to identify low- or high-risk individuals.

In adult CML, the median age at diagnosis is around 50 years, while for subgroups like adolescents and young adults (14 to 30 years old), a median age of 24 years was reported [44]. CML manifesting before the age of 18 years is a very rare disease accounting for only 3 % of all pediatric leukemias with an incidence of 1.2 per million children [7, 11, 51, 55]. In pediatric cohorts, the age is almost equally distributed (median 11 years, range 1–18 years). It is a matter of an ongoing debate whether the biologic behavior of CML—especially the speed by which the disease progresses—differs in children from adolescents and adults [2, 21, 35]. But undoubtedly, host factors in growing children are distinct from those of adults, which raises issues specific to the care of pediatric patients with CML [22].

For three decades, different working groups have attempted to predict the prognosis of individual CML cases at diagnosis on the background of a standardized treatment applied [7, 20, 23, 43]. Sokal et al. were the first who demonstrated that by physical findings (e.g., spleen size) in combination with laboratory parameters (e.g., number of platelets in the blood count), patients treated with conventional chemotherapy (busulfan or hydroxyurea) could be categorized into three risk groups [47, 49]. For the prediction of survival, especially the age exerted a significant influence on this score. This led to the establishment of the Sokal Young score (Sy) for adolescents and younger adults, describing survival in the 1980s [48]. Later on, the Hasford score was considered the best predictor of survival for patients treated with interferon alpha-based regimes, mostly replacing the Sokal score but maintaining categorization of patients into three risk groups [20, 23].

Whether the Sokal and Hasford scores are also valid for patients treated with tyrosine kinase inhibitors (TKIs) have been debated controversially [20, 43]. Consequently, some years ago, the EUTOS score was implemented as a new and simple calculation requiring just the two parameters spleen size and percentage of blood basophils to group patients into low- or high-risk individuals for prediction of achieving early cytogenetic response by imatinib treatment at month 18 as a solid surrogate marker of progression-free survival [21]. More generally, all scoring systems have been extensively evaluated in adults, proved to be good predictors of an individual patient’s risk, and therefore are currently included in clinical guidelines on diagnosis and treatment of CML [4, 39]. However, it must be stressed that the endpoints of each scoring system (Sokal and Hasford: survival; EUTOS: cytogenetic response) vary.

Allogeneic stem cell transplantation (SCT) was the classical approach for an attempt to cure CML in the last millennium [17]. With regard to the better outcome of this procedure in younger patients and increasing size of donor registries, children and young adults almost invariably qualified for SCT before the year 2000 [16, 38, 53, 54]. If performed early after diagnosis multiple studies also demonstrated that outcome of SCT was superior. Therefore, in the past, most children and adolescents underwent SCT early after diagnosis, and thus, the natural course of CML was superimposed by the risk of toxicity and GvHD-associated deaths in pediatric patients [10, 53]. The highly successful application of imatinib has changed this scenario completely as SCT has become the treatment of second or even third line after failure of TKI treatment also in children [2, 11, 12, 25, 28]. Presently, combined therapies, variability of dose, and even temporary or prolonged discontinuation of TKIs are challenging questions asked by most clinical trials [2, 14, 15, 24, 46, 57, 58, 59]. Together with the rarity of the disease, this may explain why no scoring system for the age group below 18 years has been established in a formal manner so far and why most—if not all—therapeutic algorithms in pediatric CML are derived from the experience generated in adults [1, 2, 11, 39, 51, 53, 52, 55].

Here, we analyze a cohort of pediatric patients with CML in chronic phase at diagnosis using the prognostic scores as established in adults in a comparative fashion and question the value of the scoring systems especially with regard to grouping individual children differently or homogeneously into a defined risk category. In addition, we analyzed which scoring system would classify the early treatment response of pediatric CML most specifically in comparison to the data of an early molecular response to imatinib treatment at month 3 and month 6.

Material and methods

Data of pediatric patients enrolled in the prospective trial CML-PAED-II [51] was collected on standardized forms and filled in by the treating physicians covering the recruitment period from February 2006 to June 2012. Attempts to collect missing data into the registry were undertaken by the study documentation team. Written informed consent according to the Helsinki Declaration was obtained from all patients or their legal guardians. The study was approved by the Ethical Committee of the Medical Faculty of the Technical University of Dresden (ethical vote no. EK282122006).

Children and adolescents aged 0 to 18 years with a diagnosis of CML-CP were eligible for the analysis. Standard treatment with imatinib 260–340 mg/m2 was initiated within 3 to 5 days when the diagnosis of Ph+ chromosome or BCR-ABL1 had been confirmed by either cytogenetic or molecular analysis. Age, sex, data on physical examination, and blood count parameters at the time of diagnosis were extracted from the study database (e.g., basophiles in percentage of differential WBC; the spleen in centimeters below the costal margin). The minimal essential data set to calculate each score is listed in Table 1.

Table 1 Parameters required for the calculation of each score (indicated by X) and number of patients of whom the necessary parameters could be extracted from the database

For score calculation, mathematical equations were applied as given in Table 2. The calculations for EUTOS, Hasford, and Sokal scores were performed using statistics software R (version 3.1.1 (2014-07-10)) and confirmed with both the online Internet resources of the European Leukemia Network and the German “Kompetenznetz Leukämien.” Sy score was calculated manually [3, 48]. Sokal scores were rounded to quote two decimal places. On the basis of the calculated index, the patients were grouped into risk categories using the original categorization low risk, intermediate risk, and high risk for the Hasford, Sokal, and Sy scores, respectively, while the EUTOS score categorizes only into two risk groups (low and high) [20, 21, 47, 48].

Table 2 Mathematical formulas used to calculate the different prognostic scores

The measurements of the spleen and liver size were in most cases reported using the costal arch as anatomical reference. In a minority of cases, findings from a computer tomography or ultrasound examination were reported. In some cases, organomegaly was indicated as finger’s breadth below the costal margin and not measured in centimeters. We assumed that a pediatrician finger’s breadth ranges from 1 to 2 cm and calculated the size of the spleen below the costal margin for the minimal and maximal assumption. If the calculated minimal and maximal values of the risk score resulted in categorization of a patient into the same risk category, then the mean risk score was calculated from the minimal and maximal values and used further on. If the risk categorization based on the calculated minimal and maximal values differed, the maximum spleen size value was used to calculate the risk score.

Therapeutic response was evaluated by assessing the molecular response by measurement of the transcript ratio BCR-ABL1/ABL1 in blood specimen sent in to the central reference laboratory. Measurements were performed according to the international scale as reported elsewhere [27]. The study protocol did not include measurements of BCR-ABL1 levels at diagnosis using an independent control gene other than ABL1. For this reason, BCR-ABL1 levels at diagnosis are not eligible for any predictive analyses. The two single time-point measurements of BCR-ABL1/ABL1 at month 3 and at month 6 after start of imatinib treatment (interval ±10 days at day 90 and day 180 after start) were analyzed for evaluation of the risk categorization by the four scoring systems. The decision to select these two time points was based on several recent studies demonstrating in adults with CML that the achievement of an early molecular response is a good surrogate marker for long-term progression-free survival and overall survival [19, 32, 34, 42, 40]. Also, treatment guidelines now have incorporated a molecular response goal of transcript ratio BCR-ABL1/ABL1 < 10 % at month 3 and recommend that patients not reaching this goal should be considered for a treatment change [39]. Preliminary retrospectively analyzed data from small adults cohorts confirmed that early intervention strategies in adults could be based successfully on the early response data [8].

Statistical analysis was performed using software R (version 3.1.1 (2014-07-10)). Venn diagrams were generated using R package “VennDiagram,” Version 1.6.5. Logistic regression analysis was performed to evaluate the predictiveness of the different scoring systems (EUTOS, Hasford, Sokal, as independent factors) on the binarized treatment outcome (good responders [ratio BCR-ABL1/ABL1 < 10 % (<1 %)] and poor responders [ratio >10 % (>1 %)] at month 3 (or month 6, respectively) after diagnosis). Furthermore, the age of the patients (in nominal) was also included in the analysis as an independent factor. In order to obtain interpretable odds ratios (OR), we performed a logistic regression with categorization of the EUTOS score in low- and high-risk groups according to a given cutoff. An alternative cutoff value was chosen based on a maximization of the OR for different values of the cutoff.

Results

From the total cohort of 122 patients enrolled into the registry CML-PAED-II, the mandatory data to calculate the EUTOS, Hasford, and Sokal scores could be retrieved in 90 patients. Missing data and implausible differential blood counts (e.g., relative sum of white cells >100 %; spleen size not indicated) excluded 32 patients from the analysis. The necessary parameters for calculation of the Sy score were complete in only 46 out of 90 patients, because the hematocrit value was not collected systematically on the diagnosis documentation form. Fifty-five of these 90 patients were male (61 %) and 35 were female (39 %). According to the categorization used in established guidelines of the American Academy of Pediatrics [60], our cohort comprised 5 toddlers, 37 children, and 48 adolescents. The mean age was 10.97 years (median 12; lower quartile 8; upper quartile 15). The distribution by age is shown in Fig. 1. The mean hemoglobin level was 10.0 g/dl (median 10.0; lower quartile 8.2; upper quartile 12.3), the mean white blood count (WBC) was 76,165/μl (median 217,411; lower quartile 31,550; upper quartile 285,400), and the mean platelet number was 513,100/μl (median 405,500; lower quartile 266,200; upper quartile 551,800). Mean relative blast count was 3.4 % (median value 0.7; lower quartile 0.0; upper quartile 3.0). The mean value of the basophile count was 3.5 % (median 3; lower quartile 1.7; upper quartile 4.9). The mean value of the spleen size below the costal margin was 6.2 cm (median 3.0, lower quartile 0.0, and upper quartile 12.0). After the start of imatinib treatment in this cohort, no serious events such as death, progress to advanced phases of CML, or ouvert hematological relapse occurred during the first 6 months after diagnosis.

Fig. 1
figure 1

Age distribution of the cohort investigated

The absolute and relative distribution (%) of the individuals from the cohort to the risk categories by either one of the applied scoring systems varied. Applying the Sokal formula, the low-, intermediate-, and high-risk groups included 56/90 (62 %), 14/90 (16 %), and 20/90 (22 %) patients, respectively, while the Hasford score identified 53/90 (59 %) patients as low risk, 24/90 (27 %) as intermediate risk, and 13/90 (14 %) as high risk. Using the Sy score, the low-, intermediate-, and high-risk groups included 45/46 (98 %), 1/46 (2 %), and 0/46 patients (0 %), respectively. Applying the EUTOS score stratified 73/90 (81 %) patients into the low risk and 17/90 (19 %) patients into the high-risk group. When comparing only the Sokal score to the Sy score, discordant results were obtained in 21/46 (46 %) patients. The Sy score was excluded from further comparisons because the number of patients with sufficient data to calculate the Sy in the pediatric cohort presented here was too low and also because the original cohort analyzed by Sokal et al. comprised only 25 patients under the age of 16 years out of a total cohort of 625 adult patients <45 years [41].

Comparing the results of risk categorization of an individual patient by either one of the three scoring systems yielded the following results: 78/90 patients were categorized as low risk and 25/90 as high risk by any of the scores. Venn diagrams demonstrate (Fig. 2) that within the low-risk cohort, 49/78 patients (63 %) were categorized homogenously as low risk, while within the high-risk cohort only 6/25 patients (24 %) were homogenously categorized as high risk by all three scoring systems. In 10 patients, the results from each applied scoring system differed completely from each other (Supplemental material, Table S1). In those 16/90 patients in whom splenomegaly had not been indicated in centimeters below the costal margin but instead in finger’s breadth only, no change in risk categorization occurred when the maximal assumption of the spleen size instead of the mean size formed the basis for calculation of a score.

Fig. 2
figure 2

Venn diagrams for low-risk and high-risk scores. The Venn diagrams indicate the overlap of the risk categorization based on Sokal, Hasford, and EUTOS score. a Seventy-eight out of 90 patients are categorized as low risk in any of the three scoring systems. The concordance of the risk estimation is indicated by the number of individuals being categorized as low risk in one or more of the scores, shown in the non-overlapping and overlapping areas. b Twenty-five out of 90 patients are categorized as high risk in any of the three scoring systems. Similarly, the overlap is indicated by the number of individuals being categorized as high risk in one or more of the scores. Note that area of the ellipses and overlap is not proportional to the number of individuals in each group

By adopting the cutoff limits for molecular response as established in adult CML guidelines (e.g., transcript ratio <10 % at month 3, <1 % at month 6, respectively, [39]) to the pediatric cohort, a total of 72 patients could be analyzed at month 3, and 73 patients at month 6 (Supplemental material, Table S1). At month 3 in the subcohort comprising 41 patients categorized by all three scoring systems (excluding Sokal young) homogenously as a low-risk group, there were 73 % good responders (30 out of 41 patients). In the heterogeneously categorized subcohort (27 out of 72 patients), there were 52 % good responders (14 out of 27 patients), and in the homogenously categorized high-risk subcohort (4 out of 72 patients), there was only one good responder (25 %). Accordingly, at month 6, the subcohort of patients homogenously categorized as low risk (39 out of 73 patients) comprised 74 % good responders (29 out of 39 patients). In the subcohort categorized heterogeneously (28 out of 73 patients), 61 % (17 out of 27) were good responders, and in the homogenously categorized high-risk group (6 out of 73 patients), a total of four were good responders (66 %).

Using a logistic regression model, we analyzed, which of the scores was predictive for an individual patient’s response at month 3. We treated EUTOS, Sokal, and Hasford scores as independent, continuous variables. Furthermore, we included age as further variable in the regression model. In total, 72 patients yielded all necessary data for this analysis. Starting from a model with all possible interactions, we progressively reduced the complexity by sequential model comparison with χ 2-based ANOVA. Both Sokal and Hasford scores explicitly account for patient age in the risk estimation. However, neither of the scores nor age at diagnosis had a significant effect on predictiveness of the regression model. In fact, we found that only the EUTOS score significantly improved the predictability of the patient’s response at month 3 compared to random guessing (p = 0.008).

Next, we wondered whether the usual risk categorization (low risk vs high risk) retained the predictive value of the EUTOS score. We therefore repeated the logistic regression analysis for a categorical factor. Using only the categorization of the EUTOS score with the cutoff at 87 points, the EUTOS score showed an odds ratio of OR = 2.8 (95 % CI 0.8 to 10.5) to predict poor imatinib response in the high-risk group as compared to the low-risk group, although the discrimination did not reach statistical significance (p = 0.108). However, by reducing the cutoff point for the discrimination between low risk and high risk for the EUTOS score from 87 to 48 points, an OR = 3.8 (95 % CI 1.4 to 10.6) with p = 0.008 could be achieved. This resulted in shifting of 21 patients from the EUTOS low-risk group to “newly defined EUTOS high-risk group”. Among these “re-categorized patients,” 10 showed a poor imatinib response at month 3.

We repeated our analysis to estimate individual patient’s response at month 6 (total of 73 patients), with transcript ratios <1 % categorized as good responders. Now, even the weak correlations of the EUTOS score with the patients’ response could not be detected anymore. In fact, predicting the patients’ response with any of the measures was not significantly better than random guessing (p > 0.1). Analyzing the relation between response at month 3 and at month 6, we observed that 33 patients were good responders at both time points, while 17 remained bad responders. However, 5 patients showed good response at month 3 but not at month 6, while 7 patients showed a better response at month 6 compared to month 3.

Discussion

The biology and the clinical characteristics of CML in both children and adults have been extensively reviewed [1, 2, 35, 52, 55]. Because of the rarity of CML in the first two decades of life, data on pediatric CML was mostly collected retrospectively or pooled from different studies or preexisting publications on all age cohorts and not described separately [29, 31, 34, 35]. In this study, exclusive data of pediatric patients (0 to 18 years) was collected prospectively and analyzed. While the total number of patients in studies on adults is clearly higher, relevant clinical parameters collected in these retrospective analyses were comparable. Compared to adults, pediatric CML patients present at diagnosis with a higher leukemic burden (WBC, splenomegaly), and more children are diagnosed in advanced phases of CML [35, 36, 52]. Also, anemia and thrombocytosis are present more frequently (60 %) in children [36]. With regards to these clinical parameters, our cohort did not differ from earlier pediatric reports (data not shown here).

In the TKI era, it still has remained a gold standard to publish observed survival time of patients along with their risk profile categorized at diagnosis [43]. Also, additional cytogenetic aberrations besides the Ph+ have been correlated with a dismal prognosis, but these aberrations have so far not formed the base for a risk scoring systems on its own [9]. Interestingly, the Sokal score as the oldest established risk categorization also shows predictive power concerning molecular response, risk of progression to AP/BP, and overall survival in adult patients treated with imatinib and thus seems to be still useful in the TKI era as a valid prognostic marker [43]. However, in the clinical practice, the question remains to which extend these available scores can be transferred to pediatric patients.

The risk categorization of the pediatric cohort by the Sokal score as presented here exhibited significant differences to some reports on adults. Oyenkule et al. reported that 40 % of adults patients (n = 134) had a low-risk, 34 % an intermediate-risk, and 26 % a high-risk Sokal scores, while in our cohort, 66 % had a low-risk, 22 % an intermediate-risk, and 12 % a high-risk Sokal scores, respectively [43]. Also, in another paper on 559 adult patients, the proportion of the low-risk, intermediate-risk, and high-risk groups was 47, 30, and 23 %, respectively, thus more patients with higher-risk profiles than in our cohort were described [18]. However, it must be kept in mind that age exerts a major impact on the calculation of the Sokal score, and this factor might well compensate a high WBC and/or splenomegaly. By calculating the Sokal score (Table 1), a 10-year-old with CML would have a lower risk of mortality than a 70-year-old patient if they both had the same spleen size and blood cell counts. A correlation of the organ volume with the body surface should perform best from a theoretical point of view. Age-dependent reference tables are used in pediatric sonography based on body weight and on body length, allowing calculation of the relative organ enlargement. However, this approach works only in moderate splenomegaly; a massive organ enlargement is beyond the scope of the ultrasound transducer’s angle width. As a major factor, this limitation may hamper the application of scoring systems based on the spleen size in pediatric CML patients [21, 33, 56].

While older adults usually exhibit no or only moderate spleen enlargement (<5 cm), splenomegaly after diagnosis is significantly more frequent in adolescents [21, 26, 30, 36, 40, 44]. Splenomegaly in our cohort exhibited a median size of 5.8 cm (range 0 to 25) below the costal margin. Although one study on splenomegaly not restricted to CML (age range 0 to 88 years) attempted to measure organ size more objectively by ultrasound, clinically in CML, no further steps in this direction have been performed successfully so far [41]. This may in part be explained by the obstacle an excessive spleen size poses on correct measurements given the limited angle an ultrasound transducer covers. However, a correct measurement of the spleen size is especially important to calculate the EUTOS score, since this score is based solely on the two parameters, spleen size and percentage of basophils [21]. This important aspect must be taken into consideration as in children with CML excessive splenomegaly is diagnosed more frequently than in adults [35, 36].

Older age is a dismal predictor only in adults pretreated by hydroxyurea and interferon when undergoing SCT [17]. As of note, when imatinib treatment results were analyzed in adults, a survey demonstrated that the response and mortality rate was not affected by age [18]. This hypothesis is supported by our findings indicating no statistically significant effect of age at diagnosis on the response at month 3 or at month 6. However, adolescents and young adults may experience a worse outcome—partly related to treatment non-adherence [40, 44].

In a recent analysis on the Hasford score, data of young adults and adolescents showed age-dependent differences: while a relative proportion of patients with a low-risk and intermediate-risk Hasford score of 54 and 36 %, respectively, was reported [43], in our cohort, 63 and 28 % were classified to these categories, respectively. No differences were found with regard to the proportion of patients with a high-risk Hasford score (10 % in adults and 9 % in our pediatric cohort). Also, no significant correlation was found between a BCR-ABL1 transcript level at month 3 lower or higher than 10 % and a low-risk or high-risk Hasford score in this pediatric cohort [56].

In an analysis on younger (<65 years) and older adults, the percentage of CML patients with a high EUTOS score was 19 %, which is in line with some studies in adults, but higher than in one recent study (12.5 %), and higher than in the cohort comprising 2060 adult patients under imatinib treatment (10.5 %) that was used to define the EUTOS score [18, 21, 26, 29, 34]. This score has been claimed as the most sensitive score to identify adult patients with a very unfavorable outcome, and also to predict response to nilotinib. However, these statements are not accepted by all experts [5, 6, 26, 29, 33]. In pediatric CML cohorts, the EUTOS score has not been studied so far.

The IRIS trial first demonstrated the prognostic impact of monitoring BCR-ABL1 transcripts [12, 27]. While on imatinib treatment, achievement of major molecular remission at month 18 was defined as one important milestone, failing an early response also turned out as a relevant parameter to identify the aggressiveness of CML [27]. In adults with CML, the analysis of Marin et al. pointed out that a transcript ratio lower than 9.84 % at month 3 after treatment initiation was the most informative independent predictor for survival while the ratios at month 6 or month 12 contributed little to identify high-risk patients [34]. In this cohort, the transcript levels at these time points contributed little to identify patients with CML progression (data not shown here).

Two groups independently reported a significant difference between EUTOS low- and high-risk adult patients in achieving fast molecular response [6, 50]. Significance could also be demonstrated in 1288 patients who had not been recruited to such studies and also in patients treated with second-generation TKIs [26]. However, the score is recently under ongoing discussion as some authors controversially demonstrated no prognostic difference in major molecular response, event-free, and overall survival in patients on imatinib treatment with a high or low EUTOS score, respectively [29, 43]. In this pediatric cohort, our results (compare Venn diagrams in Fig. 2) indicate that the risk classification is not fully homogeneous for the low-risk group, but shows a strong overlap between the different scoring systems. This reasoning appears less stringent for the high-risk categorization where only 6 out of 25 patients were grouped homogenously. Although the EUTOS score as a continuous variable showed a correlation with the individual patient’s response at month 3, this effect was lost when using 87 score points as cutoff for categorization into the high-risk group. However, we could demonstrate that choosing an alternative cutoff might rise the predictiveness of the EUTOS. We are by no means arguing that a lower cutoff for the EUTOS score should be applied generally in pediatrics but interpret our findings at least as a hint that a refined risk categorization may appear beneficial for the pediatric CML cohort. Thus, we consider it appropriate to utter a strong warning against any uncritical use of adult patient-based scoring systems in pediatric patients. The fact that neither of the scoring systems showed any correlation with the 6-month response makes us even more confident that the established risk scores are not suitable for pediatric patients. Also, two other groups recently presented pediatric data in abstract form on the TKI treatment response in children with CML showing that none of the scores was predictive [37, 45]. However, at present, the paucity of pediatric data does not allow to draw definite conclusions from small cohorts with a short follow-up. It must be kept in mind for comparison that the initial publications describing the scores used in adults, namely Sokal, Hasford, and EUTOS, were based on the analysis of 813, 1573, and 2060 patients, respectively, while our analysis presented here comprises only 90 pediatric patients.

Thus, as the cohort analyzed is still rather small, cutoff points like the transcript ratio BCR-ABL/ABL1 should be extensively analyzed in children with CML from larger cohorts. With regard to the established scores, measuring massive splenomegaly should be standardized age-dependently in pediatric CML to clarify the exact correlation with prognosis in younger patients. Given the rarity of CML in childhood, evidently, only future international collaboration will enhance a greater compilation of data. Building a larger database in order to extract follow-up parameters in pediatric patients would facilitate future analyses which may result in a better approach of assessing the prognosis, e.g., a specific predictive score for children with CML.