Introduction

Myelodysplastic syndromes (MDS) are a heterogeneous group of diseases, presenting with cytopenia and dysplastic changes in bone marrow (BM) and peripheral blood (PB), and a variable risk of transformation to acute myeloid leukemia [1]. Because of the wide variation in clinical outcome, the classification of MDS is still a challenge. In 2008, a new edition of the WHO classification of MDS was published, including some changes from the previous edition published in 2001 [2, 3]. As important aspects, the revised classification allows for a more precise definition of patients previously considered “unclassifiable.” The category of refractory anemia (RA) has been expanded to refractory cytopenia with unilineage dysplasia (RCUD), the category of refractory cytopenia with multilineage dysplasia and ringed sideroblasts (RCMD-RS) is incorporated in RCMD, and the prognostic significance of increased blasts in PB is emphasized, with the change of definition of refractory anemia with excess blasts (RAEB-1) [4].

As all new systems, it is essential to demonstrate whether the categories from the 2008 WHO classification are easy to identify and if the decision of including an individual case in each category is based in reproducible aspects.

There are very few publications evaluating discrepancies in the diagnosis of MDS. Howe et al. reported an excellent correlation among three observers studying 103 samples from patients diagnosed with MDS [5]. However, the three pathologists belonged to the same center. Very recently, Naqvi et al. published their results finding a discrepancy in the morphological diagnosis of MDS between the MDACC and the referring centers [6]. The rate of discrepancy, 12 %, demonstrates the complexity of diagnosing MDS.

The present study was designed with two objectives: First, we explored whether the 2008 WHO classification is more efficient than the previous edition of 2001 in diagnosing 100 patients with dysplastic changes. The second objective was to asses if the 2008 WHO classification allows a good concordance among several hematologists, by distinguishing all the current categories of MDS.

Methods

A total of 100 BM samples from patients previously diagnosed with MDS using the 2001 WHO classification were evaluated in a clinical setting. The study was performed in ten hospitals, and each hematologist provided ten cases. Selection of the cases was made following the criteria of each investigator, and the proportion of samples from each category was not predefined.

The morphological review was done by the same ten hematologists, working in five pairs. First, the observers selected the samples from their centers and reviewed these, using both 2001 and 2008 WHO classification. Later, the specimens were sent to a central coordinator, who exchanged the samples among the observers in a reciprocal manner. This means that each observer evaluated 20 samples, and each sample was analyzed independently by two morphologists. The second observer was blinded to the clinical and laboratory data, except for the PB counts (Fig. 1). The study was approved by the central Ethical Committee of Clinical Research in accordance with the Helsinki Declaration.

Fig. 1
figure 1

One hundred cases previously diagnosed with MDS were evaluated by 10 hematologists working in five pairs. Each observer evaluated 20 samples, and each sample was analyzed independently by two morphologists. The specimens were exchanged in reciprocal manner by a central coordinator

Morphological analysis

The MDS samples consisted of BM smears from aspirate, with their correspondent PB smears. Specimens were stained with Wrigth in 80 cases and May–Grünwald–Giemsa in 20 cases. For evaluation of ring sideroblasts (RS), smears were stained with Prussian blue. The degree of dysplasia in the three hematopoietic lineages and the percentage of blasts and RS were reported according to the criteria of the 2001 and 2008 WHO classification. At least 500 nucleated BM cells were counted.

Cytogenetics

Cytogenetics was available in 65 cases. Cytogenetic aberrations were grouped according to the original IPSS [7]. Cytogenetic data will not be shown as it makes no difference for the purpose of the present study.

Statistical methods

The interobserver concordance was evaluated using the Cohen κ test.

Results

All the samples were diagnosed originally by the FAB [8] and the 2001 WHO criteria. The first step was to reclassify the samples in the referring centers by the 2008 WHO classification. Table 1 shows the cases that were reclassified according with the new definitions by the 2008 WHO classification.

Table 1 Cases of MDS reclassified by the referring observers according the 2008 WHO

The percentage of myeloblasts in PB led to change the diagnosis of three cases with multilineage dysplasia. Two cases of RCMD were classified as RAEB-1 because they showed 2 % blasts in PB. One case of RCMD-RS was defined as unclassified MDS (MDS-U) by the 2008 WHO classification, with 1 % of myeloblasts in the PB.

Nineteen cases were considered as MDS-U by the 2001 WHO classification. Twelve cases were reclassified using the 2008 WHO classification as RCMD and five as RCUD. Finally, only three cases remained as MDS-U by the 2008 WHO proposal.

After this, the samples were exchanged for the second review. Five samples were rejected by the second observers because of bad quality of the BM or PB smears, and 95 samples were considered suitable for the study. Discordance was observed in 26 samples (27 %). The inter-observer agreement for all the series was 73 % (kappa test 0.59).

Regarding the five pairs of observers, all of them found some discrepancies (Table 2). The percentage of discordance was 21 % by the first pair (four discrepancies of 19 suitable samples), 26 % by the second pair (five of 19), 26 % by the third pair (five of 19), 33 % by the fourth pair (six of 18), and 30 % by the fifth (six of 20). The 26 cases with interobserver discrepancy are shown in Table 3.

Table 2 Distribution of samples among the five pairs of observers, regarding 2008 WHO classification
Table 3 Twenty-six cases with interobserver discrepancies regarding 2008 WHO classification

Since the results by pairs were quite homogeneous, we considered the series as a whole, to study the rate of agreement by categories. First, we focused in the enumeration of RS and blast cells. There were 20 cases with 15 % or more of RS. All these cases were easily identified by the second observers. Regarding the study of blast cells, we divided the series in two groups: The first group included the cases with increased number of blasts in BM (≥5 %) or PB (≥2 %); it comprised 22 cases diagnosed as RAEB-1 + RAEB type 2 (RAEB-2) by the referring centers. The second group included the cases with MDS and without excess blasts (73 cases). All the referring diagnoses with an excess of blasts were confirmed, and only six cases with less than 5 % of BM blasts were diagnosed as RAEB-1 by the second observers. The inter-observer agreement regarding specimens with and without excess blasts was 94 %.

Then we studied the inter-observer concordance by the 2008 WHO specific categories (Table 4). Concordance was detected in 84 % of cases with RAEB-1 (10 of 12 cases), 90 % with RAEB-2 (nine of 10), 100 % of cases with myelodysplastic/myeloproliferative neoplasms (MDS/MPN) (four of four), and 100 % with MDS-U (three of three).

Table 4 Agreement by second observers regarding all categories of 2008 WHO classification

The rate of concordance of cases with RCMD (74 %) was quite similar among the five pairs of observers (Table 2). They noted discrepancies in 13 of 50 samples: Five samples were identified as RAEB-1 and eight as RCUD. Multilineage dysplasia was recognized in 42 of 50 cases (84 % agreement). The observers only found concordance in two of five cases with RCUD (40 %), in two of eight cases with refractory anemia with ring sideroblasts (RARS) (25 %), being the discordance due to multilineage dysplasia considered by the second observer (Table 3).

Discussion

The fourth edition of the WHO classification includes significant changes in order to improve the previous system published by the WHO in 2001. One important characteristic of a good taxonomy system is reproducibility with little inter-observer variation. This is essential regarding MDS, as the current classification offers prognostic information that can guide the treatment. The objective of our study was to explore the reliability of the 2008 WHO, compared with the 2001 WHO classification, among several observers from ten institutions. When the observers recoded their own samples according to the 2008 WHO classification, we found that the 2008 WHO was able to diagnose in a specific category 17 of 19 cases that were previously “unclassified” by using the 2001 classification.

Most of the cases coded as “unclassified” by the 2001 WHO showed only one cytopenia with dysplasia in two or three lineages. The 2008 WHO classification identifies these cases as RCMD. These patients with one cytopenia in the context of multilineage dysplasia seem to have a longer survival than patients with two cytopenias [9]. Anyhow, the 2008 WHO classification gives a clearer definition of the RCMD, and it enables a more accurate stratification of patients in our series.

After this reclassification, the specimens were exchanged in a reciprocal manner among the group. Although we had a large number of observers taking part in our study, the concordance among the different pairs was very similar. The higher rate of discrepancy (33 %) was detected by the pair of observers who included more cases without excess blasts.

We noted that, in cases with morphological discrepancy, the second observer usually assessed a higher risk category than the referring. Eight cases with unilineage dysplasia were identified as RCMD. Five cases with RCMD and one with del5q were defined as RAEB-1, and two cases with RAEB-1 were reclassified as RAEB-2. Probably, there was a bias of over-diagnosis because all the participants knew that they were studying samples previously diagnosed with MDS.

Regarding patients with or without excess blasts (threshold ≥5 % BM blasts), the inter-observer agreement was excellent (94 %). Other authors also have demonstrated very good concordance in recognizing blasts cells [5, 6]. Detection of BM blasts by morphology is essential for defining prognosis. The threshold of 5 % of BM blasts was defined by the FAB system in 1982, and it is still valid in the context of WHO classification. There was also a good agreement between the categories RAEB-1 and RAEB-2. In 2008, the International Working Group of Morphology of MDS developed standardized definitions for RS, myeloblasts, and promyelocytes [10]. These clear definitions contribute to improve precision in the diagnosis of MDS.

The category of RCMD comprised the highest number of samples because it included most of patients with a previous diagnosis of MDS-U and RCMD-RS. The definition of RCMD requires dysplastic changes in ≥10 % cells in two or more myeloid lineages. Multilineage dysplasia, in the absence of BM blasts, is associated with worse prognosis than unilineage dysplasia [11]. Matsuda et al. have identified some morphological features, like pseudo-Pelger–Huet anomalies, or micromegakaryocytes, as unfavorable prognostic factors [12]. But this information is not modifying in the current WHO classification, so we considered an agreement when both observers assessed dysplasia in two or more lineages, regardless of the lineages affected. The study showed 74 % agreement among the second observers in this category. But the interobserver concordance was higher (84 %) when we focused on the detection of multilineage dysplasia, independently of the enumeration of blasts. Dysplastic changes in two or more lineages were easily recognized in 42 cases. This consistent agreement was expected, as RCMD is a good example of clonal hematopoietic stem cell disease, with evident dysplasia in at least two BM lines.

Few cases showed unilineage dysplasia (RCUD, RARS) in our series. It seems reasonable, as these categories are less frequent than RCMD or RAEB, and we did not define a number of cases by categories.

However, the low agreement among observers is striking. Probably, these categories are misdiagnosed. The arbitrary threshold of 10 % of dysplasia in each cell line might turn out quite low, when we are reviewing samples that we know previously diagnosed with MDS. Other authors reported similar results. Howe et al. showed a very good agreement in their interobserver variation study, but they found discrepancy in four of nine cases with RA according 2001 WHO classification. Furthermore, there were only eight cases with discrepancy in their series: Seven showed unilineage dysplasia (four RA, three RARS) [5].

Within patients with unilineage dysplasia, cases diagnosed with RARS by the referring observers also showed high discrepancy. The second observer found dysplasia ≥10 % in neutrophil precursors or megakaryocytes in six out of eight cases. The distinction between unilineage and multilineage dysplasia in the RA and RARS categories is very important, as Germing et al. demonstrated in their retrospective and prospective series [13, 14]. The negative impact of multilineage dysplasia, confirmed by these authors, led to the merging of the entities RCMD and RCMD with ringed sideroblasts in the 2008 WHO classification. In concordance with Germing and colleagues, Patnaik et al. also showed that the percentage of ring sideroblasts per se does not influence the prognosis of MDS without excess of blasts, but the diagnosis of RARS was one of the independent indicators of overall survival [15].

Unilineage dysplasia by the WHO has been associated with a favorable prognosis, and it is one of the factors that influence the WPSS [11]. This system allocates a different risk, based on the subjective criteria of the observer who makes the diagnosis. The IPSS and the risk model developed by Kantarjian et al. are not influenced by the degree of dysplasia, so all of these cases with discrepancies showed the same risk by these scores [16]. Although our series had a few number of these cases, we believe that our study illustrates the difficulty of diagnosing unilineage dysplasia.

We conclude that our study shows a good agreement in detecting cases with or without excess blasts and also in detecting multilineage dysplasia that are very relevant prognostic factors. However, the diagnosis of unilineage dysplasia was not reproducible. A larger series of patients is needed to confirm these results, but we stress the importance of carefully making the diagnosis of unilineage MDS. In the future, new prognostic tools, as molecular and immunophenotypic findings, would help to understand the current morphological discrepancies.