Introduction

Post transplantation cyclophosphamide (PTCy) is a platform for graft-versus-host disease (GVHD) prophylaxis, which was first implemented in haploidentical hematopoietic stem cell transplantations (HSCT) [1, 2]. Now it is gaining popularity in matched-related and unrelated HSCTs [3,4,5] due to lower immunosuppression burden and low incidence of chronic GVHD [6, 7]. Single-agent PTCy is predominantly used for matched-related bone marrow (BM) transplantations, while combinations with other immunosuppressive agents are used for the peripheral blood stem cell grafts [3, 8, 9].

Graft composition is a well known factor affecting the outcome of HSCT. While the number of CD34 + cells is the gold standard to assess the quality of HSCT grafts at least for peripheral blood stem cells (PBSC) [10], the other cell populations are assessed for the prognostic purposes. It was demonstrated that total number of CD3 + cells moderately increase the risk of acute GVHD [11], while for overall populations of CD4 + and CD8 + cells there are controversial results [12, 13]. Other studies characterizing lymphocyte subsets in the graft demonstrated the association between T-regulatory cells (Treg) [14], TCR alpha/beta cells [15], TCR Vbeta11 + T cells [16], iNKT cells [17], and acute GVHD. On the other hand, higher numbers of NK, NKT cells were reported to facilitate engraftment graft-versus leukemia (GVL) effect [18, 19]. However, all the above-mentioned studies were evaluating the predictive role of lymphocyte subpopulations in the setting of classical GVHD prophylaxis consisting of calcineurin inhibitors, methotrexate or mycophenolate mofetil as the second agent with or without antithymocyte globulin (ATG). PTCy has a different mechanisms of tolerance induction comparing to the classical GVHD prophylaxis [20]. Thus, the predictive value of lymphocyte subpopulations in the graft might be different comparing to classical GVHD prophylaxis. Only few studies of PTCy were published with analysis of only CD3 levels in the graft [21] and none solely focused on the related and unrelated transplantations.

In the previous study of cytokines in the plasma of patients after PTCy, our group demonstrated the reverse association of interleukin (IL)-8 and interferon (INF) gamma concentrations with GVHD: higher level of these cytokines was associated with reduced GVHD incidence [22]. Thus, we hypothesized that more prominent activation of lymphocytes and NK cells after graft infusion, which is characterized by higher cytokine levels, results in more profound depletion of lymphocytes. The extent of lymphodepletion by ATG was reported to predict relapse and mortality after transplantation [23]. In this prospective study, we evaluated whether the same effect is observed for PTCy due to excessive activation of lymphocytes. Owing to the limited information about the impact of graft composition on the outcome with PTCy a large number of parameters were screened to look for associations. We evaluated graft cellular subpopulations mentioned earlier [10,11,12,13,14,15,16,17,18,19] and immunological recovery. Additionally, most common inflammatory plasma biomarkers were studied, including ST2, IL-2, receptor, soluble receptor to TNF alpha (srTNFalpha), IL-8, IL-17, and VEGF A [24,25,26,27,28]. Extended information about the donors was obtained.

Patients and methods

Seventy-nine adult consecutive patients were included prospectively in the study during 2016–2017 (Supplement Fig. S1). The enrolled patients participated in the expansion phase of the NCT02294552 study of risk-adapted PTCy prophylaxis [27]. All patients signed informed consent for blood collection, graft analysis, and collection of clinical and personal data. The study was approved by the First Pavlov medical university ethical committee. Conditioning was performed with fludarabine 180 mg/m2 and oral busulfan 10–14 mg/kg according to the institutional operating procedures. Dose of busulfan 14 mg/kg is labeled as MAC in the analysis and dose 10 mg/kg is labeled as RIC. GVHD prophylaxis was performed with cyclophosphomide 50 mg/kg at day + 3, + 4. Recipients of the matched-related donor (MRD) grafts received no further immunosuppression. Recipients of matched unrelated donor (MUD) grafts received tacrolimus 0.03 mg/kg/day and MMF 30–45 mg/kg starting from day + 5. All clinical procedures and supportive care was performed as in the original study [28]. The criteria for inclusion in this biomarker study were the following: acute myeloid leukemia (AML) or acute lymphoblastic leukemia (ALL) as the indication for HSCT; complete remission at the time of HSCT; transplantation of either BM from MRD with single-agent PTCy or PBSC from MUD with PTCy, tacrolimus and MMF prophylaxis. Single-antigen HLA mismatch was allowed for MUD. Twenty-six patients had transplantation from MRD, and 53 from MUD (Table 1). The planned group size for the study was 75 patients, including 25 MRD and 50 MUD HSCTs. Four additional patients were enrolled for the possibility of flow cytometry analysis failure, which did not occur during the study. Planned median follow-up was 2 years. The group size was calculated based on the data from ATG studies where up to 40% of patients exhibit excessive lymphodepletion and slow recovery afterwards. The difference in survival for the excessively lymphodepleted patients was 20% [23]. The power of the study was 70%. Characteristics of patients enrolled in MRD and MUD were not significantly different, except higher percentage of CD34 cells in the graft due to the use of PBSC (Supplement Table S1).

Table 1 Clinical characteristics of the study group

Flow cytometry

The fresh non-frozen samples of BM and peripheral blood apheresis products were studied on the day of HSCT by flow cytometry (FACS Aria II, BD Biosciences, CA, USA; antibodies by Miltenyi biotec, Bergisch Gladbach, Germany). At least 300,000 events were collected. The following populations were analyzed: CD3, CD4, CD8, CD19, CD16CD56, NKT, iNKT (including CD4 and CD8 subpopulations), Treg (CD4 + CD25 + CD127 + ), double-positive T cells (CD4 + CD8 + ), double-negative T cells, TCR alpha/beta, TCR Vβ11 cells. The gating strategies are presented in Supplement Fig. S1. Immunological recovery was assessed with the same lymphocyte subpopulations in the peripheral blood at day + 30, + 60, and + 100 in 40 last consecutive patients from the study group. Total lymphocyte recovery was available for all patients.

Enzyme-linked immunoassay

Blood was collected from the central venous line after discarding the first draw in the EDTA-containing tubes at days 0, + 7, + 30. The selection of days was based on the previous study of cytokines after PTCy [22]. The samples were centrifuged at + 4 °C and 1000  × g for 15 min, aliquoted and stored at −80 °C until the day of the assay. The samples were studied for IL-6, soluble TNF receptor, IL-2 receptor, VEGF A, IL-17, IL-8 (eBioscience, CA, USA) and ST2 (Critical diagnostics, CA, USA) by commercially available kits according to the instructions of the manufacturer. Concentrations were determined without knowledge of clinical data. The sensitivities of IL-17, IL-6, IL-8, soluble TNF-α receptor, VEGF A and IL-2 receptor were 0.5–100, 0.92–100, 2.0–1000, 100–10000, 7.9–1000, 270–15000 pg/ml, respectively. VEGF A was measured on day 0. IL-6, IL-8, soluble TNF receptor, and IL-2 receptor were measured on day + 7. ST2, soluble TNF receptor, and IL-17 were measured on day + 30 (Supplement Fig. S2). Samples with concentrations above the calibration curves were re-tested in dilution.

Clinical definitions

Time to disease relapse incidence (RI), acute GVHD (aGVHD), moderate to severe chronic GVHD (cGVHD), non-relapse mortality (NRM), overall survival (OS), event-free survival (EFS), and GVHD-relapse free survival (GRFS) were defined as the time from transplantation to the event. Incidence of aGVHD was calculated at 125 days after HSCT, and the time frame for the other outcomes was 2 years. Events for EFS were relapse or death. Disease relapse was defined as morphologic or cytogenetic evidence of disease with pre-transplantation characteristics, or morphologic evidence without pre-transplantation characteristics. GVHD-related mortality was defined as any death without relapse in a patient within the time frame of 2 years. The Consensus Conference criteria and National Institutes of Health criteria were used for aGVHD and cGVHD grading, respectively [29, 30]. Measurable residual disease (MRD) was assessed before transplantation in all patients. In ALL MRD was measured by 8-color flow cytometry. In AML it was evaluated by combination of WT1, PCR, and FISH (for patients with chromosomal aberrations).

Statistical analysis

In the group description overall survival (OS) and event-free survival (EFS) were calculated using Kaplan–Meier methodology. The comparisons were made using the log-rank test. Cumulative incidence analysis was used for aGVHD, cGVHD, relapse incidence, and NRM. The comparisons were made using Gray test. The competing risks for acute and chronic GVHD were relapse and donor lymphocyte infusion (DLI). No prophylactic DLIs were used in the study group. NRM and RI were also analyzed as competing risks. Multivariate analysis of EFS was performed with Cox regression. Non-parametric data was analyzed with Chi-square, Fisher exact and Mann–Whitney tests according the type of data in each group.

The methodology of the laboratory data analysis included several steps. First, the preliminary univariate Cox regression screening was performed for initial features extraction. The cluster analysis was performed for visualization of patterns in graft composition and cytokine profiles. Further identification of patterns in graft composition and cytokine profiles was performed with principle components analysis (PCA) [31]. Based on PCA results, the clusters extraction analysis was performed. The identified clusters were further characterized and evaluated against the outcomes highlighted in the initial screening [32]. Based on the experience with the selected patient population, the expected number of NRM and aGVHD events was to be <10 and produce low statistical power, thus the main focus in the statistical plan was made on EFS and chronic GVHD. Based on the results of clustering, the whole group was split and cumulative incidence or Kaplan–Meier curves were plotted. Data processing and visualization was implemented with use of the R statistical packages.

Results

Clinical results

Median follow-up was 22 months (range 15–33). In the whole group overall survival (OS) was 77%, event-free survival (EFS) 73%, grade II–IV aGVHD 13%, moderate and severe (m&s) chronic GVHD 20%, NRM 6%, RI−20%. The major cause of NRM were severe bacterial infections. There was no difference between BM/related and PBSC/unrelated grafts in the OS, EFS, incidence of aGVHD, cGVHD, NRM, and RI (p > 0.11, Supplement Fig. S3). Also there was no differences in the outcomes between ALL and AML patients (p > 0.09, Supplement Fig. S4). RI in the study group was not affected by the remission number (p = 0.1941), but was negatively impacted by cytogenetic risk (p = 0.0032) and presence of MRD (p = 0.0411).

Predictors of HSCT outcome

The cluster analysis of graft composition revealed two major groups of grafts. One with high prevalence of CD3 and NK effector subpopulations and low concentration of regulatory subpopulations. The other, on the contrary, contained higher prevalence of regulatory populations (iNKT, Treg) and lower number of CD16-CD56 + , TCR alpha/beta (Supplement Fig. S5). The graft variant with high effector prevalence was observed in 35% of cases with comparable frequency in the BM and PBSC group (p = 0.3). None of the donor characteristics, including age, gender, or donor white blood cell differential count before stimulation were associated with graft composition clusters (p > 0.4). No distinct groups of patients were identified based on cytokine cluster analysis (Supplement Fig. S6). Also there was no association between cytokines and graft cluster (p > 0.2), thus further analysis was focused on graft composition.

Based on cluster discrimination patients who received the graft with high levels of CD3 and NK cells had significantly lower EFS (85% vs. 61%, p = 0.0393, Supplement Fig. S7) due to higher relapse incidence (36% vs. 12%, p = 0.0105). The regression screening (Fig. 1a) identified possible association with lower EFS not only for effector subpopulations, but also for the early lymphocyte recovery. Thus, levels of CD3, NK, and NKT cells in the graft along with absolute lymphocyte count (ALC) on day + 30 were analyzed in principle component analysis, which identified the group of patients where high levels of CD3, NK, and NKT were associated with poor ALC recovery at day + 30 and reduced EFS (80% vs. 50%, p = 0.0106, Fig. 1b, c). The differences were due to both increased NRM (17% vs. 3%, p = 0.0437) and relapse (33% vs. 16%, p = 0.1330). No such association was observed for the BM grafts. Further characterization of this subgroup revealed that it was solely restricted to PBSC grafts (34%) with no cases in BM group (0%). The number of CD34 + cells was not different between the clusters (5.6 vs. 4.8 × 106/kg, p = 0.06). There was also no association with the level of cytokines measured (p > 0.39). Incidence if aGVHD (p = 0.9) and cGVHD (p = 0.6) was not different between these two groups.

Fig. 1
figure 1

a “Heat” plot of univariate p-values of the predictive parameters from Cox regression analyses. p-values > 0.1 are marked in white. More significant p-values a marked with darker colors. b Results of principal component analysis. The highlighted area comprises patients with high proportion of CD3, CD16-CD56 + , NKT cells in the graft, slow engraftment, and lymphocyte recovery. c Event-free survival in patients with high proportion of CD3, CD16-CD56 + , NKT cells in the graft and with lower proportion of these cells

In the multivariate analysis, the above-mentioned phenotype of the graft was still significantly predictive for EFS (HR 2.93, 95% CI 1.23–6.97, p = 0.015) along with the MRD-positive status before HSCT (HR 3.32, 95% CI 1.37–8.05, p = 0.008). Remission number (HR 1.7, 95% CI 0.81–3.37, p = 0.168) and cytogenetic risk (HR 2.57, 95% CI 0.83–4.34, p = 0.128) did not reach statistical significance.

No significant predictors for aGVHD were determined, however a subgroup of patients (25% from the whole group) with young donors and high prevalence of CD4 + CD8 + cells, iNKT cells, and TCR Vβ11 cells was identified in the principal component analysis (Supplement Fig. S8). With correction for previous grade III–IV aGVHD (HR 6.441, 95% CI 1.400–22.544, p = 0.0007), patients in this cluster had significantly higher incidence of moderate and severe cGVHD (HR 2.754, 95% CI 0.965–7.679, p = 0.0483). The median donor age was 24 years compared to 31 in the rest of the study group (p = 0.010). The phenotype with high prevalence of double-positive, iNKT, and TCR Vβ11 cells was significantly more often observed with PBSC grafts than with BM grafts (36% vs. 4%, p = 0.002). The cGVHD clusters and adverse EFS clusters significantly overlapped (p < 0.0001). None of the other donor characteristics, including gender, CMV serostatus, and steady-state differential WBC count were associated with the above-mentioned phenotype.

The confidence intervals of lymphocyte subsets in the graft determining adverse EFS and cGVHD are presented in Supplement Table S2.

Immunological recovery

Since the screening for predictive biomarkers identified the link between graft composition, lymphocyte recovery, and survival, additional analysis of immunologic recovery was performed in the subgroup of consecutive patients to determine the synchronicity of subpopulation recovery. Engraftment was faster in the PBSC group (median 20 vs. 22 days for neutrophils, p = 0.015 and 14 vs. 21 days for platelets, p < 0.001), however by day + 30 there was no significant differences in the level of WBC (p = 0.80) and lymphocytes (p = 0.49) between the groups. Also no differences between the PBSC and BM groups was observed in overall lymphocyte count by days + 60–180 (p > 0.1).

The absolute number of lymphocytes, CD3, CD8, TCR alpha/beta, and NK cells recovered rather rapidly after PTCy reaching normal range in the majority of patients by day + 60 [33]. However, CD4 + T-cell and B-cell recovery was significantly delayed (Fig. 2a). The cluster analysis demonstrated that there are four sets of populations according to the pace of recovery from day + 30 to day + 100: CD3, CD8, and TCR alpha/beta cells, CD4 and NKT cells, NK cell subpopulations, Treg and iNKT cells. All the subpopulations within the first three clusters significantly correlated with each over, while there was no correlation with the Treg and iNKT cells (Fig. 2b and Supplement Table S3). Although the size of the reconstitution cohort was small, there was a trend to the higher levels of iNKT cells (2.6 ± 3.1 vs. 0.9 ± 1.3 × 106/ml, p = 0.002) and Treg cells (32 ± 10 vs. 6 ± 1, p = 0.082), lower Day + 30 levels of CD3 + (100 ± 100 vs. 400 ± 680 × 106/ml, p = 0.053) and NKT (0.8 ± 0.8 vs. 12.0 ± 23.7 × 106/ml, p = 0.027) in the cluster of patients with initial high prevalence of CD3, NKT, and CD16-CD56 + cells in the graft.

Fig. 2
figure 2

a Recovery of lymphocytes in patients with relapse, non-relapse mortality, and alive without relapse. b Cluster analysis of the immunologic reconstitution. The color represents the mean absolute values for each time point normalized to the highest absolute level of reconstitution. Lighter colors indicate lower levels, darker colors indicate higher levels of reconstitution. The clusters demonstrate the synchronicity of reconstitution

Despite differences in the pace of reconstitution between lymphocyte subtypes, ALC on day + 30 with the cut off level 400 × 106/ml had very high predictive power for both RI (p = 0.0059) and NRM (p = 0.0074). Only 18% of relapses and one case of non-relapse mortality was documented in patients with day 30 ALC > 400 106/ml (Supplement Fig. S9). The difference in the EFS with this cutoff was 91% vs. 53% (p < 0.0001). Thirty-eight percent of patients had day + 30 ALC below this level and 10% were in the cluster with high prevalence of effector lymphocyte populations in the graft.

Discussion

The clinical results in the study population was comparable to the previously published for MRD [6] and MUD [3, 9] HSCTs with PTCy. There was some heterogeneity in the study population; it included both AML and ALL patients in CR, both MRD donors with single-agent PTCy and MUD/MMUD donors with PTCy-tacrolimus-MMF regimen. However, heterogenic patient populations are used in the majority of biological studies in HSCT recipients [25]. Nonetheless, in the current study the outcome of ALL and AML patients was not significantly different, which is in line with the CIBMTR data [34]. Also PTCy-based prophylaxis produced comparable results for MRD and MUD donors just like in the previous study [28]. So the presence of clinical heterogeneity might have not significantly impacted the results. On the other hand, relatively small study size might have prevented certain thoroughly validated biomarkers from reaching statistical significance.

The main conclusion of the study is that, like ATG [23, 35], PTCy in a subgroup of patients can cause excessive depletion of lymphocyte subpopulations, leading to delayed immunological recovery and adverse clinical outcome. Moreover, poor immunological recovery is the long known factor of dismal prognosis [36, 37]. However, unlike ATG, this effect of PTCy was not dependant on the increased exposure but rather on the composition of the graft. We have determined that in one-third of donors there is an “activated” phenotype of the graft with high prevalence of CD3, NKT, and CD16-CD56 + cell populations and low prevalence of Treg populations. It is equally observed in BM and PBSC grafts, however the significant clinical impact was observed only for PBSC. Patients who received grafts with this phenotype had higher risk of relapse and worse EFS. This observation is supported by the mechanism of action of PTCy when only activated cells undergo apoptosis due to differences in aldehyde dehydrogenase expression in activated lymphocytes [20].

This phenomenon seems to be unique for PTCy-based prophylaxis. The efficacy of classical prophylaxis is dependent on suppression of cytokine production [35, 38, 39, 40] and subsequent T-cell activation. Thus, this “activated” phenotype of donor immune system only increases the risk of aGVHD what was demonstrated in the previous studies [11, 15] or increases GVL in case of high NK cell population prevalence [18]. In our study, we observed that NK cell subpopulations play a completely opposite role. It was demonstrated that CD16-CD56 + are more prone to rapid activation compared to the other NK-subtypes [41, 42]. Along with that it was demonstrated that NK cells is one of the drivers of lymphocyte proliferation after graft infusion and before PTCy administration [43]. Since donors with high prevalence of NK cells also had increased numbers of T cells in the graft it is unclear which cell population determines the degree of activation after graft infusion. Further elucidation of these aspects in preclinical models might be complicated as the majority of them utilize bone marrow as the graft source [44].

The previous study of immunological recovery after haploidentical transplantation with PTCy identified delayed recovery of CD3 cells compared to MRD with classical prophylaxis [45], while in MRD setting with single-agent PTCy and BM as the graft source there was a relatively rapid recovery [46]. The probable explanation might be that this effect of “over-activation” and profound depletion by PTCy might be more prominent after haploidentical transplantation than after MRD and MUD. As in the previous study by Kanakry et al. [47], we observed that cell populations recovery after PTCy was relatively synchronous. So ALC is a parameter that could be assessed easily and predict the majority of adverse events after HSCT.

The identified subpopulations responsible for cGVHD represent the cells of thymic origin. Imbalance in CD4 + CD8 + [48, 49], Vbeta11 cells [50], and iNKT cells [51] was reported during chronic T-cell activation and autoimmune diseases, like lupus or rheumatoid arthritis. Thus, these expanded subpopulations in graft might represent the predisposition of donor immune system to autoimmune disease, which translates into cGVHD after HSCT.

While PTCy has been used for more than a decade, there might be several explanations why the reported associations have not been reported previously. First, there was relatively small number of studies looking at graft composition. Mussetti et al. [21] reported no impact of higher number of CD3 + cells in the graft on the outcome of haploidentical transplantations with PTCy. This might be due to differences in the protocol: cyclosporine A was given starting day + 1, which could have prevented the hyperactivation of lymphocytes. Also, as described previously, after haploidentical transplantation the effects of over-activation might affect more patients. Second, the studies that evaluated cytokine proteomics started the assessment by day 30, when the consequences of day + 1 and + 2 events might not be so easily captured [47]. The fact that cytokines tested in this study had low predictive value for the prognosis highlights the importance of blood sampling before graft infusion, before and after PTCy administration.

In conclusion, the study demonstrated that high concentrations of effector populations in the graft negatively impact the outcome after MRD and MUD PBSC transplantations with PTCy due to compromised immunological recovery. Confirmatory multicenter studies are required to confirm this observation and determine the safe cut offs. Also it should be studied if this “activation” phenotype could be captured by analysis of lymphocyte subpopulations in a donor peripheral blood at steady-state.