Introduction

Recent studies have shown that at the time of diagnosis, rates of bone marrow infiltration (BMI) in Hodgkin's lymphoma (HL) and non-Hodgkin's lymphoma (NHL) can reach up to 14% and 25% in the pediatric population, respectively [1, 2]. BMI is an important factor in clinical staging, treatment planning, and prognosis assessment. Bone marrow biopsy (BMB), which can provide a definitive histological diagnosis, is commonly used by clinicians to identify BMI. Nevertheless, BMB is also considered an invasive and painful procedure as it implies for a child to be sampled under general anesthesia, while just a small percentage of bone marrow (BM) is evaluated by BMB, which is prone to false-negative results due to sampling errors [3, 4]. Moreover, lymphomas are highly aggressive tumors, and BMB is not ideal for the diagnosis for BMI of lymphoma [5]. As an invasive examination, pain is one of inevitable adverse reactions of BMB, and other adverse events include bleeding and pyoderma gangrene at the puncture site [6].

As a non-invasive systemic metabolic imaging technique, 18F-FDG PET/CT has been widely used in clinical practice and has been shown to have high sensitivity in most malignancies. 18F-FDG PET/CT has gained an increasingly important role in lymphoma and is currently widely used for patients with lymphoma due to its high accuracy. In addition, 18F-FDG PET/CT can be used for lymph node and extracellular lymph node staging, including bone marrow assessment [7]. Over recent years, numerous studies have focused on investigating whether 18F-FDG PET/CT can be used to evaluate if BMI in lymphoma patients can partially or completely replace BMB, and most studies suggested that PET/CT can completely replace BMB in the evaluation of BMI of HL. However, there is still considerable controversy in the evaluation of the performance of BMI of NHL such as diffuse large B-cell tumor and follicular cell lymphoma [8,9,10,11,12,13].

Due to the different quality of the included studies, and the differences in the patient populations and the study designs, the results of previous meta-analysis studies are quite different. For instance, the pooled sensitivity and specificity of 18F-FDG PET/CT in detecting BMI were only 51% (95% confidence interval [CI], 38%– 64%) and 91% (95% CI, 85%–95%), according to Pakos et al. [14], while Chen et al. [15] also reported a relatively low sensitivity and specificity of 74% (95% CI 0.65–0.83) and 84% (95% CI 0.80–0.89), respectively. Given that the use of 18F-FDG PET/CT in pediatric lymphoma patients could help avoid BMB-related discomfort and complications and improve the detection rate of lymphoma BMI, it is worth promoting in this group of patients. Therefore, we systematically reviewed and analyzed published studies on the accuracy of FDG PET/CT in the diagnosis of pediatric lymphoma BMI [16].

Materials and methods

Search strategy

The PubMed (including Medline), Cochrane, and Embase databases were searched for published articles related to the use of 18F-FDG PET /CT for the detection of BMI in pediatric lymphoma patients. The language was limited to English, there was no start date restriction, and the search included articles published until August 10, 2020. To ensure that the relevant literature was fully reviewed, we expanded the search scope, and the bibliography of the articles that were evaluated as relevant following the screening process was filtered to find potentially suitable references. This systematic review and meta-analysis are compliant with the last PRISMA statement on systematic review and meta-analysis of diagnostic test accuracy [16].

Study selection

Studies investigating the diagnostic presentation of pediatric lymphoma with BMI by 18F-FDG PET/CT met the inclusion criteria. Review articles, guidelines, editorial letters, meetings or summaries, meta-analyses, case reports, studies with fewer than ten patients, non-English publications, and animal studies were excluded. In addition, study inclusion criteria comprised only studies using integrated PET/CT systems, while studies using independent PET systems without CT-based attenuation correction for PET data were excluded. Articles that only focused on staging and post-treatment assessment were excluded, as were articles that only used 18F-FDG PET/CT as reference criteria. In the following stage of reading the full text, the article was excluded when the data in the article were insufficient to obtain true-positive (TP), true-negative (TN), false-positive (FP), and false-negative (FN) data for 18F-FDG PET/CT diagnosis of pediatric lymphoma infiltrating the bone marrow. Articles with the largest number of patients or articles with the most details were selected when the data appeared in more than one article.

Two reviewers independently screened the title and abstract of each retrieved record to obtain the full text that might fit the inclusion criteria. Differences were settled through consultation until consensus was reached.

Study quality

The Quality Assessment of Studies of Diagnostic Accuracy Included in Systematic Reviews (QUADAS)-2 tool, containing four following areas—patient selection, index test, reference standard, and flow and timing, was adopted to assess the methodological quality of the included studies [17]. All the enrolled articles were analyzed by two reviewers according to the above items. Bias risk or application issues were described as L (low risk/concern), H (high risk/concern), or U (unclear).

Statistical analysis

Since BMB can result as false negative due to inadequate sampling of focal bone marrow invasion, BMB is not the only reference standard for calculating the sensitivity and specificity of 18F-FDG-PET/CT in the diagnosis of pediatric lymphoma infiltrating BM [18, 19]. The 18F-FDG PET/CT positivity should at least be combined with BMB and follow-up results as reference criteria to determine the number of TP, FP, TN, and FN in each enrolled study. Hierarchical logistic regression models were used to calculate estimates of pooled sensitivity and specificity, including bivariate and hierarchical summary receiver operating characteristic (HSROC) modeling [20, 21], and a 95% confidence interval (CI) was constructed for these estimates. Publication bias in the enrolled studies was assessed by visual analysis of Deeks's funnel plot, and P values were calculated by Deeks’s asymmetry test [22]. Cochran's Q test and Higgins I2 test were used to evaluate the heterogeneity between studies, with p < 0.05 and I2 > 50%, respectively, indicating the existence of heterogeneity [23]. If heterogeneity was found, meta-regression was performed for subgroup analysis.

The “midas” module in the 14.0 version of Stata software (Stata Corp LP, College Station, TX, USA) was used for statistical analysis, and p < 0.05 indicated statistical significance.

Results

Literature search

A total of 564 articles from PubMed, Embase, and Cochrane Library databases were generated by computer-assisted search; detailed data are shown in Table 1. After discarding 71 duplicate articles, 23 potentially eligible original articles were filtered through titles and abstracts. After reading the full text, 14 research articles were further excluded, including 5 articles not in the English and 4 articles containing insufficient data to construct a 2 × 2 contingency table. Also, the focus of the 4 articles was not on the diagnostic performance of 18F-FDG PET/CT for pediatric lymphoma infiltrating BM, while 2 studies only used PET instead of PET/CT to evaluate the diagnosis of pediatric lymphoma invading bone marrow. Ultimately, 8 studies with 1417 patients that evaluated the diagnostic performance of 18F-FDG PET/CT for lymphoma infiltrating BM were included in this meta-analysis [11,12,13, 24,25,26,27,28]. The detailed screening process of the study inclusion is shown in Fig. 1.

Fig. 1
figure 1

Flowchart of literature selection in the research process

Table 1 Search strategy and results as of August 10, 2020

Characteristics of included studies

The characteristics of the patients from the enrolled studies are shown in Table 2. The included studies were published from 2011 to 2019. The number of patients in the included studies ranged from 31 to 784, and the proportion of patients with BMI ranged from 7.4% to 51.4%. The mean or median age of patients in the 8 studies ranged from 8.0 years to 14.8 years. In 3 out of the 8 studies, the pathological type of lymphoma invading bone marrow only included HL [13, 25, 26]; in 1 study, only NHL [12]; in the remaining 4 studies, both HL and NHL were included [11, 24, 27, 28]. Four studies included patients newly diagnosed with lymphoma [11, 12, 27, 28], 1 included patients who had begun treatment for lymphoma [26], 1 included both newly diagnosed patients and patients who had begun treatment [24], while the remaining 2 were not recorded [13, 24]. All patients enrolled in the study were with pathologically confirmed lymphomas.

The characteristics of the included studies are presented in Table 3. All 8 studies included in this meta-analysis are retrospective studies. All but 1 of the 8 studies were single-center studies. BMB and follow-up were used as reference criteria in 5 studies, and BMB and MRI/CT and follow-up were used as reference criteria in the remaining 3 studies. 18F-FDG PET/CT was interpreted by clinical examiners blinded to the reference standard in all but one study, where it was not described.

The characteristics of 18F-FDG PET/CT are summarized in Table 4. Vendors of PET/CT machines in the study included Philips, GE, and Siemens. The time between 18F-FDG administration and the scan in the enrolled studies ranged from 45 to 90 min. Low-dose CT and contrast-enhanced CT were used in one study; only three studies used low-dose CT, two studies used diagnostic CT, which was not clearly documented in the remaining three studies. All studies identified criteria for the diagnosis of lymphoma BMI with 18F-FDG PET/CT.

Table 2 Features of patients enrolled in the study
Table 3 Features of the inclusion studies
Table 4 FDG PET/CT characteristics

Quality assessment

The Quadas-2 assessment results are shown in Table 5. Objective evaluations of the quality of these included studies were considered to be relatively good. Each of the eight enrolled studies met at least four of the seven QUADAS-2 domains. In terms of patient selection, since only one study included II—IV lymphoma patients, it was considered a high risk of selection bias [27]. In addition, two studies did not describe the stage of the enrolled patients and thus were considered to have an unclear risk of selection bias [27, 28]. In the index test domain, there was an unclear risk of bias in one study as it was not clear whether a preset threshold for lymphoma BMI was used with 18F-FDG PET/CT [13]. With regard to the reference standard domain, there was an unclear risk of bias in one study as it was not clear whether 18F-FDG PET/CT was blinded for the derivation of the reference standard [11], and some biopsy tissues in a study could not generate a more positive result because of lack of material; however, it was not explained how to deal with this issue, thus resulting in high risk of bias (27). With respect to the flow and timing domain, the reference standards used in all studies clarify the time of follow-up and at least more than 3 months, so it is considered to have a low risk of bias. Due to the use of at least BMB as a reference standard in all eight studies, there was low concern related to application issues.

Table 5 Risk of bias and application concerns for included studies were assessed by the QUADAS-2 tool

Diagnostic accuracy

The specificity and sensitivity of 18F-FDG-PET/CT in detecting pediatric lymphoma bone marrow involvement were 81.0% to 100% and 63.0% to 100%, respectively. The Q value of Cochran's Q test was 0.272 (P = 0.436 > 0.05), which indicated that heterogeneity did not exist, but the Higgins I2 suggested significant heterogeneity in both sensitivity (I2 = 66.27%) and specificity (I2 = 87.25%). The pooled sensitivity and specificity of 18F-FDG-PET/CT in detecting pediatric lymphoma involving bone marrow are shown in Fig. 2, with values of 95.0% (95%CI: 86.0% ~ 98.0%) and 95.0% (95%CI: 90.0% ~ 98.0%), respectively.

Fig. 2
figure 2

Coupled forest plot with pooled sensitivity and specificity. Numbers are pooled estimates with 95% CIs in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence interval

The SROC curve of 18F-FDG PET/CT for BMI detection in children or adolescents with lymphoma is shown in Fig. 3. The difference between the 95% confidence interval and the prediction interval was significant, which indicated the heterogeneity among studies. The area under the SROC curve was 0.99, with a 0.97–0.99 of 95%CI. Deeks's funnel plot [22] for the publication bias analysis of each enrolled study is shown in Fig. 4, with a p-value of the slope coefficient of 0.60. The study of Yağci-Küpeli et al.[26] revealed significant bias, which may be related to the positive criterion defined in this study as ≥ 1 lesion uptake higher than liver parenchymal intensity. Also, for all BM avid lesions detected by 18F-FDG, the maximum standardized uptake values (SUVmax)(defined as the average value of the three highest value pixels) were calculated for analysis, finally revealing that the sensitivity and specificity were lower compared to other included studies. Also, the article does not explain how to deal with the data that cannot be obtained with clear results due to insufficient biopsy materials.

Fig. 3
figure 3

SROC curve of the diagnostic performance of 18F-FDG PET/CT for detection of BMI in patients with pediatric lymphoma. AUC = area under the curve; MRI = magnetic resonance imaging; SENS = sensitivity; SPEC = specificity; SROC = summary receiver operating characteristic; BMI = bone marrow infiltration

Fig. 4
figure 4

Deeks et al.'s funnel plot was used for publication bias analysis of each enrolled studies. ESS = effective sample size

Exploration of heterogeneity

Meta-regression analysis results of lymphoma bone marrow infiltration examined by 18F-FDG PET/CT are presented in Table 6. Whether the follow-up results of CT/MRI examinations were used as a reference standard was an important factor affecting heterogeneity (p < 0.01). To be specific, the sensitivity and specificity of the follow-up results of the CT/MRI examination used as the reference standard were higher compared to the group that did not use the follow-up results of the CT/MRI examination as the reference standard, and the sensitivity and specificity were 0.97 [95% CI 0.87–1.00] versus 0.88 [95% CI 0.86–0.97], and 0.97 [95% CI 0.93–1.00] versus 0.87 [95% CI 0.86–0.94], respectively. In addition, the pathological type of lymphoma, clinical setting, CT technique, and sample size were not important factors affecting heterogeneity.

Table 6 The results of meta-regression analysis of PET/CT diagnosis of pediatric lymphoma invading bone marrow

Discussion

Lymphoma is the most common pediatric malignant tumor after leukemia and glioma, and its detection rates have been gradually increasing over recent years [13]. Accurately judging whether the lymphoma infiltrates the BM is related to the clinician's formulation of the patient's treatment plan. In the previously published literature lymphoma involvement in the BM, opinions differed on whether routine BMB should be used to determine the stage of lymphoma [29, 30]. Previously published meta-analyses have shown high sensitivity and specificity of 18F-FDG-PET/CT in evaluating the BMI of lymphoma in children and adults [7, 19,20,21]. This meta-analysis focused on determining the efficacy of 18F-FDG-PET/CT for evaluating the BMI of pediatric lymphoma.

The current systematic review and meta-analysis included 8 studies with 1417 pediatric lymphoma patients who underwent 18F-FDG-PET/CT. Overall, the quality of the included studies was good, and the results were consistent across studies. Our results suggested that the pooled sensitivity and specificity of the enrolled studies were 95.0% (95% CI 86.0% ~ 98.0%) and 95.0% (95% CI 90.0% ~ 98.0%), respectively. Given the excellent diagnostic performance of 18F-FDG PET/CT for BMI of pediatric lymphoma, 18F-FDG PET/CT can be used as one of the main methods for screening newly diagnosed or treated pediatric lymphoma patients for BMI. As a non-invasive method, 18F-FDG PET/CT can be used to avoid the pain caused by BMB in pediatric patients and help determine the optimal treatment regimen. In the present study, we did not compare BMB with 18F-FDG PET/CT through a well-designed controlled trial to compare their diagnostic performance in detecting pediatric lymphoma BMI. Nevertheless, given the high false-negative rate of BMB in detecting lymphoma BMI reported in the literature, we believe that 18F-FDG PET/CT is a superior method for detecting lymphoma BMI in terms of specificity and sensitivity positive predictive value, negative predictive value, etc. Of course, one issue that needs to be considered in pediatric patients is the standardization of radiation doses. A previous study confirmed that an effective dose of 100 mSv was associated with a 1% estimated lifetime attributable risk of developing radiation-induced cancer [31]. Therefore, when pediatric patients undergo PET/CT examinations, the scan radiation dose should be optimized without affecting the diagnosis. For the PET component of PET/CT, studies and guidelines suggest the injection activity of 18F-FDG for pediatric patients of 3.7–5.2 mBq/Kg [32,33,34]. Also, previous studies have confirmed that determining the longest reasonably achievable scan time per bed position can optimize the activity of radiopharmaceutical administration and result in a 40–50% reduction in administered radiotracer activity [32, 34, 35]. As for the CT component of PET/CT, the dose of CT imaging parameters can be reduced by lowering mAs, decreasing kVp, and increasing pitch [32, 33]. Over recent years, numerous studies used ultrasmall superparamagnetic iron oxide particles to enhance MRI or PET/MRI technology in pediatric lymphoma patients to avoid or reduce the radiation dose of pediatric patients [36,37,38,39]. There is no radiation in the MRI examination, and PET/MR imaging can reduce the radiation dose by about 40%-70% compared with the use of PET/CT [37, 38, 40]. Moreover, in an 18F-FDG PET/MR study [41] involving 21 pediatric tumor patients aged 7–17 years, the feasibility of reducing the injection activity to 1.5 MBq/kg under the condition of preserving the clinical practicability of the image was confirmed, which is beneficial to pediatric patients. Still, compared with PET/CT, ferumoxytol-enhanced MRI has a promising development; however, its effectiveness is still slightly inferior and needs to be further verified by future studies [36]. Although PET/MRI shows that the detection and staging of pediatric lymphoma lesions are equivalent to PET/CT [39,40,41,42], its use is expensive compared with PET/CT. Also, PET/MRI has not yet been widely developed in most hospitals around the world. Therefore, we believe that18F-FDG PET/CT should be recommended as a pre-treatment evaluation method for pediatric lymphoma patients. In most medical institutions, 18F-FDG PET/CT is a routine examination for lymphoma patients, and as such, it does not bring additional economic burden to patients.

This meta-regression analysis showed that CT/MRI follow-up results used as the reference standard were an important factor affecting heterogeneity. It is possible that in the study that only used BMB as the reference standard, the small range of biopsy tissues resulted in false negatives, which ultimately led to low sensitivity and specificity. Otherwise, our results also emphasized that regardless of whether the pathological type was HL or NHL, or whether patients were newly diagnosed or already treated, FDG PET/CT showed high sensitivity and specificity. As for pathological type, including NHL patients or not in the study is not a factor affecting FDG PET/CT in the diagnosis of lymphoma bone marrow invasion, and the sensitivity and specificity are 0.94 [0.89–1.00], 0.97 [0.79–0.98] (without NHL), and 0.96 [0.82–1.00], 0.94 [0.92–1.00] (with NHL), respectively. Also, the sensitivity and specificity of newly diagnosed pediatric lymphoma patients (0.97 [95% CI 0.92–1.00], 0.97[95%CI 0.93–1.00], respectively) were slightly higher compared to those who started treatment, or both (0.92 [95% CI 0.89–0.98], 0.93 [95% CI 0.93–0.97]. The reason may be that in patients who have already started treatment, the activity of the tumor is inhibited, and some lesions are false-negative during 18F-FDG PET/CT imaging, which leads to low sensitivity or specificity, but there is no statistically significant between the two (p = 0.10).

The reference criteria for determining 18F-FDG PET/CT positivity used in the current study included at least two or more methods, among which BMB (part) was essential. BMB has long been considered the standard method for bone marrow staging. It is performed by unilateral or bilateral iliac crest annular biopsy to confirm lymphoma BMI, where bilateral iliac crest biopsy reduces the missed diagnosis by 20% compared with unilateral iliac crest biopsy [6, 42]. However, BMB can only obtain effective information in a limited area, and the false-negative rate is still high even after bilateral iliac crest biopsy [11, 26, 27]. In their study on 18F-FDG PET/CT detection of lymphoma BMI in children, Hassan et al.[13] reported that PET/CT scan revealed that 44% of the patients had HL focal BMI, which was not found in iliac crest biopsy. Another study, involving 34 HL and 23 NHL, also showed a high false-negative rate of 46% for pediatric lymphoma BMI detected by BMB [23]. Furthermore, several studies [11, 12, 24, 26, 27] have reported on BMB having lower sensitivity (62.5%, 32%, 56%, 38%, 53.1%, respectively) compared to PET/CT in the diagnosis of pediatric lymphoma BMI. In addition to a high false-negative rate or lower sensitivity, the use of BMB may present an ethical issue, as some pediatric patients cannot tolerate the pain it causes, which has been reported by a number of studies [11,12,13, 26,27,28].

One of the major limitations of this study is that the reference criteria used to determine lymphoma BMI were only partially determined by BMB. In addition to BMB, all studies used different reference criteria, such as MRI, CT, radionuclide bone imaging, and post-chemotherapy follow-up, thus potentially increasing the significant risk of differential validation bias among all included patients. Nevertheless, any further study using only the biopsy to obtain histopathological results as a reference standard is not feasible in clinical practice and would not be ethically justified in determining the lymphoma BMI. Another limitation is that some of the studies included patients with Hodgkin's disease, non-Hodgkin's lymphoma, or a mixture of the two. Meanwhile, reported studies of lymphoma BMI were all based on analysis of per-patient; thus, further studies on lymphoma BMI based on each pathological type or per-lesion are warranted. Besides, the high heterogeneity among studies included in the meta-analysis also affected the general practicability of the results. However, the current meta-analysis remains reliable in demonstrating the important role of 18F-FDG PET/CT imaging in the diagnosis of lymphoma BMI.

Conclusion

18F-FDG PET/CT imaging showed superior diagnostic performance in detecting lymphoma BMI, with sensitivity and specificity up to 95%. Although there was heterogeneity between studies included in this systematic review and meta-analysis, our results suggest that 18F-FDG-PET/CT may be an appropriate alternative to BMB in evaluating lymphoma BMI in newly diagnosed and previously treated pediatric lymphoma patients.