Introduction

Tissue diagnosis remains the cornerstone of clinical management decisions in modern oncology [1]. For brain tumors located in regions that are difficult to access by open surgical approaches, stereotactic needle biopsy is the mainstay procedure for sample acquisition as a means of tissue diagnosis [2, 3]. The available literature on stereotactic needle biopsy largely consists of single institutional experiences, involving surgeries performed by select surgeons [4,5,6]. The diagnostic yield reported is generally favorable and ranged 80–95% [7, 8]. Depending on the study, the risk of procedure-related morbidity spans an order of magnitude, from 0 to > 10% [9, 10]. Post-procedure mortality ranged 0 to 9% [5, 9]. While risk factors for morbidity have been identified in terms of anatomic location [11, 12], tumor histology [11], patient comorbidities [12], radiographic appearance [13, 14], and surgical technique [12, 15], there is a paucity of information on the relationship between the number of biopsies taken and surgical morbidity.

The available literature suggests that increasing the number of samples taken during the biopsy would be accompanied by a higher likelihood of postoperative morbidity [12, 16]. It is also anticipated that increasing the number of samples taken during biopsy may facilitate sampling different regions of the tumor, thereby enhancing diagnostic yield [3, 17]. Whether there is an optimal number of samples taken or trajectory adopted that would maximize the diagnostic yield without compromising surgical safety remains an open question. Since most surgeons perform biopsies in a standardized manner, addressing this question would require comparison of surgeons who adopted distinct practices and the creation of a study cohort with sufficient statistical power to address the issue [4, 5, 18, 19]. While population databases provide data from large patient cohorts, information on the number of samples taken during biopsy is not collected in the currently available databases.

Here, we performed a meta-analysis of the published literature on stereotactic needle biopsy and compared the diagnostic yield, morbidity and mortality for studies with differing mean number of samples taken during biopsy. No significant difference in diagnostic yield was noted between studies reporting different numbers of biopsies. However, higher surgical morbidity was noted in studies where biopsies exceed a threshold number. The results suggest a morbidity model for stereotactic needle biopsy where the risk of complication is non-linearly associated with the number of samples taken.

Methods

Search algorithm

This systematic review and meta-analysis were conducted in accordance with the PRISMA guidelines [20]. A comprehensive PubMed database search was performed on 11/1/2019 for articles focusing on the safety and efficacy of different stereotactic biopsy brain procedures primarily for tumor indications.

The following search strategy was used: ((stereotactic* OR stereotaxis* OR frame based* OR frameless*)[Title/Abstract] AND (biopsy* OR neurobiopsy* OR resection* OR surgery*)[Title/Abstract] AND ((brain* OR neurosurgery* OR neurosurgeon* OR intracranial* OR neurological* OR neuropathology* OR cerebral*OR central nervous system*)[Title/Abstract] OR brain/pathology[MeSH Terms] OR brain diseases[MeSH Terms])) NOT case reports.

The protocol for this meta-analysis was registered with PROSPERO (#CRD42019141383) [21].

The studies included in the meta-analysis followed the following inclusion criteria: (1) written in English (or English language translation available); (2) involved human subjects; (3) fully reported peer-reviewed clinical studies; (4) studies focused on diagnostic yield (DY) and complications in stereotactic brain biopsy procedures, and mentioning mean biopsy numbers. Two authors independently extracted the following data from the included articles. The variables used in data compilation were: first author, publication year, quality score, stereotactic biopsy method, sample size, mean biopsy samples, mean maximum lesion size (cm), diagnostic yield, morbidity and mortality. For three studies, the approximate lesion size,d was calculated from the available data for lesion volume,V using the formula: d = 2x∛V.(3/4π).

Quality score was assigned to each study using the Newcastle–Ottawa scale (NOS) in a similar manner as our previously published analyses [22]. The studies included in this analysis were assessed in a similar manner. A score of 1(versus 0) was assigned for satisfactory fulfillment of each criterion. Studies with a NOS ≥ 5 were classified as high-quality studies and those with NOS < 5 were categorized as low-quality studies.

Statistical analysis

Meta-analysis by proportions was performed to calculate the event rate for diagnostic yield, morbidity and mortality for different studies divided into sub-groups, based on the number of mean biopsies. Based on cumulative analysis, the identified studies were categorized into three groups on the basis of mean number of samples taken during biopsy: < 3 biopsy group (mean biopsies < 3), 3–6 biopsy group (3 ≤ mean biopsies ≤ 6) and > 6 biopsy group (mean biopsies > 6).

The effect size was reported in terms of odds ratio (ORs) with 95% confidence interval (CI). Heterogeneity across the studies was gauged using Higgins inconsistency index (I2) and Cochran’s Q χ2 test [23]. I2 > 50% was considered high heterogeneity, 25–50% moderate heterogeneity, and < 25% was considered absence of heterogeneity [23]. DerSimonian and Laird random effects model was used to pool the meta-analysis results [24].

Publication bias was assessed using funnel plots, Egger’s regression intercept test, and Duval and Tweedie’s trim and fill test [25, 26]. The overall stability of our analysis was determined using a cumulative meta-analysis [27], performed after arranging the studies from largest to smallest w.r.t sample size (and from most to least precise). Sensitivity analysis was performed by excluding one study in the pooled analysis at a time to determine the influence of individual studies [28]. Meta-regression analyses using the number of biopsies as the moderator variable were performed. Different threshold values were tested in terms of number of biopsies.

All analyses were performed using comprehensive meta-analysis (CMA) software, version 3.3070 Biostat, Englewood, New Jersey, USA. P-value < 0.05 was considered statistically significant. Unpaired t-test was used for other statistical analyses.

Results

Study selection

Total 5398 studies were identified from the PubMed database and 6 articles were identified from the reference section of other relevant studies. 4898 studies were manually screened by title and abstract after excluding 482 studies which were not available in English language. 3254 studies were not directly related to stereotactic intracranial biopsy, 574 were related to stereotactic radiosurgery, 298 had not involved human subjects, 123 were case reports, 50 were related to basic sciences, 35 were related to techniques in neurosurgery, and 25 were editorials, commentaries, or proceedings. 539 studies were evaluated in detail. 40 articles evaluated the diagnostic yield, morbidity or mortality in stereotactic brain biopsy procedures. The data for the number of biopsy samples was available in only 18 articles. These studies were included in our meta-analysis (Fig. 1) [4,5,6, 12, 19, 29,30,31,32,33,34,35,36,37,38,39,40,41].

Fig. 1
figure 1

Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) flow diagram of the search strategy and study selection with reasons for exclusion of studies

Study characteristics

The eighteen studies identified were published from 2005 to 2019 and were either prospective observational studies or retrospective analysis of institutional experiences. Using the Newcastle–Ottawa scale [21], fifteen studies were categorized as high-quality, and three were graded as low-quality studies. In total, the 18 studies yielded results for 2416 patients. Demographics for each study are shown in Table 1. The mean age of patients in the study cohorts ranged 40.5 to 82.8 years. The ratio of male to female patients ranged 0.72:1 to 5:1. Mean size of biopsied lesions ranged from 3.2 to 5.1 cm (Table 2). The predominant pathology was glioblastoma.

Table 1 Demographics of included studies and cohorts
Table 2 Mean biopsy samples, diagnostic yield, morbidity, mortality and mean lesion size reported in the study cohorts

Different diagnostic yield, morbidity or mortality was reported for groups within the same study who underwent differing types of biopsies. For instance, studies that compared frame-based and frameless biopsies reported distinct diagnostic yield, morbidity, and mortality for the frame-based and frameless cohorts. As such, we considered these cohorts as distinct entities in our meta-analysis. 28 such cohorts were analyzed in our meta-analysis (Table 2).

Descriptive analysis

In an explorative analysis, we plotted the diagnostic yield, morbidity, and mortality reported for each cohort as a function of mean number of samples acquired during biopsy. We divided the studies into three groups of approximate equal numbers. While there does not appear to be associations between the mean number of samples acquired and diagnostic yield (Fig. 2a) or mortality (Fig. 2b), there appears to be a correlation between the mean number of samples acquired and biopsy related morbidity (Fig. 2c).

Fig. 2
figure 2

Scatter plot demonstrating diagnostic yield (a), mortality (b) and morbidity (c) in < 3, 3–6 and > 6 biopsy samples groups

Because the quality of the study varies, this analysis should be considered preliminary. We next performed meta-analysis and regression studies that controlled for study effect sizes.

Cumulative meta-analyses, sensitivity analysis, and publication bias

We wished to determine whether the three-group stratification was justified. To this end, we first examined whether data reported by studies within each group presented data that is sufficiently homogenous. Within each group cumulative meta-analyses and sensitivity analyses revealed no outlier studies that presented data significantly different from other studies (Supplementary Figure S1–S3). The funnel plot analysis showed gross symmetry in the reported diagnostic yield, morbidity, and mortality as a function of the number of samples acquired during biopsy (Supplementary Figure S4). The results from Egger’s regression intercept test are shown in Supplementary Table 1. These results suggest sufficient homogeneity in study results within each category (as defined by the mean number of samples taken during biopsy) for meta-analyses. As such, all studies were included in the subsequent meta-analyses. Of note, the homogeneity assessment described above was sufficient to qualify studies for inclusion in meta-analysis but does not imply the absence of heterogeneous results within each study group.

Morbidity

Data for post-biopsy morbidity- that included hemorrhage and neurological deficit was available for 22 study cohorts. We first performed a cumulative regression analysis where morbidity risk was calculated for studies performing up to two biopsy samples (≤ 2), three biopsy samples (≤ 3), four biopsy samples (≤ 4) and etc. (Supplementary Table 2). In this meta-analysis, we did not assume the groupings that were used in our pilot analysis and treated the number of samples taken during biopsy as a continuous variable. We reasoned that if our grouping of studies into three strata was justified, we should observe notable increase in cumulative morbidity risk at the respective threshold values of < 3 biopsies, 3–6 biopsies, and > 6 biopsies. Indeed, this pattern was observed. The cumulative morbidity risk for studies that secured ≤ 2 samples differed from those reported for studies involving 3–5 biopsy samples, which showed comparable morbidity risk. Of note, ≤ 2 samples was the lowest threshold value since there were no studies that secured only a single sample on all biopsies. Morbidity risk for studies that secured ≥ 6 biopsies taken were comparable, and differed from studies involving 3–5 biopsy samples. This pattern largely recapitulated that observed in our descriptive analysis presented in Fig. 2c, suggesting that the grouping into three strata was justified.

We therefore performed the subsequent analysis in a stratified manner, with study groups defined based on the mean number of biopsies (< 3, 3–6, > 6). For this analysis, acceptable within-group heterogeneity in morbidity results was noted: < 3 samples (I2 = 0.00, p = 0.79), 3–6 samples (I2 = 72.45, p < 0.001) and > 6 samples(I2 = 79.30, p < 0.001) (Supplementary Table 3). Pooled estimates of morbidity for these groups were 4.3%, 16.3% and 17% for < 3, 3–6 and > 6 biopsy samples, respectively (Fig. 3a). The point estimates differed significantly when comparing studies that carried out means of 3–6 and > 6 biopsy samples to those that carried out < 3 biopsy samples [RC (regression coefficient) = 1.54, p < 0.001 and RC = 1.57, p < 0.001, respectively] (Fig. 3b).

Fig. 3
figure 3

a Forest plot showing pooled estimates for morbidity in < 3, 3–6 and > 6 biopsy samples groups; b Meta-regression analysis for morbidity with biopsy count as the moderator variable; c Forest plot showing pooled estimates for diagnostic yield in < 3, 3–6 and > 6 biopsy samples groups; d Meta-regression analysis for diagnostic yield with biopsy count as the moderator variable; e Forest plot showing pooled estimates for mortality in < 3, 3–6 and > 6 biopsy samples groups. f Meta-regression analysis for mortality with biopsy count as the moderator variable

Diagnostic yield

No significant heterogeneity in diagnostic yield was noted across the studies that carried out < 3 samples during biopsy (I2 = 0.00, p = 0.85) (Supplementary Table 3). Heterogeneity noted across studies for 3–6 and > 6 biopsy sample groups was I2 = 39.86, p = 0.08, and I2 = 83.83, p < 0.001, respectively (Supplementary Table 3). Pooled estimates of diagnostic yield for studies reporting a mean of < 3, 3–6, and > 6 samples taken during biopsy were 90.4%, 93.8% and 88.1%, respectively (Fig. 3c). These point estimates did not significantly differ, suggesting that securing > 3 samples during biopsy did not significantly increase diagnostic yield [< 3 biopsies (reference), 3–6 biopsies (RC = 0.42, p = 0.29), > 6 biopsies (RC = − 0.32, p = 0.44)] (Fig. 3d).

Mortality

Data for mortality associated with stereotactic brain biopsy procedure was available for 19 cohorts. Mortality ranged from 0 to 4.2% in our study cohorts. No significant heterogeneity in mortality was noted across the results in studies that carried out < 3 samples (I2 = 0.00, p = 0.81), 3–6 samples (I2 = 0.00, p = 0.64), and > 6 samples during biopsy (I2 = 0.00, p = 0.65) (Supplementary Table 3). Pooled estimates of mortality for studies reporting a mean of < 3, 3–6, and > 6 biopsy samples were 1.4%, 1.9%, and 3.4%, respectively (Fig. 3e). These point estimates did not significantly differ, suggesting that the number of samples taken during biopsy did not significantly increase the mortality risk [< 3 biopsy group (reference): 3–6 (RC = 0.25, p = 0.71, > 6 (RC = 0.88, p = 0.11] (Fig. 3f). These meta-analysis pooled estimates are generally consistent with those reported in the biopsy literature [5, 6, 42].

Comparison of biopsy count between biopsy methods

We compared the number of mean biopsies between frame-based (FB) and frameless (FL) methods of stereotactic needle biopsy. Biopsy data was available for 11 frame-based, and 15 frameless biopsy cohorts (Table 2). No significant difference was found between the two groups in terms of the samples taken (p = 0.15) (Supplementary Figure S5), in diagnostic yield, morbidity, or mortality [22].

Sub-group analysis

Post-biopsy hemorrhage in all patients

Data for hemorrhage associated with stereotactic brain biopsy procedure was available for 22 cohorts. Pooled estimates of hemorrhage for cohorts reporting < 3, 3–6 and > 6 biopsy samples were 3%, 7.4% and 10.4% [< 3 biopsy group (reference): 3–6 (RC = 0.80, p = 0.28), > 6 (RC = 1.49, p = 0.04)] respectively (Supplementary Figure S6).

Diagnostic yield and morbidity in patients with glioblastoma

Data for diagnostic yield for patients with glioblastoma was available in 7 cohorts. Diagnostic yield was 91.7%, 93.7% and 96.6% in patients who received < 3, 3–6 and > 6 biopsies, respectively. No significant difference was found in the diagnostic yield between the three groups [< 3 biopsy group (reference): 3–6 (RC = 0.30, p = 0.85), > 6 (RC = 0.96, p = 0.59)] (Supplementary Figure S7). Data pertinent to morbidity in patients with glioblastoma was only available for 3 cohorts in 3–6 biopsy, and 2 cohorts in > 6 biopsy group. Pooled estimates of morbidity for cohorts reporting 3–6 and > 6 biopsy samples were 9.6% and 52.6% [3–6 biopsy group (reference): > 6 (RC = 2.34, p = 0.01] respectively (Supplementary Figure S7).

Discussion

A priority in modern medicine involves reduction of variation in clinical practice in order to optimize standardized care and outcomes. Such reduction is difficult to implement in surgical disciplines, where many practices are propagated as a matter of teaching or tradition. The number of samples taken during stereotactic needle biopsy is a case in point. In our curation of the available literature, the average number of samples taken during stereotactic needle biopsy ranged nearly ten-fold between studies (Table 2). While this variation in practice was not associated with differences in diagnostic yield, an increased number of samples taken during a needle biopsy were accompanied by elevated morbidity risk. Meta-analysis of this data set of ~ 2400 patients revealed that the relationship between morbidity risk and the number of samples taken was non-linear. Both continuous and non-continuous meta-analysis revealed a threshold in terms of the number of samples taken, below which the morbidity risk is minimal; the morbidity risk is significantly elevated beyond this threshold.

Meaningful interpretations of the results presented here require thoughtful consideration given variations in surgical practice, patient selection, and tumor type. There are considerable divergences in the practice of stereotactic needle biopsies, including the stereotactic system adopted [36, 43, 44], type of biopsy needles [18, 45], method of sampling [46, 47], and pressure applied for suction [46]. Similarly, surgeons differ in criteria for patient selection in terms of pre-operative condition as well as in experience for stereotactic needle biopsy [48, 49]. Additionally, tumors differ in intrinsic vascularity and anatomic locations. Undoubtedly, all of these factors influence postoperative morbidity. As such, our conclusion should not be taken in absolute or universal terms. It is not the case that > 3 samples taken in biopsy will be associated with a significant increase in morbidity risk for all biopsies by all surgeons. Instead, our study suggests when the needle biopsy literature is analyzed in aggregate, there is a non-linear relationship between the number of samples taken and the risk of postoperative morbidity. The exact thresholds of and incremental risk will necessarily depend on the specific clinical context.

Irrespective of the type of biopsy needle/forceps [18, 45], each additional biopsy samples tissue farther away from the needle or another region of the abnormal tissue. In this context, the threshold morbidity risk model suggests a minimal distance between the biopsy site and regions of tumor vascularity or to the eloquent cortex, beyond which morbidity risk is escalated. The non-linearity in incremental risk likely reflects regional histologic heterogeneity. While modeling of these risks is difficult on a case-per-case basis, probabilistic modeling can be performed using aggregate datasets such as one generated here using models developed to study rare collision events [50, 51]. Such modeling can inform surgical decisions in stereotactic needle biopsies when applied in the context of the MR imaging features that proxy histologic features in the tumor microenvironment [52, 53]. For instance, knowledge of the density of microvasculature and neoplastic cells in a contrast enhancing region can inform the optimal number of biopsy samples in a procedure given estimates of hemorrhagic risk in these regions.

While insignificant, there is a trend toward decreased diagnostic yield with an increasing number of samples taken during biopsy in our analyses. This finding is somewhat anti-intuitive in light of reports demonstrating that increased number of samples taken during a biopsy improves sampling of the abnormal tissue, thereby increasing diagnostic yield [17]. We believe that the findings presented by these studies are robust and sound. Plausible interpretations of our finding in this context include: (1) increased biopsy number represent intra-operative decisions of additional samples secured in response to non-diagnostic samples, which can result from challenging pathologies [54, 55], or from sub-optimal biopsy trajectories/sites [19, 56], or, (2) the finding may reflect institutional variation in pathology expertise in the interpretation of biopsy samples. Needle biopsy samples are typically limited. As such, neuropathology expertise influences the likelihood for definitive diagnosis [3].

This study represents the first of its kind to assess morbidity risk as a function of the number of samples taken during stereotactic needle biopsies. Given the state of the current literature in this arena and the low morbidity of the procedure, we feel that meta-analysis of the aggregate literature is necessary to provide sufficient statistical power to address the question. However, our study design bears a number of intrinsic limitations. First, by nature, this meta-analysis incorporates studies involving surgeons with distinct preferences, differing institutional practice patterns, and varied reporting biases. The lesions biopsies in the various studies also straddle a wide geographic distribution in the cerebrum. Despite these sources for heterogeneity, cumulative meta-analyses and sensitivity analysis demonstrated sufficient homogeneity for this meta-analysis. Second, the mean diameter of the lesions biopsied in the various studies collated here is uniformly > 3 cm. While larger lesions with increased vascularity may increase the risk of hemorrhage, there is a greater likelihood of “missing” the lesion or limited sampling for the smaller lesions. Thus, whether the results proposed here are applicable to biopsies of smaller lesions remains an open question. Finally, prospective validation of the results of this meta-analysis described here is warranted. Design of such studies should take into consideration technologies that aid in the safety of needle biopsies [56, 57], tools that augment diagnostic yield [35, 58, 59], and algorithms for optimal target selection [60].

Conclusions

Meta-analysis of the available literature on stereotactic needle biopsy revealed a morbidity model where risk is non-linearly associated with the number of samples taken, with significant risk escalation beyond a threshold in the number of samples taken.