Introduction

Breast-conserving therapy (BCT) is an appropriate treatment for invasive and in situ breast carcinomas with at least equivalent survival outcomes compared to mastectomy [1,2,3,4,5]. Obtaining clear margins while maintaining an optimal cosmetic result is the most important surgical challenge of BCT. Leaving involved margins doubles the incidence of local recurrence [6] and therefore may have an impact on mortality [7]. If not recognized during initial operation, involved margins thus necessitate repeated operations. Multiple tools have emerged over the previous decades to assess intra-operative margin status of the specimen. Still, recent studies have shown that approximately 20% of patients need to undergo at least one or more reoperations to obtain clear margins [8,9,10]. In patients with in situ components, this percentage may even be 30% [8, 11].

Specimen radiography (SR) is a widely used intra-operative imaging tool to verify whether the lesion is present in the specimen and to provide information about margin involvement. Surgeons can seize the opportunity to excise additional breast tissue if the tumor appears close to the specimen edge, thereby trying to convert an initial positive into a final negative margin. SR has been shown to be of value in reducing positive margin rate in several studies [12, 13].

However, a recent meta-analysis showed that SR is inferior to other intra-operative margin assessment tools (i.e., cytology and frozen section) in terms of diagnostic efficacy with a pooled sensitivity and specificity of 0.53 and 0.84, respectively [14]. This study, however, does not take into account different histological subgroups. The extent of ductal carcinoma in situ (DCIS) tends to be underestimated on mammography [15,16,17]. In patients with unsuccessful BCT, the histological DCIS size is usually substantially larger than the radiological size [17,18,19]. While some advocate the potential of using SR for DCIS [20, 21], others question its reliability [16].

The aim of the present study was to review the literature regarding whether or not SR is a reliable method for determining intra-operative margin status in DCIS and invasive cancers with a DCIS component. We performed a systematic review of the literature concerning specimen radiography and DCIS.

Methods

This systematic review was reported in accordance with preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [22].

Article search

Articles were identified using an electronical search in Pubmed, EMBASE, and Cochrane Library up to April 1, 2017. To obtain all articles related to specimen radiography in breast cancer, a search strategy was conducted in consultation with a medical librarian. The final search term consisted of the following terms: (Mastectomy, Segmental”[Mesh], Segmental Mastectom*, partial mastectom*, lumpectom*, breast surg*, breast conserving Surger*, breast conserving therap*, Breast carcinom* OR breast cancer surgery) AND (specimen mammogra* OR specimen radiogra*). No additional filters were used. Additional missing articles were identified by searching the referencing lists for relevant studies.

Selection of studies

The literature search was conducted by one investigator (DV). The abstracts and titles were screened by the same author and checked by a second author (LS), based on preset inclusion criteria. If no final decision could be made from the title and abstract, the full text was retrieved.

Eligibility criteria

We included studies according to the following criteria: (I) adults with breast cancer; (II) articles assessing the diagnostic accuracy of specimen radiography as an intra-operative margin assessment tool; (III) sensitivity and specificity data, compared to final histopathology as reference test. True positive (TP), false positive (FP), true negative (TN), false negative (FN), positive predictive values (PPV), and negative predictive values (NPV) were not required and were calculated from raw data where possible; (IV) must contain a subgroup analysis of either pure DCIS or invasive cancer with DCIS components; and (V) full text available in English, French, German, or Dutch.

As the imaging quality of digital mammography is superior to that of screen-film mammography, especially for the detection of DCIS, we excluded articles older than 15 years. We also excluded studies that used sliced SR, reviews, meta-analysis, case reports, conference abstracts, and animal studies.

Data extraction

Two reviewers (DV and LK) independently collected data from selected studies using a standardized form. Reviewers were not blinded to author or publication source of the studies. The following data were collected: author, year of publication; country; study design; type of specimen radiography used; number of samples; number of DCIS-associated samples; mean age; mean tumor size; diagnostic test accuracy data, including TP, FP, TN, FN, sensitivity, specificity, PPV, NPV, and diagnostic accuracy; positive margin rate; intra-operative re-excision rate; and reoperation rate. Missing diagnostic accuracy data were calculated from the raw data if possible. Diagnostic accuracy values were calculated by (TP + TN)/Total specimens. Authors were personally contacted when clarification was required to correctly interpret the results. When multiple radiological margins were given, we used the threshold for which TP, FP, TN, and FN could be found; otherwise, we used a radiological margin closest to 10 mm. When multiple histological margin thresholds were given, the one with the smallest margin was used. We discriminated between initial specimen and final specimen. Initial specimen is defined as the specimen without any additional tissue taken during first operation. Contrary, final specimen is defined as the specimen including additional tissue resected based on positive SR or performed systematically in the same operation. Moreover, we used two different definitions of positive margin rate (PMR): initial and final. Initial-PMR is defined as the percentage of specimens which were positive before any additional tissue was taken. Final-PMR, on the other hand, is the percentage of positive specimens after first surgery, including any intra-operative re-excisions.

Quality assessment

Methodological quality of the included articles was validated using the quality assessment of diagnostic accuracy studies 2 (QUADAS-2) checklist [23]. This checklist is designed to evaluate the risk of bias and applicability of primary diagnostic accuracy studies. It consists of four key domains: patient selection, index test, reference standard, and flow and timing, which can be classified as either low, high, or unclear risk of bias. Signaling questions help determine the risk of bias. All signaling questions were considered to adequately assess study quality for present review. We added one signaling question to the reference standard domain: did the study differentiate between initial and final specimen? We choose to add this question because we noted that some studies did not clearly report this differentiation. We theorized that if a SR is found to be positive during operation, additional tissue is taken, ultimately converting a positive margin of the initial specimen to a negative margin in final pathology. In other words, a true positive is wrongly qualified as a false positive, adversely affecting diagnostic accuracy. One author (DV) did this assessment.

After consultation of a medical statistic statistician, we considered meta-analysis not feasible because of clinical diversity and differences in index test and reference test measurements.

Results

Finding and selecting studies

A total of 232 studies (Fig. 1) were found through electronical search and an additional 3 articles by cross referencing. After removing duplicate publications, 157 unique articles remained of which 120 failed to meet inclusion criteria based on title and abstract alone. A total of 37 articles were assessed for eligibility by review of the full text. Finally, 9 studies remained for analysis after excluding 28 articles because of (I) no DCIS sub-analysis was possible from raw data (n = 26); (II) multiple intra-operative margin assessment tools were used other than SR at the same operation (n = 5); (III) article failed to meet date of publication limit (n = 4); and (IV) SR was not used as a margin assessment tool (n = 2). Studies could be excluded based on multiple criteria. The 9 studies remained that reported sensitivity and specificity data and were included for qualitative review (Fig. 1).

Fig. 1
figure 1

PRISMA flow chart of included studies

Main results

The study characteristics are shown in Table 1. All studies included consecutive patients over a period ranging from 1 to 15 years. Inclusion criteria were diverse for the individual studies, as shown in Table 1. A total of 1141 specimens were reported, including specimens without DCIS components. One study [16] did not publish the sample size from which sensitivity and specificity for specimens with DCIS components were calculated. Therefore, we could not calculate the number of specimens included in the DCIS sub-analysis. After exclusion of this study, the total sample size with DCIS components was 881. Sample size ranged from 22 to 266 in the whole population and from 22 to 164 when only DCIS-associated lesions were considered. Mean age ranged from 52 to 59 years. Mean tumor size on final histopathology ranged from 10.2 to 24.2 mm. The index test was defined intra-operative by a radiologist in three studies, while in five studies radiologists measured radiological distances retrospectively. In one study, it was unclear whether specimen assessment was performed intra-operatively or in retrospect. Five studies discriminated between initial and final specimen histology, whereas four did not or this remained unclear based on the full text.

Table 1 Study characteristics of included studies

A total of nine studies published data concerning sensitivity and specificity or data from which these values could be calculated (Table 2). Sensitivity ranged from 22 to 77% and specificity ranged from 52 to 100%. The positive predictive value (PPV) and negative predictive value (NPV) were available for 7 studies. PPV ranged from 53 to 100% and NPV ranged from 32 to 95%. Only five studies presented diagnostic data in a 2 × 2 table of data from which this could be calculated. Hence, accuracy was only available for these studies and ranged from 55 to 95%. Final-PMR of DCIS-associated lesions was documented in six studies and ranged from 28 to 63%. In studies which also documented lesions without DCIS components, the final-PMR was higher when only lesions with DCIS components were considered, except for one study [20].

Table 2 Diagnostic accuracy and secondary outcome data of specimens with either pure DCIS or invasive with DCIS components

Methodological quality assessment

Risk of bias and applicability assessment using QUADAS-2 checklist [23] for each individual studies are shown in Table 3. High risk of bias for patient selection was found in four of nine included studies. The main reason for this high risk was inappropriate exclusion of specimens. In one study, the risk of bias in patient selection was increased because SR was used less often when the preoperative biopsy showed an invasive lesion; instead, they used frozen section for these patients [24]. Applicability of patient selection was considered adequate in 78%. We found a high risk of bias and high concerns regarding applicability for index test in six of nine studies. This was mainly due to the index test being retrospectively assessed (i.e., radiologist or surgeons revised specimen radiographies and evaluated margin involvement based on preset threshold or measure margin width). We also noted a high risk of bias in reference standard. As stated above, we considered it important that studies differentiated between initial and final specimen histopathology as this could greatly influence outcomes. Due to this, we evaluated a high risk of bias for reference standard in four studies. The flow and timing domain was considered inadequate in three of nine studies. In two studies, it was unclear how diagnostic values were calculated [16, 25] and in one study not all specimens were included for analysis [20].

Table 3 Risk of bias and concern of applicability assessment using QUADAS-2-checklist

Discussion

This systematic review was conducted to assess the performance characteristics of specimen radiography in specimens with pure DCIS or with invasive cancers with DCIS components. The results of this review suggest that SR may be an unreliable tool for margin assessment in DCIS and invasive carcinoma with DCIS components. We noted a wide range of sensitivity, from 22 to 77% (i.e., the probability that SR will be positive when pathological margin is positive) but it was overall low. Specificity (i.e., the probability that SR will be negative when the pathological margin is negative) was generally moderate, ranging from 52 to 100%, although the majority of studies found values of around 75%.

This low diagnostic performance can be partly explained by the intrinsic accuracy of SR in general. In a systematic literature review and meta-analysis, St John et al. found that SR was substantially inferior to intra-operative cytology and frozen section in terms of diagnostic accuracy, with a pooled area under the receiver operating characteristic curve (AUROC) of 0.73 for SR versus 0.98 and 0.96 for cytology and frozen section, respectively [14]. Nevertheless, frozen section has its limitations. Freezing artifacts, folding of the specimen, and air bubbles can jeopardize adequate interpretation of the slides [26, 27]. Furthermore, it takes time to transport, prepare, and analyze the specimen, leading to longer operation times [28]. As no meta-analysis was feasible in the present study, we could not compare their results with our review.

A possible flaw of SR is an erroneous interpretation of orientation and specimen handling. In only 48–56% the direction of the shortest distance measured with SR correlated with the shortest distances measured at final pathology report [16, 29]. Slight movement of the specimen during the time from excision to final pathology can change the orientation of the involved margins. In other words, a lateral involved margin is incorrectly identified as a cranial involved margin. This finding puts the adequacy of pathology as true standard of reference into perspective. Future investigations can minimize this bias by taking measures to ensure correct orientation by, for instance, immediate intra-operative inking or addition markers of the specimen.

DCIS size tends to be underestimated at radiological imaging based on microcalcifications at mammography [15, 17,18,19], leading to inadequate excisions. Re-excision rate in DCIS-associated specimens are higher than those of other breast cancers combined [8, 11]. We also found that studies that documented both total PMR and PMR of DCIS-associated specimens reported a higher PMR in the latter group [24, 25, 29, 30], except for one study [20]. The presence of DCIS is a known risk factor for involved resection margins [10, 31,32,33,34]. In specimens with both invasive and DCIS components, 78–96% of involved margins were due to DCIS [29, 34]. The radiological extent of DCIS is mainly based on architectural distortion and/or microcalcifications, but these mammographic characteristics do not well correlate with the actual DCIS size.

These observations question the adequacy of present preoperative imaging protocols, especially for DCIS-associated lesions. In the reviewed series, only one used preoperative magnetic resonance imaging (MRI) combined with mammography as preoperative imaging and, most interestingly, had the best performance characteristics [20]. Recent studies on preoperative MRI in DCIS consistently show a more adequate prognostication of the DCIS extent, compared to mammography or ultrasound alone [35,36,37,38,39,40], although a meta-analysis showed no improvement in surgical outcomes for patients with DCIS [41]. Furthermore, general application of preoperative MRI for all lesions has not been shown to benefit PMR or reoperation rates in several randomized controlled trails and observational studies [42,43,44]. A meta-analysis even showed increased mastectomy rates in patients with preoperative MRI, and no reduction of incomplete resections and reoperation rates after initial breast-conserving resections [45]. Reoperation rates following preoperative MRI could even be paradoxically increased compared to a control group [43]. Therefore, applying preoperative MRI as general practice for all histology types seems not beneficial. The value of MRI for distinct cases, such as DCIS, might be very useful to compare to intra-operative SR.

There is no consensus which distance in mm from tumor to resection margin at specimen radiography should be used to determine whether excision of additional tissue should be recommended. A greater threshold leads to less missed positive margins but increases unnecessary resections of healthy tissue. Some studies noted increased sensitivity when a greater radiological margin was used, but specificity inversely decreased [16, 21, 46]. Efforts have been made to define the optimal threshold. Radiological margin widths of 4–11 mm have been proposed [18, 29]. Using receiver operating characteristic curves, optimal combination of sensitivity and specificity suggested a 15-mm margin width [21]. An optimal radiological threshold for DCIS-associated specimens is yet to be determined.

As stated above, we found very diverse outcome data. The lowest sensitivity was 22% [16]. However, it remains unclear how many specimens were included in this series. Hisada and colleagues [20] documented opposing result in their analysis of 22 specimens with DCIS components. They found a PPV and NPV of 100% and 95%, respectively. Margin involvement was incorrectly identified in only one of 22 retrospective specimen radiographs. However, they excluded cases with re-excisions based on an intra-operative positive SR and thus introduced bias.

The heterogeneity of entry and exclusion criteria explains the diversity in outcomes (Table 1). Even when selecting only DCIS specimen, this subgroup analysis consists of diverse study populations. For instance, one study [16] explicitly excluded pure DCIS specimens, whereas another [21] only included pure DCIS lesions. Another explanation for diversity is the difference in how index test was defined. In five studies, index test was defined retrospectively, in three intra-operatively, and in one this was unclear.

The overall methodological quality was poor with a high risk of bias in multiple studies. Publications did not consistently differentiate between initial and final specimen margin involvement. A positive intra-operative SR will likely lead to immediate excision of additional tissue, potentially converting a positive initial margin to a negative final margin. When the final specimen pathology is set as the reference standard, this means that an unknown number of True Positives are erroneously considered as False Positive. In six studies, this was either the case or this was unclear. This major flaw can contribute to the low diagnostic accuracy in the literature.

In the end, the main purpose of using SR as a margin assessment tool is to lower reoperation rates while optimizing cosmetic outcomes. Diagnostic accuracy in terms of sensitivity and specificity is hard to translate into daily practice. A more valuable parameter to assess the effectiveness of SR is to determine the ability of SR to convert positive to negative margins. In other words, how many SR procedures are required to prevent one re-excision? Only a few studies have published on this conversion rate. We believe this ‘number needed to treat’ (NNT) is a more valuable parameter to determine usability of a margin assessment tool. Contrary to that is the number needed to ‘harm.’ The latter means the rate at which false-positive SR leads to excision of healthy tissue or the rate at which a false-negative SR leads to a postponed re-excision.

In order to assess this NNT in future investigations, a clear distinction between initial and final specimen is mandatory. For each individual specimen, the margin status must be reported before and after direct re-excision. Only then the true value of SR can be assessed. Prospective study design must focus on obtaining the number SR images needed to convert one initially positive margin to a final negative margin. We believe that a NNT offers a more practical application and opens opportunities for cost–benefit analysis. Additionally, the role of preoperative MRI is yet unclear and open to discussion. Further research should be undertaken to investigate the value of preoperative MRI as its role is not yet clear. Finally, efforts must be made to minimize the possibility of inaccurate interpretation of SR because of wrong orientation or problems with specimen handling. It can be hypothesized that immediate inking or marking of the specimen during operation can prevent these issues.

Differentiating between initial and final specimen also bypasses the issue that a re-excision based on true-positive SR has a chance to fail at converting margins to negative. For instance, Rua et al. noted 16 intra-operative re-excisions based on a positive SR in a series of 62 DCIS-associated specimens. None of these cases converted positive margins to negative margins [25]. Although it is unclear what proportion of these patients had positive margins at SR, none of these patients reportedly benefited from SR; however, it is likely that some did benefit.

Our review has certain limitations. First, we noted a wide range in outcomes and methodological designs. We found a high risk of bias in a substantial part of included studies. Therefore, we believed pooling of data was not feasible and our findings need to be interpreted with caution. Future, prospective studies are needed that address the flaws of the current studies described in this review. Second, we observed different definitions of pathological free margins. This finding reflects the difference in current guidelines across the world. In order for future research to be widely applicable, a uniform definition of pathological free margin needs to be established. Third, in this review we evaluated the reliability of SR in only DCIS-associated specimens. The findings are not thus representative for all breast lesions.

Conclusion

The present results do not support the routine use of intra-operative specimen radiography to reduce the rate of positive margins in patients undergoing breast-conserving surgery for pure DCIS or the DCIS component in invasive cancer. We recommend future prospective studies to discriminate between margin status before and after re-excision and to focus on the number needed to treat rather than diagnostic accuracy. This should allow for a more applicable outcome of results and will provide opportunities for cost–benefit analysis.