Intraductal papillary mucinous neoplasms (IPMNs) of the pancreas represent the most common radiographically identifiable precursor lesions for pancreatic adenocarcinoma.1 Compared with main duct type IPMNs (MD-IPMNs), the clinical management of branch duct type IPMNs (BD-IPMNs) with or without main pancreatic duct (MPD) dilation remains controversial.2,3,4 Therapeutic strategies for BD-IPMN depend mainly on the suspicion of high-grade dysplasia (HGD)/malignancy emerging from preoperative assessment. Therefore, accurate prediction of the risk for HGD/malignancy is important for patients with suspicious BD-IPMN, which would assist in the therapeutic decision of resection for BD-IPMN harboring HGD/malignancy or in the surveillance of patients who have a low risk of HGD/malignancy to avoid potentially morbid and life-threatening surgery.

Radiologic imaging evaluation of HGD/malignancy in patients affected by BD-IPMN has been investigated in many studies and used in clinical guidelines. The International Association of Pancreatology (IAP) published the first International Consensus Guidelines (widely known as the Sendai guidelines) for the evaluation and management of IPMN in 2006 and recommended resection only for BD-IPMN with the following features5: symptomatic cysts, asymptomatic cysts size of 3 cm or larger, MPD dilation of 6 mm or more, or presence of a mural nodule (MN).

In 2012, the IAP updated these criteria, referred to as the Fukuoka guidelines,6 and established the following new classification of features: high-risk stigmata (HRS) and worrisome features (WFs) based on potential clinical and radiologic predictors of HGD/malignancy. Compared with the Sendai guidelines, in brief, it introduced more detailed radiologic characteristics. Factors associated with HRS (with surgical resection recommended) due to the high risk of HGD/malignancy included obstructive jaundice, an enhancing solid component, and an MPD dilation of 10 mm or more. The WFs (with further assessment by endoscopic ultrasonography [EUS] and/or cytology recommended) included a cyst size of 3 cm or larger, thickened/enhancing cyst walls, an MPD diameter of 5 to 9 mm, non-enhancing MN, and abrupt changes in MPD caliber with distal pancreatic atrophy. The latest update, published in 2017, made only minor revisions and put particular emphasis on the size of enhancing MN for predicting HGD/malignancy, while adding lymphadenopathy and cyst growth rate as WFs.2

Other guidelines for asymptomatic pancreatic neoplastic cysts published in 2015 by the American Gastroenterological Association (AGA)3 suggested that pancreatic cysts with at least two high-risk features (size ≥3 cm, MPD dilation, and solid component) should be examined by endoscopic ultrasound-fine needle aspiration (EUS-FNA) and recommended resection for positive cytology and/or the presence of a solid component and MPD dilation.

The European evidence-based guidelines in 2018 established “absolute indications” and “relative indications” for surgery4 as the concepts of HRS and WFs, respectively, established by the IAP. Compared with the imaging features in the latest Fukuoka guidelines, it added solid mass to absolute indications, while including only cyst size of 4 cm or larger, enhancing MN size smaller than 5 mm, and a cyst growth rate of 5 mm or more per year as relative indications.

Imaging features require constant re-evaluation of their role in predicting HGD/malignancy of BD-IPMN. For example, cyst size remains controversial in the current literature. The Sendai guidelines recommended surgery when a cyst is 3 cm or larger. However, it was subsequently questioned by further studies, and the revised Fukuoka, AGA, and European guidelines relaxed the cyst size threshold for resection.

Currently, only the European Study Group guidelines are evidence-based, but most studies that drafted the guidelines showed a low level of evidence. Three previous meta-analyses7,8,9 evaluated imaging findings suggestive of HGD/ malignancy in BD-IPMN, but included limited imaging features. In this study, we performed a comprehensive and methodologically rigorous systematic review and meta-analysis to determine the imaging features predicting HGD/malignancy in BD-IPMN, including mixed type, and their diagnostic accuracy.

Methods

Search Strategy

This systematic review was performed in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.10 We searched the PubMed, Embase, and Cochrane databases for relevant articles published in the English language until 14 July 2020. The search strategy used the following terms: [(pancreas OR pancreatic OR pancrea*) AND (“intraductal papillary mucinous” OR IPMN)] AND (imaging OR “computed tomography” OR CT OR “magnetic resonance imaging” OR MRI OR “magnetic resonance cholangiopancreatography” OR MRCP OR “endoscopic sonography” OR endosonography OR “endoscopic ultrasound” OR EUS). We also checked the reference lists of included studies and review articles for possible additional studies.

Inclusion Criteria

In our study, patients with cystic lesions originating from the branch ducts, regardless of MPD dilation, on radiologic imaging or EUS were regarded as having BD-IPMN. According to the guidelines of the World Health Organization, IPMNs are histologically diagnosed as low-grade dysplasia (LGD)/adenoma, moderate dysplasia (MGD)/borderline, HGD/carcinoma in situ (CIS), or invasive carcinoma. We divided BD-IPMNs into “LGD/MGD” (including LGD and MGD) and “HGD/malignancy” (including HGD and invasive carcinoma) on the basis of the pathologic assessment.

Studies were selected based on the following inclusion criteria: (1) written in English with full text available, (2) involved patients evaluated for BD-IPMN by computed tomography (CT), magnetic resonance imaging (MRI), magnetic resonance cholangiopancreatography (MRCP), or EUS, (3) had a reference standard formed by histopathologic diagnosis in surgical resection specimens or autopsy specimens, (4) were prospective or retrospective studies with more than 10 patients, and (5) provided sufficient data for construction of diagnostic 2 × 2 tables.

Exclusion Criteria

Studies that did not meet the inclusion criteria were excluded. In addition, the exclusion criteria ruled out (1) editorials, review articles, letters, case reports, conference abstracts, and comments, (2) a reference standard formed by clinical follow-up evaluation or cytology via EUS-FNA without histopathologic confirmation, (3) studies of fewer than 10 patients, (4) studies reporting overlapping data (duplicate studies were included if they reported different imaging characteristics of the same patient cohort; for these studies, we extracted the different parameters for analysis to avoid overlapping data of the same variable), and (5) studies with insufficient original data to derive 2×2 tables.

Study Selection

Two reviewers screened the titles and abstracts independently for eligibility based on the inclusion and exclusion criteria. Then the full-text screening for the remaining potentially relevant studies was performed. Disagreements were resolved by discussion, and consensus was achieved. Finally, eligible studies were included in the systematic review and meta-analysis.

Quality Assessment

Two review authors independently assessed the methodologic quality of the included studies using the Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool,11 resolving differences by discussion.

Data Extraction

One reviewer independently extracted data from each study according to a prespecified protocol, and a second reviewer checked the data. A consensus was reached on all items. The following were extracted: (1) study characteristics (first author, year of publication, country of origin, and study period and design), (2) participant characteristics (number of BD-IPMN patients, mean age, sex distribution, proportion of HGD/malignancy in BD-IPMN, and presence of symptoms), (3) imaging characteristics (imaging method, maximum cyst size, thickened/enhancing cyst walls, multiplicity, MN, solid component, enhanced solid component/MN, non-enhanced MN, MPD dilation, abrupt change in MPD caliber with distal pancreatic atrophy, and lymphadenopathy), and (4) outcome data (numbers of true-positives [TP], false-positives [FP], false-negatives [FN], and true-negatives (TN). The cutoff values of cyst size were set at 2 cm, 3 cm, and 4 cm. The definition of MPD dilation varied across the studies (different cutoff values of MPD diameter were used), and MPD diameters of 5 mm and 10 mm were recorded.

Data Synthesis and Statistical Analysis

For individual imaging features in each study, 2×2 contingency tables were populated with TP, FP, FN, and TN data. As a single indication of test accuracy, the current study used the diagnostic odds ratio (DOR) as the overall and primary outcome measure of imaging accuracy. The DOR is the ratio of the odds of the present imaging characteristic when the pathology diagnosis is truly HGD/malignancy relative to the odds of the present imaging characteristics when the pathology diagnosis is truly LGD/MGD.

This ratio, calculated as DOR = (TP/FP)/(FN/TN), ranges from zero to infinity, with a value of 1 indicating that the test does not discriminate between patients with HGD/malignancy and those with LGD/MGD. Higher values indicate better diagnostic accuracy.12 We calculated the pooled DOR and the 95 % confidence interval (CI) using the DerSimonian-Liard random-effects model.

The I2 index evaluates the extent of heterogeneity among studies, and statistical heterogeneity was considered high when I2 was greater than 50 %. Publication bias was assessed by visual examination of funnel plots as well as by statistical analysis using the Egger test when the number of studies reporting the outcomes was 10 or more, with trim and fill analysis performed to yield publication bias-adjusted DORs.

We used a bivariate random-effects approach to construct the summary receiver operating characteristic (SROC) curve and to estimate the summary area under the curve (AUC) sensitivity and specificity of imaging features, which were used as other outcome measures to compare diagnostic accuracy.13All statistical analyses were performed using Meta-Disc version 1.4 (Meta-Disc, Unit of Clinical Biostatistics team of the Romany Cajal Hospital, Madrid, Spain) and Stata version 15.1 (StataCorp LP, College Station, TX, USA). All tests were two-tailed, and a P value lower than 0.05 was considered statistically significant.

Results

Study Selection

For this study, 3027 studies were title- and abstract-screened for inclusion. Of these studies, 2909 were excluded because of ineligibility. Of the 118 potentially eligible studies that were full text-screened, 80 were excluded. Of the excluded studies, 2 investigated patients without histopathologic confirmation, 27 did not separate the data of BD-IPMN, 4 did not meet the HGD/malignancy criteria,14,15,16,17 19 did not have the data required for the construction of 2×2 tables, 5 had overlapping data,18,19,20,21,22 and 23 were irrelevant. Finally, 38 studies that met the eligibility criteria were ultimately included in this meta-analysis. The process of article selection for inclusion is demonstrated in Fig. 1.

Fig. 1
figure 1

Flow diagram of the literature search and selection process

Study Characteristics

The study characteristics are summarized in Table 1. The study investigated 3114 patients during a period ranging from 1982 to 2017, with publication years from 2001 to 2020. The mean age ranged from 57.9 to 68.9, years, and the proportion of men was 36.4 % to 80 %.

Table 1 Characteristics of the included studies

In terms of imaging methods used to evaluate the lesion characteristics, 29 studies used multiple imaging methods23,24,25,30,31,32,33,34,35,37,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,59,60 including CT, EUS, and MRI/MRCP; 7 studies used only one imaging method27,28,36,38,39,57,58; and the imaging methods for 2 studies were not stated.26,29 The proportion of HGD/malignancy ranged from 11.7 to 59.4 %, and the presence of symptoms in patients with BD-IPMN ranged from 16.3 to 93.2 %.

Quality Assessment

The overall quality of the studies included in this meta-analysis was evaluated using QUADAS-2. The study quality was generally high, with a low risk of bias and low concerns of applicability (Fig. S1). All the studies fulfilled five or more of the seven items in the QUADAS tool. The common weakness was lack of blindness in imaging to the reference standards, which was documented in only 10 studies,24,27,29,31,38,47,51,52,53,55 leading to an “unclear” assessment of imaging analysis bias in the remaining 28 studies.

Imaging Features for the Diagnosis of HGD/Malignancy in BD-IPMN

As the main diagnostic accuracy index, the pooled DORs of individual imaging features for the diagnosis of HGD/malignancy in BD-IPMNs were calculated (Table 2; Fig. 2). The significant imaging features for HGD/malignancy were enhanced solid component/MN (DOR, 12.21; 95 % CI, 6.14–24.27), an MPD diameter of 10 mm or greater (DOR, 7.93; 95 % CI, 3.02–20.83), solid component (DOR, 4.85; 95 % CI, 2.49–9.42), lymphadenopathy (DOR, 4.84; 95 % CI, 1.11–21.06), MN (DOR, 4.48; 95 % CI, 3.15–6.39), an MPD diameter of 5 mm or greater (DOR, 3.69; 95 % CI, 2.62–5.19), an abrupt change in MPD caliber with distal pancreatic atrophy (DOR, 2.65; 95 % CI, 1.66–4.24), thickened/enhancing walls (DOR, 2.38; 95 % CI, 1.57–3.60), and cyst size of 3 cm or greater (DOR, 1.98; 95 % CI, 1.48–2.64). On the other hand, the pooled DORs of cyst 2 cm in size or larger (DOR, 1.52; 95 % CI, 0.90–2.53), cyst 4 cm in size or larger (DOR, 1.96; 95 % CI, 0.81–4.75), multiplicity (DOR, 0.78; 95 % CI, 0.59–1.05), and non-enhanced MN (DOR, 0.96; 95 % CI, 0.50–1.85) did not show statistical significance. Among all the imaging features analyzed, only solid component showed substantial heterogeneity (I2 = 53.5 %).

Table 2 The pooled diagnostic odds ratios (DORs) of imaging features for the diagnosis of HGD/malignancy in BD-IPMNs
Fig. 2
figure 2figure 2figure 2figure 2

Forest plots of the pooled diagnostic odds ratios (DORs) of imaging features for predicting high-grade dysplasia (HGD)/malignancy in branch-duct intraductal papillary mucinous neoplasms (BD-IPMNs)

Based on the results of the symmetric funnel plot (Fig. S2) and Egger test (Table 2), no publication bias was found in cyst size of 3 cm or greater or MN. However, thickened/enhancing cyst walls and an MPD diameter of 5 mm or greater showed asymmetric funnel plots. Furthermore, the trim and fill method showed that the publication bias-adjusted DORs were similar to the original unadjusted DORs in terms of the aforementioned four imaging features and reached statistical significance, which indicated that the original pooled DORs were robust.

Diagnostic Accuracy of Imaging Features for Diagnosing HGD/Malignancy in BD-IPMN

Table 3 and Fig. 3 show the diagnostic accuracy of each statistically significant imaging feature identified for diagnosing HGD/malignancy in BD-IPMNs. Among these findings, an MPD diameter of 10 mm or greater, enhanced solid component/MN, and lymphadenopathy showed a large AUC (0.95, 0.89, and 0.89, respectively) and a high specificity (0.98, 0.95, and 0.97, respectively), but a low sensitivity (0.14, 0.38, and 0.09, respectively). The remaining imaging features showed an AUC ranging from 0.54 to 0.77, a specificity ranging from 0.62 to 0.93, and a sensitivity ranging from 0.17 to 0.59.

Table 3 Summary of pooled indices of diagnostic accuracy for imaging features suggestive of HGD/malignancy in BD-IPMN
Fig. 3
figure 3

Summary receiver operating characteristic (SROC) curve of imaging features for predicting high-grade dysplasia (HGD)/malignancy in branch-duct intraductal papillary mucinous neoplasms (BD-IPMNs) (prediction and confidence contours for the 38 studies included in the meta-analysis)

Discussion

This study aimed to determine imaging features and their diagnostic accuracy for predicting HGD/malignancy in patients with BD-IPMN. The findings showed that among the many suggested imaging features in the current guidelines, two imaging features (enhanced solid component/MN and an MPD diameter of ≥10 mm) were the most highly suspicious features for HGD/malignancy in BD-IPMN, with the highest DORs (12.21 and 7.93, respectively) and largest AUC (0.89 and 0.95, respectively). Another seven imaging features (solid component, lymphadenopathy, MN, MPD diameter of 5 mm or greater, abrupt change in MPD caliber with distal pancreatic atrophy, thickened/enhancing walls, and cyst ≥3 cm) showed overall low DORs (1.98 to 4.85) and low AUC (< 0.70 for 5 imaging features).

Our results were generally in accordance with the current guidelines, especially the revised Fukuoka guidelines2,6 and the European guidelines,4 which recommend unique therapeutic strategies for BD-IPMN through classifying imaging features as indications for surgery or surveillance on the basis of differences in the risk for HGD/malignancy of each imaging feature. In addition, diagnostic accuracy analysis showed an overall high specificity (>0.80 for 7 imaging features), but a relatively low sensitivity (<0.60 for the aforementioned 9 imaging features). From these results, we concluded that the diagnostic accuracy of these imaging features may be limited by poor sensitivity.

Recently, the updated Fukuoka guidelines and the European guidelines have placed special emphasis on the role of MN (enhancement and size) in predicting HGD/malignancy, suggesting that the presence of MN enhanced 5 mm or more represents an indication for surgery (HRS and absolute indications, respectively). Several studies also have reported that the size of MN showed significant independent factors associated with HGD/malignancy in BD-IPMN.33,53,61,62 However, this diagnostic value could not be evaluated because only one study37 reporting the enhancement and size of MN was included in our study, with another two studies33,50 reporting only the size of MN, and one additional study reporting only the enhancement of MN.53 Therefore, we used “enhancing solid component/MN” as a composite and finally included four studies.

Our study showed that the presence of enhanced solid component/MN, regardless of its diameter, had the best DOR (12.21) and could increase the risk of HGD/malignancy 12-fold, indicating that it was a strong feature suggestive of HGD/malignancy for patients with BD-IPMN. Furthermore, enhanced solid component/MN showed an extremely high AUC (0.89) and specificity (0.95), but a low sensitivity (0.38), indicating that we may miss a substantial number of patients with HGD/malignancy.

These results were consistent with those of previous studies37,53 reporting that enhanced MN had a high specificity (0.85 to 0.95) and a low sensitivity (0.49 to 0.63). These results demonstrated that enhanced solid/component has a strong effectiveness in determining which patients should undergo resection and could be used as an indication for surgery. However, because of insufficient data to investigate a possible dimensional cutoff of enhanced solid component/MN related to an increased risk of HGD/malignancy, the cutoffs (a threshold of 5 mm adopted by the Fukuoka and European guidelines) of enhanced solid component/MN in HGD/malignancy risk prediction could not be confirmed in this study.

In addition, applying MN (an indication for surgery in the 2006 Sendai guidelines) instead of enhanced solid component/MN for identifying HGD/malignancy in BD-IPMN showed a lower DOR (4.48), similar to two recent meta-analyses7,9 (DOR, 4.1−6.0), and an AUC of 0.77 (sensitivity, 0.53; specificity, 0.81), indicating that evaluation of enhancement could help to identify a true MN and to distinguish mucin globules from MN.

Similar to MN, solid component (absolute indications in the latest European guidelines) showed a lower DOR (4.85) with an extremely low AUC (0.54) (sensitivity, 0.34; specificity, 0.92), emphasizing the importance of using contrast-enhanced imaging to improve diagnostic accuracy.

Based on these results, we believe that MN or solid component used as a clear indication for surgery is of concern, and that the new European guidelines regarding solid component should be applied with caution. Non-enhancing MN, suggested as a WF in the 2012 Fukuoka guidelines6 but not recommended in the 2017 Fukuoka guidelines, did not show statistical significance associated with HGD/malignancy.

An MPD diameter of 10 mm or greater, another HRS and an absolute indication for surgery management in BD-IPMN (in the latest Fukuoka guidelines2 and European guidelines,4 respectively), had the second highest DOR (7.93), the best AUC (0.95) and specificity (0.98), and extremely low sensitivity (0.14), indicating that it had a diagnostic value as an indicator of HGD/malignancy approximately equal to that of enhanced solid component/MN. In addition, an MPD diameter of 5 mm or greater showed a low DOR of 3.69 and an AUC of 0.67 (sensitivity, 0.59; specificity, 0.75). Our study is in accordance with the latest Fukuoka and European guidelines regarding the role of MPD diameter cutoffs in risk prediction.

Another four WFs in the 2017 Fukuoka guidelines, namely, lymphadenopathy (newly added), abrupt change in MPD caliber with distal pancreatic atrophy (another change in MPD), thickened/enhanced walls, and cyst size of 3 cm or greater significantly associated with malignant BD-IPMN (DOR, 1.98–4.84) showed variable diagnostic values. Despite the high AUC (0.89) and specificity (0.97) of lymphadenopathy, its sensitivity was the lowest (0.09).

These results are consistent with those of a validation study22 involving 350 patients with BD-IPMN, which showed that lymphadenopathy was a significant predictor of HGD/malignancy in the univariable analysis, with a sensitivity of 0.07 and a specificity of 0.97. With approximately five times the increased risk of HGD/malignancy (DOR, 4.84), the diagnostic value of lymphadenopathy for predicting HGD/malignancy in BD-IPMN remains to be validated further. Abrupt change in MPD caliber with distal pancreatic atrophy and thickened/enhancing walls showed similar DORs (2.65 and 2.34, respectively) and diagnostic accuracy (AUC, 0.67 and 0.59; specificity, 0.92 and 0.93, and sensitivity, 0.20 and 0.17, respectively). Cyst size of 3 cm or greater had the lowest DOR (1.98) and a poor absolute diagnostic value (with the lowest specificity of 0.62).

Our results were in accordance with those of a previous meta-analysis7 that reported a pooled specificity of 0.64. Compared with the 2006 Sendai guidelines5 that used a cyst size of 3 cm or greater as an indication for surgical resection, the revised guidelines placed less emphasis on cyst size. The AGA3 and Fukuoka guidelines2,6 recommended it as a WF for observational follow-up evaluation, whereas the European guidelines4 suggested a more conservative size of 4 cm or greater as a relative indication for surgery. We suggested that more than four imaging features should be used as WFs, as recommended by the Fukuoka guidelines.

This study had several limitations. First, because various imaging methods (CT, MRI/MRCP, or EUS) were used across different studies, the imaging features of suspected HGD/malignancy were analyzed with integration across various methods. Measurement errors may have occurred because individual imaging features obtained by different diagnostic methods across studies may have been affected by the preferences of institutions for specific diagnostic methods. Because only a small number of studies were available for comparison of diagnostic performance, the diagnostic accuracy of specific imaging methods in predicting HGD/malignancy in BD-IPMN could not be determined in this study. Furthermore, the studies included in this meta-analysis were conducted from 1984 to 2017, and imaging technologies advanced dramatically during this period. Therefore, better images were provided by the current techniques than by the earlier techniques, introducing heterogeneity of the imaging features across this period.

Second, because only surgically resected BD-IPMN was included in our study, the study population did not reflect the entire spectrum of tumors. Because the risk of spectrum bias was introduced, suspicious imaging features might be overrepresented for the resected patients compared with those who remained under surveillance.

Third, our results cannot quantitatively predict HGD/malignancy in patients with BD-IPMN. A more sophisticated quantitative system to enable the determination of individual risk would be useful in personalizing the treatment of patients with BD-IPMN, such as the development of a preoperative nomogram/risk score for predicting HGD/malignancy and its validation for clinical efficacy.

Fourth, with regard to the fact that invasive carcinoma behaves very differently than HGD carcinoma (HGD BD-IPMN are much closer to the behavior of MGD and LGD, especially after resection), HGD and invasive carcinoma should be analyzed separately. However, such separated analyses were not performed due to the limited data obtained from the studies identified.

Conclusion

The following imaging features were significantly associated with HGD/malignancy in BD-IPMN: enhanced solid component/MN, an MPD diameter of 10 mm or greater, solid component, lymphadenopathy, MN, an MPD diameter of 5 mm or greater, abrupt change in MPD caliber with distal pancreatic atrophy, thickened/enhancing walls, and cyst size of 3 cm or greater. However, these imaging features should be weighted differentially in predicting HGD/malignancy to facilitate appropriate management. The presence of enhanced solid component/MN and an MPD diameter of 10 mm or greater were the most highly suspicious features for HGD/malignancy in BD-IPMN and should be considered as indications for surgery.

Unfortunately, due to incomplete data, the cutoffs of enhanced solid component/MN in HGD/malignancy risk prediction could not be confirmed in this study. Besides, our study further proposed applying the remaining seven imaging features as indications for close observation, further evaluation, or both. Solid component had a lower effectiveness in risk prediction than contrast-enhanced imaging, so we recommend that the new European guidelines regarding solid component should be used with caution. Further studies assessing multiple parameters including clinical, imaging, cytologic, and molecular indicators in a sophisticated quantitative system could better risk-stratify and improve the management of BD-IPMN.