Introduction

Differentiating acute pelvic inflammatory disease (PID) from acute appendicitis (AA) in premenopausal women with acute lower abdominal pain and inflammatory syndrome often poses a clinical diagnostic dilemma. Although laparoscopy has been used as a PID diagnosis reference standard, it is not yet recommended to assess early and mild stages of PID [1, 2]. However, the clinical diagnosis of PID is inaccurate [3, 4]. Cervical or uterine motion tenderness, adnexal tenderness, elevated CRP and leukocyte count are key features suggesting PID, but they can also be found in AA [5,6,7]. It is important to differentiate these entities early in the course of the disease, since incorrect diagnosis precludes proper management and delayed diagnosis of PID increases the risk of tubal infertility and ectopic pregnancy [8, 9], whereas undiagnosed appendicitis may be complicated by peritonitis. Indeed, PID treatment is primarily based on broad-spectrum antibiotic therapy, whereas AA treatment requires surgical appendectomy.

Pelvic ultrasound is the initial investigation of choice in most reproductive-age women presenting with acute pelvic pain [10, 11]. Although computed tomography (CT) is not the primary imaging method for assessing patients with gynaecologic pathology, it may be initially performed due to the nonspecific nature of the presenting signs and symptoms, or when ultrasound results are equivocal, especially if AA is suspected [10].

Two previous case–control studies [12, 13] were focused on CT features of acute mild PID and demonstrated that pelvic fat haziness, tubal thickening and hepatic capsular enhancement in the arterial phase were associated with acute PID. In these studies, the control groups consisted of patients with various causes of abdominal pain, including pyelonephritis, pancreatitis and ureter stones. There is generally no consistent overlap between the CT features of these conditions and the CT features of acute PID. The specificities of CT signs for predicting acute PID may thus have been overestimated in these studies due to the heterogenous control groups. Indeed, most CT findings described as specific for acute PID can also be encountered in AA, especially if the appendix is located in the pelvis region. Conversely, although reported CT signs of AA, including appendiceal enlargement and mesenteric fat stranding, are highly accurate [14,15,16,17,18], these same signs can also be found in alternative conditions that can clinically mimic AA, especially PID, and differential diagnosis based on imaging can thus be equivocal. Two previous studies specifically investigated CT features that could potentially differentiate right-sided tubo-ovarian abscess (TOA) from AA [19, 20], but these studies excluded PID cases without abscess formation. To our knowledge, CT signs that can differentiate acute PID (complicated or not) from AA have not been systematically investigated.

The aim of our study was therefore to construct a decision tree based on CT findings to differentiate acute PID from AA in women of childbearing age with acute lower abdominal pain and inflammatory syndrome.

Material and methods

Patient selection/reference standard

This retrospective, single-institution comparative study was approved by our institutional review board, and informed consent was waived owing to the retrospective nature of the study. A computerised search of the diagnostic database at our urban university hospital was performed to identify adult (at least 15 years old) non-menopausal women: (1) consecutively admitted to the general or gynaecological emergency department from January 2005 to October 2015 and presenting with acute lower abdominal pain, (2) who had undergone abdominopelvic contrast-enhanced helical CT examination, and (3) who were subsequently diagnosed with acute PID (N70–N74) or acute appendicitis (K35-K37) according to International Classification of Diseases-10 (ICD-10) codes.

The diagnostic reference standard of acute PID was based on either laparoscopic findings consistent with PID or clinical signs with laboratory evidence of PID in cases where no surgery had been performed [21, 22].

The diagnostic reference standard of AA was based on surgical and histopathological reports in the absence of surgical evidence of PID. Cases with pathological evidence of isolated appendiceal serositis without mucosal involvement were not eligible for the AA group.

Finally, 136 patients fulfilled the diagnostic criteria for acute PID and 391 patients fulfilled the diagnostic criteria for AA (Fig. 1).

Fig. 1
figure 1

Flowchart depicting the two groups of patients. AA acute appendicitis, PID pelvic inflammatory disease

One author (E.P.) reviewed the electronic medical records of these patients to record some clinical findings and constituted the final groups as follows:

Among the 136 PID patients, 27 were excluded for the reasons mentioned in Fig. 1. The remaining 109 patients constituted the final PID group (age range: 15–57 years).

Among the 391 AA patients (age range: 15–99 years), all patients older than 57 years (n = 75) were excluded. Among the remaining 316 patients (age range: 15–57 years), 11 were excluded (Fig. 1).

For each PID patient, two age-matched control subjects were selected among the AA patients as follows: (1) The 305 AA subjects were sorted according to their age. AA subjects of the same age were sorted according to the date of the CT examination, starting with the most recent CT. (2) For each PID case, the author (E.P.) manually selected the first two women with an age similar to that of the control subjects. This procedure was repeated to obtain 218 control subjects.

The final study population thus consisted of 327 patients.

CT technique

CT was performed using a 64-detector row scanner (LightSpeed VCT; GE Healthcare, Milwaukee, WI, USA) at 120 kVp, with dose-modulation software to determine the mAs value on the basis of body weight (noise index, 20; 130–700 mA). The CT images were reconstructed at 3-mm section thickness in the axial plane, with 0.625-mm native images available for interpretation. All patients underwent CT during the portal venous phase (delay, 70–80 s) with intravenous contrast material (iohexol, Omnipaque 300, GE Healthcare; or iobitridol, Xénétix 350, Guerbet, Aulnay-sous-Bois, France), administered with a power injector at 2–3 ml/s. No oral or rectal contrast material was administered.

Image interpretation

Two radiologists (F.C.D. and K.E.) with 14 and 5 years of experience, respectively, in abdominal imaging independently reviewed all CT scans in random order at a specialised image-archiving and communication system unit (Centricity PACS; GE Healthcare). The radiologists received training in recognising the evaluated CT findings on the basis of ten CT images obtained in patients who were not included in the final study.

Interpretation discrepancies were resolved by consensus by a third radiologist (I.M.) with 10 years of experience in abdominal imaging.

All the reviewers were blinded to the clinical and surgical outcomes, as well as to the initial imaging reports, but they were aware that all CT examinations involved patients with either PID or AA.

The images were specifically evaluated for the presence of CT findings that were expected to help in the differential diagnosis between AA and acute PID, noted as follows:

  1. (a)

    appendiceal outer wall-to-outer wall diameter [14,15,16,17,18, 22];

  2. (b)

    periappendiceal fat stranding [14,15,16,17,18, 22] (Fig. 2);

  3. (c)

    appendix contents [15, 18, 22, 23] (Fig. 2);

  4. (d)

    tubal thickening [12, 13], considered as moderate if 5 mm ≤ axial tubal diameter < 10 mm, and as marked if axial tubal diameter ≥ 10 mm, and/or in the presence of fluid contents within the fallopian tube (Figs. 2, 3 and 4);

  5. (e)

    anterior pelvic fat stranding [12, 13] (Fig. 5), qualified as symmetrical or asymmetrical;

  6. (f)

    uterine serosal enhancement [13] (Fig. 5);

  7. (g)

    inner myometrial enhancement [12, 13] (Fig. 5);

  8. (h)

    intraperitoneal pelvic fluid;

  9. (i)

    intraperitoneal extrapelvic fluid, qualified as moderately or very abundant;

  10. (j)

    pelvic peritoneal enhancement [13] (Fig. 4);

  11. (k)

    thickening of the uterosacral ligaments;

  12. (l)

    obliteration of presacral and perirectal fascial planes;

  13. (m)

    loss of definition of the uterine border;

  14. (n)

    ileo-caecal lymph node(s) ≥ 5 mm.

Fig. 2
figure 2

Transverse portal venous CT image in a 17-year-old woman with pathologically proven non-perforated acute appendicitis shows a fluid-filled, enlarged appendix (appendiceal diameter: 11 mm) and periappendiceal fat stranding. The left fallopian tube was not identified in this patient

Fig. 3
figure 3

Portal venous CT images in a 21-year-old woman with surgically proven bilateral salpingitis and pelvic peritonitis. (A) The axial image shows an air-filled appendix with a 6-mm diameter (arrow). (B) The coronal image shows a tubular structure in an adnexal location corresponding to the left fallopian tube (arrows), with a transverse diameter of 10 mm

Fig. 4
figure 4

Portal venous CT images in a 23-year-old woman with surgical confirmation of PID. (A) The axial image shows a retrocaecal appendix, with a 7-mm diameter without intraluminal air. Note the presence of free fluid in the right and left paracolic gutters. (B) The more caudal axial image shows marked bilateral tubal thickening (arrows) and pelvic peritoneal enhancement (arrowhead). Note the presence of free fluid in the pelvis (*)

Fig. 5
figure 5

(A) Transverse portal venous CT image in a 41-year-old woman with surgically proven salpingitis shows symmetrical anterior pelvic fat stranding (white arrowheads), uterine serosal enhancement (arrow) and inner myometrial enhancement (black arrowhead). (B) Coronal portal venous CT reformation in a 19-year-old woman with clinical and bacteriological evidence of pelvic inflammatory disease shows uterine serosal enhancement (arrow) and inner myometrial enhancement (arrowheads)

Statistical method

Interobserver agreement for all qualitative CT findings was determined with the k statistic and was classified as follows: k = 0–0.20, slight agreement; k = 0.21–0.40, fair agreement; k = 0.41–0.60, moderate agreement; k = 0.61–0.80, substantial agreement; and k = 0.81–1.00, almost perfect agreement. Weighted kappa was used for variables with more than two choice classes. CT consensual reading results were used for all the following statistics.

Continuous variables were compared between PID and AA groups using a t-test or Wilcoxon-Mann–Whitney rank-sum test according to their distribution (parametric or nonparametric, respectively) and categorical variables were compared using a Chi-square test or Fisher’s exact test, as appropriate.

To generate a clinical guideline that would allow PID and AA differentiation, a decision tree was obtained with the classification and regression tree (CART) algorithm, including all CT findings as predictor variables. CART analysis identified the choice of the best splitting parameters to formulate diagnostic criteria to differentiate PID from AA and automatically calculated optimal cutoff points for continuous variables, if necessary. The decision tree model was developed using a two-step approach to simulate the radiological decision process to predict the binary outcome. The accuracy of this model was calculated with its 95 confidence interval.

Computer software packages (SAS, version 9.3; SAS Institute, Cary, NC, USA and R, version 3.0.2; R Foundation for Statistical Computing) were used to perform the statistical analyses.

Results

Clinical findings

The final study group consisted of 327 patients with a median age of 28 years (range, 15–57 years; interquartile range [IQR], 22–39 years).

An intrauterine device (IUD) was in place or had been removed in the previous 15 days in 34 patients of the PID group (31.2 %) and in 15 patients of the AA group (6.9 %) (p < 0.0001).

Reference standard

In the PID group, a surgical diagnosis was obtained for 49 patients (45 %) and a diagnosis based on laboratory findings was obtained for 60 patients (55 %), within 5 days after diagnostic CT.

The microbiological sampling procedures and bacteriological results of the PID cases are detailed in Table 1.

Table 1 Bacteriological sampling procedures and bacteriological results in the pelvic inflammatory disease (PID) group

All the patients of the AA group underwent laparoscopic appendectomy within 24 h after diagnostic CT. All AA cases were confirmed by pathological findings. The surgical report described non-complicated AA in 208 cases (95.9 %) and complicated AA with abscess formation in nine cases (4.1 %).

No patient in either group (AA or PID) returned to the same hospital for abdominal pain within the 6-month follow-up.

Univariate comparative CT findings

The frequencies of the CT findings in the two groups (AA and acute PID) are shown in Table 2.

Table 2 Univariate comparative CT findings between acute pelvic inflammatory disease (PID) and acute appendicitis (AA)

Appendiceal diameter, periappendiceal fat stranding and fluid-filled appendix were significantly associated with AA. In the PID group, the median appendiceal diameter was 5 mm (IQR, 5–6 mm), whereas in the AA group, the median appendiceal diameter was 10 mm (IQR, 9–13 mm; p<0.0001). In five PID patients, the appendix could not be confidently identified on CT and so the CT findings related to the appendix were missing for these cases. These five women had all undergone surgery, with a macroscopically normal appendix being found; they were thus considered to have a normal appendiceal diameter on CT. The appendix was identified in all AA patients.

Right tubal thickening, left tubal thickening, anterior pelvic fat stranding, uterine serosal enhancement, inner myometrial enhancement, obliteration of presacral and perirectal fascial planes, loss of definition of the uterine border and pelvic peritoneal enhancement were significantly associated with acute PID. There were 124 women (38 %) with tubal thickening in the entire cohort; 95/124 (77 %) had PID and 29/124 (23 %) had AA. In the PID group, 14/109 (13 %) of the patients had no tubal thickening, while 189/218 (87 %) had no tubal thickening in the AA group.

CT decision-tree analysis

According to the CART algorithm, an appendiceal diameter with a 7-mm cut-off value and left tubal thickening with a 10-mm cut-off were the two most discriminating CT criteria for differentiating acute PID and AA (Table 3 and Fig. 6). The accuracy of this model was 98.2 % (95 % CI: 96–99.4).

Table 3 Cross-tabulation with significant CT findings in the CART analysis (left tubal thickening and appendiceal diameter) for the pelvic inflammatory disease (PID) and acute appendicitis (AA) groups
Fig. 6
figure 6

CT decision tree with the CART algorithm to differentiate acute appendicitis (AA) from pelvic inflammatory disease (PID)

Only one patient (0.5 %) among the 218 AA had an appendiceal diameter < 7 mm. In this patient, the appendix was measured at 6 mm but was fluid filled and there was periappendiceal fat stranding, whereas tubal thickening and anterior pelvic fat stranding were absent.

Ninety-one of the 109 PID patients (83.5 %) had an appendiceal diameter < 7 mm and, among those, 52 (57 %) had marked left tubal thickening (Fig. 3).

Among the 18 PID (16.5 %) patients with an appendiceal diameter ≥ 7 mm, nine (50 %) patients had a left tubal diameter ≥ 10 mm (Fig. 4) and nine (50 %) patients had a left tubal diameter < 10 mm.

Five PID (5 %) patients displayed normal CT scans, i.e. presenting with none of the investigated CT signs.

Reproducibility

The interobserver agreement for the CT findings is detailed in Table 2. The correlation coefficient was almost perfect for the appendiceal diameter. Agreement was substantial for periappendiceal fat stranding, appendix content, right and left tubal thickening, anterior pelvic fat stranding and free pelvic fluid, slight for thickening of the uterosacral ligaments and moderate for the other CT findings.

Discussion

Our study results show that only two CT findings (appendiceal diameter and left tubal thickening) were very accurate in differentiating PID from AA. Consequently, the first step in interpreting abdominopelvic CT in non-appendectomised women of reproductive age with acute lower abdominal pain is to measure the appendiceal diameter. If the appendiceal diameter is < 7 mm, AA is very improbable and thus CT findings of PID should be investigated. If the appendiceal diameter is ≥ 7 mm, left tubal thickening should be analysed, since a left tubal diameter ≥ 10 mm would indicate PID rather than AA.

To our knowledge, this is the largest consecutive cohort of acute PID investigated by contrast-enhanced CT. Two previous studies assessed CT findings in acute mild PID [12, 13], including 48 and 32 patients in the PID group, respectively. Our study differed from these previous studies by the use of a homogenous comparative group of AA patients. The specific investigation of CT signs enabling differentiation of acute PID and AA is justified on the basis that PID constitutes the principal differential diagnosis of AA in young women with acute lower abdominal pain and inflammatory syndrome [24]. Given that all AA patients had undergone laparoscopic appendectomy, the strength of our study design is that PID was surgically excluded in all patients of the AA group. That is of substantial interest for limiting classification biases. In fact, most control subjects in previous studies had not undergone any diagnostic reference test to reliably exclude PID but were diagnosed through clinical follow-up. Moreover, the diagnostic reference standard of PID in the previous studies was based on unspecific clinical findings and serum markers of inflammation, and laboratory documentation of a cervical infection with Neisseria gonorrhoeae or Chlamydia trachomatis was facultative. In order to enhance the specificity of the diagnostic reference standard, the PID diagnosis in the present study was based on either surgical confirmation or microbiological confirmation of the causative organism in the presence of an inflammatory syndrome. Furthermore, all cases of AA were confirmed by pathological examination of the removed appendix. Appendiceal serositis, as opposed to AA, is an inflammatory reaction of the surface of the appendix caused by an extra-appendiceal source of inflammation and cases of PID-associated appendiceal serositis have been reported [25]. Patients with pathological findings of appendiceal serositis were not eligible for the AA group in the present study. Consequently, the appendiceal origin of the inflammatory process was confirmed in all AA patients and none of the AA cases could be secondary to PID, thus avoiding classification biases.

We built a CT decision tree for PID/AA differentiation in order to integrate our results in clinical practice. CT readings can be difficult in young women given the frequently small amount of intra-abdominal fat. Whereas CT differentiation between right-sided TOA and peri-appendiceal abscess has been investigated in previous studies [19, 20], the CT signs for differential diagnosis between consecutive PID patients and an age-matched AA group have never been analysed.

We demonstrated that the first step in CT interpretation of women of reproductive age with acute lower abdominal pain should be to measure the appendiceal diameter. In our study population, the most reliable appendix diameter cutoff was 7 mm, which is the generally accepted cutoff for CT diagnosis of AA [14, 15, 17, 26]. However, normal appendix measurements have been reported up to 10 mm [15, 16, 18]. In the study of Yves et al., a diameter threshold of 8–9 mm was the most effective for maximising the sensitivity and specificity [16]. The relatively low appendix diameter threshold in our study might be related to a high percentage of mild appendicitis, as shown by the low proportion of complicated AA within the AA group (9 %). Our findings are in agreement with those of previous studies in demonstrating that the non-visualisation of the appendix reliably excluded AA [27, 28], since none of the five patients with a non-visualised appendix had AA in the present study.

Jung et al. demonstrated that thickened fallopian tubes were a predictive sign of PID, with sensitivity and specificity values of up to 58 % and 92 %, respectively [12]. We further differentiated moderately thickened tubes and markedly thickened or fluid-filled tubes, thus demonstrating that marked tubal thickening was more strongly associated with acute PID than moderate tubal thickening. In the present study, left tubal thickening was the second discriminating criterion, besides the appendiceal diameter, for differential diagnosis between AA and acute PID. The frequency of left tubal thickening in acute PID was higher than in the study of Jung et al., possibly due to the detection of subtle tubal thickening by the reviewing expert radiologists of our study. However, in 13 % of PID cases, no tubal thickening was identified on CT. This might occur in case of endometritis without concomitant salpingitis. Since our clinical and microbiological diagnostic reference standard did not allow differentiation between these two entities, our study sample may have included cases of endometritis without salpingitis, which represents an initial state of acute PID. The present study further demonstrated that fallopian tube thickening was not completely specific for acute PID diagnosis. Pelvic inflammatory processes of other origins, particularly pelvic appendicitis, can cause tubal thickening, especially right tubal thickening, by the spread of the inflammatory process from the appendix to the adjacent mesosalpinx. These considerations probably explain why right tubal thickening (moderate or marked), seen in 11 % of AA patients in the present study, was a less reliable predictor in the CART analysis than left tubal thickening (seen in 7 % of AA patients) for differentiating acute PID from AA.

In line with the authors of previous studies [12, 13], the present study demonstrated that anterior pelvic fat stranding, uterine serosal enhancement and pelvic peritoneal enhancement were significantly associated with PID. However, these signs represent nonspecific inflammatory changes in pelvic fat and peritoneum, probably explaining why they were not significant predictors in the CT decision-tree analysis for differentiating PID from AA.

In the present study, the presence of an IUD was significantly associated with PID. However, this result may be due to a selection bias, since PID cases in the presence of an IUD are more often complicated by abscess formation than PID cases without an IUD. Thus, PID patients with an IUD were more likely to undergo CT evaluation and to be included in our study. Consequently, we consider that the presence of an IUD cannot be considered as a useful finding to differentiate PID from AA.

Our study had several limitations. First, it was inherently limited by its retrospective design. Second, only women who had undergone CT examinations were enrolled in this retrospective study. This might have resulted in selection bias because more symptomatic women or more clinically difficult cases were more likely to be imaged with CT.

Third, we did not investigate CT in healthy women as a ‘control group’ because we considered that: (1) it did not correspond to a clinical issue in healthy women; and (2) vaginal or endocervical microbiological sampling procedures and bacteriological results would have been required for all women in order to exclude PID. Indeed, with PID, the diagnosis must be excluded because it could be pauci-symptomatic [1, 3, 4]. Hence, we chose to investigate CT findings to specifically differentiate PID from AA, because: (1) it represents the current clinical differential diagnosis in acute lower abdominal pain with inflammatory syndrome in women of childbearing age; (2) it can be a diagnostic challenge with CT, especially in thin women and/or when the appendix is located in the pelvis region; and (3) exclusion of PID was reliable in our study since surgery was performed in all patients in the AA group. Thus, we did not report the diagnostic performance of CT findings for PID since it did not make sense, as the ‘control’ group involved ill rather than healthy patients.

Fourth, the fact that the radiologists were aware that all CT examinations involved patients with either PID or AA might have resulted in interpretation bias. In cases where a pathological appendix was apparent, the reviewers might have underestimated PID-associated CT findings such as tubal thickening and pelvic fat stranding, whereas in cases where a normal appendix was visible, the reviewers were aware that the patient had PID and thus might have overestimated tubal thickening and pelvic fat stranding.

Finally, we are aware that the design of our study (case–control study) skewed the predictive values of the CT findings given the high proportion of patients with PID in our population (ratio 2 to 1 for AA vs. PID), which does not reflect the prevalence of these two conditions in daily practice. But it is unlikely that it has biased our main comparative analysis.

In conclusion, when investigating acute lower abdominal pain in women of childbearing age, the present study demonstrated that the appendiceal diameter and left tubal thickening were the most discriminating CT criteria for differentiating acute PID from AA, with an excellent accuracy.