Introduction

Suspected acute appendicitis is the most common reason for urgent laparotomies in children. One child in 500 will undergo emergency appendicectomy annually [1]. Acute appendicitis is one of the few surgical diagnoses that is made clinically, and appendicectomy remains an operation that is often performed without certainty of the diagnosis. The failure to diagnose and to treat the condition can lead to a progression of the disease, with associated morbidity and mortality. Delayed management of appendicitis is associated with prolonged hospitalization, a delay in the return to normal life, and an increased rate of perforation (34%–75%) [24], wound infection (0%–11%) [57], pelvic abscess (1%–5%) [57], and late intra-abdominal adhesions. On the other hand, 10%–30% of all patients undergo surgery unnecessarily, with a false positive diagnosis of appendicitis [6, 7].

Clinical and computer-aided scoring systems have been shown to increase the diagnostic accuracy and reduce unnecessary appendicectomies in adults [813]. However, diagnostic scores abstracted from adults’ data have not been found to be useful in children [8, 12], and, to our knowledge, only one study has addressed the issue of a prognostic scoring system unique to children with suspected appendicitis [14].

The present study was undertaken to create a prognostic score to improve diagnostic accuracy and to minimize unnecessary appendicectomies in children with suspected appendicitis. The score was constructed from a prospectively collected sample and further validated in a separate, prospective cohort. The unnecessary appendicectomy rate, potential perforation rate, missed perforation rate, and missed appendicitis rate for the model were calculated and compared with those of clinicians.

Patients and methods

Patients

The trial was conducted at Kuopio University Hospital, Kuopio, Finland. Children aged 4–15 years who presented at the Emergency Department (ED) with suspected acute appendicitis were included. The diagnostic criteria of acute appendicitis were those set by the World Organization of Gastroenterology Research Committee [15, 16]. Patients with a history of previous appendicectomies and those with abdominal trauma or obvious hernia were excluded. The attending surgeon decided which children had suspected cases of acute appendicitis.

In the first phase, 35 items of clinical data in 131 consecutive patients were prospectively recorded between December 1999 and November 2000 (Table 1). Four children were excluded for having surgical conditions other than appendicitis. The appendicitis score was constructed and the cut-off points determined for the presence and absence of appendicitis as the final diagnosis.

Table 1 Patients’ characteristics. Data are number of cases or mean (SD)

In the second phase the performance of the score was prospectively assessed on 109 non-consecutive children who presented between December 2001 and December 2003 (Table 1). The results of the scoring system were compared with the operative and histological findings and clinical outcome. The study was approved by the local ethics committee and was conducted in accordance with the Declaration of Helsinki. This report is part of our “Acute Abdomen in Children” study, and some results have already been published [17].

Construction of the appendicitis score

Altogether, 35 history variables and clinical findings were recorded with a predefined structured data sheet based on the modified abdominal pain chart of the World Society of Gastroenterology [15, 16]. To facilitate data analysis, and for ease of comparison between the two groups, we changed the multinomial and continuous variables to dichotomous variables. We used the χ2 test and Fisher’s exact test to compare groups with and without appendicitis (SPSS for Windows 10.0, SPSS, Chicago, USA). A P value ≤0.05 was considered statistically significant. Fifteen of the 35 variables analysed were shown to have no prognostic significance in differentiating between acute appendicitis and non-appendicitis (Table 2). The variable of the menstrual period was not included in the final calculation. Therefore, a backward stepwise binary logistic regression analysis was performed on 19 variables, which resulted in a model that included nine variables: gender, intensity of pain, relocation of pain, vomiting, fever, pain in the right lower quadrant (RLQ), guarding, bowel sounds, and rebound tenderness. According to the model, the probability of acute appendicitis (PAA) for an individual patient can be calculated as: 1/(1+exp(−z), in which z = gender (male 1.6, female 0) + intensity of pain (severe 2.4, mild or moderate 0) + relocation of pain (yes 3.6, no 0) + pain in the RLQ (yes 3.9, no 0) + vomiting (yes 1.8, no 0) + fever (yes 3.0, no 0) + guarding (yes 3.5, no 0) + abnormal bowel sounds (yes 4.1, no 0), + rebound tenderness (yes 6.6, no 0) −17.7(constant). The coefficients of the model were rounded to the nearest integer, which resulted in an appendicitis score (Table 3).

Table 2 Variables with no prognostic significance for differentiating between appendicitis and non-appendicitis. Data are presented as number (%) of cases
Table 3 Appendicitis score. Numbers are presented as points

The score had a minimum of zero points and a maximum of 32 points. The cut-off level for acute appendicitis (AA) was ≥21, which corresponded to an appendicitis probability of 100%, and the cut-off level for non-appendicitis (NA) was ≤15, at which the probability of appendicitis was zero. By choosing the two cut-off points in the appendicitis score, one could divide the children into three groups: NA group (low probability of appendicitis—amenable to discharge); observation group (intermediate probability of appendicitis—necessitating further observation); and AA group (high probability of appendicitis—justifying emergency laparotomy).

Validation of the appendicitis score

The appendicitis score was further assessed in 109 patients who presented at the ED with abdominal pain suggestive of appendicitis. The variables of the score were recorded by the attending surgeon at the time of admission (INITIAL SCORE) and 1 h after the first examination (END SCORE). The surgeon was not asked to express any probabilities but only to record the clinical data and state what he considered to be the most likely diagnosis (AA or NA). The decision to operate was based on overall clinical assessment and the laboratory tests (C-reactive protein, leukocyte count and urine sample).

The results of the scoring system were compared with the final diagnosis based on the operative and histological findings and clinical outcome. The criteria for rates of unnecessary appendicectomy, potential perforation, missed perforation, and missed appendicitis were determined. These performance criteria were used to assess the diagnostic accuracy of the appendicitis score:

  • Unnecessary appendicectomy rate was determined as the proportion of patients who did not have AA but who were assigned to the operation group. The rate was calculated as the number of patients with NA assigned to the operation group/number of patients in the operation group.

  • Potential perforation rate was defined as the proportion of patients with AA not assigned to the AA group. The rate was calculated as the number of patients with AA not assigned to the operation group/number of patients with AA.

  • Missed perforation rate was defined as the proportion of patients with a perforated appendix not assigned to the AA group. The rate was calculated as the number of patients with a perforated appendix not assigned to the operation group/number of patients with a perforated appendix.

  • Missed appendicitis rate was determined as the proportion of patients with AA assigned to discharge. The rate was calculated as the number of patients with AA assigned to the NA group/number of patients with AA.

Results

The validation sample consisted of 109 children (Table 1). Forty appendicectomies, based on clinical judgement, were performed. One child was initially misdiagnosed as not having appendicitis, but she was later operated on for a non-perforated appendix. One child was discharged but she was later re-admitted and operated on for a perforated appendix with localized peritonitis. For 79 children who did not have appendicitis, the abdominal pain resolved spontaneously before a definitive diagnosis was provided. When three children with other surgical conditions were excluded, the unnecessary appendicectomy rate was 27% (Table 4).

Table 4 Unnecessary appendicectomy rate, potential perforation rate, missed perforation rate, and missed appendicitis rate of the initial score and the end score compared with those of the clinical decision in detecting appendicitis. Values are percentages (numbers)

The classification according to the appendicitis score was compared with the final diagnosis of the children. The INITIAL SCORE would have suggested discharge in four children (15%) with acute appendicitis and appendicectomy in four children (13%) who did not have appendicitis. Twenty-four children, seven with acute appendicitis and 17 without appendicitis, would have been observed. Therefore, the INITIAL SCORE would have resulted in an unnecessary appendicectomy rate of 13%, a potential perforation rate of 41%, a missed perforation rate of 33%, and a missed appendicitis rate of 15% (Table 4).

By repeated application of the appendicitis score (END SCORE), three children (11%) with acute appendicitis would have been discharged, and four children (13%) who did not have appendicitis would have been operated on. Twenty-three children, six with acute appendicitis and 17 without appendicitis, would have been observed. The END SCORE would have yielded in an unnecessary appendicectomy rate of 13%, a potential perforation rate of 33%, a missed perforation rate of 0%, and a missed appendicitis rate of 11% (Table 4).

The unnecessary appendicectomy rate would have been reduced from 27% (clinical judgement) to 13% (END SCORE) by repeated application of the appendicitis score (Table 4). In contrast, the Appendicitis Score would have suggested discharge in three children (11%) with acute appendicitis. However, all of them had typical tenderness in the RLQ, and none of them was discharged before the definitive management.

The mean (SD) END SCORE was 21 (4.6) in children with appendicitis compared with 12 (5.4) in children without appendicitis (mean difference 9.5, 95% CI 7–12, P=0.001). The END SCORE was ≥21 in all three with a surgical condition mimicking appendicitis.

Discussion

In the present study, the stepwise multiple linear logistic regression analysis of 19 medical history and clinical attributes, and laboratory tests, yielded a diagnostic model that comprised six medical history variables and three clinical finding variables. In contrast to most scores [811, 1822], no biochemical test was included in the appendicitis score. The scoring system was then tested on the same patient group that it was devised from, and the cut-off levels for recommendation of surgery, observation, and discharge were defined. The score was validated on the test sample that was prospectively collected without actually using the predictions from the appendicitis score in clinical decision making. Our results suggest that the use of the appendicitis score may facilitate the diagnosis of acute appendicitis.

Several diagnostic scoring systems have been developed, characterized as non-invasive, user-friendly, cost-effective, and comprehensible to the physician (Table 5) [811, 14, 1823]. Initial assessment studies have reported an excellent performance for some of the diagnostic scores [8, 9, 19, 20, 23]. However, the ability of the scores to fulfill standardized performance criteria has varied. Ohmann and co-workers [24] have suggested the following performance criteria for scores in acute appendicitis: an unnecessary appendicectomy rate of less than 15%, a potential perforation rate of less than 35%, a missed perforation rate of less than 15%, and a missed appendicitis rate of less than 5%. Ohmann and co-workers [24] measured the performance of ten scores, and found that only the Alvarado score fulfilled all four criteria, and the Lindberg, the Fenyö and the Christian scores fulfilled two criteria each. However, if applied to a large German database, none of the scores fulfilled any of the performance criteria. The results of the scoring system are known to be better in the original surrounding than when tested in different clinical environments [24].

Table 5 Diagnostic scores for appendicitis. Comparison according to performance criteria. Values are percentages (numbers)

The clinical benefit of the scoring system integrated into the diagnostic process has been assessed in one prospective study [25]. The study was performed in two consecutive phases. In the first phase no additional diagnostic support was available, but in the second phase the diagnostic scoring system was used. Ohmann and co-workers [25] found that the diagnostic performance of the final examiner decreased with the score (specificity 86% vs 78%; diagnostic accuracy 88% vs 81%). However, the delayed appendicectomy rate (2% vs 8%) and the delayed discharge rate (11% vs 22%) were significantly lower with the score. The authors concluded that the score could not be recommended as a standard investigation method in diagnosing acute appendicitis.

Most diagnostic scoring systems have been originally developed for adult populations, and, therefore, the scores have been applied to children with varying success. In one prospective study the use of the Alvarado score [8] decreased a false-positive appendicectomy rate of 44% to 14% [26]. Dado and co-workers retrospectively tested a modified Lindberg’s score [9] and showed that the scoring system could have reduced unnecessary surgery from 23% to 8%, but 8% of children with appendicitis would have been discharged [27]. In contrast to these reports, some authors have claimed that clinical scoring systems would not contain variables that would allow for separation of appendicitis from the other conditions mimicking it in children [12, 28].

To our knowledge, only one study has addressed the issue of a diagnostic score unique to children with suspected appendicitis [14]. Madan prospectively assessed 1,170 children with acute abdominal pain suggestive of acute appendicitis and constructed a diagnostic scoring system comprising eight variables. These variables were cough/percussion/hopping tenderness in the RLQ, anorexia, pyrexia, nausea/emesis, tenderness in the RLQ, leukocytosis, polymorphonuclear neutrophilia, and relocation of pain. The predictive score was prospectively validated on 66 children and resulted in a sensitivity of 1.0, a specificity of 0.87, a positive predictive value of 0.90, and a negative predictive value of 1.0. Madan did not report whether the predictions from the scoring system were actually used in clinical decision making [14].

In the present study the application of the score to the separate test sample would have resulted in management errors in three (11%) children with a final diagnosis of simple AA and unnecessary appendicectomy in four (13%) children who did not have appendicitis. In addition to the unnecessary appendicectomy rate, our rates of potential perforation (33%) and missed perforation (0%) were well within the range of those performance criteria for the diagnostic scores suggested by Ohmann and co-workers [24]. One child with an end score of 15 had tenderness in the RLQ and positive rebound and guarding signs. She had three physical findings typical of acute appendicitis, but, in the absence of other signs, the end score remained at 15. Two children, with an end score of 7, had fewer symptoms and signs than other children with AA, and they were initially diagnosed as not having appendicitis. Each of them had tenderness in the RLQ at the ED, but guarding and rebound tenderness did not develop until 4–5 h after admission. On the other hand, the scoring system would have allocated six of ten children with unnecessary appendicectomy to observation and four to surgery.

Since the nine variables in the appendicitis score do overlap with non-surgical conditions, the score does not give 100% reliability. It is known that the diagnosis of appendicitis may not become clear, in a minority of patients, until some hours, or even days, after the onset of symptoms, and delay often ensues before an accurate diagnosis is established [10]. Thus, a decision to operate or discharge the patient cannot be based solely on initial scoring but must also be based on repeated structured clinical examination. Ultrasonography or diagnostic laparoscopy may provide an additional aid in diagnosing appendicitis, especially in adolescent girls after their menarche.

The appendicitis score can be used as a diagnostic aid, but it cannot supplant careful clinical judgment. The score should be integrated into the diagnostic process, in which children with uncertain diagnosis should be re-assessed, for example, at 3-h intervals. It may well be that for one cut-off point certain criteria are fulfilled but for others they are not. Therefore, the results of the scoring system depend on the selection of the cut-off point. The appendicitis score, combined with repeated clinical examination, would have probably reduced our unnecessary appendicectomy rate; therefore, the cut-off level (≥21) for recommendation of appendicectomy can be considered acceptable. Since the presence of abdominal pain in the RLQ, rebound or guarding are indicative of appendicitis, we would recommend that children with these findings should be observed in the hospital, even if the score is ≤15. We have planned to construct decision rules based on the predictions of the scoring system and to test these in a prospective controlled trial. Further testing in different clinical environments is essential to develop the scoring system for clinical use.

In conclusion, the appendicitis score, combined with repeated clinical examination, could be of help in the clinical diagnosis of appendicitis to reduce unnecessary appendicectomies in children. Caution and careful surgical judgment are advised for children with a low diagnostic score and physical findings typical of appendicitis.