Introduction

Acute appendicitis (AA) is the most common surgical emergency in children [1]. Over a 1-year period, one child out of 1,000 will undergo emergency appendicectomy [2]. AA is one of the few surgical diagnoses that is made clinically, and appendicectomy remains an operation that is performed without certainty of the definitive diagnosis. Despite a recent increase of knowledge concerning AA, accurate diagnosis has remained suboptimal. Diagnostic imprecision is reflected in the high appendiceal perforation rate of 17–33% [1, 3] and in the unnecessary appendicectomy rate of 3–54% [46].

Clinical scoring systems for adults have been developed to increase the diagnostic accuracy and decrease the unnecessary appendicectomy rate. Some developers of the diagnostic scores have suggested a decrease of unnecessary appendicectomy rate of up to 50% [710]. However, diagnostic scores abstracted from adults’ data have not been found to be useful in children [8, 11], and only a few studies have addressed the issue of a clinical scoring system unique to children with suspected AA [12, 13]. To our knowledge, no score has been evaluated in prospective clinical trials in children.

We have previously constructed and validated a diagnostic score for AA in children [13]. The purpose of this prospective controlled study was to determine whether diagnosis by using the appendicitis score improves clinical outcomes for children with suspected appendicitis. The main outcome measures were the diagnostic accuracy (primary endpoint) including unnecessary appendicectomies, duration of hospital stay and adverse events, including delayed treatment in association with perforation.

Patients and methods

Patients

The study was conducted at Kuopio University Hospital between January 2005 and January 2007. The trial was approved by the Local Ethics Committee and was conducted in accordance with the Declaration of Helsinki. Children aged 4–15 years presenting at the Emergency Department with suspected AA were included. The diagnostic criteria of AA were those set by the World Organization of Gastroenterology Research Committee [14]. Patients with a history of previous appendicectomies and those with abdominal trauma or hernia were excluded. Each child could enter the study only once. Children who were old enough were assented and parents gave written informed consent.

The appendicitis score

The appendicitis score was constructed and validated in our previous study [13]. The scoring model included nine variables for predicting AA: gender (male 2 points, female 0 point), intensity of pain (severe 2 points, mild or moderate 0 point), relocation of pain (yes 4 points, no 0 point), pain in the right-lower abdominal quadrant (RLQ) (yes 4 points, no 0 point), vomiting (yes 2 points, no 0 point), body temperature (≥37.5°C 3 points, <37.5°C 0 point), guarding (yes 4 points, no 0 point), bowel sounds (absent, tinkling or high-pitched 4 points, normal 0 point) and rebound tenderness (yes 7 points, no 0 point). The appendicitis score has a minimum of 0 point and a maximum of 32 points, and it is used to predict the presence or absence of AA. The cutoff level for AA is ≥21 points, which corresponded to high probability of AA, and the cutoff level for non-AA is ≤15, at which the probability of AA is low. Therefore, by choosing the two cutoff points in the score, the patients could be divided into three groups: non-AA group (low probability of AA—amenable to discharge), observation group (intermediate probability of AA—necessitating further observation) and AA group (high probability of AA—justifying emergency appendicectomy).

Study design

In the present study, a randomised parallel group and prospective study design was used to evaluate the use of the appendicitis score on diagnostic accuracy and final clinical outcomes in children with suspected AA. The study nurses assessed all children with suspected appendicitis for eligibility. After obtaining written informed consent, the children were assigned to either the appendicitis-score group or the no-score group. Randomisation was performed with consecutively numbered sealed opaque envelopes containing a random number.

All 31 general surgeons working at the Emergency Department were briefed on the abdominal examination technique and the use of the appendicitis score. The attending surgeon indicated a provisional diagnosis (AA, non-specific abdominal pain or other), a differential diagnosis and a provisional disposition (discharge, observation or operation) in all children.

The same surgeon re-examined the child at 3 h after the initial examination. The surgeon again indicated a provisional diagnosis, a differential diagnosis and a provisional disposition in all children. If the diagnosis and final disposition could not be established at 3 h, the patient was re-evaluated at 6 h and, if necessary, at 9 and 12 h. All patients, for whom discharge was decided, were kept in hospital for six, nine or, if necessary, more hours.

In the appendicitis-score group, the decision to operate was based on the use of the diagnostic scoring system. The surgeon recorded the variables of the appendicitis score and calculated the sum of the score for each child. All variables of the score were repeatedly recorded at 3 h and, if necessary, at 6, 9 and 12 h after the initial examination of the patient. Those children who were interpreted as having positive abdominal guarding or rebound test were recommended to be observed/operated on, even if the sum of the appendicitis score was ≤15. In the no-score group, the decision to operate was based on overall clinical assessment and the laboratory tests (C-reactive protein, leucocyte count and urine sample). No imaging studies were used.

All children except four were taken to the paediatric surgery ward and followed up for hospital course, discharge diagnosis and adverse events. Children for whom a definite diagnosis was not obtained were followed up until symptoms resolved spontaneously. The histological criterion for AA was an inflammatory reaction with polymorphonuclear leucocytes in the mucous layer of the appendix and oedema [14]. Patients with no pathologic evidence of appendiceal inflammation were defined as having undergone unnecessary appendicectomies. The appendix was considered perforated if the surgeon’s operative dictation reported perforation. All patients were followed up by a telephone call at 4 weeks.

Outcome measures

The main outcome measures were the diagnostic accuracy (the primary endpoint) and adverse events between the appendicitis-score group and the no-score group. We compared the clinical examination sensitivity (the ability to diagnose AA), specificity (the ability to diagnose non-appendicitis), positive predictive value (the proportion of the patients with AA who were correctly diagnosed), negative predictive value (the proportion of the patients with non-appendicitis who were correctly diagnosed) and diagnostic accuracy (true cases of AA and true cases of non-appendicitis as a proportion of all results) in the two groups at baseline and at 3, 6, 9 and 12 h after the initial evaluation of the child. Sensitivity was calculated as a/a + c, specificity as d/b + d, positive predictive value as a/a + b, negative predictive value as d/d + c and diagnostic accuracy as a + d/a + b + c + d, where a represents true AA, d represents true non-appendicitis, b represents false-positive decisions and c represents false-negative decisions.

Statistical analysis

No formal sample size calculation was performed, but a cohort of 60 patients in each group was considered sufficient to show any significant difference between the two groups. To compare the two study groups, we analysed continuous variables by means of a two-tailed t test for two independent samples. For categorical variables, we used the χ 2 test and Fisher’s exact test. A two-sided P value ≤0.05 was considered statistically significant. All analyses were performed using a statistical programme (SPSS for Windows 14.0, SPSS, Chicago, IL, USA).

Results

A total of 126 children were enrolled and randomised to be included to either the appendicitis-score group (N = 66) or the no-score group (N = 60). The baseline characteristics of the patients in the two groups were similar in terms of age, gender, weight, height and laboratory findings (Table 1).

Table 1 Patients’ characteristics

The diagnostic accuracy was significantly greater in the appendicitis-score group compared with that in the no-score group (92% vs 80%; P = 0.04). A significantly higher rate of unnecessary appendicectomies was observed in the no-score group compared with that in the appendicitis-score group (29% vs 17%; P = 0.05). Following repeated clinical examination, the diagnostic accuracy was significantly improved in each group, 74% vs 92% in the appendicitis-score group (P = 0.01) and 67% vs 80% in the no-score group (P = 0.01) (Table 2). A post hoc power analysis showed that, with the obtained sample size and effect size, the study had a 50% probability to show a significant difference between the two groups with a two-sided significance level of 0.05.

Table 2 Diagnostic sensitivity, specificity, and accuracy in the appendicitis-score group and in the no-score group. Values are percentages (patients)

A total of 122 children were taken to the hospital ward. Four children in the appendicitis-score group had self-limited abdominal pain, and they were discharged from the ED. Emergency appendicectomy was performed in 67 children, in 29 children in the appendicitis-score group and in 38 in the no-score group. For 59 children, 37 children in the appendicitis-score group and 22 in the no-score group, who did not have AA, the abdominal pain resolved spontaneously before a definitive diagnosis was established. One child in the appendicitis-score group underwent unnecessary appendicectomy and she was postoperatively diagnosed as having right-sided pneumonia. She was operated on despite the fact that the appendicitis score would have suggested discharge.

In a total, unnecessary appendicectomy was performed in 5 out of 29 children in the appendicitis-score group and in 11 out of 38 children in the no-score group (diff. 12%, 95% confidence interval of diff: −9% to 33%, P = 0.05). At the time of admission, AA was indicated as a provisional diagnosis in 33 out of 66 children in the appendicitis-score group and in 40 out of 60 children in the no-score group. After the initial examination, the decision to operate was made in 16 children in the appendicitis-score group and in 31 children in the no-score group. Of those 47 children, one out of 16 children in the appendicitis-score group underwent unnecessary appendicectomy compared with eight out of 31 children in the no-score group. Following repeated clinical examination, the provisional diagnosis of AA was changed to the final diagnosis of non-specific abdominal pain in eight children in the appendicitis-score group and in six children in the no-score group. Eight children (four in each group) with final diagnosis of AA were initially diagnosed as having non-specific abdominal pain. All except one were finally operated on for non-perforated appendicitis. One girl in the no-score group was discharged, but she was later re-admitted and operated on for a perforated appendix. One boy in the appendicitis-score group was taken to the hospital ward for suspected AA, but his abdominal pain resolved spontaneously and he was discharged. Two months later, he again experienced acute abdominal pain and, at that time, he was operated on for uncomplicated appendicitis.

By strictly following the diagnostic guidelines of the appendicitis score, the initial diagnostic accuracy of 86% would have been obtained. The appendicitis score would have initially suggested discharge in three children with eventual AA and operation in two children with non-specific abdominal pain as their final diagnosis. The final appendicitis score would have yielded in a diagnostic accuracy of 88%. By repeated application of the appendicitis score, three children with non-specific abdominal pain would have been operated on, and six children with non-perforated appendicitis would have been observed.

There were no differences between the appendicitis-score group and the no-score group in terms of hospital stay, histological findings of the appendices, rate of complications and final diagnosis. There was no mortality, and all patients healed eventually. One patient in each group had a superficial postoperative wound infection, which was cured by local debridement (Table 3).

Table 3 Outcomes between the two groups

Discussion

In this randomised clinical trial, the previously constructed and validated appendicitis score was used as a diagnostic aid to differentiate AA from non-surgical abdominal pain. Diagnostic accuracy after the repeated use of the appendicitis score was significantly improved compared with that obtained by unstructured clinical examination. Clinical examination sensitivity was 100% in the appendicitis-score group, and no child with AA was discharged before definitive treatment. Moreover, the rate of unnecessary appendicectomy was significantly reduced in the appendicitis-score group compared with that in the no-score group.

In the present study, no imaging techniques were used. However, imaging techniques such as ultrasonography and computed tomography have been suggested to increase the diagnostic accuracy and decrease the rate of unnecessary appendicectomy in children. Three prospective studies using ultrasonography documented a sensitivity of 88–93% and specificity of 96–97% in diagnosing AA [1517], and the authors of these studies recommended the use of ultrasonography as an adjunct in equivocal cases. In contrast, two studies have shown that there would be no role for ultrasonography where clinical evidence is convincing, given the known false-negative rate of ultrasonography and the knowledge that the technique may delay surgical treatment [18]. The main disadvantage is that ultrasonographic examination is operator-dependent, and thus the technique requires considerable training and experience [19]. Two retrospective studies of focused helical computed tomography suggested sensitivity of 95–97% in diagnosing AA [20, 21]. Disadvantages of computed tomography include potential anaphylactoid reaction if intravenous contrast is used and radiation exposure. Recent reports have shown that computed tomography in children is related with the one in 1,000 risk of malignancy developing in later life [22, 23].

Several authors have created diagnostic scoring systems in which a finite number of clinical variables is elicited from the patient and each is given a numerical value [7, 8, 12, 24, 25]. The sum of these values has been used to predict the likelihood of AA. All developers of the diagnostic scores have reported promising results in adults, and some have suggested a decrease of the unnecessary appendicectomy rate of up to 50% [7, 8]. Diagnostic scores have been applied on children with varying success. In one prospective study, the use of the Alvarado score [8] decreased a false-positive appendicectomy rate of 44% to 14% [10]. Dado and co-workers [26] tested retrospectively a modified Lindberg’s score [24] and showed that the scoring system could have reduced unnecessary surgery from 23% to 8%. Accordingly, in the present study, the unnecessary appendicectomy rate was 29% in the control group, compared to 17% among those children where the appendicitis score was applied.

To our knowledge, only a few authors have addressed the issue of a diagnostic score unique to children with suspected appendicitis [12, 13]. Samuel [12] constructed a diagnostic score comprising eight variables: cough–percussion–hopping tenderness in the RLQ, anorexia, pyrexia, nausea, tenderness in the RLQ, leucocytosis, polymorphonuclear neutrophilia and relocation of pain. The author validated the diagnostic score in a separate test sample, resulting in a sensitivity of 100%, a specificity of 87%, a positive predictive value of 90% and a negative predictive value of 100% [12]. Schneider and co-workers [27] evaluated the performance of the Samuel score in a prospectively identified paediatric cohort and reported a sensitivity of 82%, specificity of 65%, negative predictive value of 88% and positive predictive value of 54%. The authors concluded that the score could not be used in clinical practice as the sole method for determination of the need for appendicectomy.

In the present study, the diagnostic score comprised nine variables. In contrast to other scores [7, 8, 12, 24, 25], no laboratory test was included in the appendicitis score. Therefore, we were able to use the score as a diagnostic aid repeatedly at 3 h and, if necessary, at 6, 9 and 12 h after the initial examination of the child. Diagnostic accuracy was improved from 80% in the control group to 92% with the repeated use of the appendicitis score. On the other hand, this improvement may have been related to the fact that structured history and data collection sheets were used repeatedly in the appendicitis-score group. The attending surgeons were able to focus on the most important symptoms and signs that are indicative of AA. It is known that the diagnosis of AA may not become clear, in a minority of patients, until some hours, or even days, after the onset of symptoms, and delay often ensues before a definitive diagnosis is established [28].

We observed an extraordinary high rate of unnecessary appendicectomies (29%) in the no-score group. This finding is related to the difficulty in diagnosing AA in children. Accurate information can be elicited in older children, but children under 10 years old cannot properly express themselves and this inability to verbally describe symptoms may lead to incorrect interpretation of the clinical signs [4]. Furthermore, there exist several non-surgical abdominal conditions mimicking appendicitis, and in the first few hours of acute abdominal pain, it can be difficult to distinguish children who have AA from those who do not have AA. It is also known that the diagnosis of AA is difficult in adolescent girls because gynaecological problems, such as ovarian cysts and dysmenorrhea, can produce symptoms that mimic appendicitis [14]. The high rate of unnecessary appendicectomies in the no-score group might also be explained by the fact that general surgeons in our hospital system mainly treat adults.

There were five unnecessary appendicectomies in the appendicitis-score group and 11 in the no-score group. One child in the appendicitis-score group underwent unnecessary appendicectomy despite the fact that the sum of the appendicitis score was 14 points, indicating discharge. The child was interpreted as having persistent guarding in the RLQ, and she was operated on for suspected appendicitis. After the operation, the child developed respiratory symptoms and she was diagnosed as having right-sided pneumonia. One child in the appendicitis-score group and eight children in the no-score group underwent unnecessary appendicectomy after the initial clinical examination. Moreover, there were no cases of missed appendicitis in the appendicitis-score group, compared with one case in the no-score group. This difference may reflect the fact that we have recommended those children with the presence of abdominal pain in the RLQ, rebound or guarding to be observed in the hospital, even if the sum of the appendicitis score is ≤15.

In conclusion, this study shows that the use of the appendicitis score can reduce the unnecessary appendicectomy rate in general surgeons treating children with suspected appendicitis. The appendicitis score can be used as a diagnostic aid, but it cannot supplant detailed clinical judgment. The appendicitis score can be integrated into the diagnostic process, in which children with equivocal diagnosis are re-assessed at certain intervals.