Introduction

The estimated lifetime incidence of acute appendicitis is approximately 7 % [1]. Despite this, the majority of patients who present with acute right iliac fossa (RIF) pain do not have appendicitis and the scenario continues to pose a diagnostic challenge [2]. The Alvarado Score is the best known clinical prediction rule for estimating risk of appendicitis [3]. It is based on a combination of symptoms, signs and basic laboratory results and has been the subject of many validation studies [4, 5]. Its use in routine clinical practice is varied and limitations including overestimating the risk of appendicitis in women and children have been noted [6].

The Appendicitis Inflammatory Response score (AIR) is based along the same principles of the Alvarado score, assigning patients to low, medium or high probability of acute appendicitis [7]. It incorporates CRP as a variable in the score, a widely available laboratory test that has not shown sufficient sensitivity or specificity to be used as a stand-alone test to predict risk of appendicitis [8]. Initially developed on a retrospective cohort, the AIR score has been the subject of one retrospective validation study that confirmed an improved discriminating power when compared to the Alvarado score [9].

The aim of this study was to evaluate the AIR score and compare its performance in predicting the risk of appendicitis to both the Alvarado score and the clinical impression of a senior surgeon.

Materials and methods

All patients referred to the General Surgical team on call with acute RIF pain from 1st January to 1st July 2013 were included. Data were collected prospectively on a proforma completed by the surgical resident who assessed the patient in the Emergency Department. All parameters included in both the Alvarado and AIR scores were documented (Table 1).

Table 1 Comparison of parameters included in Alvarado and appendicitis inflammatory response (AIR) Scores

A consultant surgeon or senior resident (post graduate year 5 or greater) was asked to categorise each patient into either a low, medium or high risk probability group for appendicitis based on their initial assessment. The findings of any operative interventions were recorded, and a diagnosis of appendicitis was based on the histological finding of transmural inflammation. Advanced appendicitis was defined as the presence of either transmural gangrene or perforation on final histology.

At present, no clinical scoring system for estimating risk of appendicitis is used within the department. Decisions regarding radiological investigations or surgery are at the discretion of the consultant surgeon in charge of the care of the patient. The Alvarado and AIR scores were calculated and patients stratified into the relevant high, medium and low risk groups at the end of the study period. The original Alvarado score separates those with a likely diagnosis of appendicitis into ‘probably appendicitis (score 7 or 8)’ and ‘highly likely appendicitis (score 9 or 10)’ yet recommended surgery for all with a score of over 7 [3]. For the purposes of this analysis, any patient with an Alvarado score ≥7 was considered as ‘High probability of appendicitis’ to allow comparison with the AIR score and the clinical judgement both of which stratified to three groups.

Statistical analysis was performed using Minitab statistical software (Minitab Inc, PA, USA). Categorical variables were assessed using Fishers exact test. A p value of less than 0.05 was considered significant. The diagnostic performances were assessed on calculated values for sensitivity, specificity, positive and negative predictive values as well as area under the receiver operating characteristic (ROC) curves. Differences in sensitivity and specificity were calculated using the McNemar test.

Results

A total of 182 patients with a median age of 19 years were included (Table 2). Appendicitis was the final diagnosis in 37 % of patients. Only 10 patients over the age of 16 did not undergo either radiological imaging or surgery. Of patients proceeding to surgery, 86 % underwent a laparoscopic procedure.

Table 2 Basic demographics and outcomes of study cohort

The Alvarado score assigned a higher proportion of patients to the high probability for appendicitis group (45 %) than the AIR score (14 %) or clinical assessment by a senior surgeon (29 %) (p < 0.001) (Table 3). In those deemed high probability of appendicitis, the AIR score had considerably higher specificity (97 %) and positive predictive values (88 %) than the Alvarado score for predicting a diagnosis of appendicitis (Table 4).

Table 3 Patients grouping and correlation with final diagnosis of appendicitis as stratified by AIR score, Alvarado score and senior surgeon assessment
Table 4 Comparison of sensitivity, specificity, positive and negative predictive values of AIR score, Alvarado score and senior surgeon assessment for predicting acute appendicitis

Assessment by either scoring system or clinical judgement alone assigned similar proportions of patients to the low probability of appendicitis group (p = 0.615). The number of patients in the low risk group who had a final diagnosis of appendicitis was 8 % or less with the scoring systems and with clinical assessment. While the AIR score had the lowest false negative rate, there was no statistical difference (p = 0.784) (Table 3). The percentage of patients who did not have appendicitis and were correctly assigned to the low risk group was 67 % for surgeon assessment, 62 % for the AIR score and 55 % for the Alvarado score (p = 0.164).

No patient with advanced appendicitis was stratified as low probability by either the AIR or Alvarado score. While not of statistical significance (p = 0.502), two patients deemed low probability of acute appendicitis by clinical assessment had a final diagnosis of advanced appendicitis proving that use of a scoring system is at least equally as good as clinical assessment at identifying those patients with advanced appendicitis (Table 4). All three methods of assessment stratified a similar, high proportion (≥75 %) of those with advanced appendicitis to the high probability of appendicitis group (p = 0.350). Of those deemed high probability of appendicitis by the AIR score, 48 % had a final diagnosis of advanced appendicitis, a figure higher than that for the Alvarado (19 %) and clinical assessment by a senior surgeon (25 %) (p = 0.012).

Radiological imaging was used as a diagnostic adjunct in 49 % of patients with ultrasound employed twice as frequently as CT. The decision to order imaging was based on clinician assessment and not the AIR or Alvarado score. Those deemed high probability for appendicitis were more likely to undergo a CT (p = 0.003) than those deemed less likely to have appendicitis. Overall, ultrasound had a false negative rate of 14 % for acute appendicitis that did not differ according to surgeon stratified risk of appendicitis.

Comparison of area under the curve (AUC) values from analysis of receiver operator characteristic (ROC) curves (Table 5) shows that the discriminatory capacities of all the three methods of assessment are similar with almost identical ROC curves for the AIR and Alvarado scores when plotted across all data points in each scoring system (AUC: 0.850 AIR Score, 0.84 Alvarado score). The AIR score performed better in children and in men. Assessment by a senior surgeon performed as well as both scoring systems in children and adults and had greater discriminatory capacity when assessing men but showed greatest inaccuracy when assessing females. The Alvarado score had the greatest discriminatory capacity in females.

Table 5 Discriminatory capacity of the AIR score, Alvarado score and senior surgeon assessment according to patient age and gender using receiver operator characteristic (ROC) curve analysis

Discussion

This is the first evaluation of the AIR score to compare its ability to estimate risk of appendicitis with the Alvarado score and the clinical impression of the senior assessing surgeon. The results show that both the AIR and Alvarado scores are accurate in ruling out appendicitis in those stratified as low risk with high negative predictive values. The ability of the scoring systems to accurately rule out appendicitis was equal to the clinical judgement of a senior surgeon suggesting that the scoring systems are well placed to be used as a decision support tool for junior surgeons or emergency department doctors when evaluating patients with a low probability of appendicitis who could be safely selected for observation on an out-patient basis.

Important differences do exist between the AIR and Alvarado score when it comes to selecting those at high probability for acute appendicitis. A high AIR score has excellent specificity and positive predictive values that exceed those of the Alvarado score and the clinical impression of a senior surgeon. A 2011 systematic review of the Alvarado score recorded a pooled specificity of 82 % for those with a high Alvarado score for accurate prediction of appendicitis, similar to the 76 % reported here [6]. This low specificity is one factor that may explain the low rates of utilisation of the Alvarado score in clinical practice. These prospective results evaluating the specificity of a high AIR score mirror those of the original study [7] and the validation study published in 2012 [9] that demonstrate a markedly superior specificity for a high AIR score in predicting acute appendicitis. A high AIR score does not identify all patients with appendicitis but in those it does deem high risk it does so with substantially greater accuracy than the Alvarado score with a specificity upwards of 95 %, a level that should allow confidence in the test in a clinical setting.

Many aspects of acute appendicitis have changed in the last 20 years with the advent of minimal access surgery as a diagnostic and a therapeutic procedure as well as increasing use of computed tomography (CT) in achieving a pre-operative diagnosis [10]. CT has a sensitivity rate of over 95 % and its use has helped reduce negative appendectomy rates [1113]. However, the liberal use of CT has raised concerns regarding unnecessary exposure to ionising radiation [14]. It is unrealistic to expect a clinical scoring system to achieve the same degree of sensitivity and specificity associated with CT. Symptoms and signs associated with appendicitis frequently overlap with other inflammatory conditions such as terminal ileitis, mesenteric adenitis and pelvic conditions that can affect women of child-bearing age.

The low sensitivity of a high AIR score (33 %) is in keeping with that from the two retrospective studies (37 %) and reflects the high proportion of patients with appendicitis stratified to the medium probability group [7, 9]. This need not be construed as a failure of the AIR score and could help rationalise the use of imaging. The AIR score more confidently identifies those patients with a high probability of appendicitis in whom supplemental imaging is unlikely to change management. Selective radiological imaging may be better reserved for patients deemed medium probability from the AIR score.

It is not clear why clinical prediction rules for acute appendicitis have not attained routine clinical use, especially as other areas of medicine have assimilated clinical scoring systems into daily practice. The use of the wells score in estimating risk of deep venous thrombosis [15] and the CHADS2 scoring system [16] in predicting risk of stroke in patients with atrial fibrillation have both been developed within the last 20 years yet are commonly used. The prevailing use of smartphone and app technology should mean that these scoring systems do not have to be committed to memory but should be easily accessible. The AIR score could be incorporated as a routine diagnostic tool for use by junior clinicians in the disciplines of Emergency Medicine and General Surgery who are frequently faced with patients with suspected appendicitis who have yet to appreciate some of the nuances learned by senior colleagues through years of experience in assessing such patients. The findings of this study show that a low AIR score is as accurate in excluding appendicitis, and a high score more accurate at predicting appendicitis, than the clinical assessment of a senior surgeon.

Admittedly, this study was confined to one surgical department (seven consultants and seven senior residents) and the figures for the diagnostic accuracy of clinical judgement cannot be extrapolated to other departments. However, multiple studies have shown that while clinical assessment is not 100 % accurate, it does have a role in estimating risk of appendicitis and that a slide towards routine use of CT for patients with suspected appendicitis is not only unnecessary but also potentially harmful [17, 18].

In conclusion, the AIR score is a reproducible assessment tool that is accurate in excluding appendicitis in those deemed low probability for appendicitis. A high specificity, greater than that of the Alvarado score, make it well placed to be used as a decision support tool in identifying patients at high probability of appendicitis that should proceed to surgery. A randomised control trial should be considered to study the AIR score as grounds for selective use of CT in those deemed medium probability for appendicitis.