Abstract
Background
Clinical prediction rules (CPRs) provide an objective method of assessment in the diagnosis of acute appendicitis. There are a number of available CPRs for the diagnosis of appendicitis, but it is unknown which performs best.
Aim
The aim of this study was to identify what CPRs are available and how they perform when diagnosing appendicitis in adults.
Method
A systematic review was performed in accordance with the PRISMA guidelines. Studies that derived or validated a CPR were included. Their performance was assessed on sensitivity, specificity and area under curve (AUC) values.
Results
Thirty-four articles were included in this review. Of these 12 derived a CPR and 22 validated these CPRs. A narrative analysis was performed as meta-analysis was precluded due to study heterogeneity and quality of included studies. The results from validation studies showed that the overall best performer in terms of sensitivity (92%), specificity (63%) and AUC values (0.84–0.97) was the AIR score but only a limited number of studies investigated at this score. Although the Alvarado and Modified Alvarado scores were the most commonly validated, results from these studies were variable. The Alvarado score outperformed the modified Alvarado score in terms of sensitivity, specificity and AUC values.
Conclusion
There are 12 CPRs available for diagnosis of appendicitis in adults. The AIR score appeared to be the best performer and most pragmatic CPR.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
Appendicitis is one of the most common acute surgical illnesses with a lifetime prevalence of one in seven [1]. It continues to be clinically challenging to diagnose as it mimics a variety of other pathologies, especially in females [1]. Diagnosis is usually based on the clinical history, examination, correlated with laboratory and imaging investigations. The final diagnosis may require diagnostic laparoscopy, which itself is not without risk.
Clinical prediction rules (CPRs) are one of the most commonly described tools used to aid the diagnosis of appendicitis. CPRs are derived from systematic clinical observations and aim to reduce uncertainty by standardising the collection and interpretation of clinical data [2, 3]. They have been shown to provide a more objective method of assessment and standardisation of care for patients with suspected appendicitis, thereby reducing the number of unnecessary operations and patient exposure to radiation [4]. Although a plethora of CPRs exist for the diagnosis of appendicitis, it is unclear which of these performs most reliably.
The aim of this systematic review was to identify all current CPRs for the diagnosis of appendicitis in adults and assess their performance.
Methods
Search strategy
This study was completed in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [5]. A comprehensive literature search was performed in MEDLINE, EMBASE, Pubmed and Cochrane Central Register of Controlled Trials databases from inception to February 2016. The search strategy is outlined in Table 1. Studies were restricted to English language and humans only. The reference list of all included and relevant review articles were also searched to identify further potentially eligible manuscripts.
Inclusion and exclusion criteria
Only studies that derived or validated the impact of a CPR for use in adults presenting with right lower quadrant (RLQ) pain, right iliac fossa (RIF) pain or abdominal pain suspicious of appendicitis were included. For the purposes of this study, a CPR was defined as one that [2, 3, 6];
-
Had three or more predictive variables obtained from the history, physical exam and simple diagnostic tests
-
Provided a probability of an outcome or suggested a diagnostic/therapeutic course of action.
-
Was not a decision analysis, decision tree or practice guideline.
Both CPR derivation and validation studies were included. A derivation study was defined as a study that described the method of how a new CPR was formed and explained how it should be applied in a clinical setting. A validation study assessed performance of an existing CPR by ascertaining the sensitivity, specificity and/or AUC. If derivation studies included an internal validation component, the validation component was excluded from the validation study analysis due to a high risk of potential bias [2].
Exclusion criteria for derivation studies
When assessing articles which derived a CPR, studies that modified an existing scoring system in order to generate a new scoring system were included if the new parameters and cut-off values were clearly defined. There was no restriction on study design. Scores which were derived for use solely in paediatric, elderly, pregnant or single gender populations and those that did not assess the primary outcomes of appendicitis versus non-appendicitis and/or required the use of neural networks were excluded.
Exclusion criteria for validation studies
Studies that validated CPRs in elderly populations, a single gender only or included patients younger than 14 years, were excluded. Studies that looked at a subset of the scoring system or only patients that had imaging were also excluded. Three studies that did not state the age of the participants were also excluded. Studies that included patients younger than 14 years of age with a separate analysis for adults were included.
Selection of studies
The initial search, title and abstract screen were performed independently by MK and CH. Any discrepancy between the two reviewers was discussed with senior author AM. A total of 224 articles were identified as relevant and underwent full text review by authors MK, CH, ML, WM and LS.
Data extraction and statistical analysis
Derivation studies
Data from studies describing derivation of a CPR were extracted using a standard pro forma. Study characteristics, derivation methodology, scoring systems characteristics (e.g. use of weighting, positive versus negative scoring) and variables comprising the CPR were recorded for each study.
Validation studies
Extracted data from CPR validation studies were also extracted using a standard proforma. These included study design, results obtained for sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), likelihood ratios and AUCs values from receiver operating curve (ROC) analysis.
When more than two cut-off values were evaluated for the prediction of high risk of having appendicitis, only the cut-off recommended in the original derivation paper was used for analysis. When sensitivity and specificity were not calculated in the validation studies, these were calculated from the data available using a two-by-two table by author MK and confirmed by YT. Forrest plot confidence intervals (CI) were calculated using the variance method for all studies to minimise bias [7].
Assessment of methodological quality of validation studies
The quality of included validation studies was assessed and scored using 15 pre-defined criteria by Wasson et al. (Table 2) [3]. These criteria were specifically designed to assess articles describing clinical prediction rules.
Results
Study selection
The initial database search identified 7696 titles, and a further 56 identified through the manual search. Of these, 4398 were potentially relevant after removal of duplicates and further screening. Following abstract review 257 papers met criteria for full text review. Of these, 12 papers describing derivation of CRPs and 22 describing validation were included. The PRISMA flow diagram is presented in Fig. 1 [5].
Derivation studies
Characteristics of CPRs derived for use in adults with suspected appendicitis demonstrated significant heterogeneity in both study population and methodology (Table 3). Among the discrepancies in methodology was the variation in statistical analyses. Three studies used univariate analysis, while seven studies used multivariate analysis (Table 3) [8–16].
The most commonly incorporated variable was the white cell count, which appeared in all 12 studies (Table 4) [8–18]. Temperature, rebound tenderness and migratory pain were the next most common across all studies (Table 4) [8–11, 16–18]. Studies that used multivariate analysis identified gender, elevated C-reactive protein, RIF pain, neutrophilia, vomiting and signs of peritonism (guarding, rigidity) as likely variables [9, 10, 13–16]. Rectal tenderness, diarrhoea and Rovsing’s sign were the least commonly used variables and appeared only in CPRs that used univariate analysis [11, 12, 18].
Validation studies
The 22 included validation studies demonstrated heterogeneity with respect to study population, study design and cut-off values evaluated (Table 5). Two of the 22 studies only had AUC values available for the adult population. A scatter plot of all sensitivity and specificity values adjusted for sample size is shown in Fig. 2. A Forrest plot could only be generated for sensitivity as the number of true negatives was unable to be calculated from the majority of the studies due to incomplete follow-up of discharged patients (Fig. 3). As CIs displayed in the Forrest plot were calculated using the variance method, the values presented in Fig. 3 may differ to those published in the original studies due to different calculation methods. The studies published by Scott et al. (year), Erdem et al. (year) do not have CIs calculated as the sensitivity and sample size values were too similar.
The majority of studies had a quality score between six and eight, while only six studies scored ten or more out of fifteen (Table 5). Of these, the two highest quality studies validated the acute inflammatory response (AIR) and Lintula scores [19, 20].
A general trend demonstrated that at higher cut-off values, the specificity of scoring systems improved but at the expense of the sensitivity. Clinically, this means CPRs with high cut-off values are better for ruling out a diagnosis of appendicitis due to the good positive predictive value (Table 5; Figs. 2, 3). This was especially apparent in the Alvarado and AIR scores.
The most commonly validated CPR was the Alvarado score, followed by the Kalan’s modified Alvarado score (Figs. 2, 3) [21–37]. The average AUC value for the Alvarado score that ranged between 0.74 and 0.88 was higher than the modified Alvarado score which had an AUC of 0.69 from a single study (Table 5).
The sensitivity of the Alvarado score ranged from 67.65 to 96.3%, while specificity ranged from 58.18 to 89.39% when the originally recommended cut-off of seven was used. This variability was also seen in Kalan’s modified Alvarado score where the sensitivity ranged from 53.8 to 97.6%, and specificity ranged from 28.57 to 80% for the same cut off value. This variability remained regardless of the quality of the studies (Table 5).
The AIR, Raja Isteri Pengiran Anak Saleha Appendicitis (RIPASA), Ohmman, Lintula and Eskelinen scores each had only a single validation study from which sensitivity and specificity could be obtained [19, 20, 38].
The AIR score showed a high sensitivity (92%) and moderate specificity (63%) at a cut-off value above five. This reverted to 20 and 97%, respectively, for a cut-off value above eight, which was the original cut-off recommended by the authors. The AUC values generated for this CPR ranged from 0.805 to 0.97, with an average value of 0.872 [20, 39, 40].
The Lintula score which was originally derived for use in paediatrics showed high performance in adults with a sensitivity of 87% and specificity of 96% [19]. The final score looked at in this study was based on repeated calculations for patients who were observed as inpatients. This is in comparison with other studies which only reported diagnostic indices based on scores at admission. There were no AUC values available for this CPR.
Erdem et al. validated the Alvarado, RIPASA, Eskelinen and Ohmann CPRs in a single study with a quality score of ten [38]. While the RIPASA, Eskelinen and Ohmann scores showed superior sensitivity and AUC values to the Alvarado scoring system, they showed poor specificity.
The pragmatic utility of these scoring systems (Table 5) demonstrated that the modified Alvarado score, Alvarado and AIR score are the most user-friendly CPRs. The use of decimal points and multiple weightings make the other scores difficult to calculate in a busy clinical setting.
Discussion
There are currently 12 published CPRs available to aid diagnosis of adults presenting with suspected appendicitis. These have been validated in 22 separate studies. The aim of this systematic review was to ascertain which of these available scores performed the best. The heterogeneity of included studies precluded the possibility of performing a meta-analysis. Based on a narrative review, however, it appears the AIR score performs the best.
Assessing the best performing CPR without meta-analysis meant narratively assessing sensitivity, specificity, AUC values, usability and the quality of available studies. Although the Lintula score performed highly in terms sensitivity and specificity, this score is difficult to use in a busy clinical setting and the comparability of the results obtained remains in question as the final score was based on repeated calculations as opposed to calculation at a single point in time. While the Eskelinen, RIPASA and Ohmann scores had good sensitivity and AUC values, they are difficult to calculate given the number of variables and range of weightings used. Thus, the overall best performer in terms of the quality of studies, results and usability was the AIR score. It is easy to calculate manually, and all parameters are easy to interpret except perhaps for the recommended subjective grading of rebound tenderness (as this requires clinical experience which may be limited in junior doctors). A score of ≥five appears to be better than the originally recommended cut-off of nine as there is lower number of missed diagnoses without a significant reduction in specificity.
The majority of published validation studies evaluated the Alvarado score and Kalan’s modified Alvarado score. This is probably because Alvarado was among the pioneers to generate a CPR as a diagnostic aid for appendicitis [8]. Although the Alvarado score is simple to calculate, the interpretation of left shift in neutrophils is time consuming. The results from the available studies demonstrated wide variation for both sensitivity and specificity. This variation was further emphasised as cut-off value increased and was also attributable to study design (e.g. prospective verses retrospective), variations in the characteristics of the evaluated patients, interpretation of variables of the CPR by different clinicians in different settings as well as the clinical expertise of the clinicians. While the overall sensitivity did not appear to show much variation between the Alvarado and modified Alvarado scores, the specificity appeared to be lower for the modified Alvarado score [8, 41]. Thus although the modified Alvarado score provides a more user friendly CPR, the removal of the left shift in neutrophils appeared to increase the number of false positives and was less accurate than the original CPR [8, 41].
Among derivation studies, there was wide discrepancy in the derivation methodology used. Multivariate logistic regression is known to be more reliable than using univariate analysis. This is highlighted by those CPRs derived with the multivariate method consistently identifying variables used in clinical practice [7, 42, 43] [44]. Variables such as rectal tenderness and diarrhoea that were identified in studies employing univariate analysis are seldom used clinically in the diagnosis of appendicitis [44–47]. The reliability of multivariate logistic regression analysis is further emphasised by CPRs which used this methodology such as the Lintula, AIR and Eskelinen scores showing better sensitivity and AUC values compared to the Alvarado score which was derived using univariate analysis.
Several studies investigating CPRs for appendicitis conclude that clinical judgement is comparable to CPR stratification, especially when performed by a senior surgeon [21, 27, 30, 34, 48]. While this could imply that CPRs do not improve diagnostic accuracy compared to a senior surgeon, it provides evidence that CRPs can improve diagnostic accuracy to the level of an experienced surgeon when used by less experienced staff [21, 30, 48, 49]. Given that junior staff usually undertake initial evaluation of patients with suspected appendicitis, the use of a CPR is valuable in this context. Patient care is likely to be more standardised and unnecessary exposure to radiation and invasive investigations, including laparoscopy, minimised.
The heterogeneity and quality of included studies precluded meta-analysis of available data. A further limitation was the pre-defined age criteria as many of the studies included children were excluded because the finding for children and adults could not be separated. The exclusion of non-English publications may also have excluded important validation studies done in other populations.
Conclusion
There are currently 12 CPRs available for use in adults with suspicion of appendicitis. Heterogeneity in methodology and quality of available studies precluded a meta-analysis. The AIR score performed best in terms of sensitivity, specificity AUC values and usability but has been validated in only a small number of studies. The Alvarado and modified Alvarado were the most commonly validated CPRs, but their performance was variable. The original Alvarado score outperformed the modified Alvarado score across all three criteria (sensitivity, specificity and AUC values).
References
Stephens PL, Mazzucco JJ (1999) Comparison of ultrasound and the Alvarado score for the diagnosis of acute appendicitis. Conn Med 63:137–140
Laupacis A, Sekar N (1997) Stiell l G. Clinical prediction rules: a review and suggested modifications of methodological standards JAMA 277:488–494
Wasson JH, Sox HC, Neff RK et al (1985) Clinical prediction rules. N Engl J Med 313:793–799
Ohle R, O’Reilly F, O’Brien KK et al (2011) The Alvarado score for predicting acute appendicitis: a systematic review. BMC Med 9:139
Moher D, Liberati A, Tetzlaff J et al (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Ann Intern Med 151:264–269
McGinn TG, Guyatt GH, Wyer PC et al (2000) Users’ guides to the medical literature: Xxii: how to use articles about clinical decision rules. JAMA 284:79–84
Zhou X-H, Obuchowski NA, McClish DK (2011) Statistical analysis for meta-analysis statistical methods in diagnostic medicine. Wiley, Hoboken, pp 435–448
Alvarado A (1986) A practical score for the early diagnosis of acute appendicitis. Ann Emerg Med 15:557–564
Andersson M, Andersson RE (2008) The appendicitis inflammatory response score: a tool for the diagnosis of acute appendicitis that outperforms the Alvarado score. World J Surg 32:1843–1849. doi:10.1007/s00268-008-9649-y
Andersson M, Ruber M, Ekerfelt C et al (2014) Can new inflammatory markers improve the diagnosis of acute appendicitis? World J Surg 38:2777–2783
Fenyo G (1987) Routine use of a scoring system for decision-making in suspected acute appendicitis in adults. Acta Chirurgica Scandinavica 153:545–551
Jahn H, Mathiesen FK, Neckelmann K et al (1997) Comparison of clinical judgment and diagnostic ultrasonography in the diagnosis of acute appendicitis: experience with a score-aided diagnosis. Eur J Surg 163:433–443
Lindberg G, Fenyö G (1988) Algorithmic diagnosis of appendicitis using Bayes’ theorem and logistic regression Bayesian statistics 3:665–668
Sammalkorpi HE, Mentula P, Leppaniemi A (2014) A new adult appendicitis score improves diagnostic accuracy of acute appendicitis—a prospective study. BMC Gastroenterol 14(1):114. doi:10.1186/1471-230X-14-114
Tzanakis NE, Efstathiou SP, Danulidis K et al (2005) A new approach to accurate diagnosis of acute appendicitis. World J Surg 29:1151–1156 (discussion 1157)
van den Broek WT, Bijnen BB, Rijbroek B et al (2002) Scoring and diagnostic laparoscopy for suspected appendicitis. Eur J Surg 168:349–354
Christian F, Christian GP (1992) A simple scoring system to reduce the negative appendicectomy rate. Ann R Coll Surg Engl 74:281–285
Goh PL (2010) A simplified appendicitis score in the diagnosis of acute appendicitis. Hong Kong J Emerg Med 17:230–235
Lintula H, Kokki H, Pulkkinen J et al (2010) Diagnostic score in acute appendicitis. validation of a diagnostic score (Lintula score) for adults with suspected appendicitis. Langenbecks Arch Surg 395:495–500
Scott AJ, Mason SE, Arunakirinathan M et al (2015) Risk stratification by the appendicitis inflammatory response score to guide decision-making in patients with suspected appendicitis. Br J Surg 102:563–572
Al-Hashemy AM, Seleem MI (2004) Appraisal of the modified Alvarado score for acute appendicitis in adults. Saudi Med J 25:1229–1231
Bhattacharjee PK, Chowdhury T, Roy D (2002) Prospective evaluation of modified Alvarado score for diagnosis of acute appendicitis. J Indian Med Assoc 100(310–311):314
Bulus H, Tas A, Morkavuk B et al (2013) Can the efficiency of modified Alvarado scoring system in the diagnosis acute appendicitis be increased with tenesmus? Wien Klin Wochenschr 125:16–20
Gurav P, Hombalkar N, Dhandore P et al (2013) Evaluation of right iliac fossa pain with reference to alvarado score can we prevent unnecessary appendicectomies? JKIMSU 2(2):24–29
Huang TH, Huang YC, Tu CW (2013) Acute appendicitis or not: facts and suggestions to reduce valueless surgery. J Acute Med 3:142–147
Kariman H, Shojaee M, Sabzghabaei A et al (2014) Evaluation of the Alvarado score in acute abdominal pain. Ulusal Travma ve Acil Cerrahi Dergisi 20:86–90
Kim K, Rhee JE, Lee CC et al (2008) Impact of helical computed tomography in clinically evident appendicitis. Emerg Med J 25:477–481
Limpawattanasiri C (2011) Alvarado score for the acute appendicitis in a provincial hospital. J Med Assoc Thai 94:441–449
Malik AA, Wani NA (1998) Continuing diagnostic challenge of acute appendicitis: evaluation through modified Alvarado score. Aust N Z J Surg 68:504–505
Man E, Simonka Z, Varga A et al (2014) Impact of the Alvarado score on the diagnosis of acute appendicitis: comparing clinical judgment, Alvarado score, and a new modified score in suspected appendicitis: a prospective, randomized clinical trial. Surg Endosc 28:2398–2405
Meltzer AC, Baumann BM, Chen EH et al (2013) Poor sensitivity of a modified Alvarado score in adults with suspected appendicitis. Ann Emerg Med 62:126–131
Owen TD, Williams H, Stiff G et al (1992) Evaluation of the Alvarado score in acute appendicitis. J R Soc Med 85:87–88
Pouget-Baudry Y, Mucci S, Eyssartier E et al (2010) The use of the Alvarado score in the management of right lower quadrant abdominal pain in the adult. J Visc Surg 147:e40–44
Pruekprasert P, Maipang T, Geater A et al (2004) Accuracy in diagnosis of acute appendicitis by comparing serum C-reactive protein measurements, Alvarado score and clinical impression of surgeons. J Med Assoc Thail 87:296–303
Rodrigues G, Rao A, Khan SA (2006) Evaluation of Alvarado score in acute appendicitis: a prospective study. Internet J Surg 9:1–5
Tade AO (2007) Evaluation of Alvarado score as an admission criterion in patients with suspected diagnosis of acute appendicitis. West Afr J Med 26:210–212
Yuksel Y, Dinc B, Yuksel D et al (2014) How reliable is the Alvarado score in acute appendicitis? Ulusal Travma ve Acil Cerrahi Dergisi 20:12–18
Erdem H, Cetinkunar S, Das K et al (2013) Alvarado, Eskelinen, Ohhmann and Raja Isteri Pengiran Anak Saleha Appendicitis scores for diagnosis of acute appendicitis. World J Gastroenterol 19:9057–9062
de Castro SM, Unlu C, Steller EP et al (2012) Evaluation of the appendicitis inflammatory response score for patients with acute appendicitis [Erratum appears in World J Surg. 2012 Sep; 36(9):2271]. World J Surg 36:1540–1545
Kollar D, McCartan DP, Bourke M et al (2015) Predicting acute appendicitis? A comparison of the Alvarado score, the Appendicitis Inflammatory Response Score and clinical assessment [Erratum appears in World J Surg. 2015 Jan; 39(1):112; PMID: 25315090]. World J Surg 39:104–109
Kalan M, Talbot D, Cunliffe WJ et al (1994) Evaluation of the modified Alvarado score in the diagnosis of acute appendicitis: a prospective study. Ann R Coll Surg Engl 76:418–419
Zhou X-H, Obuchowski NA, McClish DK (2011) Regression analysis for independent ROC data statistical methods in diagnostic medicine. Wiley, Hoboken, pp 261–296
Zhou X-H, Obuchowski NA, McClish DK (2011) Analysis of multiple reader and/or multiple test studies statistical methods in diagnostic medicine. Wiley, Hoboken, pp 297–328
Beasly SW (2000) Can we improve diagnosis of acute appendicitis? BMJ 321:907–908
Abou Merhi B, Khalil M, Daoud N (2014) Comparison of Alvarado score evaluation and clinical judgment in acute appendicitis. Med Arh 68:10–13
Kasimov RR, Mukhin AS (2013) Current state of acute appendicitis diagnosis. Sovremennye Tehnologii v Medicine 5:112–116
Wagner JM, McKinney WP, Carpenter JL (1996) Does this patient have appendicitis? JAMA 276:1589–1594
Yegane R, Peyvandi H, Hajinasrollah E et al (2008) Evaluation of the modified Alvarado score in acute appendicitis among iranian patients. Acta Medica Iranica 46:501–506
D’Souza C, Martis J, Vaidyanathan V (2013) Diagnostic efficacy of modified alvarado score over graded compression ultrasonography Nitte University. J Health Sci 3:105–108
Acknowledgements
Irene Zeng—Research Biostatistician, Department of knowledge and information management, Middlemore Hospital.
Authors’ contribution
MK designed the study, performed the initial screen and review of all articles included and composed the manuscript. ML was involved in the extraction of data from articles included and assisted with preparing the final manuscript. CH was involved in initial article screen and extraction of data from articles included. WM was involved in extraction of data from articles included. LS was involved in extraction of data from articles included. YH provided statistical expertise used to develop the dot plot. JM is a senior author and primary supervisor of masters student who completed this project and also assisted in preparing the final manuscript. AM is a senior author and principal investigator provided supervision to the co-authors. All authors have approved the manuscript as representing honest work, and each author meets the requirements for authorship.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Kularatna, M., Lauti, M., Haran, C. et al. Clinical Prediction Rules for Appendicitis in Adults: Which Is Best?. World J Surg 41, 1769–1781 (2017). https://doi.org/10.1007/s00268-017-3926-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00268-017-3926-6