Appendicitis is the most common cause for abdominal surgery in children and it is the commonest surgical emergency worldwide [1] with an estimated 17.7 million cases in 2019 [2], but the incidence differs across geographical regions [3]. Appendicitis can be a challenging diagnosis to make due to the various clinical presentations including diarrhea or nonspecific symptoms, hence, often misdiagnosed as gastroenteritis [4]. The risk of misdiagnosing appendicitis is higher in specific populations such as women, certain ethnicities [5], and older patients due to their comorbidities and various differential diagnoses [6]. Conversely, a misdiagnosed appendicitis can lead to a negative appendectomy, where women [7] and patients aged over 40 years [8] are at greater risk. In addition, patients undergoing a negative appendectomy have a higher risk of postoperative complications such as wound infections and incisional hernia compared with patients operated for uncomplicated appendicitis [9]. In addition, patients undergoing negative appendectomy have higher short- and long-term mortality in comparison with patients with uncomplicated appendicitis [10]. This indicates that there is a need for better preoperative diagnostic methods to avoid both the negative appendectomies and the missed diagnoses of appendicitis. One method is the use of a diagnostic tool, as incorporation of a diagnostic tool was shown to reduce the number of admissions and the surgical rate [11]. Typically, the appendicitis diagnosis through these tools is made from a predetermined combination of criteria in one or multiple categories such as patient characteristics, symptoms, physical examination, laboratory values, and imaging. One category alone, e.g. patient characteristics, physical examination, or laboratory values may not be sufficient in the diagnosis of appendicitis [12]. In combination, however, they have a higher discriminatory power. Conversely, the incorporation of diagnostic tools could require accessibility to special equipment possibly unavailable at the hospital [13, 14] making the diagnostic tool unusable.

No overview of all the available diagnostic tools for diagnosing appendicitis including non-English publications currently exist. This scoping review aimed to provide an overview of all existing diagnostic tools to diagnose appendicitis. Furthermore, we wanted to characterize these with respect to the target population, accuracy, and their need for diagnostic equipment.

Materials and methods

Protocol and registration

This review was reported according to Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) [15]. The protocol was uploaded prior to data extraction at Open Science Framework (OSF) [16].

Eligibility criteria

This study investigated the diagnostic tools specific for diagnosing appendicitis. The primary outcomes were to characterize the incorporated variables in the diagnostic tools into categories including patient characteristics, symptoms, physical examination, vital signs, laboratory values, and/or imaging. Additionally, we characterized the needed hospital access to staff and special equipment to utilize these diagnostic tools. This could be a medical doctor/surgeon, other health professionals, thermometer, laboratory-, and/or imaging equipment. The secondary outcomes were to investigate, whether the diagnostic tools were targeted at specific populations, e.g. with regards to age (children/adults/other), sex, ethnicity, etc. and to note the geographical regions of these populations according to the United Nations [17]. In addition, we wanted to illustrate the accuracy of these diagnostic tools such as sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), where variables were either extracted or calculated [18, 19].

Inclusion criteria

The studies included in this review were based on the following eligibility criteria. Original articles that stated their purpose to derive an accessible diagnostic tool (a score, an app, a website, etc.) that approximates the risk of appendicitis or propose a cut-off score for performing an appendectomy. The variables incorporated in the diagnostic tool had to be presented, however, the individual weight of the incorporated variables was not required to be presented. Modified diagnostic tools and subsequent variations were only included if they had recalculated the risk of appendicitis with the new variables incorporated. A minimum of three predictive variables were required in either of the following diagnostic categories: patient characteristics, symptoms, physical examination, vital signs, laboratory values, and/or imaging. Articles of all languages and publication years were included.

Exclusion criteria

Articles were excluded if the diagnostic tool only depended on imaging modality (e.g. computed tomography, ultrasound, or magnetic resonance imaging) or only added an imaging modality to an existing diagnostic tool. Articles that aimed to differentiate between simple and complex appendicitis were also excluded, since they did not align with the aim of this study.

Information sources

The author group developed a search string in collaboration with a research librarian and subsequently adapted it specifically to the individual databases. The following five databases were searched: PubMed (1966 to present), China National Knowledge Infrastructure (1951 to present), Latin American and Caribbean Health Sciences Literature (1982 to present), Índice Bibliográfico Espanhol de Ciências da Saúde (2000 to present), and Embase (1974 to present). The search was performed on 3rd of March, 2022. Scientific journals and libraries were contacted if the full text articles were not available online. Furthermore, the reference lists of the included studies were screened using a snowball search [20]. For PubMed the following search string was used: (“appendicitis”[MeSH Terms] OR “appendicitis” OR “appendectomy"[MeSH Terms] OR “appendectomy” OR “appendectomies” OR “appendicectomy” OR “appendicectomies”) AND (score OR classify OR “index”). The other adapted search strings can be viewed in the protocol [16].

Data processing

Records were uploaded to Mendeley (version 1.19.8, Elsevier, UK) where duplicates were removed. Afterwards, the records were uploaded to and screened in Covidence [21], which is an online screening tool. Firstly, two authors independently screened the titles and abstracts. Secondly, two authors independently screened the relevant reports by full text. Conflicts were resolved by discussion within the author group. The data to be extracted were explored during the data charting process, in which the first author chose ten studies reporting different diagnostic tools. A template for the data charting process was made in Microsoft Excel (Microsoft, Redmond, Washington, USA) with variables detected in the full-text articles. Data charting was then performed independently by two authors and conflicts were resolved by discussion within the author group. Double data entry for the data extraction was performed by the first author. The variables extracted were study characteristics such as study period and country of population, and if the article reported the ages of the patients suffering from appendicitis and not suffering from appendicitis separately. We extracted the characteristics of the diagnostic tools such as the target population (age group, sex, and ethnicity), type of diagnostic tool (decision tree, score, etc.), and the accuracy of diagnostic tools (positive predictive value, negative predictive value, sensitivity, and specificity) for the validation cohort. However, if these were not reported for a validation cohort, the accuracy variables were extracted for the derivation cohort. If necessary, simplifications were made on the variables so they could be categorized or grouped together.

Synthesis of results

Data were plotted in histograms, and the median and range were calculated for the population size and accuracy across studies. Data were categorized into six diagnostic groups: patient characteristics, symptoms (where initial pain located elsewhere than the right lower quadrant was grouped together with migration, and appetite was grouped together with anorexia), physical examination, vital signs, laboratory values (where absolute neutrophil count, neutrophil–lymphocyte ratio, and polymorph neutrophils were grouped as neutrophils), and imaging modalities (computed tomography, ultrasound, and/or magnetic resonance imaging).

Results

After the removal of duplicates, the literature search identified 6419 unique records that underwent title and abstract screening. We then screened 159 records in full texts, and we included 77 eligible studies [13, 14, 22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96], see Fig. 1. Six of these studies [22, 23, 25, 26, 56, 57] reported on one or more similar tools (Supplementary Table 1), thus, 74 unique studies remained, and 82 diagnostic tools were found (Fig. 1 and Supplementary Table 1).

Fig. 1
figure 1

PRISMA 2020 flow diagram of study selection process. CNKI China National Knowledge Infrastructure, LILACS Latin American and Caribbean Health Sciences Literature, IBECS Índice Bibliográfico Espanhol de Ciências da Saúde, n number

Study characteristics

The studies reported on various types of diagnostic tools, comprising 54 scoring systems (66%), 12 Eqs. (15%), eight diagnostic trees (10%) and eight others (nomogram, website, app, or desktop software) (10%), see Table 1 and Supplementary Table 1. Most of these studies were reported in English (84%) and the populations of the derivation cohorts were primarily from Europe (43%) and Asia (32%). For the respective country, region, and population cohort of the individual studies, see Supplementary Table 1. Across studies, the derivation cohort had a median size (range) of 315 (49–2423) patients, while the validation cohort had a median size of 171 (40–1426) patients.

Table 1 Summary of characteristics of included studies and diagnostic tools

Target population

The studies’ target populations were mainly categorized by age, sex, and ethnicity, see Table 1 and Supplementary Table 1. The remaining studies did not specify or report on the target population. Age was the most frequently used target population with a total of 28 studies and 33 diagnostic tools. Twenty-nine of these diagnostic tools were developed for children [29, 30, 34, 35, 37, 40, 42, 45, 53, 56, 70, 71, 73, 79,80,81,82, 87, 91,92,93,94] where age ranged from 0 to 20 years and the median of the mean and median ages was 10 years. The diagnostic tools consisted of 20 scoring systems, two websites, one equation, and one nomogram (Supplementary Table 1). Furthermore, two studies and two diagnostic tools targeted the adolescent- and adult patients [43, 55], with age ranging from 12 to 58 years and aimed at females in their reproductive years. One was a scoring system and one a diagnostic tree (Supplementary Table 1). For older patients with the lowest delimiter being over 50 years, there were two studies and four diagnostic tools, which all consisted of equations [24,25,26].

For the studies differentiating the presentation of appendicitis between sexes, there were a total of six studies and nine diagnostic tools [22, 28, 38, 41, 43, 55]. Three of the studies included only females [28, 43, 55] in the derivation cohort, one study only included males [38], and two studies included both females and males with 48% males [22] and 42% males [41]. The diagnostic tools consisted of four equations, three scoring systems, and two diagnostic trees (Table 1 and Supplementary Table 1).

Lastly, studies targeting ethnicity as their desired population were the least frequent with a total of three studies and three diagnostic tools. Their methodology was to use patients from their region, by either making use of a local database or having a derivation cohort from their region and calculating the best predictors to diagnose appendicitis. The specific populations were from Brunei [65], Pakistan [74], and New Zealand [33], respectively. The diagnostic tools consisted of three scoring systems (Table 1 and Supplementary Table 1).

Diagnostic tools characteristics

The diagnostic tools’ characteristics were categorized into six groups, see Table 2. Patient characteristics were included in 35% of the diagnostic tools, the most frequent of these were sex (32%) and age (16%). Symptoms were included in 85% of the diagnostic tools where the most frequent were nausea/vomiting (41%) and migration (40%). Physical examinations were included in 93% of the diagnostic tools, the most frequent ones being right lower quadrant tenderness (51%) and rebound pain (50%). Laboratory values were included in 78% of the diagnostic tools, the most frequent being leucocytes (66%) and neutrophils (34%). The imaging category was included in 16% of the diagnostic tools and involved ultrasound (15%) and CT (1%). None of the diagnostic tools included magnetic resonance imaging.

Table 2 Characteristics of categories used in the diagnostic tools. Characteristics used in only one or two diagnostic tools was group under other, and details can be found in footnotes, where (n = 1) unless otherwise stated

To characterize the needed hospital access to staff and equipment to utilize these 82 diagnostic tools under different circumstances, we developed a flow diagram (Fig. 2). Many diagnostic tools (35%) relied solely on a medical doctor/surgeon who could perform a physical examination and access to laboratory tests (n = 29). Often (26%) a thermometer was needed in addition (n = 21) and only 7% of diagnostic tools did not require a medical doctor/surgeon (n = 6) to perform a physical examination. However, for all of these (n = 6) another health professional was deemed necessary to collect patient characteristics and/or symptoms.

Fig. 2
figure 2

Flow diagram of needed equipment and hospital staff for the 82 diagnostic tools. For further details on each diagnostic tool see Supplementary Table 1 (language, target population, and type of tool) and Supplementary Table 2 (accuracy). References to diagnostic tools in each equipment category are: A: [22,23,24, 28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54], B: [13, 22,23,24,25,26,27, 55,56,57,58], C: [22, 23, 59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78], D: [79,80,81,82,83,84,85,86], E: [14, 22, 23, 25,26,27, 87, 88], and F [89,90,91,92,93,94,95,96]. n number; medical doctor/surgeon: e.g. physical examination needed; other health professionals: e.g. patient characteristics and/or symptoms needed; lab: laboratory values e.g. urine, stool, or blood test; imaging: computed tomography or ultrasound; therm: thermometer

Accuracy

The diagnostic accuracy of the tools is depicted in Fig. 3 and the exact values can be seen in Supplementary Table 2. The total number of patients in the derivation cohort across all diagnostic tools was 34,603 patients with a median (range) of 320 (49–2423), while the validation cohort across all diagnostic tools comprised 6034 patients with a median of 176 (40–1426). A large dispersion in the accuracy was also observed, see Fig. 3. PPV was reported or possible to calculate in 56% of diagnostic tools with a median value (range) of 91% (34–100%), and > 90% in 48% of the reported diagnostic tools. NPV was reported or possible to calculate in 48% of diagnostic tools with a median of 94% (0–100%), and > 90% in 62% of the reported diagnostic tools. The sensitivity was reported or possible to calculate in 76% of diagnostic tools with a median of 89% (15–100%), and > 90% in 52% of the reported diagnostic tools. The specificity was reported or possible to calculate in 76% of diagnostic tools with a median of 86% (34–100%), and > 90% in 39% of the reported diagnostic tools.

Fig. 3
figure 3

Scatterplot of the accuracy in percent (y-axis) of the diagnostic tools where each grey dot represents a tool. The black horizontal lines represent the medians. The exact values can be seen in Supplemental Table 2. Sensitivity and specificity (n = 62 tools). PPV positive predictive value (n = 46 tools), NPV negative predictive value (n = 39 tools), n number

Discussion

A total of 82 diagnostic tools were included in this scoping review with 12 tools reported in another language than English. Both symptoms (85%) and physical examination (93%) were included in the majority of the diagnostic tools, and one third of tools relied on a medical doctor/surgeon with access to laboratory equipment. Six diagnostic tools did not require a medical doctor/surgeon, thus no physical examination of the patient was needed. The accuracy was high for most of the diagnostic tools, meaning a diagnostic tool categorizing the patient with high or low risk for appendicitis, had a high correlation to the patient having/not-having appendicitis.

We used a comprehensive search string with the assistance of a research librarian, and the literature search was performed in five databases from three different regions. Furthermore, there were no restrictions on publication year or language. This led to an increased number of studies included and identified diagnostic tools compared with previous reviews on this subject [7, 97]. Prior to the data extraction a protocol was uploaded at Open Science Framework and the study was reported according to PRISMA-ScR [15]. A scoping review approach was chosen as the method, as our aim was to provide an overview of diagnostic tools used in diagnosing appendicitis and if diagnostic equipment was required. The scoping method allowed us to group data in variables meaningfully [98]. However, our review had some limitations. Even though we searched in multiple databases, we did not cover all regions, e.g. the Middle East, which could lead to bias if studies were published and not indexed in the included databases. However, the included databases cover most of the studies published worldwide. The accuracy of the diagnostic tools was generally high. Nevertheless, there is a need for some of the diagnostic tools to be reexamined further and revalidated in other cohort than the cohort of the original study.

In conclusion, this scoping review provided an overview of 82 diagnostic tools including 12 tools reported in other languages than English. Some diagnostic tools were developed for specific target populations such as age, sex, and ethnicity. Furthermore, most diagnostic tools relied on a medical doctor/surgeon with access to laboratory values. The accuracy of diagnostic tools showed a large variation but an overall good accuracy with a total median of PPV and NPV above 90% and a total median sensitivity and specificity above 85%. This study can be used as a guide for clinicians worldwide to choose a fitting diagnostic tool according to the patient population, hospital staff, and hospital equipment.