Introduction

In 2014, the American Association of Poison Control Centers (AAPCC) reported 2,165,142 phone calls for possible human poisonings, which underrepresents total poisonings [1]. In 2004, the World Health Organization (WHO) reported 346,000 people worldwide died from unintentional poisoning [2]. Given the magnitude of the problem, the value of timely, appropriate, and cost-effective medical management of poisonings is critically important. The clinical severity of poisoning can range from asymptomatic to life-threatening depending on specifics related to the toxin and the timing of ingestion in relation to the availability of medical treatment. As such, the ability to predict the severity of toxicity, and therefore the patient’s prognosis, is of critical significance. Clinical severity scores are used for medical conditions or applied to critically ill patients to predict the progression of their illness. They are also useful in research when comparing groups of patients. An example is the Glasgow Coma Scale (GCS). The GCS was first described in 1974 as a way to prognosticate the severity of a traumatic brain injury; it is an easy bedside tool that in addition to providing prognostic information now serves as a simple tool to communicate the severity of illness in a variety of conditions beyond traumatic brain injury [3].

A variety of static and dynamic severity scores are used in patient care and medical research. Static severity scores use the worst physiologic data points at a certain point in time, while dynamic severity scores collect data points over time. Examples of commonly used static scores include the Acute Physiology and Chronic Health Evaluation (APACHE II) and the Simplified Acute Physiology Score (SAPS). The Model for End-Stage Liver Disease Score (MELD) is a dynamic score while the Mortality Probability Model (MDM) can be static if it is only done at admission or dynamic if it is completed every 24 h [4]. There are also organ dysfunction scores such as the Multiple Organ Dysfunction Score (MODS) or Sepsis-Related Organ Failure Assessment (SOFA) which aim to trend overall organ dysfunction as it relates to morbidity and mortality [4]. Some of these scores have been applied to poisonings such as organophosphates [58] and paraquat [9].

The Poisoning Severity Score (PSS) was developed between 1990 and 1994 by the European Association of Poisons Centres and Clinical Toxicologists (EAPCCT), the International Programme on Chemical Safety (IPCS), and the Commission of the European Union to provide a simple and reliable scoring system to describe poisonings and define their severity [10]. The PSS uses a collection of clinical signs and symptoms to give a score of 0 to 4. The score is applied according to the patient’s most severe clinical effects, regardless of the timing of those effects. It is not meant to provide prognostic information. A score of 0 equates to being asymptomatic, 1 is minor, 2 is moderate, 3 is severe, and 4 is given if the patient dies. As such, all patients that die should only receive a score of 4.

The PSS has been applied to a wide range of poisonings including hydrocarbons [11], organophosphates [5, 6, 12], antipsychotics [13, 14], and envenomations [15]. However, the PSS includes a large number of data points from 12 different organ systems and multiple subjective variables such as “mild hemolysis,” “mild hypotension,” and “prolonged coughing,” which decrease its inter-rater reliability. Given these significant limitations, we are skeptical that the PSS, in its present form, can reliably be used as a research or clinical tool. We also wonder if it is possible for a single scoring system to accurately predict outcomes in all toxic exposures given the wide variation and breadth of human poisonings and envenomations. We aim to review the literature to obtain a better understanding of how the PSS is utilized, determine if it is applied correctly, and describe its limitations.

Methods

We searched the medical literature in all languages using PUBMED, EMBASE, and SCOPUS from inception through August 2013 applying search terms “poison severity score” or “toxicology severity score” or “toxicology score” or “toxicologic score” or “toxicologic scoring system” or “overdose scoring system” or “toxicology scoring system” or “overdose severity score” or “toxic scores overdose” or “poisoning severity score.” EMBASE was only searched with the term “poisoning severity score” due to the large number of publications returned with the other search strategies. The strategy returned a total of 2564 articles. Some articles were identified on more than one search. One of the authors reviewed the titles of the 2564 articles identified by the search. Based on the title, the abstract of any relevant article was reviewed. If the “Poisoning Severity Score” was included in the abstract, the publication was included for review.

Our search identified 204 publications of which 141 were abstracts only and 1 was a letter to the editor, all of which were excluded. Of the remaining 62 publications, 23 were only available in foreign languages; one was successfully translated. Attempts to translate the remaining foreign language articles were unsuccessful so the manuscripts were excluded, leaving 40 for review. The reference lists of all 40 manuscripts were reviewed for additional references but only returned two new abstracts, both of which were excluded.

The Original Publications

In 1990, The EAPCCT convened a working group to develop a scoring system to standardize the evaluation of poisoned patients [16]. The group proposed two schemes: a detailed clinical scheme referred to as the TOXscore and a simplified version called the PhoneTOXscore, which was evaluated in a small pilot study and later renamed the PSS. After being renamed, the finalized version was published in 1998 as part of a two-phase trial originally involving 14 poison centers [10].

The initial phase included 14 poison centers from different parts of the world, included but not limited to Turkey, New Zealand, France, Toronto, and Uruguay; no sites from the USA were included. Each of the participating poison centers submitted 25–30 cases (convenience sample), which were summarized on a standardized form. A total of 371 cases were collected and redistributed to each site.

The 14 sites produced 5194 scores (14 × 371). Staff members at each site scored each of the cases. The percentage of times each of the centers assigned a case the same score was referred to as the concordance. Acceptable concordance was defined as 70% (10/14 centers with the same score). For example, there were 55 different cases of poisoning from corrosives. For 12 cases, the concordance between centers was 50–69%; for 18 cases, it was 70–90%; and for 25 cases, it was >90%, for an overall acceptable concordance of 78% for corrosives. Overall, concordance ranged from 55% to 81% for all categories for an overall average of 72%.

After the results from phase 1 were analyzed, the scoring system was revised with “minor modifications” that were not specified. An additional three centers were included; however, these new centers did not submit any new cases. The same 371 cases were distributed to each of the 17 centers for scoring using the new system. Results were again analyzed between centers and acceptable concordance improved from 72% to 80%.

Even though the PSS was not intended as a prognostic score, the Birmingham Centre of the United Kingdom National Poisons Information Service (NPIS) published a validation study assessing the prognostic utility of the PSS in 1998 [17]. NPIS scored 718 consecutive inquiries made to the center. Cases were scored following the initial contact with NPIS. Of the 718 inquiries, 397 had a score of 0 (55.3%), 225 had a score of 1 (33.3%), 71 had a score of 2 (9.9%), and 25 had a score of 3 (3.5%). No patients were initially given a score of 4. The center then made a follow-up call 12–24 h after the initial contact and continued follow-up until the patient recovered or died; during follow-up, the PSS was scored again. Follow-up was completed for 638 cases (89%). Follow-up was obtained for 548 of 622 patients (88%) that originally had a PSS of 0 or 1; at follow-up, five patients developed a PSS of 2 but did not deteriorate further. Six patients that originally were scored as a PSS 2 received a score of 3 at follow-up. Five patients died and each were originally given a PSS of either 2 or 3. The authors concluded that the PSS is a simple and reliable system for describing a poisoning, describing its ultimate severity, and identifying the most severe cases.

Articles Incorporating the Poisoning Severity Score

Forty manuscripts were reviewed (Table 1). Foreign language manuscripts that could not be translated and were excluded were published in China, Poland, France, Turkey, Belgium, Slovakia, Germany, Hungary, and Sweden. One manuscript from Hungary was translated into English and included in the review. Twenty-two of the studies correctly used the PSS. One study included the PSS in the abstract but not in the body of the manuscript so it is not clear how it was used [27]. Another study only mentioned that a PSS ≤ 2 was an independent prognostic factor but offered no other information regarding how the PSS was used [23]. Seven articles used the PSS in a unique way such as comparing it to another severity score. The remaining articles either misapplied the PSS or modified the PSS. Some articles are included in more than one category. For instance, the authors used the PSS in a unique way but also used the score incorrectly or misapplied it. The review divides the articles into those that used the score correctly, those that used it in a unique way, and those that used it incorrectly or modified it. Table 1 includes a brief review of each manuscript with key critiques of how the PSS was incorporated in each article. For an in-depth review of each manuscript, please see the online supplement.

Table 1 Manuscripts including the Poisoning Severity Score

Studies that Properly Incorporated the Poisoning Severity Score

Many studies correctly applied the PSS [15, 18, 19, 21, 25, 31, 33, 35, 3941, 44, 46, 47]. The score was included in these studies to describe the severity of illness. In some cases, it was used in a dynamic fashion as opposed to a static score. These articles are not included in this section as the PSS was misapplied even though it was calculated correctly. In many of the studies, the PSS was reported along with clinical and demographic data such as age, sex, or laboratory values. The studies were mostly from different authors and took place in a wide variety of locations. This indicates that the scoring system can be properly used in a wide variety of exposures and by authors not originally involved in developing the PSS. Two authors used the PSS correctly in multiple studies [7, 15, 24, 39]. Interestingly, four other authors used the PSS correctly in one study but then misapplied the PSS in a second study [25, 26, 30, 45, 47, 48]. If the PSS were demonstrated to be a useful research tool, it could be used to compare groups of patients such as occurred in these studies.

However to be considered useful, the PSS must do more than just be applied correctly. In most of the included studies, the PSS was incorporated as just another piece of clinical information, just like the patient’s age or sex. There are simpler ways to communicate severity of illness that do not require learning a new system, especially one as complex as the PSS. A few studies attempted to determine if other variables could predict the severity of an exposure by comparing them to the patient’s PSS. This included correlating CO levels [41], bicarbonate concentrations [40], and blood glucose concentrations [38] with the patient’s PSS. One author attempted to use the PSS to determine if intentional exposures caused more severe poisonings than accidental exposures [33]. It is not apparent that the PSS added anything to these manuscripts’ results or conclusions. In many of these studies, it would be easier for the reader to understand if the authors used a simpler system or excluded the PSS altogether. This is especially true given that most physicians are not familiar with the PSS so do not know how to interpret it. Stating that a patient has a PSS of 2 likely has much less meaning to another physician than when a colleague states a patient has a GCS of 5. Two other studies either completely excluded the PSS in the body of the manuscript or only mentioned that a PSS ≤ 2 was an independent prognostic factor [23, 27]. Whether these were oversights or evidence that the scoring system is not useful or valuable is unknown. While collecting data such as a patient’s age or sex either prospectively or in a retrospective manner is relatively simple, calculating the PSS is cumbersome and is not worth including if it does not provide any additional, useful information other than what is already routinely and more easily collected.

Studies Comparing the Poisoning Severity Score to Another Severity Score or Using the Poisoning Severity Score Uniquely

A few authors used the PSS in nontraditional, interesting, or unique ways. In some studies, the PSS was modified or misapplied or used in a dynamic fashion. However, they are included in this section due to how the PSS was incorporated.

Four studies compared the PSS to other severity scores. Two compared the PSS to the GCS [5, 24]; one to the GCS, APACHE II, and SAPS II [7]; and one to the GCS, predicted mortality rate (PMR), and APACHE II [8]. Overall, the authors felt that all scores performed well, although this is certainly debatable. Churi et al. noted a moderate correlation between the PSS and GCS (r = 0.51, p < 0.001) [24]. Akdur also noted that patients with a PSS of 3 or 4 had a significantly lower GCS (mean 4.7) and so determined that both scores were effective tools [5]. However, both groups of authors concluded that while both were equally effective, the GCS was much easier to apply than the PSS. In a second paper by Churi et al., the authors do not state which scoring system was the best but concluded that the PSS, GCS, and APACHE II all predicted mortality well [7]. This would be expected with the PSS as only patients that died could get a score of 4. Sam et al. noted a significant negative linear relationship between the APACHE II, PMR, and PSS (r = −0.660, r = −0.636, and r = −0.583, respectively; p < 0.001) [8]. They also found significant linear correlations between the clinical outcome and the different scores: APACHE II (r = 0.347, p = 0.003), PMR (r = 0.419, p < 0.001), and PSS (r = 0.557, p < 0.001). They concluded that all four scores were useful in determining the severity of organophosphate poisonings. These studies should be praised for attempting to compare and validate the clinical significance and use of the PSS. While it appears that the PSS performed well compared to other severity scores, two authors explicitly mention that it was more difficult to use than the GCS. As multiple authors found the PSS difficult to use, it makes us wonder if it would be simpler to incorporate a different severity score that performs just as well as, if not better, than the PSS. Even if the PSS performed as well as other scores, physicians are not familiar with it or how to interpret it which limits its utility.

However, not all authors draw similar conclusions. In a study by Peter et al. that was first available online in September 2013 (after the literature search was completed), they compared the APACHE II, Mortality Prediction Model II, SAPS II, and PSS in organophosphate poisoning [49]. They found that the area under the curve (AUC) for mortality was significantly higher for the APACHE II (0.77) and SAPS II (0.77) than for the PSS (0.67). Overall, the authors concluded that the PSS was a poorer discriminator and inferior to the other scoring systems, which is likely an appropriate conclusion.

The PSS includes a large number of variables, many of which are subjective. In comparison, scores such as the APACHE II only include 15 variables derived from basic laboratory values and vital signs, and an additional 16th variable, which is slightly subjective (severe organ system dysfunction or immunocompromised). This may partly explain why scoring systems such as the APACHE II are used more frequently than the PSS. Familiarity could also account for this as the APACHE II can be used with all patients but the PSS is only applicable in poisoned patients. However, the APACHE II was used in studies of poisoned patients instead of the PSS [50, 51]. If future research demonstrates that the PSS or a modified form of it was equal or superior to other severity scores and was easy to use, then it may be worth using clinically or incorporating into more studies. In addition, the introduction of more sophisticated electronic health records and data extraction tools, however, could make the PSS easier to use.

Modification or Misapplication of the Poisoning Severity Score

Nearly half of the included studies (16/40) either modified or misapplied the PSS. There were 10 studies that used the complete PSS but misapplied it [6, 11, 2830, 32, 34, 42, 43, 45]. The most common mistake noted in these studies was not accurately applying a PSS of 4, which is to be used for all deaths. For instance, Jimmink et al. misclassified two patients as a PSS of 3 who died [32]. These studies often scored patients who died as a PSS of 3. This could be due to scoring patients too early.

In nine studies, the PSS was modified [5, 6, 8, 13, 14, 20, 29, 32, 37]. Generally, the modification simplified the PSS. Signs and symptoms were either grouped or, in some cases, were eliminated. For instance, Davies et al. modified the PSS by not including oxygen saturations, ECG findings, or laboratory data such as arterial blood gasses [6]. By modifying the PSS, the authors technically created a new scoring system invalidating the actual PSS. While some authors were exploring new applications of the PSS, this also suggests that the PSS as derived was potentially inadequate. Even if the score was used properly, it would not carry the same prognostic abilities as scores such as the GCS, which would still be a significant limitation. The multiple modifications and misapplications of the PSS may serve as further evidence that researchers do not find the scoring system to be useful and likely further explains its poor utilization.

The Utility of the Poisoning Severity Score

Severity scoring systems are useful for both research and clinical purposes. For research, a severity score assists in making comparisons between groups of patients. Clinically, a severity score could assist in determining prognosis, need for admission, and place of admission (i.e., floor verse ICU). To be useful, a scoring system must be simple to use. Ideally, one should be able to simply place its variables into a smartphone application and derive the score at the patient’s bedside. After completing this review, we found that the PSS was rather complex and, in comparison to other severity scores, was not as easy to use in poisoned patients [52, 53]. As such, we feel that the PSS has limited utility in either clinical use or research.

Scoring systems need to be reliable and have high degrees of inter-rater reliability. Scoring systems that use objective data such as laboratory values or hard numbers such as vital sign measurements and that use simple yes/no questions such as did a seizure occur are more likely to be reproducible. The PSS has multiple subjective terms such as frequent versus infrequent seizures. When more subjective information is included, severity scores will have lower inter-rater reliability. Even well-known and practiced scoring systems have been criticized for this. For instance, the GCS has objective values (eyes open versus closed) but also more subjective findings as well (inappropriate versus confused speech). While it is used in hospitals every day, some studies have shown poor inter-rater reliability [54, 55]. If inter-rater reliability is questioned in commonly used scores, we worry that the PSS would be greatly limited as it is used less often. While it appears that a few toxicologists and physicians do use the PSS for research, albeit not a large number given how infrequently we found it included in studies, it is not clear if they use it clinically. At least anecdotally, it does not appear that most physicians are incorporating the PSS into their clinical practice, which may indicate that it is not very useful or is difficult to use.

The PSS incorporates 12 different categories based on the organ system affected with each category including a large number of variables. In comparison, other scoring systems incorporate far fewer variables and categories or the practitioner only has to enter a known, commonly available laboratory value, making the scoring systems easier to remember and use. Some scoring systems such as the APACHE II are more complex in that they incorporate many variables. However, the practitioner only has to enter a few specific numbers such as the patient’s age, heart rate, or potassium. Scores such as these can easily be placed into a computerized or smartphone application or even abstracted by an electronic health record, which likely increases their utilization and makes them easy to use. Based on the breadth of information incorporated into the PSS, it is not reasonable to remember the complete score. While a computerized application would solve that issue, the application would be long and cumbersome and, therefore, would be unlikely to be used. For researchers, variables used in other scoring systems (e.g., APACHE, SAPS) can easily be derived from the chart. This is not true for the PSS as variables such as vomiting, diarrhea, coughing, and isolated extrasystoles are not always documented in the chart or are not documented in enough detail to be accurately scored. Still, application of advanced data extraction tools to the electronic health record could simplify or complement the overall PSS calculation. In addition, a medical record could be constructed in advance so that recording data related to poisonings was easier.

An underlying question not directly answered in this review but very relevant to this topic, is whether it is even reasonable to expect one scoring system to be both easy to use and to perform well in all poisoned patients. This was one goal of reviewing this literature since if the PSS was found to be a useful severity score, then more physicians or researchers should use it. However, we do not believe this to be the case. While we commend the original authors of the PSS for a very worthy effort, we wonder if their goal is practical. If not, then any score developed exclusively for poisoned patients will be either poorly utilized or commonly modified or misapplied. Medical toxicologists evaluate patients that are poisoned by many substances (e.g., medications, illicit drugs, plants, and animals), each with their own, unique pathophysiology. While patients may develop similar signs and symptoms from many different exposures, their significance changes in the context of the exposure. For instance, the nausea and vomiting that occurs after an overdose of ibuprofen carries very different implications than nausea and vomiting that develops after eating an unknown mushroom. In addition, many poisoned patients present with altered mental status or lethargy, but the significance of this varies greatly depending on the specific exposure. Patients with an isolated benzodiazepine overdose can present with diminished mental status but are unlikely to experience any significant complications or require invasive treatments. However, the significance, prognosis, and treatment implications are very different if the lethargy or altered mental status is due to either large amounts of opioids or from acute salicylism. In comparison if a patient with a head injury presents with a very low GCS, no matter the underlying injury, it can be expected that the patient will have a poor prognosis and require multiple resources. Of course, this is a limitation of using even well-known scoring systems in toxicology patients. While a GCS of 3 in a patient with a head injury is an incredibly poor prognostic sign, a patient with a GCS of 3 after an overdose of benzodiazepines could be expected to do very well. As such, it may be best to either develop a scoring system that is specific to an exposure (e.g., the Snakebite Severity Score) or to use a scoring system that is already known to perform well in sick patients, even if it is not validated in poisoned patients and may not be perfect in all poisonings. Developing exposure-specific scores would likely make the scoring systems more reliable and easier to use. However, they would not be externally valid to other poisonings and would lead to the development of multiple severity scores.

Locations of Studies that Incorporated the PSS

The included studies took place in a variety of locations ranging from Europe to the Middle East to Asia. Noticeably, none of the studies were completed in North America and only two were from South America. It could be argued that this is not surprising as the score was mainly derived in Europe. Therefore, it could be expected that researchers from those areas would be more familiar with it and, therefore, more likely to use it. However, centers from North and South America were involved in the original derivation of the score. In addition, the PSS was published in Clinical Toxicology. Since its initial publication, multiple abstracts presented at the North American Clinical Congress of Toxicology have included the PSS [5560]. In addition, researchers from countries that were not part of the original derivation published studies that incorporated the PSS.

Since its original publication, advances in technology have greatly increased the ease in which information is disseminated and shared. Increased access to high-speed internet and portable internet-ready devices (e.g., smartphones, tablets), in addition to the development of blogs, podcasts, and other social media applications, has made sharing of information easier and improved knowledge translation. As such, it is unlikely that the paucity of studies from the USA is solely due to lack of awareness of the PSS. We believe the omission serves as evidence that many North American medical toxicologists do not perceive the PSS to be useful or find it difficult to use and, therefore, do not include it in their research.

Limitations of the Review

While multiple search terms were used in multiple databases, it is possible that we missed papers that included the PSS. It is possible that we missed other manuscripts that had they been included, may have changed our findings and conclusions, although we believe this to be unlikely. Even if studies were accidentally omitted, we do not believe that would change our conclusions regarding the utility of the score and its ease of use for either clinical or research purposes. While we cannot be sure that we did not miss any additional manuscripts, we believe that, if we did miss other studies, it would be unlikely that the missing studies would only be from North or South American countries. In addition, almost all of the manuscripts written in languages other than English were excluded. Had we been able to translate more of the foreign manuscripts or found additional articles, our conclusions regarding misapplying or modifying the PSS or how it was used in comparison to other severity scores may have changed. However as none of the foreign language articles were from North or South America, their inclusion would have skewed our findings even further regarding the origin of the manuscripts. As such, we believe that the pattern of very few researchers in North or South America using the PSS as compared to Europe is real.

Conclusion

The PSS is infrequently used in North and South America. When used, it is frequently modified or misapplied making it difficult to assess its accuracy or utility. The PSS is not the ideal score to be used for all global toxicologic exposures secondary to the difficulty in using it, poor inter-rater reliability, and its length. Further collaborative research could be directed at developing a severity scoring system that could be applied in all poisoned patients. However, this could suffer the same limitations as the PSS. Another option would be to develop exposure-specific poisoning scores. However, these scores would be limited in their generalizability and likely limit their adaptability.