FormalPara Key Points

The role of pharmacists should be to determine the value of pharmacist interventions (PIs) and to target those with the most value.

The majority of tools for assessing the potential significance of PIs focused primarily on assessing clinical aspects and failed to detect other impacts.

We propose optimal pragmatic, psychometric, and theoretical properties for the development of new tools assessing the potential significance of PIs.

1 Introduction

Adverse drug events (ADEs) are a major problem relating to patient safety. They are associated with increased morbidity and mortality, prolonged hospitalizations, and higher costs of care [1, 2]. Nearly half of all ADEs are considered preventable [1]. Therefore, the detection, resolution, and prevention of actual or potential drug-related problems (DRPs) through pharmacist interventions (PIs) are considered key to reducing ADEs [1]. In this article, a DRP is commonly defined as “an event or circumstance involving drug treatment that actually or potentially interferes with the patient experiencing an optimum outcome of medical care” [2], and PIs are defined as “discrete activities by pharmacists related to patient care” [3].

Assessing the significance of a PI is now recognized as essential for demonstrating the added value of pharmacists to the healthcare system and justification for obtaining additional resources in clinical pharmacy practice. This assessment is also used as an indicator of a pharmacist’s performance and continuing quality improvement, research, and education [1].

Through studies in the literature, it is possible to classify approaches to assessing the significance of an individual PI into three main types: Approach 1the evaluation of actual consequences of DRPs (e.g., actual severity of harm); Approach 2—the evaluation of actual consequences after performing a PI and follow-up with the patient (e.g., actual clinical outcomes); or Approach 3—the estimation of potential significance of a PI (Fig. 1). The term ‘actual’ is understood to mean the entity that has appeared in the patient, while the term ‘potential’ refers to a situation in which it is possible that the entity could appear in the patient [4].

Fig. 1
figure 1

Different approaches to evaluating the significance of a pharmacist intervention. DRP drug-related problems, PI pharmacist intervention

According to ‘Approach 1’, the earlier the pharmacist intervenes to prevent harm to the patient, the more significant a PI is likely to be. In fact, harm as a result of DRPs in the patient is rare. For example, Vessal [5] found that about 90 % of the prescription errors resulted in no harm in patients because a great majority of errors were corrected early by pharmacists. Two limitations of this approach are that it offers little guidance to improve the quality of a PI in the future or to reflect the quality of the whole system of patient care rather than only the contribution of a PI [6].

According to ‘Approach 2’, the assessment of actual consequences, commonly clinical outcomes, in the patient after a PI and follow-up of the patient is the only valid indicator of the quality of a PI. It is helpful in the daily decision making of physicians and pharmacists [1]. However, assessment of an actual clinical outcome in patients is associated with some primary difficulties: criteria/technology of follow-up, timeframe, and determination of causal relationships between PIs and health outcomes [710].

According to ‘Approach 3’, the potential significance of PIs may be assessed via two sub-types: Approach 3A—prediction of the potential consequences of DRPs in the absence of a PI; and Approach 3B—prediction of the potential consequences of an implemented PI [11, 12]. The assessment of the potential significance of a PI is associated with metrological problems such as subjectivity, validity, and reliability of predictions. However, this method is frequently used as a means of commenting on the significance and quality of a PI because of its practicability when data are lacking for evaluation of actual consequences and its usefulness in guidance for improving the quality of a PI (e.g., hierarchy of potential significance of a PI and targeting the potentially most significant PIs). Therefore, for this review, we only synthesized tools for assessing the potential significance of PIs—Approach 3.

Methods and tools to assess the significance of PIs are diverse, and their pragmatic, psychometric, and theoretical properties are questionable. The only literature review of tools for rating PIs was reported in 1999 by Overhage and Lukes [12], who noted that only ten of 51 identified articles included an explicit description of the rating tool used. Thus, the authors developed a two-dimensional tool that could characterize a hospital pharmacist’s recommendations based on the severity of the DRP and the value of that intervention. A broad variation of this validated tool has been adopted for characterizing clinical activities in different settings. However, to our knowledge, no other up-to-date literature review has been conducted. Furthermore, since then, with increases in economic constraints, aging, burden of chronic disease, and patient lack of compliance, the quality assessment of PIs is shifting from solely clinical to include economic and humanistic impacts (e.g., patient quality of life, compliance, and satisfaction) [13]. Therefore, the purpose of this systematic review is to summarize the tools available for assessment of the potential significance of a PI and to propose the pragmatic, psychometric, and theoretical properties of ideal tools.

2 Methods

2.1 Research Strategy

We performed a systematic search of the databases MEDLINE (PubMed) (1986–February 2013), PASCAL (1997–February 2013), PsycINFO (1999–February 2013), and CINAHL with full-text (1993–February 2013) to collect studies using tools to assess the potential significance of an individual PI.

We combined two groups of keywords for the following search: drug-related problems AND pharmacist interventions (‘drug related problems’ OR ‘drug therapy problems’ OR ‘medication therapy problems’ OR ‘medication inappropriateness’ OR ‘pharmaceutical care issues’ OR ‘medicine related problems’ OR ‘medication related problems’ OR ‘medication errors’) AND (‘pharmaceutical care’ OR ‘pharmaceutical services’ OR ‘medication order review’ OR ‘medication review’ OR ‘pharmacotherapy interventions’ OR ‘pharmacy interventions’ OR ‘drug utilization review’ OR ‘pharmacist recommendations’ OR ‘pharmacist interventions’).

2.2 Inclusion and Exclusion Criteria

The inclusion criteria were as follows: (1) original articles published in English or French; (2) abstract available; (3) published in peer-reviewed journals; (4) involved pharmacists alone or in cooperation with other healthcare professionals; and (5) included an explicit description of a method for rating the impacts of a PI, called ‘a tool’ in this review.

The exclusion criteria for articles included the following: (1) literature reviews; (2) studies related to one specific type of DRPs/PIs (e.g., administration errors, drug information service); (3) tools only assessing the actual consequences of DRPs [e.g., ADEs/adverse drug reactions (ADRs)]; (4) tools only assessing the actual consequences of a PI; (5) studies assessing economic impact only; and (6) non-accessible articles. In addition, reference lists of articles that met our inclusion criteria, of systematic reviews, and of review articles were assessed and, if relevant, were retrieved; 11 additional articles were also retrieved from a thesis by Quélennec [14], which performed a literature review of tools for evaluation of potential clinical impacts of medication errors (MEs) intercepted through medication conciliation. Finally, a hand-search was conducted to identify articles that had not been captured in the electronic database search.

2.3 Screening and Data Extraction

In February 2013, one author (THV) screened all titles, abstracts, and then full-text articles for the first time. Another author (CC) independently screened with the same strategies. Additional articles retrieved by the second reviewer were added to the final results. The second reviewer also verified the extraction of relevant data from articles included by the first reviewer. We resolved any disagreement through discussion until consensus was reached.

2.3.1 Content of Tools

To identify the indicators used in existing tools, theoretical models that are able to be applied to assess PIs were reviewed. The conceptual models “structure-process-outcome model” by Donabedian [15] suggested that the quality of healthcare interventions be assessed through three types of indicators related to “structural features” (appropriate resources and system design); “process of care” (the method by which healthcare is provided); and “outcome” (the consequence of the healthcare provided). The model provided by Kozma et al. [16] placed outcomes into three categories—Economic, Clinical, and Humanistic Outcomes (ECHO model)—depicting the value of pharmaceutical services. Figure 2 demonstrates the combination of the above two models.

Fig. 2
figure 2

Evaluation model of pharmacist interventions based on the models by Donabedian [15] and Kozma et al. [16]

According to a risk model [17], risks are analyzed by combining the severity of consequences and probability in the context of an existing situation. Risk matrices are used predominantly in safety risk management of MEs, for example, the National Patient Safety Risk Matrix in the UK [17], the Safety Assessment Code Matrix in the USA [18], and the Standard for Risk Management in Australia [19]. An original safety–risk matrix assesses a broad range of risks, including clinical, financial risks, risks related to reputation, business processes, and system, etc. The matrix of clinical risk was simplified to develop some tools assessing the potential significance of a PI [2024].

According to a basic pharmacoeconomic model [25], the value of a PI considers both inputs and outputs of a PI compared with the absence of a PI (Fig. 3). Inputs can be thought of as resources required to implement the PI. Outputs can be thought of as consequences of a PI, in the form of clinical, humanistic, or process-related consequences. The difference between the cost of the original therapy and the new therapy gives the cost savings (or the increase in the cost of therapy). Cost avoidance refers to the prevention of additional health resources that are required to treat ADEs if a pharmacist does not intervene, such as hospitalization or a medical visit. The cost of implementation of a PI refers to the expenses of providing the PI such as pharmacist’s time, phone calls, etc. In some studies [26, 27], the economic value of a PI is estimated through cost savings plus cost avoidance less cost of implementation of a PI.

Fig. 3
figure 3

Economic model for estimation of a pharmacist intervention. PI pharmacist intervention

Regarding the content of tools, after combination of the above four models that are able to be applied to assess IPs, we determined and classified indicators used in existing tools into five main types of indicators: those related to economic, clinical, and humanistic outcomes, and process and probability of the impact.

2.3.2 Structure of Tools

We classify the structure of tools as mono-dimensional or multi-dimensional. One dimension was defined as an independent rating to answer one question related to impacts of a PI. Each dimension was also classified as nominal (two or more categories, but there is no intrinsic ordering to the categories, for example, rating PIs into two categories: technical or clinical problems [28]) or ordinal (there is a clear ordering of the dimension, for example, ordering clinical impacts of PIs into three categories such as minor, moderate, or major significance [29]). Each aspect of impact of a PI (e.g., clinical, economic aspect) was evaluated independently in one dimension or combined within ‘significance’ dimension with other aspects. For example, clinical impact was evaluated independently into six category dimensions (adverse significance, no significance, somewhat significant, very significant, extremely significant), and drug cost saving of a PI was evaluated independently in three category dimensions (drug cost reduction, drug cost increase, no change), respectively, in the tool by Briceland et al. [30]. Conversely, drug cost savings was integrated with clinical impact into a four-category dimension (low, mild, moderate, high significance) in the tool by Williams et al. [31].

2.3.3 Psychometric Parameters of Tools

Regarding the psychometric parameters of tools, validity aims to check whether the tool is measuring what it is supposed to measure; inter-rater reliability measures whether the same results are produced when the same test is applied to the same scenarios by different raters; intra-rater reliability measures whether the same results are produced when the same test is applied to the same scenarios by the same rater on two different occasions [32]. We assessed risk of bias in studies that reported validity and/or reliability results according to the Cochrane Handbook for Systematic Reviews of Interventions [33]. We addressed the following main components: selection bias, performance bias, detection bias, and other biases. We classified each study as having a low, high, or unclear/unknown risk of bias [see the Electronic Supplementary Material (ESM) 1].

2.3.4 Assessment of Quality of Tools

We assessed the quality of each tool used in included studies using the criteria outlined in the ESM 2. One point is awarded when a criterion is clearly satisfied. The sum of scores represents the quality of a tool for assessing the significance of PIs in an included study.

We designed two forms to extract data. The articles were evaluated and summarized by (1) authors, published year, country; (2) structure of tools; (3) approach of assessment; (4) content of tools; (5) notes (see ESM 3); and by (6) setting, number of sample, sampling; (7) qualification and number of raters; (8) rating methods; (9) definitions of consensus; (10) validation; (11) inter-rater reliability; (12) intra-rater reliability, (13) risk of bias, and (14) score of quality of a tool (see ESM 4). For eligible studies, at least two review authors (THV and CC) independently extracted the data using these forms. We resolved discrepancies through discussion until consensus was reached. When information regarding any of the above was unclear, we attempted to contact authors of the original reports to provide further details. We conducted this systematic review according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) guidelines [34].

3 Results

3.1 Studies Identified

A total of 873 articles were retrieved from PubMed (646), PASCAL (96), PsycINFO (33), and CINAHL with full-text (98). Of these, 833 articles were removed because of repetition or irrelevance, and 93 articles were added from reference lists, the review by Quélennec [14], an independent search by the second reviewer, and other sources. Finally, 133 articles [3, 12, 2024, 2830, 35157] were selected for inclusion in the reviewed dataset (see ESM 3, 4). Some studies used a tool or multiple tools that were described in previous studies; therefore, the study comprises only 82 distinct tools in 133 selected articles. Figure 4 presents the systematic review flowchart.

Fig. 4
figure 4

Inclusion and exclusion criteria for the systematic review. ADR adverse drug reaction, DRP drug-related problem, PI pharmacist intervention

Tools were created by research teams in the USA (43 studies), the UK (19 studies), Canada (16 studies), Australia (15 studies), France (seven studies), the Netherlands (five studies), Sweden (four studies), Norway (four studies), Spain (four studies), Germany (three studies), Switzerland (three studies), Belgium (two studies), Denmark (one study), Iran (one study), Israel (one study), Taiwan (one study), Ethiopia (one study), India (one study), Malaysia (one study), and UK and Saudi Arabia (one study).

3.2 Content of Tools

3.2.1 Main Approaches for Assessment of Significance of Pharmacist Interventions (PIs)

Of 82 distinct tools identified, 30 tools assessed the potential consequences of DRPs (Approach 3A in Fig. 1), while 46 tools assessed the potential significance of a PI (Approach 3B). Six tools applied multiple approaches [3, 12, 56, 86, 100, 120]. For example, the tool by Overhage and Lukes [12] assessed both the potential consequences of DRPs (approach 3A) and the potential significance of a PI (approach 3B).

3.2.2 Indicators Used in the Content of Existing Tools

The tools could cover one aspect or a range of aspects of impacts simultaneously. Indicators (not exhaustive) used in existing tools for assessment of potential significance of PIs are summarized in the following (see also ESM 5).

3.2.2.1 Clinical Impact

All tools reported clinical aspects as indispensable when rating the significance of a PI. Ranking the clinical significance of PI was realized by assessing effects of DRPs/PIs on safety (e.g., adverse health consequence [48], toxicity [44, 55], morbidity [21, 29, 86, 106, 113]); effectiveness (e.g., response to medication [87], disease control [53]); and necessity [134] of drug therapy; or characteristics of effects (e.g., short-term/long-term [106], permanent/temporary [23, 105, 113]), etc.

3.2.2.2 Humanistic Impact

Humanistic outcomes, also called patient-reported outcomes, are the consequences of the disease and/or its treatment as expressed by the patient. Humanistic outcomes are now more commonly used in clinical practice [158]. In this review, distinct tools clearly stated some indicators of humanistic outcomes: patient’s knowledge, compliance, patient’s satisfaction, inability to work, and quality of life. Humanistic aspects were often evaluated in combination with clinical aspects as a ‘significance’ dimension and classified as ‘low significance’ [31, 59, 71, 77, 79, 82, 108, 130], while some distinct tools evaluated certain indicators of humanistic impact of a PI independently [55, 87, 111, 120, 150].

3.2.2.3 Economic Impact

Different studies on the economic impact of a PI employ different terminologies, leading to some confusion in the perspective and components of costs, making the comparison of studies difficult. Cost savings and/or cost avoidance were rated independently in some tools [20, 30, 38, 41, 43, 48, 50, 53, 55, 56, 61, 65, 68, 73, 87, 93, 111, 123, 134]. In some studies, independent rating of the economic impact of a PI was used as the first step to determine the monetary value of a PI program [38, 41, 43, 48, 50, 68, 73, 80, 94, 123]. Cost avoidance was estimated through the types of healthcare resources avoided (e.g., readmission [102, 105, 113] or a scheduled visit to the physician [31, 48]), while cost savings were evaluated through costs related to drug therapy [20, 38, 43, 61, 66, 109, 111], drug therapy monitoring [38, 61], treatment cost [29], patient cost [43], or reimbursement [66].

3.2.2.4 Process-Related Impact

Like humanistic impacts, the process-related impacts of a PI were often ignored or reporting was incomplete or ambiguous or only mentioned arbitrarily in some tools. They may be grouped into resolving technical problems [28, 57, 82, 91], informational intervention [31, 38, 53, 57, 71, 75, 82, 94], physician’s satisfaction [120], facilitation of continuity of care [55], teamwork support [82], adherence to evidence-based therapy [104, 135], and others [93].

3.2.2.5 Structure-Related Impact

No structure-related indicators (e.g., a comprehensive inventory, record-keeping amenities such as a computer database, a designated area of the pharmacy, trained pharmacists/technicians [159]) were found in the reviewed tools.

3.2.2.6 Probability

The determination of probability of a consequence for each DRP/PI was used in 20 of 82 distinct tools [3, 2024, 48, 56, 70, 86, 89, 109, 112, 113, 115, 116, 121, 132, 135, 138]. The definitions of each level of probability were based on concrete terms with or without a range of numeric probabilities or a Likert score. The number of levels ranged from 2 to 11. Evaluation of the probability of a consequence of a DRP was useful to evaluate the confidence of judgment [70, 116, 135]; classify the risk of an adverse heath consequence by combining the severity and the probability of occurrence [2024]; and/or clarify the estimation of cost avoidance of a PI by combining the type of healthcare resources required to treat an adverse health consequence and its probability [48, 56].

3.3 Structure of Tools

The tools were multi-dimensional (one dimension with 2–20 categories, 39/82) or mono-dimensional (2–9 dimensions, 43/82), ordinal or nominal (see ESM 3). The majority were presented as classification systems with associated definitions, but other tools were based on visual analog scales [69, 132] or ordinal Likert scales [127].

3.4 Validation Process

The validation process was heterogeneous in terms of qualification and number of raters, rating methods, determination of psychometric parameters, etc. (see ESM 4).

3.4.1 Raters and Rating Methods

The profile of raters differed—internal or external, blinded or not, junior or senior, generalists or specialists—and they had various qualifications (e.g., pharmacist, physician, nurse, or pharmacologist). Rating methods varied—some studies were simply based on a single professional’s view (individual-based rating), while others used an inter-disciplinary group (group-based rating) with up to 30 raters and up to five different specialties.

There were a few instances in which a clear definition was presented outlining precisely what constituted consensus. For example, asking a panel of experts to independently judge an event and then combining their opinions using various mathematical approaches (e.g., mode [38, 39, 41, 56, 81, 100, 101, 119], median [24, 100, 130], mean [39, 41, 53, 56, 60, 69, 81, 83, 89, 100, 122, 136], sum [59]). Alternatively, a conservative approach was used taking the lower category of significance [138, 139] or an hierarchical approach in which a more senior expert was consulted when there was a disagreement among the clinical panel [37, 48, 49, 54, 55, 65, 68, 77, 91, 99, 103, 108, 113, 116, 124, 125, 128, 144, 151153, 156]. In most studies, the consensus may have been arbitrarily determined; in other words, it was defined simply as a consensus-based approach (reached through discussion) [3, 22, 37, 38, 43, 44, 46, 48, 49, 54, 62, 68, 72, 77, 80, 82, 84, 91, 92, 97, 103, 104, 107, 108, 110, 113, 115118, 120, 121, 123, 124, 132, 134, 135, 150, 152, 153, 160, 161].

3.4.2 Psychometric Parameters of Tools

Validity was only reported in eight studies (8/133, 6 %) [23, 45, 61, 69, 83, 106, 127, 131]. These explored face validity [127] or criteria-based validity (the results of coding by raters were compared with known outcomes in the literature [69, 83] or evidence in patients’ medical records [61] or with those of other skilled people or the consensus of an expert panel [23, 45, 106, 131]. Dean and Barber [69] and Taxis et al. [83] found a clear relationship between potential harm as assessed using their tools and actual harm. Eadon [45] found no significant difference between a pharmacist’s scores and those of three physicians (Mann–Whitney U test, U = 933.5, z = 0.034). Elliott and Woodward [23] found 93–100 % agreement between two pharmacists and one geriatrician, while Knez at al. [131] found 46 % agreement between a panel of three pharmacists and a physician. In three studies [61, 106, 127], descriptive information was given but no statistical information presented.

Measures of inter- and intra-rater reliability were established in 49 studies (36.8 %) (see ESM 4). High inter-rater reliability was found in 24 studies: Lesar et al. [63], Rupp [48], Overhage and Lakes [12], Caleo et al. [56], Lewinski et al. [24], Gleason et al. [128], Kwan et al. [110], Wong et al. [118], Chua et al. [146], Midlov et al. [115], Pippins et al. [116], Granas et al. [120], Lee et al. [132] with κ ≥ 0.7; Chedru et al. [59] with sigma x, y ≥ 0.7; Goarin at al. [129] with t test p < 0.05; Hawkey et al. [20] with Spearman’s rank correlation p < 0.05; Bayliff and Einarson [41], Strong and Tsang [50], and Virani and Crown [87] with coefficient of agreement ≥0.7; Khalili et al. [151], Hick et al. [81], and Bobb et al. [88] with agreement ≥80 %; Gisev et al. [127] with W ≥ 0.3; and Coffey et al. [119] with AC1 = 0.69, p < 0.01. Intra-rater reliability was only reported in two studies (1.5 %), with poor agreement in the study by Cousins et al. [61] and good agreement in a study by Dean and Barber [69].

While many studies showed that reliability was not affected by the profession of the rater [45, 69, 102, 124, 129], others found that physicians rated DRPs/PIs with lower severity/value than did pharmacists [12, 23, 38, 98]; or conversely, pharmacists tended to score PIs as being less clinically significant than physicians [53, 79]. A study by Lee and McPherson [100] found that ratings were more consistent between pharmacists than between physicians and pharmacists. However, even within the same profession, reliability was difficult to obtain. Fernández-Llamazares et al. [149] demonstrated that senior pharmacists rated more consistently than junior pharmacists.

3.5 Assessment of Quality of Tools

The scores of quality of tools for assessing significance of PIs in 133 included studies were presented in Table 1.

Table 1 Scores of quality of tools for assessing the significance of pharmacist interventions in the 133 studies included

4 Discussion

4.1 Limitations of this Review

It was difficult to identify all tools in the literature. We retrieved only four available databases. Tools were sometimes mentioned but not described in detail [162]. Tools only assessing the actual consequences of DRPs (Approach 1 in Fig. 1) or the actual consequences of a PI (Approach 2 in Fig. 1) were not used for this review because these cover different concepts. We used the outcome terminology proposed by Holdford and Smith [13]. However, the identification of classifications of indicators mentioned in existing tools was complicated because of the different terminologies used by authors and institutions. For example, determining whether a tool evaluated humanistic impacts of a PI was difficult for the following reasons (1) not all indicators of humanistic outcomes are theoretically well defined; (2) in some tools, the terminology of humanistic indicators is confusing; and (3) the complex relationships between humanistic, clinical, and economic outcomes. An assessment of the significance of PIs is key to justifying value of pharmacy services. However, methods differ between studies, which hinders their review and synthesis. Our review is a first attempt to (1) distinguish different approaches used to assess the significance of PIs, (2) evaluate the quality of tools based on theoretical models, and (3) discuss the strengths and weaknesses of existing tools and validation process. We suggest recommendations for an optimal method of evaluation of the significance of PIs.

4.2 Content of tools

The principal indicators of the impact of a PI concern the process; the clinical, humanistic, and economic outcomes; and probability. These indicators are inconsistently mentioned in tools. Some tools cover many indicators, but a comprehensive tool is not available. One reason for this may be that few tools were constructed based on theoretical models, a systematic literature review, and input from healthcare professionals.

Pharmacy practitioners and pharmacy managers need to demonstrate that for each PI the benefits outweigh the costs for a given patient, healthcare system, and society. According to the economic model, the cost of implementing a PI, cost savings, and cost avoidance should be evaluated. Tools should be constructed so as to capture the potential significance of a PI with an estimation of its economic impact (e.g., using the Williams et al. [31] tool, the potential significance had a fairly good correlation with the economic value) and is the first step to conducting a more sophisticated economic evaluation [38, 48, 73, 77, 93, 123].

Most tools focus on patient outcomes. However, PIs are also useful for the health practitioner. Tools therefore should reflect the possible impacts on both. In order to assign a probability for a potential consequence, it is ideal to know how often it has been described in the literature as well as how often it occurs at the local healthcare facility. However, in most cases, the determination of this probability was difficult to estimate. This is primarily because such probabilities are rarely available in the literature and can vary based on patient risk, co-morbidities, or other factors [138]. Generally, in order to improve the consistency of judgment of probability between raters, studies only select and code the most likely harm prevented [19, 23, 48, 56] and request the opinions of staff most familiar with these events. A multi-dimensional matrix of risk that considers many aspects of impacts and the probability of each aspect, such as the matrix developed by National Patient Safety Agency [17] could be used as a framework to construct a new tool for assessing PIs.

Assessing the potential significance of a PI is primarily based on the potential severity of consequences of DRPs that might have occurred if a pharmacist had not intervened. It makes sense to use the same definitions, terminology, and grading systems for both the potential significance of a PI and the actual severity of consequence of MEs, ADEs, or ADRs [19, 93, 163]. Indeed, the National Coordinating Council for Medication Error Reporting and Prevention (NCC MERP) Index [164] has been used to design new tools to assess PIs [84, 86, 88, 128]. Furthermore, most tools use a variety of similar terminologies without precise definitions, which risks inconsistent rating.

4.3 Structure of Tools

One can argue that a tool for evaluating impacts of a PI should be as simple as possible. However, a simple tool can hardly detect all possible impacts of PIs and would not provide enough information for practice and research. Therefore, a well-structured tool should provide the main dimensions and the main levels. A stepwise instruction should be developed to guide the use of tools in practice, so results of different studies can be compared.

An ordinal tool is preferred to prioritize the most significant PIs. Half of the tools were mono-dimensional and often concentrated on clinical impacts of PIs, failing to detect other impacts. Multi-dimensional tools and the independent evaluation of different impacts of a PI improve the sensitivity and flexibility of evaluation methods. For example, the tool by Lindblad et al. [111] separates the evaluation of economic impacts (cost savings) and clinical impacts, thereby facilitating the estimation of cost savings by the whole PI program. The number-based levels facilitate interpretation of results.

Although many studies used multi-dimensional tools, the results of each dimension were interpreted separately. Only Lindblad et al. [111] used the method of simultaneous interpretation of mean impacts of many dimensions for all PI. For all interventions, this study found a mean of 1.4 clinical, 0.8 humanistic, and 0.1 economic outcomes. This method of interpretation of results gives the added value of the whole PI program rather than the individual PI. There is no method for determining these multi-dimensional impacts of each PI.

Many authors adapted existing tools in the literature to their study. In the ESM 3 and 4, we grouped studies into sub-groups that used the same or a slightly modified tool. The most commonly adapted tools used in other studies include the following: Folli et al. in 1987 [36] (eight studies), Hatoum et al. in 1988 [38] (26 studies), Lesar et al. in 1990 [42] (four studies), Western Australian Clinical Pharmacists Group in 1991 [44] (three studies), Rupp in 1992 [48] (three studies), Chedru and Juste in 1997 [59] (five studies), Alderman in 1997 [29] (three studies), Overhage and Lukes in 1999 [12] (11 studies), Dean and Barber in 1999 [69] (six studies), Hawksworth et al. in 1999 [70] (three studies), NCC MERP Index in 2001 [164] (five studies), Society of Hospital Pharmacists of Australia guideline in 2005 [19] (four studies), Cornish et al. in 2005 [22] (five studies), and Blix et al. in 2006 [97] (three studies). The advantages of using existing structured measures include that they have already been validated and their reliability confirmed, and using measures that have been applied by others allows comparison between studies. However, limitations include difficulties in finding a suitable tool for local use, and that reproducibility of the reliability of a specific tool is not always obvious. For example, the tool by Overhage and Lukes [12] showed high inter-rater reliability in their study, but when adapted by Bosma et al. [98], Lee and McPherson [100], Fernández-Llamazares et al. [149], and Somers et al. [157] exhibited low inter-rater reliability.

4.4 Validation Process

The criteria-based validity of any method measuring the potential significance of a PI is difficult to assess because there is no generally accepted standard with which to compare [12]. The comparison of the scores given to MEs with known outcomes has limitations because errors resulting in more severe outcomes may be more likely to be reported in the literature [69]. Nonetheless, the comparison of the individual scores with the consensus results of a group of experts has other limitations. The existence of a consensus does not mean that the ‘correct’ answer has been found [165]. The consensus method is just a means of identifying current medical opinion and areas of disagreement. It recommends that the results should, when possible, be matched to other data in the literature [102], to the actual outcomes in the patient after follow-up [61], to observable events [165], or to other systems of reporting such as MEs and ADEs [89].

Measuring the inter- and intra-rater reliability of methods for assessment of impacts of PIs is a scientific and practical requirement. Indeed, this information not only provides useful data about the reliability of a subjective assessment but can also be used for teaching, peer review, and audit purposes [65, 149]. However, this measure has not been established for all tools. It is not possible to directly compare the reliability of tools as they used different methods to assess reliability.

Like the actual severity ratings of ADEs [166168] or MEs [169, 170], literature shows many inter-rater and intra-rater inconsistencies within and between healthcare professional groups. Such inconsistencies can be partly attributed to lack of clarity in the tools and scenarios used for validation, shortage of time for proper case reading and coding, and different assessor viewpoints.

The inconsistency of coding between raters prevents individual evaluation. Many studies used an expert panel; however, no strict criteria govern the selection of experts. With regard to medical research, Jones and Hunter [165] defined the term ‘expert’ to be “clinicians practicing in the field under consideration”. According to this definition, suitable experts for studies such as those proposed in this paper include pharmacists and medical practitioners. It has been recommended that experts should be selected based on their appropriateness for the study in terms of experience in the therapeutic area, reputation, geographic representation, practice type and specialty, heterogeneity in treatment patterns, and willingness to participate in the study [11, 171]. Wright et al. [172] demonstrated that community pharmacists, hospital pharmacists, general practitioners, and specialist physicians attribute significantly different values when undertaking these assessments.

4.5 Properties of Ideal Tools for Assessing the Potential Significance of a PI

Currently, there are no formal guidelines or standardization of methodology concerning methods of assessing the potential significance of PIs. Given the results of this review, we suggest some desirable pragmatic psychometric and theoretical properties, as follows.

4.5.1 Theoretical Properties

  1. 1.

    Tools should be developed based on (1) comprehensive theoretical models, (2) a systematic literature review of available evidence that reflects the whole range of impacts of a PI, (3) an evaluation of existing tools, and (4) input from healthcare professionals.

  2. 2.

    Tools should be able to demonstrate that the benefits outweigh the costs in a given patient, healthcare system, and society at the level of each PI.

  3. 3.

    An evaluation from a multi-impact perspective, rather than simply focusing on clinical impact, should be used to enhance understanding of the comprehensive effect of PIs. For example, a tool integrating clinical, humanistic, economic, and process-related impacts and the probability of these impacts.

  4. 4.

    The views of patients, healthcare providers, institutions, payers, and society should be considered.

4.5.2 Psychometric Properties

  1. 1.

    Tools should be validated prior to use.

  2. 2.

    Along with information on the clinical case, experts should be provided with a literature review, coding instructions, and examples. Indices for agreement/validity/reliability should be conform to the current guidelines [173].

  3. 3.

    The guideline proposed for the use of experts in pharmacoeconomic studies [174] is suitable for this type of study: description of consensus techniques (e.g., Delphi process, Nominal Group Technique, expert panels); justification in using such methods; description of selection of experts; provision of a definition of consensus in advance of the execution of a study; information provided to panelists in advance must be as objective and as comprehensive as possible; and modification of the tool as appropriate, with input from independent experts or a pilot test; appropriate presentation and interpretation of findings.

4.5.3 Pragmatic Properties

  1. 1.

    Tools must be brief, not time-consuming, and acceptable to evaluators.

  2. 2.

    Tools should be well-defined.

  3. 3.

    Tools must be well-structured and flexible enough to adapt to meet specific needs (e.g., multi-dimensional tool, possibility of modification of terminology of economic impact is based on different perspectives or modification of number of levels; independence between dimensions).

  4. 4.

    Tools should have an open, numeric, and hierarchical structure (with main dimensions, main levels of each dimensions, and an open structure to include the option ‘non-determinable’).

  5. 5.

    Same definitions, terminology, and grading systems for both the potential significance of a PI and the actual severity of consequence of MEs/ADEs/ADRs.

4.6 Assessment of Quality of Tools

Researchers and clinicians may have different needs in relation to a tool for assessing the potential significance of PIs. Due to the wide range of tools used in the literature, researchers need to consider developing a basis of comparison between tools. Therefore, we tried to assess the quality of each tool in included studies using ten criteria to assist in comparing tools across studies (see ESM 4). According to these criteria, the tools with the highest scores were those by Caleo et al. [56] and Hick et al. [81] (seven scores), Eadon [45], Overhage and Lukes [12], Kopp et al. [109], Virani and Crown [87], Lee and McPherson [100], and Lewinski et al. [24] (six scores). No tool could be found that met all of our above criteria. It appears that further research in this field is necessary.

5 Conclusion

Various structures and contents of tools for the evaluation of impacts of PIs were highlighted, as well as suggestions for an optimal evaluation method. The majority of tools focused primarily on assessing clinical aspects and failed to detect other impacts. Our summary was hindered by variations in tools and assessment processes. Limited and varied validity and reliability brought into question the level of evidence for the evaluation of the potential significance of PIs for justification of added value of PIs. The development of tools with optimal theoretical, pragmatic, and psychometric properties and their integration into pharmacist’s daily practice through rational assessment processes (e.g., peer review) and standardized documentation systems (e.g., information technology tools) are needed.