FormalPara Key Points

While there is a range of tools for the assessment of the appropriateness of prescribing in the literature, the results of this systematic review highlight the need for evidence-based tools that combine the various aspects of medication management in order to optimise health outcomes.

Less than 50% of available tools have been externally validated, limiting their use in clinical practice. It is important to develop tools that are proven to improve common patient-related outcomes such as falls and hospitalisation.

1 Background

Multimorbidity, defined as the co-existence of two or more chronic health conditions, is common in the older population [1]. The complexity of therapeutic management of multimorbidity for both health professionals and patients or carers is well recognised. Multimorbidity is associated with decreased quality of life, self-rated health, mobility and functional ability as well as increases in hospitalisations, physiological distress, use of healthcare resources, mortality and costs [2,3,4]. Clinical guidelines often suggest the use of multiple medications for the management of a single disease, with limited consideration of comorbidities and concurrent medications [5]. As a result, despite following best practice guidelines, patients are frequently prescribed multiple medications, commonly referred to as polypharmacy, which can interact with each other and their comorbid conditions [6].

Polypharmacy is associated with adverse health outcomes including mortality, falls, adverse drug reactions (ADRs), increased length of stay in hospital and readmission to hospital soon after discharge [7,8,9]. A systematic review from 2014 contained 50 studies in the community setting, of which the majority demonstrated relationships between polypharmacy and a range of outcomes including falls, ADRs, hospitalisation, mortality, functional status and cognition [10]. The risk of adverse effects and harm increases with increasing numbers of medications. Harm can result from a multitude of factors including drug–drug interactions and drug–disease interactions in older patients with multimorbidity. Older patients are at even greater risk of adverse effects due to decreased renal and hepatic function, lower lean body mass and reduced hearing, vision, cognition and mobility [11]. Whilst in many instances the use of multiple medicines or polypharmacy may be clinically appropriate, it is important to identify patients with inappropriate polypharmacy that may place patients at increased risk of adverse events and poor health outcomes.

While there is no consensus definition of polypharmacy, the most commonly encountered definition in the literature is the use of five or more medications [12,13,14,15,16]. This numerical definition of polypharmacy is unable to distinguish between appropriate and inappropriate medication use to provide a meaningful clinical evaluation of the risk of harm in everyday practice. It is important to consider the appropriateness of therapy using a holistic approach of considering concurrent medication classes and comorbidities present, to distinguish between appropriate and inappropriate polypharmacy to identify patients at risk of poor health outcomes.

There are prescribing tools and criteria in the literature aimed at facilitating the identification of appropriate and inappropriate medications, facilitating the deprescribing of potentially inappropriate medications and optimising the use of appropriate therapy. Important considerations in practice during medication review and rationalisation include stopping or minimising use of inappropriate medications, starting or optimising the use of appropriate medications, considering dosing of medications, accounting for the impact of renal function on drug clearance and reviewing any drug–drug or drug–disease interactions [17, 18]. There are tools available in the literature that present a scoring system where a rating or score is provided to indicate the degree of polypharmacy or potential for harm, such as the Drug Burden Index (DBI) [19] and the anticholinergic scales such as the Anticholinergic Risk Scale (ARS) [20] and Anticholinergic Drug Scale (ADS) [21]. On the other hand, there are tools that do not provide a score or rating but a list of criteria for appropriate or inappropriate prescribing such as the Beers criteria [22] and Screening Tool of Older Person’s Prescriptions and Screening Tool to Alert doctors to Right Treatment (STOPP START) criteria [23]. Given the myriad of tools and criteria available, a summary of available tools would be useful for clinicians in practice to understand which characteristics of the medication review and rationalisation processes are included in each of these tools and therefore which tools may be relevant to specific patient scenarios in everyday practice. Additionally, it is unclear whether each of these tools and criteria has been validated to be clinically relevant, in terms of their association with common patient-related outcomes such as falls, hospitalisation and mortality. Some tools and criteria have been studied in great detail, such as the DBI, which has been associated with decline in functional and cognition status, falls, hospitalisation and mortality across populations in the US, UK, Australia, Finland, Netherlands, Canada and New Zealand in the community, nursing home and hospital settings [24]. Other tools have been proposed theoretically, without external validation.

The aims of this study were to address these gaps in the existing literature by conducting a systematic review to summarise the range of tools and criteria to assess appropriateness of medication prescribing in the existing literature and for each of the identified tools and criteria, to determine their association with patient-related outcomes (external validation).

2 Methods

2.1 Data Sources and Search Strategy

2.1.1 Systematic Review 1—Tools and Criteria for Assessing Prescribing Appropriateness

MEDLINE, EMBASE and Informit (Health Collection) databases were searched between the years 2000 and 2016 as these databases are most likely to cover medicine-related topics including tools and criteria for assessment of the appropriateness of prescribing. Databases and search terms were selected after consultation with an academic librarian specialising in health-related database searches.

The following search terms (Medical Subject Headings [MeSH] and keywords) were used in EMBASE (relevant MeSH headings were used in MEDLINE):

‘polypharmacy/’ (MeSH) OR ‘multiple medication*’ (keyword) OR ‘multiple medicine*’ (keyword) OR ‘multiple drug*’ (keyword) OR ‘many medication*’ (keyword) OR ‘many medicine*’ (keyword) OR ‘many drug*’ (keyword) OR ‘potentially inappropriate prescribing’ (keyword) OR ‘potentially inappropriate medication’ (keyword) OR ‘potentially inappropriate medicine’ (keyword) OR ‘potentially inappropriate drug’ (keyword) (for all articles exploring polypharmacy and inappropriate prescribing)

AND

‘tool*’ (keyword) OR ‘criter*’ (keyword to include all words related to the word criteria) OR ‘index’ (keyword) OR ‘clinical assessment tool’ (MeSH) (for all articles exploring tools or criteria).

The following search terms were used in Informit (Health Collection) and applied to all fields including the title and abstract:

‘polypharmacy’ OR ‘multiple medication*’ OR ‘multiple medicine*’ OR ‘multiple drug*’ OR ‘many medication*’ OR ‘many medicine*’ OR ‘many drug*’ OR ‘potentially inappropriate prescribing’ OR ‘potentially inappropriate medication*’ OR ‘potentially inappropriate medicine’ OR ‘potentially inappropriate drug*’

AND

‘tool’ OR ‘criter*’ (to include all words related to the word criteria) OR ‘index’ OR ‘assess*’ (to include all words branching from the word assess such as assessment).

Prescribing of medications can be inappropriate regardless of the number of medications. The probability of inappropriate prescribing, however, increases with increasing number of medications, commonly defined as polypharmacy. We therefore included both ‘polypharmacy’ (and terms such as ‘multiple medication*’ and ‘many medication*’) as well as potentially inappropriate prescribing (and terms such as ‘potentially inappropriate medication*’) to capture the range of tools for the assessment of the appropriateness of prescribing in the literature. Given the various definitions of polypharmacy in the literature, the term ‘polypharmacy’ was not limited to a specific definition in order to include all studies referring to tools that assess appropriateness of polypharmacy.

If articles referred to tools or criteria but were not the original research article describing the formulation and development of the tool, the original article describing the tool was found and included in the review.

2.1.2 Systematic Review 2—Shortlisted Tools and Their Association with Patient-Related Outcomes (External Validation)

MEDLINE, EMBASE and Informit (Health Collection) databases were searched between the years 2000 and 2016 as these databases are most likely to cover medicine-related topics including associations of shortlisted prescribing appropriateness tools and patient-related outcomes. Databases and search terms were selected after consultation with an academic librarian specialising in health-related database searches.

The following search terms (MeSH and keywords) were used for each shortlisted tool in EMBASE (relevant MeSH headings were used in MEDLINE):

‘Name of tool or criteria’ (as a keyword, for example, ‘Beers criteria’)

AND

‘outcome*’ (keyword) OR ‘Treatment Outcome/’ (MeSH) OR ‘Outcome Assessment/’ (MeSH) OR ‘validat*’ (keyword to include all words related to the word validate) to find articles exploring each of the shortlisted tools and their associations with patient-related outcomes.

The following search terms were used in Informit (Health Collection) and applied to all fields including the title and abstract:

‘Name of tool’ (for example, ‘Beers criteria’)

AND

‘outcome’ OR ‘validat*’ (to include all words related to the word validate).

Articles were required to explore any association or probability of each of the shortlisted tools to predict outcomes such as hospitalisation, falls, mortality, ADRs, mobility and cognition. Specific outcomes were not used as search terms to prevent limiting to particular outcomes and to include the range of outcomes that have been studied in the literature. No limits were placed on the location (country), setting or age of participants to include studies conducted across various settings and populations. Exclusion criterion was articles that were duplicates.

2.2 Systematic Reviews 1 and 2

Results were limited to studies in English that were already published or in press. Reference lists of relevant articles and the grey literature were screened to identify other relevant articles. The search strategy was developed in consultation with a librarian specialising in health databases, with a pre-determined protocol developed collaboratively with the authors for search methods and selection of relevant articles. The reporting of this systematic review conforms to the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) checklist [25].

2.3 Critical Appraisal (Risk of Bias Assessment)

Once articles were shortlisted for prescribing assessment tools and associations with patient-related outcomes, the Critical Appraisal Skills Programme (CASP) tool was applied to each of those articles for critical appraisal and quality assessment, as shown in Electronic Supplementary Material Appendix S1 [26].

For quality assessment of shortlisted articles for tools and external validation, data items extracted depended on the type of study as guided by the CASP tool. For example, for cohort studies, data items extracted included whether the study addressed a clearly focused issue, if the cohort was recruited in an acceptable way, if the exposure and outcome were accurately measured to minimise bias, if the authors identified and accounted for all important confounders and if the follow up of subjects was complete and for an appropriate duration of time. While the CASP tool presents a list of questions for quality assurance, there are no clear guidelines regarding scoring or grading studies according to answers to those questions. Therefore, the four authors discussed the CASP and decided to score studies out of 8 for cohort and case control studies (a total of 12 questions for cohort studies and 11 questions for case control studies), as answers to questions 7, 8, 11 and 12 (results, precision of results, whether results fit with findings from other studies and implications for practice) in cohort studies and 7, 8 and 11 (results, precision of results and whether results fit with findings of other studies) in case control studies do not affect the quality of the study. It was decided by the authors that a score of 0–4 would be considered as low quality, 5–6 considered as medium quality and 7–8 as high quality for cohort and case control studies graded using the CASP tool. For randomised controlled trials, a score out of 9 was used (total 11 questions), as answers to questions 7 and 8 do not affect the quality of the study. A score of 0–5 would be considered as low quality, 6–7 as medium quality and 8–9 as high quality.

2.4 Study Selection and Data Extraction

Articles were shortlisted according to the inclusion and exclusion criteria. After the initial database search and primary screening of article titles and abstracts, articles were categorised as relevant, irrelevant or unsure. The appropriateness of inclusion of each article classed as relevant or unsure was discussed by all four authors. A pre-defined data extraction template was developed by all authors and then applied to ensure consistent data extraction from each of the identified studies.

Data items extracted for systematic review 1 included whether (1) the tool presented a scoring system for polypharmacy; (2) the Delphi method or expert panel was used in developing the tool; (3) information around stopping inappropriate medications and starting appropriate medications was provided; (4) alternative treatment options were suggested; (5) dosing of medications was considered (this was further divided into whether the tool simply mentioned dosing or whether it was predominantly based on dosing); (6) the impact of renal function on drug clearance was considered; (7) the tool focused on specific drug class(es); (8) the tool considered drug–drug or drug–disease interactions; and (9) the tool was implicit (requiring clinicians to apply the tool to a specific patient scenario) or explicit (criteria-based tool); as these are important considerations for prescribing appropriateness tools and criteria.

Data items extracted for systematic review 2 included the name of the tool or criteria; the outcomes investigated (such as hospitalisation, mortality and decline in cognition); the number of studies showing a positive, negative or no association between the specific tool and patient-related outcome; as well as study characteristics such as location (country), setting and age of participants. Data items extracted for critical appraisal of validation studies have been outlined in Sect. 2.3.

Once the primary data extraction was complete, all authors met and reviewed the content analysis for each of the extracted studies, with data further categorised and summarised in tables.

The PROSPERO registration number for this systematic review (systematic reviews 1 and 2) is CRD42017067233.

3 Results

A total of 1710 articles were identified with 42 prescribing appropriateness tools and criteria meeting the inclusion criteria for systematic review 1 [11, 19,20,21,22,23, 27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62]. Figure 1 shows a flowchart of study selection according to the PRISMA checklist for systematic review 1.

Fig. 1
figure 1

Study selection flowchart according to PRISMA checklist

3.1 Systematic Review 1—Tools and Criteria for Assessing Prescribing Appropriateness

Table 1 presents a summary of various prescribing appropriateness tools in the existing literature and characteristics of each of these tools such as providing guidance around stopping inappropriate medications, starting appropriate medications and drug dosing. Table 2 shows a breakdown of each of the different tools and specific characteristics. Tools were divided into two broad categories: (1) tools with a scoring system where a rating or score is provided (n = 9, 21.4%) [19,20,21, 42, 58,59,60,61,62] and (2) tools that do not provide a score or rating but a list of criteria for appropriate or inappropriate prescribing (n = 33, 78.6%) [11, 22, 23, 27,28,29,30,31,32,33,34,35,36,37,38,39,40,41, 43,44,45,46,47,48,49,50,51,52,53,54,55,56,57]. Tools that provide a scoring or rating system, allowing users to evaluate the level of appropriateness or inappropriateness of prescribing, include the DBI, ADS, ARS and the Medication Appropriateness Index (MAI) [19,20,21, 58]. Out of all tools providing a score, the Anticholinergic Cognitive Burden Scale was the only tool to provide guidance around interpretation of scores [61]. It states the need for clinical intervention at a score ≥ 3. This type of guidance is not available for other tools with a scoring system. These tools with a scoring system do not provide direct guidance around stopping specific inappropriate medications and starting appropriate therapy but quantify the burden of polypharmacy. On the other hand, commonly used tools that do not provide a scoring system do provide criteria for appropriate and/or inappropriate prescribing. For example, while the Beers criteria 2015 does not provide a scoring system, it identifies potentially inappropriate medication (PIM) use in the elderly, interactions of drugs with other drugs, diseases/syndromes, drugs to be used with caution in the elderly, drugs requiring dose adjustment in renal impairment and drugs with strong anticholinergic properties [22]. Additionally, the Beers criteria 2015 state the quality of evidence and strength of recommendations that are important considerations in evidence-based practice.

Table 1 Summary of characteristics of various prescribing appropriateness assessment tools and criteria
Table 2 Breakdown and comparison of various prescribing appropriateness assessment tools in existing literature

Out of the 42 shortlisted tools, 61.9% (n = 26) were explicit or criteria based [22, 23, 27,28,29,30,31, 33,34,35,36,37,38,39,40,41, 44, 49,50,51,52,53,54,55,56,57], with the remaining being implicit tools that require input from clinicians depending on the given patient scenario (n = 16, 38.1%) [11, 19,20,21, 32, 42, 43, 45,46,47,48, 58,59,60,61,62]. All tools with a scoring system were implicit. Additionally, seven tools without a scoring system were implicit, namely the Assess, Review, Minimize, Optimize, Reassess (ARMOR) tool, Tool to Improve Medications in the Elderly via Review (TIMER), Individualized Medication Assessment and Planning (iMAP) tool, Prescribing Optimization Method (POM) tool, Hyperpharmacotherapy Assessment Tool (HAT), Need and indication, Open questions, Tests and monitoring, Evidence and guidelines, Adverse events, Risk reduction or prevention, Simplification and switches (NO TEARS) tool and Tool to Reduce Inappropriate Medications (TRIM).

While deprescribing inappropriate therapy and optimising appropriate therapy are both important aspects of medication management, 33 tools (78.6%) provided guidance around stopping inappropriate medications [11, 22, 23, 27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45, 47,48,49,50,51,52,53,54,55,56,57] and only 12 tools (28.6%) provided guidance around starting appropriate medications [23, 27, 32, 37, 40, 44,45,46, 48, 50, 53, 55]. Out of the 42 tools and criteria, 11 tools (26.2%) suggested safer alternative treatment options to replace inappropriate therapy [28, 31, 36, 37, 39,40,41, 49, 52, 56, 57]. Whilst the dose of a medication is an important consideration in clinical practice, only 64.3% (n = 27) of tools considered drug dosing. The degree to which each tool considered dosing varied significantly between the different tools. Out of the 27 tools that considered dosing, only the DBI was predominantly based on dosing. In the DBI, the exact doses of anticholinergic and sedative medications for a specific patient are considered in order to provide a score [19]. Other tools simply mentioned appropriate dose of one or more selected medications, such as the Beers criteria [22]. Similarly, the impact of renal function on drug clearance is a significant consideration in practice but less than half of the tools (n = 19, 45.2%) took this into consideration.

There are tools that focus on specific drug class(es) (n = 4, 9.5%) and these are predominantly limited to tools with a scoring system. These tools, namely the DBI, ADS, ARS and Anticholinergic Cognitive Burden Scale focus on anticholinergics only (the DBI additionally focuses on sedatives) [19,20,21, 61].

3.2 Systematic Review 2—Shortlisted Tools and Their Association with Patient-Related Outcomes (External Validation)

A summary of associations of shortlisted prescribing tools and criteria with patient-related outcomes (external validation) is shown in Table 3. There were nine patient-related outcomes that were investigated for the different tools: hospitalisation, mortality, falls, cognitive decline, functional decline, ADRs, decline in quality of life, discharging home after hospitalisation and renal failure. Measures used for hospitalisation included admission to hospital, readmission to hospital after 30 days or 12 months, length of stay in hospital and drug-related hospitalisations. Studies measured mortality as death (regardless of setting) or in-hospital mortality. Falls were measured as the occurrence of falls in hospital, occurrence of falls regardless of setting or recurrent falls regardless of setting (two or more falls in the previous year). Studies measured cognitive decline using the Mini Mental State Examination, Abbreviated Mental Test or Short Blessed Test. Functional decline included measurement of physical function using the Barthel Index, Instrumental Activities of Daily Living, the Short Performance Physical Battery, dynamic balance tests using coordinated stability tasks, Frailty Deficit Index, grip weakness and the Cardiovascular Health Study criteria. ADRs were defined as adverse effects of medications for a specified duration of time such as 30 days or no specified duration of therapy. Decline in quality of life was measured using the EuroQol Group 5 dimensions (EQ-5D) and EuroQol Visual Analog Scale (EQ VAS) instruments, which measure health effects on the quality of life of a person.

Table 3 Proportion of studies showing positive association between externally validated prescribing appropriateness assessment tools and patient-related outcomes

Less than half (n = 14, 33.3%) of the 42 shortlisted prescribing tools and criteria have been investigated for association with at least one patient-related outcome, in 53 separate studies conducted across different countries, age groups and settings as shown in Table 4 [20, 63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114]. The majority of studies investigating associations between tools and outcomes used cohort study designs (n = 46) [20, 63, 65, 66, 68,69,70,71,72, 74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89, 93, 94, 96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114], with the remaining studies being randomised controlled trials (n = 2) [67, 92] and case control studies (n = 5) [64, 73, 90, 91, 95].

Table 4 Details of validation studies of prescribing appropriateness tools

Of the 14 tools that were investigated with regard to association with outcomes, 13 (92.9% of all tools explored external validation and 31.0% of all shortlisted tools) were positively associated with one or more patient-related outcome [19,20,21,22,23, 50, 55, 56, 58,59,60,61,62]. None of the studies showed a negative association between the tools and outcomes. Out of the 14 tools that were investigated, the majority were tools with a scoring system (n = 8, 57.1%), for example the DBI and ARS. Validation studies for prescribing assessment tools were most commonly conducted in Europe (n = 18, 34.0%), across Belgium, Finland, France, Germany, Italy, Netherlands, Norway, Austria and Sweden in the community, nursing home and hospital settings in various age groups, with the youngest patients being 60 years or older in a study conducted in Germany to validate the Fit fOR The Aged (FORTA) tool. The remaining validation studies were conducted in the US, Australia, UK and Israel.

Of the 14 tools that explored external validation, the DBI, ARS, Beers criteria (1991, 1997, 2003 and 2012), STOPP START and ADS were studied most extensively. The Beers criteria, STOPP START, ARS and DBI were associated with the highest number of patient-related outcomes (six outcomes for the Beers criteria, STOPP START and ARS and five outcomes for DBI), with all tools having been associated with hospitalisation, mortality, falls and functional decline.

Out of the nine patient-related outcomes that were investigated, nine tools (21.4%) were associated with hospitalisation, making this the most commonly reported outcome. The next most commonly reported outcomes were mortality (n = 7, 16.7%) and functional decline (n = 7, 16.7%). The least commonly reported outcomes were discharge home after hospitalisation (n = 1, 2.4%, namely the Medication Regimen Complexity Index [MRCI]) and renal failure (n = 1, 2.4%, namely the FORTA).

4 Discussion

There are many tools to assess the appropriateness of prescribing in the existing literature, which cover different aspects of the medication review and rationalisation processes. This was the first systematic review to summarise the range of tools and criteria available in the literature and assess the associations of each of these tools with patient-related outcomes.

Out of the 42 shortlisted tools, nine provided a scoring system regarding the degree of polypharmacy and the potential for harm as a result. The remaining 33 tools provided a list of criteria for appropriate or inappropriate prescribing. None of the tools provided both a rating regarding the level of polypharmacy and guidance around stopping inappropriate medications and starting appropriate medications, limiting their clinical applicability for informing clinicians in practice and facilitating the quality use of medicines.

While deprescribing inappropriate therapy and optimising appropriate therapy are both important aspects of medication management, 33 tools (78.6%) provided guidance around stopping inappropriate medications and only 12 tools (28.6%) provided guidance around starting appropriate medications, limiting the clinical relevance of the remaining tools in terms of the medication optimisation process. Often in practice, part of this process involves choosing a safer alternative medication in order to minimise harm and optimise health outcomes. Only 26.2% (n = 11) of tools, however, suggested alternative treatment options. Out of the 42 shortlisted tools, 61.9% (n = 26) were explicit (criteria based), with the remaining being implicit tools that require input from clinicians depending on the given patient scenario (n = 16, 38.1%). Similarly, a systematic review of prescribing appropriateness criteria included 46 tools (excluding tools that focus on specific drug classes), 61% of which were explicit criteria [115]. The study concluded that none of the currently available tools combine the various aspects of inappropriate prescribing, with underprescribing being mentioned by only 13.0% of the 46 tools included, despite the fact that underprescribing is an important aspect of inappropriate prescribing.

Whilst the dose of a medication is an important consideration in clinical practice, only 64.3% (n = 27) of tools considered drug dosing. Only one tool (the DBI) takes into account specific doses for a given patient to provide a score and only considers the doses of anticholinergic and sedative medicines when calculating the score. The remaining tools either simply mentioned appropriate dose of one or more selected medications or did not consider dosing at all, making these tools less clinically relevant, given dosing is an important consideration in assessing safety, efficacy and side effects of medications. Similarly, accounting for the impact of renal function on drug clearance is a significant consideration in practice, especially in multimorbid geriatric patients with compromised kidney function who may be at increased risk of harm from the use of some medication classes and at specific doses. Less than 50% of tools, however, took this into consideration.

Out of the 42 shortlisted tools, 9.5% focused on specific drug classes (anticholinergic agents and/or sedatives only). These are medication classes associated with harm and it is important to consider these when assessing medication management in the geriatric setting. Tools that focus on specific drug classes only are likely to be less relevant in clinical practice, as patients are often prescribed a range of medication classes not limited to specific classes such as anticholinergics and sedatives, making these tools less generalisable across the wider patient population.

Examples of specific tools and their limitations include the MRCI, which is appropriate for evaluating the medication administration burden for patients but does not in any way account for the pharmacology of medications and cumulative risks and side effects of combinations of specific medications or classes of medications [59]. The DBI focuses on anticholinergics and sedatives only but anticholinergics and sedatives are not clearly defined in the original article. Additionally, the mathematical formula for calculating the DBI requires the minimum effective daily dose of the drug, which may not be clearly defined for certain medications and can vary according to recommendations from one country to another [116]. There are tools that were developed for specific populations only, such as the tool by Chang et al. that is limited to the Taiwanese population and the PRISCUS list that is limited to the German population, that have not yet been validated in other healthcare settings [56, 57]. Other criteria such as the Healthcare Effectiveness Data and Information Set (HEDIS) 2016 and the NO TEARS tool include a list of statements such as ‘Adherence to Antipsychotic Medications for Individuals With Schizophrenia’ in the HEDIS, with an attachment containing a list of antipsychotic medications but no guidance around how to assess this, making these criteria less useful for clinicians in practice [43, 50].

Out of the 42 shortlisted tools, external validation has been explored for only 33.3% (n = 14) of the tools, with 31.0% (n = 13) having been validated in terms of at least one patient-related outcome. Out of these 13 tools that have been externally validated, the majority were tools with a scoring system (8/13 = 61.5%), which helps clinicians quantify the risk of harm. A recent study argues that risk stratification is a key component of assessing appropriateness of therapy and therefore having a scoring system can help quantify the burden of polypharmacy [117]. The systematic review by Kaufmann and others in 2014 found only 17.9% of tools (n = 8) to be clinically validated with regard to outcomes in the literature [115]. The current systematic review provides an update to this study and includes various other studies exploring associations with clinical outcomes conducted after 2014 such as the validation of the FORTA in 2016, which was not included in the systematic review conducted in 2014.

The DBI, ARS, Beers criteria (1991, 1997, 2003 and 2012), STOPP START and ADS were studied most extensively. The Beers criteria, STOPP START, ARS and DBI were associated with the highest number of patient-related outcomes (six outcomes for the Beers criteria, STOPP START and ARS and five outcomes for DBI), with all tools having been associated with hospitalisation, mortality, falls and functional decline. This reiterates the importance of using a scoring system to provide clinicians with an assessment of the burden of medications and potential for harm (ARS and DBI) as well as a list of criteria for appropriate or inappropriate prescribing (Beers criteria and STOPP START) to guide medication review, rationalisation and deprescribing. Based on the validation studies regarding patient-related outcomes, the Beers criteria, STOPP START, ARS and DBI may be more clinically relevant compared with the range of other tools and criteria available.

Validation studies for prescribing assessment tools have most commonly been conducted in Europe in the community, nursing home and hospital settings in various age groups with other studies in the US, Australia, UK and Israel. While prescribing appropriateness tools have been validated in different parts of the world, researchers have stated the need for developing internationally validated criteria that account for differences in cost and patterns of medication use in different countries [118].

A recent commentary argued that while there are tools that have been studied and designed in the older population, there is a need for tools that are specifically designed and validated for complex older patients with a number of comorbidities [119]. The commentary states that older patients have compromised organ function such as impaired kidney and liver activity, where simple recommendations for stopping or starting medications is not as useful and more detailed and specific guidance is required, as clinical decision making can be very challenging [119]. Another commentary stated that the usefulness of different criteria is determined once they are validated in different subgroups of the older population and refined according to validation studies [120]. Additionally, a study argued that while there may be criteria around clinical decision making, there appears to be no guidance around prioritisation of these clinical decisions [121]. The challenge in developing new evidence-based prescribing assessment tools lies in ensuring comprehensiveness and clinical relevance while concurrently ensuring these criteria are easy to use and practical for everyday practice [119].

It would be ideal to develop a tool that combines the different characteristics to formulate a holistic tool that provides structured guidance around appropriate and inappropriate medication use in the older population, who are more susceptible to adverse medicine events, as well as providing a scoring system to indicate the burden of polypharmacy and the resultant risk of harm for clinicians in practice. While some tools to assess prescribing appropriateness have been studied extensively with regard to external validation, studies are alluding to the need for validation in interdisciplinary models of care such as pharmacist-led clinics where the clinical relevance of these tools is unclear [117]. Additionally, researchers have identified that patients visiting multiple prescribers can increase the risk of adverse medicine events and prescribing indicators need to consider the number of prescribers in assessing appropriate and inappropriate polypharmacy [117].

A strength of this systematic review is the novelty of summarising the range of prescribing appropriateness assessment tools available in the literature. While previous studies have attempted to explore external validation of a subgroup of polypharmacy assessment tools such as that of anticholinergic scales and the DBI, this systematic review assessed the external validation for the range of tools and criteria available, without limiting which types of tools and criteria were studied [116, 122]. Other studies have excluded subgroups of tools that focus on specific drug classes, whereas this systematic review included the range of tools and criteria in the literature [115]. Additionally, while previous studies have simply explored associations of outcomes with a selected range of prescribing criteria, this systematic review applied the CASP tool to assess and grade the quality of studies exploring each association [116].

Given the significant variability in study designs, methodologies, patient characteristics and settings (such as community dwelling, nursing home and hospital) and resultant heterogeneity, it was not possible to group and analyse outcomes based on study participants and settings, which could have provided clinicians further clarification regarding which outcomes are clinically significant for subgroups of patients (for example, community-dwelling patients compared with those admitted to hospital). Articles in English were included, meaning there may be tools and criteria in other languages that are clinically useful and validated but have not been included in this review. The CASP tool was used for quality assessment of studies. The CASP, however, has limitations, which means that the quality of studies may have potentially been over- or under-estimated by using this tool.

The results of this review highlight the need for evidence-based tools that are internationally recognised, externally validated, easy to use in everyday practice and account for both over and under prescribing [119, 123]. Researchers have suggested the need for tools containing accurate clinical information and that are practical to use [54, 119, 124]. There is a need to develop evidence-based resources and tools that cater specifically to drug use in multimorbid geriatric patients [125] and have been validated externally to elucidate their relevance in clinical practice.

5 Conclusion

There are many tools and criteria available in the existing literature to assess the appropriateness of prescribing, with each tool covering different aspects of medication review and management. There does not appear to be any one tool that combines all these different aspects and that has been validated against key patient-related outcomes such as hospitalisation, mortality and falls, which would be useful for clinicians in practice looking to optimise medication use. Such a tool is needed to aid clinicians who wish to tailor medication management for patients who may be at risk of medication-related harm.