Detecting Signals of Disproportionate Reporting from Singapore’s Spontaneous Adverse Event Reporting System: An Application of the Sequential Probability Ratio Test

Chan, Cheng Leng; Rudrappa, Sowmya; Ang, Pei San; Li, Shu Chuen; Evans, Stephen J. W.

doi:10.1007/s40264-017-0531-4

Detecting Signals of Disproportionate Reporting from Singapore’s Spontaneous Adverse Event Reporting System: An Application of the Sequential Probability Ratio Test

Original Research Article
Published: 28 April 2017

Volume 40, pages 703–713, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Drug Safety Aims and scope Submit manuscript

Detecting Signals of Disproportionate Reporting from Singapore’s Spontaneous Adverse Event Reporting System: An Application of the Sequential Probability Ratio Test

Download PDF

Cheng Leng Chan^1,2,
Sowmya Rudrappa^1,3,
Pei San Ang¹,
Shu Chuen Li² &
…
Stephen J. W. Evans⁴

438 Accesses
6 Citations
Explore all metrics

Abstract

Introduction

The ability to detect safety concerns from spontaneous adverse drug reaction reports in a timely and efficient manner remains important in public health.

Objective

This paper explores the behaviour of the Sequential Probability Ratio Test (SPRT) and ability to detect signals of disproportionate reporting (SDRs) in the Singapore context.

Methods

We used SPRT with a combination of two hypothesised relative risks (hRRs) of 2 and 4.1 to detect signals of both common and rare adverse events in our small database. We compared SPRT with other methods in terms of number of signals detected and whether labelled adverse drug reactions were detected or the reaction terms were considered serious. The other methods used were reporting odds ratio (ROR), Bayesian Confidence Propagation Neural Network (BCPNN) and Gamma Poisson Shrinker (GPS).

Results

The SPRT produced 2187 signals in common with all methods, 268 unique signals, and 70 signals in common with at least one other method, and did not produce signals in 178 cases where two other methods detected them, and there were 403 signals unique to one of the other methods. In terms of sensitivity, ROR performed better than other methods, but the SPRT method found more new signals. The performances of the methods were similar for negative predictive value and specificity.

Conclusions

Using a combination of hRRs for SPRT could be a useful screening tool for regulatory agencies, and more detailed investigation of the medical utility of the system is merited.

Evaluating performance of electronic healthcare records and spontaneous reporting data in drug safety signal detection

Article 09 December 2014

Quantitative Methods for the Identification of Signals for Individual Case Safety Reports in India

Article 01 November 2015

Comparison of Statistical Signal Detection Methods Within and Across Spontaneous Reporting Databases

Article 22 April 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

FormalPara Key Points

Sequential Probability Ratio Tests (SPRTs) may have a role in detecting signals from spontaneous reports of suspected adverse drug reactions. They have somewhat different properties to other commonly used statistical methods for that purpose.
Using a combination of variable hypothesised relative risks (hRRs) allows for the detection of different types of adverse events (AEs). For drug–AE pairs that are rare with low expected counts, we need to apply a higher hRR for the SPRT method to pick up signals of disproportionate reporting of concern. On the other hand, a lower hRR will be useful for the more common drug–AE pairs.

1 Introduction

Post-marketing surveillance of drugs and vaccines is important to minimise risks with marketed drugs. In practice, spontaneous suspected adverse drug reaction (ADR) reporting remains the main source of information for regulators in this monitoring [1]. Pharmacovigilance distinguishes true ADRs from ‘adverse events’ (AEs) that are not caused by the drug in question. A signal is a potential safety concern that a drug may be associated with a previously unrecognised hazard requiring further investigation [2].

Since 1993, the Vigilance and Compliance Branch of the Singapore Health Sciences Authority (HSA) receives spontaneous local ADR reports from healthcare professionals (e.g. 83.2% from clinicians, 12.2% from pharmacists, 2.7% from other healthcare professionals and 1.9% from pharmaceutical companies) via facsimile, mail or online. Its Spontaneous Reports System (SRS) database has, in recent years, also received reports from the public healthcare institutions in real time via the Critical Medical Information Store (CMIS), and as a result, there has been a major increase of 40-fold in the number of ADR reports received regularly [3]. The numbers of reports received annually is now about 20,000.

Statistical ‘data mining’ methods emerged in the late 1990s to complement the traditional manual review, and these are commonly called ‘disproportionality analysis’ [4, 5]. Some examples of frequentist statistical methods are the proportional reporting ratio (PRR) and reporting odds ratio (ROR) [6,7,8]. Examples of Bayesian methods are the Bayesian Confidence Propagation Neural Network (BCPNN) [7, 9], Gamma Poisson Shrinker (GPS) [10] and Multi-item Gamma Poisson Shrinker (MGPS). The MGPS is now used by the US Food and Drug Administration (FDA) and the UK Medicines and Healthcare products Regulatory Agency (MHRA) [11, 12]. All these methods are based around the ratio of observed-to-expected counts of reports to obtain signals, and many studies have shown that no single signal detection algorithm (SDA) provides uniformly better performance [13]. The commonly used methods do not allow for multiple looks at the accumulating data over time, which can result in large numbers of false positive findings [12].

One method, the Sequential Probability Ratio Test (SPRT), has less concern associated with multiple testing over time; it is specifically designed to make allowance for multiple looks at data over time.

SPRT was developed by Wald in the 1940s [14, 15] and has mainly been used in process monitoring. A review of the literature showed that the SPRT may offer advantages over the other methods to overcome multiple testing problems [16,17,18]. It has been used in the context of scanning electronic health records, but has not had extensive evaluation for spontaneous reports.

Specifically, SPRT compares two hypotheses based on the likelihood of observing the data given those hypotheses [17, 19]. Unlike the other methods, SPRT is based on the difference between (rather than the ratio of) the observed and expected values. However, there is limited research conducted on SPRT to evaluate its performance in an SRS database similar to Singapore’s, and a previous evaluation of SPRT used a single alternative hypothesis on simulated data [20]. On theoretical grounds, at least, the methods that do not allow for this form of multiple testing over time will have a higher rate of false positive findings—false signals.

This paper explores how SPRT behaves and reviews its utility and applicability to pick up signals of disproportionate reporting (SDRs) for potential drug safety signals in the Singapore context. We also compare the performance of SPRT with three other SDA, namely ROR, BCPNN and GPS.

2 Methods

The spontaneous reporting database in Singapore has been described by Ang et al. [21]. In the database, each valid report has at least one product and at least one suspected ADR term included. Products are coded using standardised drug names, and adverse reaction terms are coded using the World Health Organization (WHO) Adverse Reaction Terminology (WHO-ART) (version 151) [22].

The SPRT method requires that specific hypotheses regarding a relative risk to be detected are set out, and in the context of signals of ADRs, some arise from relatively frequently occurring AEs where small relative risks are nevertheless potentially important, while others are from rare events where only higher relative risks can be detected. The details of the SPRT method are described in the “Appendix”, together with a brief description of the other methods used here, including the criteria that determine whether the counts for particular drug–AE pairs constitute a signal.

We analysed signals using all data from 1993 to 2013, and in additional analyses, also reviewed the data as they were up to 2011 and examined the new signals that arose in 2012 and 2013, mirroring what is done in practice with accumulating data. We also classified the drug–AE pairs in terms of seriousness of AE and whether the AE was labelled for that drug or not.

We evaluated the methods, as most other comparisons have done, using sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV), considering factors such as whether the pair is a significant SDR and whether the drug–AE pairs are labelled. The drug–AE pairs were reviewed by a senior pharmacist and considered as labelled if they were mentioned in the Micromedex^® [23] or current regulatory agency-approved drug labels in Singapore. It could be exact word-to-word matching or synonymously labelled (same meaning). The AE terms were considered as serious if they exist in the WHO critical terms list or were considered medically significant suspected serious ADRs by the US FDA or in the Important Medical Event Terms (IME) list developed by the EudraVigilance Expert Working Group [22, 24, 25]. All the analyses were performed using R software, version 3.3.1 [26], including a signal detection package PhViD [27].

3 Results

The HSA received a total of 151,180 AE reports from 1993 to 2013; these reports involved 23,183 unique drug–AE pairs. There were 1569 different suspected drug substances and 1014 different AE terms. If every drug had reports for every possible AE, then all possible combinations of drug (1569 drugs) with AE term (1014 terms) would mean that there were potentially 1,590,966 possible pairs (1569 × 1014). Of those possible combinations, only 23,183 (1.5%) unique pairs actually occurred. The counts in each of these 23,183 cells had each of the methods applied to them.

The effect of the two hypothesised relative risks (hRRs) used for detecting signals using SPRT with hRR = 2 or hRR = 4.1 is shown in Table 1, where we show the distribution of observed and expected counts where signals are detected, giving medians and 5th and 95th centiles. We analysed the SDRs in terms of seriousness of AE and whether the AE is labelled or not; for the period from 1993 to 2013, a total of 137 unique serious and non-labelled drug–AE pairs were signalled by SPRT. Of the 137 drug–AE pairs, 88 drug–AE pairs were not picked up by hRR = 2, as the number of observed counts was less than five.

Table 1 Comparing SPRT with hRR = 2 and hRR = 4.1 in terms of seriousness of AE, whether the AE is labelled or not labelled, observed and expected values

Full size table

The ROR and SPRT methods detected more SDR compared to BCPNN and GPS. Figure 1 is the Venn diagram that shows the overlap of significant pairs detected by each method for the complete data from 1993 to 2013. The SDRs detected by the BCPNN method are a subset of the other methods. It is clear from this that the large majority (70%) of signals were detected by all methods (2187/3106). SPRT detected 268 signals (N < 3) that were not detected by other methods, while ROR detected 400 signals that were not detected by the other methods.

Comparisons were done by reviewing the numbers of new signals based on drug–AE combinations that had some reports in the database prior to that year but were not signals previously, and totally new signals, where the combination had reports for the first time in the relevant period. Table 2 gives the number of new significant SDRs for each method for different quarters from 2012 to 2013. In general, SPRT tends to generate a higher percentage of new significant pairs compared to the other methods.

Table 2 Breakdown of number of SDRs generated by each method according to different quarters of years

Full size table

To evaluate which methods performed better than the others, we reviewed the significant SDRs in terms of seriousness of AE, whether the AE is labelled or not labelled, PPV, NPV, sensitivity and specificity. In this analysis, the SPRT method detected more not labelled drug–AE pairs. In terms of PPV, ROR, BCPNN and GPS performed better than SPRT. In terms of sensitivity, ROR performed better than other methods. The performances of the methods were similar for NPV and specificity (see Table 3).

Table 3 Comparisons of methods in terms in terms of seriousness of AE, whether the AE is labelled or not labelled, PPV, NPV, sensitivity and specificity (1993–2013)

Full size table

4 Discussion

We have shown that the SPRT method has some different properties to the other methods and that it can be ‘tuned’ to detect signals for rare events as well as more frequent ones. The fact that the hRR has to be pre-specified, while apparently a disadvantage, can be utilised to obtain signals in different circumstances. This method may be suitable for databases with smaller total numbers of reports and where a signal would be detected even with smaller numbers, compared with databases containing many millions of reports, such as those of the FDA and the EU.

More work may need to be done to investigate its use in practice and whether it should be an additional or an alternative method for use in the context of smaller databases. There is no general ‘gold standard’ to define which of the drug–AE pairs are really true ADRs and which are not. Methods like ROR and GPS that have been used in the past may have led to labelling, but it is not certain that all such associations are true ADRs. Individual regulatory authorities may need to examine the characteristics of the signals detected and not detected by the different methods in their own data.

Singapore, although having a high reporting rate based on number of AE reports received per million inhabitants, is a small country and, therefore, its total number of reports is not that high. In this situation, there are drug–AE pairs of interest with small numbers of reports, and detecting SDRs using SPRT only with a small value of hRR will be problematic.

For more rare events, it could be useful to adopt a higher hRR for early signalling purposes. For example, dabigatran (anticoagulant) and cerebral infarction was signalled earlier by hRR = 4.1 when the number of observed counts was two. Using hRR = 2 only gave a signal 9 months later when the number of observed counts reached five. ROR, BCPNN and GPS signalled it 3 months later than SPRT. Diltiazem (antihypertensive/anti-anginal) and vestibular disorders was signalled with hRR = 4.1 in 2011, but not by hRR = 2, ROR, BCPNN or GPS. Diltiazem and vestibular disorders is not included in the product label, but studies have shown migraine-related dizziness or vertigo have been reported in 7% of patients [28]. Letrozole (anticancer) and epidermal necrolysis was signalled with hRR = 4.1, ROR, BCPNN and GPS in 2011, but not by hRR = 2. Letrozole and epidermal necrolysis is included in the product labels as either uncommon or rare. Vancomycin (antibiotic) and acute generalised exanthematous pustulosis was signalled using hRR = 4.1, ROR and GPS in 2011, and 6 months later by BCPNN. These findings suggest that SPRT could have a useful role, but it is not clearly superior to other methods.

There are some signals probably resulting from confounding by indication, such as clozapine (antipsychotic) and neurosis, dapsone-pyrimethamine (combination of antibiotic and antimalarial) and infection, dasatinib (anticancer) and malignant neoplasm, hepatitis B immunoglobulin and viral hepatitis, pentamidine (antimicrobial) and pneumonia, and rivaroxaban (anticoagulant) and melaena. They may also be markers of the drug being ineffective, but deciding which is true is difficult, if not impossible, from spontaneous reports.

While the SPRT method is intended to allow for multiple looks at accumulating data, it does not explicitly address other issues of multiplicity. There are over 20,000 drug–AE pairs that are tested, and none of the methods make explicit allowance for this form of multiplicity. Here, reports are of suspected ADRs, so the possibility that they are all chance effects is not tenable, and the application of Bonferroni types of correction would be too extreme and lead to a notable loss of power. False discovery type methods, as described by Gould [12] and Ahmed et al. [29], do address these forms of multiplicity, and they could be applied to SPRT methods as well.

Therefore, it is not possible to choose the hRR to be detected based solely on statistical grounds. Furthermore, it should be noted that the SPRT is a sequential test, and applying it to an existing database is not the most appropriate approach to its evaluation. The most appropriate approach would be to look at the newly arrived data and see what SDRs are detected. It is also clear that the actual hRR that is most likely to be a real effect is of relevance. It is very likely that very high hRRs for reasonably common effects will have been detected in randomised trials used for licensing. However, extremely rare reactions would not be detected, and spontaneous AE reports are the best tool for detecting them.

Having a large value of hRR will generate SDRs at very low observed or expected counts, but at high observed or expected counts, the signals might be missed. Hence, a detailed analysis of the effect of different hRR values on the database is necessary, and we have tried having different thresholds or hRR values for different observed counts or expected counts, but this does not seem practically sensible. However, having different thresholds for different types of AEs, depending on their rarity, may well be sensible. This may not be easy to define, but is worth exploring in the future.

5 Conclusions

It appears that SPRT may have some applications in the Singapore’s SRS. For AEs that are rare and thus expected to have low expected counts, applying a higher hRR for the SPRT method may pick up SDRs of concern. On the other hand, AEs that are more common need a lower hRR to weed out false positives. To appreciate the value of SPRT in the Singapore database, more in-depth analysis comparing the value of the signals picked up by varying the hRRs would be a useful next step of investigation. Other countries, especially with smaller databases, may find this simple method of SPRT can be applied very easily to their databases and may provide signal detection for some rare events of significance to them. Assuming they have a database and the ability to produce the counts of the pairs, then it is easy to apply the method, and this could be done using any spreadsheet or statistical software.

References

Waller PC, Evans SJW. A model for the future conduct of pharmacovigilance. Pharmacoepidemiol Drug Saf. 2003;12(1):17–29. doi:10.1002/pds.773.
Article PubMed Google Scholar
Waller P. An introduction to pharmacovigilance. UK: Wiley-Blackwell; 2010.
Google Scholar
Koh Y LA, Tan L, Ang PS, Tan SH, Toh D, Chan CL. Pharmacovigilance in Singapore—harnessing IT and genomics to detect safety signals. Regul Aff J-Pharma. 2012;(Oct issue):13–5. https://pink.pharmamedtechbi.com/PS116976/Pharmacovigilance-in-Singapore--harnessing-IT-and-genomics-to-detect-safety-signals. Accessed 15 Mar 2017.
Hauben M, Bate A. Decision support methods for the detection of adverse events in post-marketing data. Drug Discov Today. 2009;14(7–8):343–57. doi:10.1016/j.drudis.2008.12.012.
Article CAS PubMed Google Scholar
Zink RC, Huang Q, Zhang L-Y, Bao W-J. Statistical and graphical approaches for disproportionality analysis of spontaneously-reported adverse events in pharmacovigilance. Chin J Nat Med. 2013;11(3):314–20. doi:10.1016/S1875-5364(13)60035-7.
Article PubMed Google Scholar
Evans SJW, Waller PC, Davis S. Use of proportional reporting ratios (PRRs) for signal generation from spontaneous adverse drug reaction reports. Pharmacoepidemiol Drug Saf. 2001;10(6):483–6. doi:10.1002/pds.677.
Article CAS PubMed Google Scholar
Bate A, Evans SJW. Quantitative signal detection using spontaneous ADR reporting. Pharmacoepidemiol Drug Saf. 2009;18(6):427–36. doi:10.1002/pds.1742.
Article CAS PubMed Google Scholar
Egberts AC, Meyboom RH, van Puijenbroek EP. Use of measures of disproportionality in pharmacovigilance: three Dutch examples. Drug Saf. 2002;25(6):453–8. doi:10.2165/00002018-200225060-00010.
Article PubMed Google Scholar
Bate A, Lindquist M, Edwards IR, Olsson S, Orre R, Lansner A, et al. A Bayesian neural network method for adverse drug reaction signal generation. Eur J Clin Pharmacol. 1998;54(4):315–21. doi:10.1007/s002280050466.
Article CAS PubMed Google Scholar
Dumouchel W. Bayesian data mining in large frequency tables, with an application to the FDA Spontaneous Reporting System. Am Stat. 1999;53(3):177–90. doi:10.1080/00031305.1999.10474456.
Article Google Scholar
Szarfman A, Machado SG, O’Neill RT. Use of screening algorithms and computer systems to efficiently signal higher-than-expected combinations of drugs and events in the US FDA’s Spontaneous Reports Database. Drug Saf. 2002;25(6):381–92. doi:10.2165/00002018-200225060-00001.
Article CAS PubMed Google Scholar
Gould AL. Detecting potential safety issues in clinical trials by Bayesian screening. Biom J. 2008;50(5):837–51. doi:10.1002/bimj.200710469.
Article PubMed Google Scholar
Candore G, Juhlin K, Manlik K, Thakrar B, Quarcoo N, Seabroke S, et al. Comparison of statistical signal detection methods within and across spontaneous reporting databases. Drug Saf. 2015;38(6):577–87. doi:10.1007/s40264-015-0289-5.
Article CAS PubMed Google Scholar
Wald A. Sequential tests of statistical hypotheses. Ann Math Stat. 1945;16(2):117–86.
Article Google Scholar
Wald A. Sequential analysis. New York: Wiley; 1947.
Google Scholar
Evans S. Sequential probability ratio tests applied to public health problems. Control Clin Trials. 2003;24:67S.
Google Scholar
Talbot J, Aronson JK. Stephens’ detection and evaluation of adverse drug reactions: principles and practice. 6th ed. Oxford: Wiley-Blackwell; 2012.
Google Scholar
Leite A, Andrews NJ, Thomas SL. Near real-time vaccine safety surveillance using electronic health records—a systematic review of the application of statistical methods. Pharmacoepidemiol Drug Saf. 2016;25(3):225–37. doi:10.1002/pds.3966.
Article PubMed PubMed Central Google Scholar
Spiegelhalter D, Grigg O, Kinsman R, Treasure T. Risk-adjusted sequential probability ratio tests: applications to Bristol, Shipman and adult cardiac surgery. Int J Qual Health Care. 2003;15(1):7–13.
Article Google Scholar
Roux E, Thiessard F, Fourrier A, Begaud B, Tubert-Bitter P. Evaluation of statistical association measures for the automatic signal generation in pharmacovigilance. IEEE Trans Inf Technol Biomed. 2005;9(4):518–27. doi:10.1109/TITB.2005.855566A.
Article PubMed Google Scholar
Ang PS, Chen Z, Chan CL, Tai BC. Data mining spontaneous adverse drug event reports for safety signals in Singapore—a comparison of three different disproportionality measures. Expert Opin Drug Saf. 2016;15(5):583–90. doi:10.1517/14740338.2016.1167184.
Article CAS PubMed Google Scholar
Stahl M, Lindquist M, Edwards IR, Brown EG. Introducing triage logic as a new strategy for the detection of signals in the WHO Drug Monitoring Database. Pharmacoepidemiol Drug Saf. 2004;13(6):355–63. doi:10.1002/pds.894.
Article CAS PubMed Google Scholar
Micromedex^® 2.0 (electronic version). Truven Health Analytics, Greenwood Village, Colorado, USA. http://www.micromedexsolutions.com/. Accessed 19 Sept 2015.
Safety reporting requirements for human drug and biological products; Federal Register. Department of Health and Human Services. US Food and Drug Administration. 14 March 2003. http://www.fda.gov/OHRMS/DOCKETS/98fr/03-5204.pdf. Accessed 19 Sept 2015.
Important Medical Event Terms list (based on MedDRA version 18.1). EudraVigilance Expert Working Group. https://eudravigilance.ema.europa.eu/human/textforIME.asp. Accessed 19 Sept 2015.
R Development Core Team. R: a language and environment for statistical computing. R Foundation for Statistical Computing. 2014. http://www.R-project.org. Accessed 23 Sept 2016.
Ahmed I, Poncet A. Package ‘PhViD’. Version:1.0.6. 2013. https://cran.r-project.org/web/packages/PhViD/PhViD.pdf Accessed 19 Sept 2015.
Bisdorff AR. Management of vestibular migraine. Ther Adv Neurol Disord. 2011;4(3):183–91. doi:10.1177/1756285611401647.
Article CAS PubMed PubMed Central Google Scholar
Ahmed I, Thiessard F, Miremont-Salame G, Haramburu F, Kreft-Jais C, Be’gaud B, et al. Early detection of pharmacovigilance signals with automated methods based on false discovery rates. Drug Saf. 2012;35(6):495–506. doi:10.2165/11597180-000000000-00000.
Article PubMed Google Scholar
Noren GN, Hopstadius J, Bate A. Shrinkage observed-to-expected ratios for robust and transparent large-scale pattern discovery. Stat Methods Med Res. 2013;22(1):57–69. doi:10.1177/0962280211403604.
Article PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Health Products Regulation Group, Health Sciences Authority, 11 Biopolis Way #11-01 Helios, Singapore, 138667, Singapore
Cheng Leng Chan, Sowmya Rudrappa & Pei San Ang
School of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan, Australia
Cheng Leng Chan & Shu Chuen Li
Genome Institute of Singapore (A-Star), Singapore, Singapore
Sowmya Rudrappa
Department of Medical Statistics, London School of Hygiene and Tropical Medicine, University of London, London, UK
Stephen J. W. Evans

Authors

Cheng Leng Chan
View author publications
You can also search for this author in PubMed Google Scholar
Sowmya Rudrappa
View author publications
You can also search for this author in PubMed Google Scholar
Pei San Ang
View author publications
You can also search for this author in PubMed Google Scholar
Shu Chuen Li
View author publications
You can also search for this author in PubMed Google Scholar
Stephen J. W. Evans
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cheng Leng Chan.

Ethics declarations

Ethical approval

All data used in this study are anonymised, and ethical approval was not required.

Funding

The SAPhIRE (Surveillance and Pharmacogenomics Initiative for Adverse Drug Reactions) Project was funded by a Strategic Positioning Funds grant from the Biomedical Research Council of the Agency for Science, Technology and Research of Singapore (SPF2014/001).

Conflict of interest

Cheng Leng Chan, Sowmya Rudrappa, Pei San Ang, Shu Chuen Li and Stephen J.W. Evans have no conflicts of interest that are directly relevant to the content of this study.

Appendix: Statistical Details of the Sequential Probability Ratio Test (SPRT)

1.1 Introduction

The SPRT was developed by A. Wald in the 1940s [14, 15] and was mainly used in process monitoring. A review of the pharmacovigilance literature showed that the SPRT may have useful properties not shared by other methods [16,17,18]. It has been used in the context of scanning electronic health records, particularly for vaccine studies, but has not had extensive evaluation for spontaneous reports. Specifically, SPRT compares two hypotheses based on the likelihood of observing the data given those hypotheses [19]. Unlike the other methods, SPRT is based on the difference between (rather than the ratio of) the observed and expected values. However, there is limited research conducted on SPRT to evaluate its performance in a Spontaneous Reports System (SRS) database similar to Singapore’s. A previous evaluation of SPRT used a single alternative hypothesis on simulated data [20]. On theoretical grounds, at least, the methods that do not allow for this form of multiple testing will have a higher rate of false positive findings—false signals. The method is essentially applied to a 2 × 2 table of data in the same way as other methods and where the emphasis is on obtaining an expected count assuming no association between the drug and the adverse event (AE). The method is applied to a single drug and adverse event and a determination of whether there is evidence of an association is made. The process is repeated, doing the same calculations for the next possible pairing and so on, for every drug–AE combination that occurs in the database.

1.2 The Methods of Use for SPRT in Signal Detection

For any drug–AE pair, the analysis of quantitative data mining methods can be based on 2 × 2 contingency tables (Table 4). We focus on one particular drug–AE pair and include the totals of reports of other drugs and AEs.

Table 4 2 x 2 contingency tables of AE and drugs

Full size table

The number of reports listing drug and the AE are the observed values, and the expected value for each drug–AE pair is calculated (just as for a chi-square test), assuming independence, by the formula:

$$ E = \frac{{((a + c){ \times }(a + b))}}{(a + b + c + d)} $$

(1)

We assume the continuously accumulating data in SRS is described by the Poisson distribution [17] which is widely used in pharmacoepidemiology and pharmacovigilance.

The SPRT uses the log likelihood ratio (LLR) for the Poisson distribution calculated [19] by the formula:

$$ {\text{LLR}} = O \times \log_{e} ({\text{hRR}}) - \, E \times ({\text{hRR}} - 1), $$

(2)

where for any drug–AE pair,

O is the observed number of reports, which is equal to cell a in Table 4.
hRR is the hypothesised relative risk of interest to constitute a signal.
E is the expected count as calculated at (1).

This is a very simple calculation which uses the observed and expected counts multiplied by factors related to the hypothesised relative risk.

1.3 Varying Thresholds in Signal Detection

A and B are two thresholds in the SPRT method that determine, in this context, whether a signal has been detected.

The signalling rules for SPRT are:

If LLR > B, then reject H ₀ and accept H ₁.
If LLR < A, then reject H ₁ and accept H ₀.

In the conventional use of SPRT in process control one can stop monitoring if there is convincing evidence that there is no effect. However, in the case of post marketing surveillance, monitoring continues indefinitely, as signals can still occur subsequently due to various factors such as change in pattern of drug use and prescribing, so the threshold A is irrelevant in this context.

The threshold is calculated using α and β; α (type I error), the probability of incorrect rejection of a true null hypothesis, also called a “false positive”. β (type II error) is the probability of accepting a false null hypothesis, also called a “false negative”. 1 − β gives the power of the test.

The values A and B are obtained using the formulae:

$$ A = \log_{e} \{ \beta /(1{-}\alpha )\} $$

(3)

$$ B = \log_{e} \{ (1{-}\beta )/\alpha \} . $$

(4)

To minimize the false positives, the value of α should be made small. To lessen the false negatives, the value of β should also be made small. We have used conventional α and β values at 0.05 and 0.20, respectively. This gives the approximate threshold value to be 2.77 for B. Different threshold values based on different choices of α and β can also be derived (see Table 5).

Table 5 Different threshold values based on different choices of α and β

Full size table

1.4 Observed and Expected Counts and their Effect on LLR

In order to understand how SPRT works, it is helpful to see what happens to the LLR when its parameters change. A graph of LLR is shown in Fig. 2 with varying values of hRR (in the range of 1–6) with a fixed observed count of 3 and an expected count of 1. This graph shows that the most likely value of hRR, (its maximum LLR) is at hRR of 3, and the likelihood of other values of hRR fall off the further hRR is from 3.

1.5 Observed and Expected Counts and the Relation to hRR

In order to detect a signal, with a threshold value of 2.77 (i.e. α at 0.05 and β values at 0.20) and expected count of 1, using an alternative hypothesis hRR of 2 the minimum observed count is 6 (see Fig. 3). If the hRR is increased to 3, the LLR will cross the threshold when the observed count is greater than or equal to 5.

If we reduce the expected value to zero (though we cannot ever have exactly zero in practice, but it can be very close to zero), then we see that the LLR formula in (2) would approximately be equal to LLR = O × log_e (hRR). Hence, we can make LLR arbitrarily large with any observed count (and an extremely small expected count), including a low one, cross the threshold by having a sufficiently high value of the alternative hypothesis for hRR. Although this at first seems counter-intuitive, a very high hRR becomes more likely than the null even with a low observed count with an extremely small expected count. This is because the SPRT is a test comparing two simple hypotheses. Table 6 gives the hRR for the alternative hypothesis with a corresponding minimum observed count needed to cross a threshold of 2.77. It was observed that when SPRT was applied with a low hRR of 2 as the alternative hypothesis to our SRS database, it resulted in low numbers of SDRs. This could be explained as more than 90% of the drug–AE pairs in our SRS had less than 10 reports (see Fig. 4).

Table 6 Varying values of hRR with corresponding qualifying SDRs

Full size table

In all databases of spontaneous reports, low cell counts occur very frequently (and zero counts are the most frequent of all as noted above), and this is a particular issue with a relatively small database in Singapore compared to the US FDA Adverse Event Reporting System or the European EudraVigilance database. On the other hand, if the alternative hypothesis for hRR is sufficiently high then signals can be obtained even with a single observed count. However, this means that with higher observed and expected values, but with lower values of hRR, then signals become more difficult to detect.

1.6 Use of Two Hypothesised Values

Initially SPRT with hRR = 2 was applied as has been used by others in references cited above. However, rare events with relatively small values of the observed and small values of the expected count, did not produce signals. Figure 5 shows that the numbers of reports rise when the value of hRR reaches 4.1. This is because the minimum value for the observed count to be a signal with an expected close to zero, becomes 2 and so potentially many more cells have a count of 2 than a count of 3. Rarer AEs, such that although they only occur twice, but have a very small expected value and hence were not detected as a signal using SPRT with hRR = 2 could be signalled more easily. Figure 6 shows the overlap of significant pairs detected by SPRT with hRR = 2 and hRR = 4.1.

Signals with a notable excess based on a low number of reports with very low expected values are of medical interest as well as those with larger numbers of reports, so we decided to include all the SDRs generated by SPRT with hRR = 2 or with hRR = 4.1, if either crossed the threshold of 2.77.

All these calculations are repeated using each drug–AE combination that exists. The calculations are quite simple; the complexity is in doing them for every combination and classifying each pair as giving rise to a signal or not.

1.7 Comparison of SPRT with Other Methods

In our study, three methods in common use i.e. reporting odds ratio (ROR), Bayesian Confidence Propagation Neural Network (BCPNN) and Gamma Poisson Shrinker (GPS) were compared with SPRT. Virtually all those using these methods use a criterion based on a minimum count of drug–AE pairs (N ≥ 3) so this was also imposed here. The reason is that the analyses of ROR and GPS methods with small cell counts with one or two reports can result in statistical instability associated with disproportionalities. If the criterion of N ≥ 3 was not applied, ROR and GPS methods would generate many SDRs which are false positives. When we did not specify N ≥ 3 for GPS, the method signalled several drug–AE pairs with two counts. As explained in the study by Ang et al. [21], using a stricter threshold criterion for these methods will not give a better performance. Table 7 gives the formulae and thresholds for all the four methods compared in this study. Table 8 shows the formulae for calculating the sensitivity, specificity, positive predictive value and negative predictive value.

Table 7 Formulas and thresholds of methods used

Full size table

Table 8 Concepts of sensitivity, specificity, positive predictive value, and negative predictive value

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chan, C.L., Rudrappa, S., Ang, P.S. et al. Detecting Signals of Disproportionate Reporting from Singapore’s Spontaneous Adverse Event Reporting System: An Application of the Sequential Probability Ratio Test. Drug Saf 40, 703–713 (2017). https://doi.org/10.1007/s40264-017-0531-4

Download citation

Published: 28 April 2017
Issue Date: August 2017
DOI: https://doi.org/10.1007/s40264-017-0531-4

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Detecting Signals of Disproportionate Reporting from Singapore’s Spontaneous Adverse Event Reporting System: An Application of the Sequential Probability Ratio Test