Introduction

While clinical trials usually have sufficient sample size to demonstrate efficacy, few are powered to detect rare adverse events or adverse events that occur after long term exposure. As a result, pharmacovigilance and pharmacoepidemiologic studies using spontaneous reporting (SR) adverse drug reaction (ADR) databases, electronic health records, or medical encounter claims data are critical to monitor the safety of newly marketed medications [1, 2]. The rapid growth in computing power and the development of large administrative datasets provide an opportunity for researchers and regulatory agencies to conduct active post-marketing surveillance of medications.

Petri et al. [3] introduced ‘prescription sequence analysis’ as a new and fast cohort crossover approach for detection of safety issues associated with use of medications in 1988. Thereafter, a more general term, sequence symmetry analysis (SSA), was proposed by Hallas and used to identify whether there was an association between the initiation of cardiovascular medications and the onset of depression in 1996 [4]. The method was first used to identify whether there was an association between the initiation of cardiovascular medications and the onset of depression. Since its initial application, the use of SSA has been increasing, both as a method of studying specific side effects of medication use and as a data mining tool to detect unknown and unsuspected safety issues [5,6,7]. A validation study of the method indicated that SSA has moderate sensitivity and high specificity in detecting heart failure as an adverse event [8]; however the new signals identified by the paper have yet to be confirmed. SSA has also been found to have robust performance when the same association is analyzed across several different databases [9, 10]. Additionally, SSA may result in more rapid detection of safety issues as it requires only a minimal dataset and is computationally efficient [9, 10]. More recently SSA has been applied as a signal generation tool and has the potential to provide a complementary approach to adverse event detection alongside routine PV using spontaneous reports [8].

A common limitation in observational studies is the potential for confounding [2]. The advantage of SSA is that it is robust to confounders that are stable over time, e.g., gender, and genetic factors [4, 11, 12]. In this review article, we explain the theoretical and conceptual framework of SSA and discuss the strengths and advantages of SSA based on currently available literature. We also highlight the challenges and pitfalls in applying SSA. Finally, we summarize the application of SSA in practice.

Theoretical and conceptual framework of sequence symmetry analysis

Sequence symmetry analysis is based on examining the sequence of events in relation to initiating a medication [4]. If a medication (referred to as an index medication) is suspected of causing an adverse event, it may be more often followed by the initiation of a medication commonly used to counteract or treat the adverse event (referred to as an outcome medication). For example, if a particular medication is associated with diarrhea we would notice more people initiating anti-diarrheal medication after initiating the index medication than before the index medication. Initially, Hallas [4] described the method to include all sequences of events irrespective of their proximity, however, Tsiropolous [7] published a variation to the method in which a limit was placed on the time window between events.

The statistic of interest in SSA is the sequence ratio (SR), which is a measure of asymmetry of sequences. The SR is calculated by dividing the number of people for whom the outcome medication was initiated after the index medication with the number of people for whom the outcome medication was initiated before the index medication. As such, the SR could also be regarded as an estimate of the incidence rate ratio of the outcome in the exposed period versus that of the non-exposed period [4, 6].

$${\text{Sequence ratio}} = \frac{{{\text{Number of people using index medication}} \to {\text{outcome medication }}}}{{{\text{Number of people using outcome medication}} \to {\text{index medication}}}}$$

In the absence of an association, one would expect a symmetrical pattern in the distribution of initiation of the outcome medication before and after the initiation of the index medication.

To illustrate the calculation of the sequence ratio, consider the scenario of medication induced hypothyroidism. Amiodarone, an anti-arrhythmic drug, is known to induce hypothyroidism. Therefore we would expect a higher chance of a person receiving thyroid hormone supplement, in the form of thyroxin, after initiating amiodarone [13, 14] (Fig. 1a). In the absence of an association, we would expect the pattern to be symmetrical as represented by the volume of patients in blue in Fig. 1b. The relative excess volume of patients in red (Fig. 1b) may be due to the side effect which is calculated by the SR (Fig. 1a).

Fig. 1
figure 1

Theoretical and conceptual framework of SSA. a Asymmetrical prescribing pattern of potential causal relationship. b Estimation of background rate of natural occurrence from non-causal sequence. Background rate indicates a group of patients received thyroxin before amiodarone due to chance, instead of the pharmacological effects of amiodarone

SSA requires the identification of new users of both the index and outcome medications. An efficient graphical approach to the identification of incident users is the waiting time distribution method, also proposed by Hallas [4, 15]. The waiting time distribution method graphs a group of medication users by the time of their first prescription within a specified time window. Those patients who have their first prescription at the beginning of the window may be prevalent users and are excluded from the analysis. After a specified waiting time, the number of medication users could be constant over time and the graph will be dominated by incident users. As a result, we are able to efficiently select incident users after the waiting window. This graphical approach strengthens the efficiency of large scale surveillance across multiple datasets [6, 7] (Fig. 2).

Fig. 2
figure 2

Hypothetical waiting time distribution method to capture incident drug users. The waiting time distribution provides a graphical representation of a group of medication users by the time of their first prescription within a specified time window, with most past users captured at the beginning of the window (purple bar). After a specified waiting time (12 months), the number of medication users will be constant over time and the graph will be dominated by incident users (blue bar). (Color figure online)

The choice of outcome has been described as a medication initiated that has the potential to be used to treat the occurrence of an adverse event (e.g., thyroxin for amiodarone induced hypothyroidism). If data are available, it is also possible to use diagnosis events as outcomes. For example, Caughey et al. [16] used hospitalization for hip fracture as an outcome and performed SSA to test the association between prochlorperazine and hip fracture. It is also possible to investigate symmetry between two events, for example Cole et al. [17] evaluated the association between hysterectomy and the risk of irritable bowel syndrome. We provide examples of index and outcome drugs or events in Table 1.

Table 1 Published studies using sequence symmetry analysis from 1996–2016

The validity of SSA is dependent on the availability of a good indicator for the outcome of interest. Whether a medication or a hospitalization event is used, the specificity of the outcome measure as an indicator of the adverse event is important. The use of non-specific drug indicators requires an additional level of interpretation. Further, when the adverse event in question only constitutes a smaller proportion of the use of the outcome drug in question, associations might be attenuated. Since SSA relies on medication indicators, associations may be underestimated if patients who experience side effects discontinue the suspected medication, are not treated with a medicine for the side effect or use an over-the-counter (OTC) medication that is not recorded in the dataset. As a result the drug → outcome sequences that would otherwise have occurred will not be included in the analysis and the signal may be attenuated. This will only be a problem to the extent that the adverse drug effect is commonly known or suspected, which will lead some clinicians to discontinue the offending drug rather than just treating the outcome symptom. The SSA can thus be expected to work best with unsuspected associations.

Strengths and advantages of sequence symmetry analysis

SSA is a case-only design, as it includes only those patients who have the outcome of interest [18]. Additionally it only includes those patients who have the exposure or medication of interest. SSA has been shown to capture signal even when the adverse event is rare [12]. For example, Lai et al. [6] found a significant signal of hyperprolactinemia among patients receiving sulpiride and amisulpride who were later dispensed prolactine inhibitor treatment, even though the incidence rate of hyperprolactinemia in the population was relatively low. As another example, Tsiropoulos et al. [7] detected an association between use of antiepileptic drugs and use of antibiotics with only a few users of both medications.

SSA has been shown to have high specificity but moderate sensitivity. A validation study by Wahab et al. [8] found SSA to have a specificity of 93% (95% CI 0.87–0.96), a sensitivity of 61% and a positive predictive value of 77% (95% CI 0.61–0.88) when tested against adverse events identified in 120 clinical trials for 19 medications. Additional research has demonstrated that when applied to administrative data sources, SSA can be a complementary tool to traditional pharmacovigilance methods. The rates of detected events increased 21% after supplementary use of the SSA beyond traditional methods and different signals were detected using the different methods [19, 20].

SSA inherently controls for time-constant patient-specific confounders such as genetic or environmental factors [4]. Traditional methodologies for evaluating exposure-outcome associations, such as cohort or case–control studies, require data on a large number of confounders to produce unbiased risk estimates. SSA has the advantage that it only requires three variables, a patient identifier, medication code and medication dispensed date. Data for other potential confounders are not required as the design controls for them implicitly [10]. This ensures computational efficiency, which is an important feature of the SSA method [7]. An additional benefit of the limited data set is that it is very suitable for distributed network analyses, i.e., when structured queries are sent to data owners and applied locally, and where summary results of those queries are returned to a coordinating centers for collation. Since this obviates the need for exchange of raw data, the method aids in preserving the confidentiality and privacy of patients [10, 13, 21, 22]. SSA has become one of the routine methodological approaches for the Asian Pharmacoepidemiology Network (AsPEN) [13], a multi-national research network established to support pharmacoepidemiology research among several Asian countries [23].

Lastly, a graphical output of SSA can be generated to aid in the interpretation of generated signals. The premise behind SSA is the concept of symmetry. While the SR is the statistic used to summarize potential asymmetry, a visual representation of the sequence of events can help to understand the temporality of the association (Fig. 1). Review of the SSA graphs and temporality of the association may help to further validate the plausibility of an identified signal.

Challenges and pitfalls of sequence symmetry analysis

While there are advantages to the use of SSA, there are potential challenges and pitfalls. As described in the previous section SSA utilizes a non-symmetrical pattern of treatment orders as evidence of a potential adverse effect of the medication of interest. There may be other reasons, apart from a true effect, that could create such asymmetry, and these are discussed below.

Prescribing trends

The SSA may be affected by prescribing trends over time which may possibly lead to a biased effect estimate [4]. For example, an excess of index medications → outcome sequences could occur if the use of the outcome medication is increasing, e.g., because of changes in reimbursement or other drivers in utilization. This would result in the SR overestimating the true incidence rate ratio. To remedy such bias, a null-effect SR can be calculated which adjusts the SR for the background rate of the medications under study. As described by Hallas [4], the null-effect SR takes the prescription trends in the background population into account, by computing an expected SR based on the probability of the sequence of initiation of outcome drugs after index drugs in the absence of any causal association. The calculation of null-effect SR has been described by Hallas and revised by Tsiropolous et al. [7] who placed a restriction on the exposure window between sequences. The null-effect SR is derived from the calculation of the probability, P, of each incident index drug user being exposed to an outcome drug within the specified exposure window after the day the index drug was initiated.

$${\text{P}} = \frac{{\mathop \sum \nolimits_{\text{n = x + 1}}^{\text{x + d}} {\text{M}}_{\text{n}} }}{{\mathop \sum \nolimits_{\text{n = x - d}}^{\text{x - 1}} {\text{M}}_{\text{n}} { + }\mathop \sum \nolimits_{\text{n = x + 1}}^{\text{x + d}} {\text{M}}_{\text{n}} }}$$

Here, P indicates the probability of each incident index drug user will have his first prescription for a drug after day x inside a time window, n indicates consecutive days of the study period, M n indicates the number of persons receiving their first outcome drug on the date, and d is the specified number of days for observation time window (e.g., 365 days to capture the pairs of index and outcome drugs).

The overall average probability, P a , is then calculated by weighting the number of incident users of an index drug on each day of the study and averaging for all days [7], as:

$${\text{Pa}} = \frac{{\sum\nolimits_{\text{m} = 1}^{\text{u}} {\left[ {{\text{I}}_{\text{m}} {{ \times }}\left( {\sum\nolimits_{\text{n} = m + 1}^{\text{m} +\text{d}} {{\text{M}}_{\text{n}} } } \right)} \right]} }}{{\sum\nolimits_{\text{m} = 1}^{\text{u}} {\left[ {{\text{I}}_{\text{m}} {{ \times }}\left( {\sum\nolimits_{\text{n} = \text{m} - \text{d}}^{\text{m} - 1} {{\text{M}}_{\text{n}} } + \sum\nolimits_{\text{n} = \text{m} + 1}^{\text{m} + \text{d}} {{\text{M}}_{\text{n}} } } \right)} \right]} }}$$

Here, Pa indicates the overall average probability that the outcome drug will be prescribed after the index drug, with the given prescription pattern in the background population taken into consideration. It is calculated by weighting the number of incident users of an index drug on consecutive m days of the study and averaging for all days. n indicates consecutive days of the study period, u indicates the last day of the study period, M n indicates the number of persons receiving their first outcome drug on the date, I m indicates the number of persons receiving their first index drug on that day, and d is the specified number of days for observation time window (e.g., 365 days to capture the pairs of index and outcome drugs).

Finally, a null-effect SR is calculated as P a /(1 − P a ). The adjusted sequence ratio is then calculated as the crude SR divided by the null-effect SR [4].

$${\text{Adjusted}}\,{\text{SR}} = {\text{crude}}\,{\text{SR}}/{\text{Null-effect}}\,{\text{SR}}$$

The limitations of the method described to adjust for the underlying trends in medication use and their effect on the calculation of the SR are that individual level data on initiation of new drugs for the entire population is required and that it can be computationally demanding. Other approaches for adjustment, such as bootstrap resampling methods, have been described by Garrison et al. [24]. A simulation has tested a limited set of potential utilization patterns of the underlying trends and found that adjustment of the crude SR by the null-effect SR effectively removes bias related to changes in underlying utilization trends [25]. However, more work is required to study potential bias in other scenarios.

Inappropriate identification of new use

As discussed in the theoretical background to SSA, the method requires the identification of initiation of the medication of interest and of the outcome medication. The reason for this is that adverse events are more likely to occur soon after treatment initiation and that the initiation of the outcome medication is more likely to reflect treatment for the onset of an adverse event rather than treatment for an ongoing condition. When employing SSA, the medication of interest may be a specific medication or a medication class. When examining use by specific medication, new users of a medication may include patients who have switched to that medication from a medication in the same class. Exclusion of switchers from study cohorts or censoring the switchers at the time of switching is the simplest solution to overcome potential bias from existing use of the medicine class. In SSA, switching medications might affect the estimation of the background rate from the non-causal sequences. For example, first generation antipsychotics such as haloperidol have a higher risk of extrapyramidal symptom (EPS) and therefore patients may be switched to second generation antipsychotics such as olanzapine when EPS is suspected [6, 26]. This results in a switch to olanzapine, a second generation antipsychotic, shortly after a diagnosis of EPS has been registered. Analysis of this sequence without the recognition of prior first generation antipsychotic use would lead to an apparent inverse association between use of olanzapine and risk of EPS. A solution to this problem is to include only new users of a medication class. In the example of EPS discussed above, patients would only be selected for inclusion in SSA at the time of initiating their first antipsychotic in the class.

Time-variant variables and selection of study periods

SSA may be affected by bias due to within-person confounding [27] such as fluctuations in disease severity, dietary or other behavioral changes which mat influence the order of prescribing of the index and outcome medicines. The effect of time-varying confounding on the results of SSA may be influenced by the length of the exposure window. Limiting the study period, for example to 12 months, can help to reduce potential bias due to time-varying covariates, however, the trade-off is the potential to miss adverse reactions that develop only after a long-term exposure [6,7,8].

There is no standard exposure time window for SSA, and the best strategy to determine the appropriate time window is to consider the likely time course of the development of the adverse effect under study. For signal detection studies without specific hypotheses, a 1-year time window might be optimal for achieving acceptable sensitivity and positive predictive value. A study by Wahab et al. [19]. found the restriction to a shorter exposure window reduced the sensitivity of SSA but this may be due to small sample sizes. However, Wahab et al. [19] assessed SSA only for acute events and the restricted time window may be insufficient for detecting adverse events that may take longer to manifest.

There is no formal computational method to adjust for known time-variant confounders in the SSA, as opposed to the case-crossover and self-controlled case series analyses that allow for adjustments of time-varying covariates, e.g., by incorporating time-dependent covariates in a conditional logistic or poisson regression model.

Inverse causality and protopathic bias

One of the assumptions of SSA is that the occurrence of the outcome will not affect the probability of exposure. Violation of this assumption may result in an effect known as inverse causation. For example, if a non-symmetrical distribution of a sequences is found with SR below 1.0, this could be explained by either the index drug reducing the risk of using the outcome drug or the outcome drug increasing the risk of using the index drug. It is not possible from these data alone to know which is correct.

Protopathic bias is also a potential problem when employing SSA. Protopathic bias occurs when the index medication is used to treat the underlying symptoms of an outcome before the outcome is diagnosed [28]. This might lead to a false conclusion that the index medication induces the outcome event. Inverse causality and protopathic bias highlight the importance of including sensitivity analyses to test the robustness of SSA results.

Tradeoff-signal versus noise

SSA may be used for hypothesis generation. Two recurrent discussions in such activity are how to address the problem of multiple testing and the related problem of how to define a signal. A signal generated by SSA may be defined as a SR in which the lower limit of the 95% confidence interval is greater than 1. Published studies have used variations of this, for example, Tsiropoulos considered a result to be a signal if there was sufficient power (highest number of sequence pairs) or the most significant associations (highest SR) [7]. Other studies have calculated 99% confidence intervals to identify the significant signals when investigating outcomes for a medication class, e.g., potential safety signals associated with the use of glaucoma eye drops [29] and antiepileptic drugs [14]. Lowering the significance level may not be the ultimate solution to the signal to noise tradeoff [30]. Using a lower alpha value threshold, e.g., 0.01 rather than 0.05, will reduce the number of non-causal signals generated purely by chance, but it will also reduce the number of causal signals to the same extent, since fewer of the true associations will reach statistical significance. Thereby, the signal/noise ratio is virtually unchanged by a lower p value threshold. The chosen alpha level, therefore, should be determined by a careful consideration of sensitivity and specificity, and the implications of false positive or false negative findings.

Detection bias and confounding by indication

As with many other observational study designs, detection bias may play a role in SSA because patients might be more likely to receive a particular treatment after they start another because they are now in the health system [10, 14]. For example developing a health condition such as diabetes may trigger a patient to be more actively seeking treatment for other health conditions. A study investigating the risk of antipsychotic induced hyperglycemia identified a seemingly protective association between antipsychotics and initiating insulin (the indicator of hyperglycemia), however, the authors noted that it was possible that entry into the health system through diagnosis of diabetes may have prompted the diagnosis of a psychiatric condition [10]. Hallas pointed out several possible causes of asymmetrical patterns, such as confounding by indication [4]. For example, the relationship between a cardiovascular medication and depression may be confounded when cardiovascular disease itself may lead to depression [4].

Application of sequence symmetry analysis in practice

Examples of the use of SSA are listed in Table 1. We classified all the SSA studies into three groups by purpose of the studies, i.e., signal evaluation with a specific study hypothesis, signal generation studies where the aim was to detect new signals associated with treatment or methodological studies which evaluated the validity of SSA. Although SSA was originally considered as a signal detection tool for drug safety, most of the SSA studies undertaken to date have aimed to test known hypotheses or to prove clinical phenomena and evaluate the risk signal, with only a limited number focused on generating hypotheses or detecting unknown, unsuspected risks [6, 7, 29]. Further research is required to determine the utility of SSA as a potential tool for large scale data-mining in claims data.

Execution of sequence symmetry analysis

The execution of SSA requires a dataset that includes (1) a unique patient identifier; (2) a variable to identify the medication dispensed and (3) a variable to identify the date of medication supply. An analytical SAS or R program is available from the authors upon request. It contains the following sections:

  1. 1.

    Selection of new use of index and outcome medication.

  2. 2.

    Identifying patients with new use of either medication.

  3. 3.

    Ordering sequences according to which came first.

  4. 4.

    Crude SR calculation.

  5. 5.

    Null-effect SR calculation.

  6. 6.

    Adjusted SR calculated as the ratio of the crude SR and the null-effect SR.

  7. 7.

    Confidence interval calculation.

Conclusion

Sequence symmetry analysis has been increasingly applied in pharmacoepidemiology studies. Its advantages are that it provides efficient computation, moderate sensitivity but high specificity and is robust towards time constant confounding factors. Its minimal data requirement means that it is suitable when only dispensing data are available. However, there is a potential for false positive or negative results and careful consideration should be given to potential sources of bias when interpreting the results of SSA studies.