Keywords

1 Introduction

At the time of their marketing, the effects of drugs and especially their efficacy have been studied mostly in randomized controlled clinical trials (RCT), comparing them to placebo or to existing drugs. However, these RCT are by nature limited in their extent. Stringent inclusion and exclusion criteria are destined to provide for homogeneous study populations and reduce response variability. These features reduce the representativeness of RCT to the future user population (Steg et al. 2007; Blin et al. 2017). Once the drugs have proven efficacy and a measure of safety, and are on the market, they will be prescribed to patients with concomitant diseases and medication or other risk factors that have usually been excluded from RCT (Blin et al. 2017). When several new drugs are marketed within a short time frame, as is often the case with new drug classes (e.g. direct-acting anti-anticoagulants), there is no comparative RCT. It is very unlikely that any pharmaceutical company will devise at great cost a directly comparative RCT, comparing their drug to other direct competitors. In addition, the introduction of new drugs or therapeutic options to the market may shift user populations of previously marketed drugs and modify their benefit–risk balance.

There is therefore a need to study the interactions drugs with their target populations, within a real-life environment. This includes the description of how it is used (drug utilization studies), how it compares to similar drugs within the same disease environment (comparative effectiveness), whether any new safety concerns arise, or quantify previously identified concerns (post-authorization safety studies). In addition, even before a drug is marketed, its future environment and place on the market can be anticipated and modelled, as well, once it has effectively been marketed, as its real impact on health economics (health technology assessment).

By definition, pharmacoepidemiology studies are non-interventional, i.e., there is no influence or there should be no influence on the choice of therapeutic options studied, in contrast with interventional studies (RCT) where treatment is assigned to each patient.

Pharmacoepidemiology has long been limited to field studies such as case–control studies (Pierfitte et al. 2001) or simple cohort studies describing drug utilization, though some early databases such as Saskatchewan in Canada, VAMP research (now CPRD) in the UK, Medicare or Health maintenance organizations (HMO) such as Kaiser-Permanente in the USA paved the way for the large population resources now available.

Data resources such as countrywide healthcare systems databases have become readily available, and hospital-based data repositories or electronic health records are opening new possibilities, including multi-database and multi-country efforts involving very large populations of dozens to hundreds of million patients.

Pharmacoepidemiological studies require knowledge of epidemiological and statistical methods as well as the resources available to study the drugs and their specificities, but also knowledge of the pharmacology of the drugs being studied, and of the diseases involved as indications, efficacy or safety outcomes.

Finally, pharmacoepidemiology, as the study of drug effects in large populations, might be seen as just another method in experimental pharmacology, on a larger scale, much like the observation of drug effects in individual animals or persons in traditional experimental pharmacology.

2 Data Sources in Pharmacoepidemiology

Pharmacoepidemiological studies can involve primary collection of data (field studies) or secondary use of data previously collected for other ends (claims databases, electronic health records (EHR)).

2.1 Primary Data Collection

In primary data collection, studies are devised ad hoc, much as clinical trials, and specific information, such as quality of life, lifestyle data not present in medical records, blood or DNA samples can be acquired.

Field studies may be obligatory when the data needed is not readily found in the claims or EHR databases, such as the site of an ocular injection, or the presence of lifestyle characteristics, or again the reasons for which a drug may have been prescribed or stopped. Because these studies involve contact with patients and the generation of primary data, they are subject to patient safety requirements and informed consent. The rules for reporting adverse events will also not be the same as for secondary data [https://www.ema.europa.eu/en/human-regulatory/post-authorisation/pharmacovigilance/good-pharmacovigilance-practices# final-gvp-modules-section. Guideline on good pharmacovigilance practices (GVP) Module VI – Collection, management and submission of reports of suspected adverse reactions to medicinal products (Rev 2)].

Studies may also combine data from an ad-hoc field study and claims databases, either directly where patients are identified, characterized and recruited by prescribers, but then followed in claims databases including after patient randomization (Mackenzie et al. 2016; MacDonald et al. 2013, 2014; Flynn et al. 2014) or indirectly, by verifying in a field study potential associations of confounders with prescribing. In the absence of an association (e.g. a drug is not preferentially prescribed in smokers), then that potential confounder is just a risk modifier and can be neglected in database studies.

An alternative is to identify patients in a database, then return to the patient and/or prescriber to complete the data. Since this may infringe on patient confidentiality protection laws, this design, which could be thought optimal to identify and enrol patients in highly targeted field studies, may not be easily feasible (Depont et al. 2007a, b).

The benefits of primary data collection (field studies) are their great flexibility, since the data acquisition is tailored to the needs of the study. Their main drawback is cost: since pharmacoepidemiological studies generally require large numbers of patients, identifying, recruiting and following large numbers of patients is usually difficult and expensive.

This is true whatever the study design. In some specific cases, there is no option, such as for rare genetic diseases, where patients are often included in registries and more easily available. Patients may also be recruited through disease-based associations, with a clear risk of recruitment bias: patients who participate in disease associations may not be representative of the whole patient population.

In some cases, especially when expensive drugs are studied, these may be on specific dispensing registries, and serve to identify patients (e.g. for targeted cancer therapies), which will allow characterization of users, and specific follow-up including, for instance, reasons for drug discontinuation (if recorded) and/or progression-free survival (Fourrier-Reglat et al. 2014a, b; Noize et al. 2017; Rouyer et al. 2018). If medical records are complete enough, it may not even be necessary to interact with the patient.

A basic principle is to include patients only after the drug has been prescribed, without interfering with the prescription process. There may be some interventions in the study, such as blood or DNA sampling, or recording of QOL variables, but these should not interfere with the free choice by the prescriber of the therapeutic options, and do not alter the observational status of the study (Guiard et al. 2019).

2.2 Secondary Sources of Data

In secondary data sources, the data is usually already present at the time of the study, and it would generally not be possible to enrich the dataset, though new developments in clinical data repositories might change this in the near future. These data sources might be medical records (electronic health records (EHR)) or data derived from healthcare insurance systems (claims data).

Electronic Health Records

These are databases or repositories of medical records, from general practitioners (GP) or hospitals. They are based on the voluntary recording by participating physicians of the clinical details of the patients they follow, often within patient management software. This will include outpatient diagnoses and prescriptions (not dispensing), results of lab tests and other exams (if entered) and results of specialist visits, or hospital discharge summaries, as well as lifestyle characteristics. The quality and completeness of the data depends on the heath care professional’s input. Ideally this will be done by healthcare records management software, the data being transmitted to the database after anonymization. In this case the data will actually be used for patient management. Quality and completeness of the data needs to be regularly verified, and missing data may be an issue. The completeness of data may also be an issue and depend on the healthcare system. If the GP is the overall curator of all the patient’s healthcare, the data may be presumed complete, though hospital data may not always be fully transcribed, nor might some specialist visits (Jick et al. 2003). Though these data are in principle anonymized, it is possible under certain circumstances to return to the originator GP to obtain precisions on specific points, or for quality control.

Claims Databases

Claims databases contain recordings of all healthcare encounters that are covered by the healthcare system or insurance company. This may include outpatient medical consultations, drugs or devices dispensed, lab test or imaging, paramedical interventions, but also hospital admissions including diagnoses and procedures. Often there is the recording that this lab test or exam has been done, but not always the results of such tests. In some countries there are outpatient diagnoses, or records of chronic conditions, and linkage to national lab test or pathology repositories. This is especially true in Nordic countries. These data might also be linked to specific registries such as cancer, diabetes or rare disease registries, or death registries including or not its cause.

Depending on the data sources, such databases may contain much information on medical expenses than can provide direct or indirect information on various potential confounders. For instance, if they do not contain such lifestyle information as smoking or BMI, they do contain information on their medical consequences, such as chronic bronchitis, sinus infections, including use of antibiotics, peripheral arterial disease, tobacco cessation aids or devices, specialist consultations, etc. Increased BMI may be related to diabetes (identified directly or through its treatment), osteoarthritis and procedures such as knee or hip replacement, and the use of drugs for these indications, but also bariatric surgery and other procedures related to obesity, or the use of assistance such as walkers or canes, and use of spas and weight-reducing programs. All these variables can be included in modern statistical analyses such as high-dimensional propensity scores and disease risk scores (Schneeweiss et al. 2009; Neugebauer et al. 2015).

The data in claims databases are collected systematically and prospectively. They concern all the information for all the patients covered by the healthcare system. This might be lifelong for the whole population as in France or in Nordic Countries, or limited to specific areas (in Germany, Italy or Canada), or to specific ages, social status and resources (as in the USA), which may limit the usefulness or representativeness of such claims databases (Trifiro et al. 2009; Coloma et al. 2011; Bezin et al. 2017).

Chart Reviews

A third approach to secondary use of data is the concept of chart reviews, where patient files are examined for the presence of specific events such as indicators of cancer progression, which will usually not be found in the claims data (Fourrier-Reglat et al. 2014a, b; Rouyer et al. 2018). Chart reviews may concern patients treated with specific drugs, or recorded drug exposures before events such as liver transplantation (Gulmez et al. 2013a, 2015).

Finally different data sources may be combined in datahubs or data repositories that aggregate information from claims databases and from clinical records, in-hospital or outpatient, data from registries, including results of lab tests or description of DNA sequencing or target information, as well as information from the emerging wearable devices (Dhainaut et al. 2018). These multisource data linkages mutually enrich all the datasources.

3 Methods and Designs in Pharmacoepidemiology

Pharmacoepidemiology is based on the study of the conjunction of subjects, exposures and events. One may consider several approaches in pharmacoepidemiology: event-based or exposure-driven methods.

Exposure-driven methods describe the use of a drug in the population, the drivers for that use, and the consequences of such use. This can represent cross-sectional studies, simply describing the user population. Possibly including past or concomitant events, prescriptions or diseases. If the subject of interest is what happens in drug users, then these studies are based on cohort methods, where patients are usually included at the time of the first prescription or dispensing of a drug and followed forward in time. Cohorts are by definition prospective.

One might be interested in the occurrence of a specific event and what happened before it that might be causally involved. These case-based studies are by definition retrospective studies, since subjects are included at the time of the event, and what is studied is what happened before.

Finally, the description of disease healthcare burden or management profiles, where patients are identified by a disease or event, and followed thereafter to describe management and costs, and the consequences of management represents what is called health technology assessment (HTA). In this field, the subject of interest in not so much the health consequences of drug utilization, but its economic consequences within a healthcare system. The main methods used here would be meta-analyses exploring the magnitude of effects compared to the cost of the drug, using all available information from clinical trials and from observational studies, which are the input to mathematical models. A very well-known source of HTA information is the National Centre for Clinical Excellence in the UK, also known as NICE. Most countries and/or health care and health insurance systems have HTA activities, to optimize the use of resources tailored to their specific healthcare systems.

3.1 Ascertainment of Exposure

Ascertainment of exposure is fundamental to any pharmacoepidemiological study, by definition. Exposure might be ascertained from the prescriber of the patient in ad-hoc field studies, or from medical records of prescription of the drug (EHR), indicating an intent of exposure, or from claims databases indicating the dispensing of the drug, i.e. that the subject was actually in possession of the drug.

Asking patients about exposures is often uncertain, and specific methodologies are needed. Knowing what drugs were actually prescribed, or better yet bought, will focus on suspect drugs. In some cases, such as over-the-counter medicines that can be freely bought, querying the patient may be the only source of information, unless there are exhaustive pharmacy records (and drug sales are restricted to pharmacies only, notwithstanding internet sales) (Moore et al. 1993; Noize et al. 2009, 2012). In some rare cases, exposure can be ascertained by biomarkers such as plasma drug concentrations (Moore et al. 2001).

Knowledge of exposure may also vary depending on the source: medical records may cite only the drugs prescribed by the keeper of the records: GP prescriptions, or in-hospital prescriptions. Claims databases include only reimbursed dispensing of drugs covered by the insurance scheme, usually not including OTC drugs. Drugs dispensed in the hospital may be included in the hospital cost, and not itemized, except perhaps very expensive drugs that are covered separately.

3.2 Ascertainment of Events and Diseases

Events can be considered as outcomes, as patient selection criteria, or as background variables. Events as outcomes in cohorts usually consider the first occurrence of the event during follow-up (e.g. bleeding in a cohort of patients on anticoagulants, myocardial infarction (MI) in coronary prevention studies, or cancer progression), and of course death. These will most often be identified from hospital diagnoses and ICD10 coding (Duong et al. 2018). Events may also be the initiator for the inclusion of patients in a study, for instance in disease-based follow-up studies such as current practice for the management of a disease (for instance secondary prevention post MI (Blin et al. 2017; Bezin et al. 2018)). Events are also the point of entry in case-based studies, studying causes of such events, e.g., drugs associated with the onset of hepatic injury or heart failure (Gulmez et al. 2013a, 2015; Moore et al. 2019). Finally diagnoses serve as indicators of diseases that may be potential confounders or indicators of risks, used to devise risk or prognostic scores, existing at the time of or prior to inclusion, or arising during the course of follow-up.

The identification of events may also include more information than just an ICD-10 code, such as a stay in intensive care for myocardial infarction (Blin et al. 2019a). These diagnoses can be the object of external validations, comparing codes to the actual patient files (Bezin et al. 2015; Bosco-Levy et al. 2019), or internal validation using adjudication committees with a complete patient health utilization history to the code itself, when it is not possible to link claims data to medical records (Pladevall-Vila et al. 2019; Wentzell et al. 2018; Czwikla et al. 2017).

Identification of diseases as previous history will rely on patient or physician interrogation in field studies as in clinical trials, with uncertainties (Fourrier-Reglat et al. 2010a, b), or on previously registered diseases, procedures or treatments indicative of such diseases in EHR or claims databases. One issue may be the depth (duration) of the database or the previous history one may wish to explore. Quite often only a few years are available, especially for commercial claims databases, when patients may change their healthcare provider.

3.3 Selection of Participants: Exposure-Based

Subjects can be selected on the exposure of interest. Such exposure-based studies may have several types of main objectives: drug utilization, non-comparative or comparative outcomes studies.

Drug Utilization Studies

Generally, new users of a drug will be selected, and described in a cross-sectional study of drug utilization, or included in a cohort, and followed for ulterior events. This would traditionally be done in field studies using so-called registries or phase-IV (post-marketing) studies. In such studies with primary data collections, new users of a new drug are identified by the prescriber and described for factors associated with the prescription, including lifestyle factors, and followed for common events, such as drug cessation and the reasons thereof, or common adverse reactions. Except in very high-risk patients such as in oncology, the event rates for serious events would be too low to allow quantification, in these studies of limited size. They may however provide valuable information on less-serious common events that would not warrant hospitalization, and would not be captured in claims databases, and on potential confounders not included in population databases.

Using large population databases will provide for the detection, description and follow-up of the very first users of the drug, and how this use might change over time.

Outcomes Cohorts

These will provide event rates for events resulting in hospital admissions (serious adverse reactions), or that may have therapeutic markers (such as the use of antidepressant drugs to identify depression) using prescription symmetry analysis (Hallas 1996; Petri et al. 1988; Idema et al. 2018). These cohort studies can be very large and will provide unbiased whole-population event rates (Miranda et al. 2017). These event rates may be considered in the absolute, for instance in the absence of non-drug related occurrences of the event, or compared to those observed in pivotal clinical trials (Blin et al. 2017) especially in cancer (Fourrier-Reglat et al. 2014a, b; Noize et al. 2017; Rouyer et al. 2018).

Among the benefits of these new-user cohort studies are the possible comparisons with pivotal clinical trials for outcomes that were identified in these trials, including efficacy outcomes so that the applicability and representativeness of these trials can be appreciated (Garbe et al. 2013).

Another benefit is that these studies will allow the evaluation of the risks associated with the drugs, and their possible benefits, especially with the final arbiter, all-cause death. This will inform the assessment of benefit–risk for drugs used in serious conditions, or to prevent serious outcomes. Outcomes can also be compared to those historically observed with other drugs in the same indication in different studies, or to these other drugs in comparative effectiveness studies.

3.4 Comparative Effectiveness or Safety Studies

Comparative effectiveness or safety studies compare event rates in similar populations treated with different drugs. These will be conducted according to a “new user” cohort design, comparing two or more marketed drugs, reducing comparison biases with adapted methods. The inclusion only of new users is necessary to avoid selection biases, such as depletion of susceptibles, whereby patients who remain on a drug (prevalent users) are those who tolerate the drug or benefit from it (Moride and Abenhaim 1994). It might also be that the use of a new drug indicates failure of a previous drug, or poor tolerability of older drugs, or again that the new drug is indicated in a specific subset of the population with a different baseline risk. Only new users in the same indication and same patient groups are at equal risk of positive or negative events. It might be difficult to find new users in chronic diseases with a low incidence of new cases, where most patients have a long history of previous treatments. Because no randomization is possible in these purely observational studies, much care is taken to ensure comparability of patient populations and avoid confounding. Confounding can be reduced by adjustment methods, taking advantage of these large populations or by matching. When more than two drugs are compared, matching may be more complex, and adjustment preferable, with or without weighing.

3.5 Matching

The comparability of cohort groups may be enhanced by matching on variables known to be associated with the outcome, and with exposure. These variables will be most likely to be confounding variables, or to be associated with confounding variables. A confounding variable is one that is associated both with the exposure and the outcome, and which can explain some or all of the association of the outcome with the exposure. Presently the most extreme matching methods use high-dimensional propensity scores (hdPS), which are determined from several hundred variables among the thousands present in the datasets. They result in groups that are identical or very close on a large number of variables, including variables that are not part of the hdPS itself (Schneeweiss et al. 2009; Neugebauer et al. 2015; Rassen et al. 2011; Wang et al. 2017; Schneeweiss 2018). Some call these highly matched cohort studies virtual-clinical trials or pseudo-randomized studies, but the absence of real randomization for exposure allocation cannot exclude residual confounding, and makes these studies simply indicative, within a wider context of many studies with different methods and biases.

3.6 Analysis

Typically, analysis of cohort studies is the determination of the relative risk, comparing event rates in exposed and comparator groups:

 

Events

No event

All

Treated

a

b

a + b

Control

c

d

c + d

  1. Relative risk is a/(a + b)/c/(c + d)

Variants of this very simplistic approach use time-dependent or survival models that define hazard ratios, based on person-time exposed or followed rather than the absolute per-person event rates above.

In these studies, any comparative analysis would be on treatment, as exposed. Intent to treat (ITT) analyses are justified in clinical trials, where the majority of treatments will be continued until the end of the follow-up, so that any untreated period will be only a small part of overall study time. In observational studies, the duration of treatment is not imposed, and the observation time might be very long, so that most of the observation time may be off treatment, resulting in major unexposed time bias. ITT is therefore essentially meaningless when the initial treatment period is short compared to potentially unlimited observation time. It would also be meaningless when stopping treatment materially alters the outcomes. For example, anticoagulants decrease clotting (thrombotic events) but increase bleeding, whereas stopping them increases the risk of clotting but the risk of bleeding disappears. ITT is also difficult to interpret if the diseases are spontaneously reversible, or if the main objective is the timing of outcomes, such a death in cancer patients.

ITT can however be used when the duration of follow-up is limited to the expected duration of treatment. Most analyses of observational data will be on treatment, comparing event rates while on treatment in exposed patients.

Analysis will commonly use time-dependent variables in survival analysis methods with Kaplan–Meier curves and Cox proportional hazards analyses. When death is a common occurrence that may act as a competing risk (patients who die are no longer at risk of another event), specific analyses such as Fine and Gray competing risk model should be used for the other, non-fatal outcomes (Fine and Gray 1999). Cohort studies can provide absolute risks and added risks, which may be important for regulatory decisions.

There are many other methodological approaches or considerations that are amply discussed each year during the annual International Conference on PharmacoEpidemiology (ICPE) meeting (www.pharmacoepi.org), such as methods for using big data, or the enrichment of claims data with data from patient data warehouses or new data sources that include not only health expenditures, but also clinical, pathology, societal or genetic information.

3.7 Selection of Participants: Disease-Based

These studies are typically used to describe disease management, and often to prepare for other designs, by specifying the expected event rates in a given disease population that may be the indication for future drugs of interest. Patients are selected on the disease of interest (e.g. diabetes, myocardial infarction or metastatic cancer) to describe disease management and prepare for HTA studies, to model the impact of an as yet unmarketed drug. Disease-based studies are also used in the post-marketing arena to test the impact a new drug or intervention has had on the disease management and cost. The analysis of these studies will be the same as the cohort studies above. One major use of such disease-based cohorts is as a source for nested case–control studies: in contrast with exposure-based studies, where only one or two exposures are studied, in these disease-based cohorts, all health interventions will be included and can be used as potential exposures in nested case–control studies, which will in addition provide information on potential interactions between exposures.

3.8 Selection of Participants: Event-Based

In this approach the event is the main driver, and case-based methods will be applied. These studies are always retrospective, in that the patients are included once the event has occurred and previous exposures are identified. In some variants, the cases and controls are identified within a cohort, in a nested case–control design. A control at a given moment might later become a case if an event occurs. Cases are usually excluded from further studies and are not used as controls.

The general approach is that cases of a given event of interest are identified, and exposures prior to that event are compared to exposures in patients or periods without an event. The comparators might be the patients themselves in case-crossover methods or self-controlled case series; cases may be matched to selected controls in the classical case–control methods, with matching that may be more or less complex including (high-dimensional) disease risk scores, or at the simplest in case-population approaches, which consider the whole source population as the control population. Case-population studies however require identification of all the cases in that population. This might entail events that are easily recognized and circumscribed to a very specific treatment environment, such as transplantation centres or intensive care units, so that exhaustive identification of all cases in the population can be obtained (Gulmez et al. 2013a). If a sample only is studied, this sample must be representative of the complete case-population. An alternative is the use of whole-population databases where all the cases of a given event may be identified, providing the case specifications are consistent with such identification: for instance, all cases of liver injury or of MI admitted to hospital can be identified in a national healthcare system (Moore et al. 2019). Less severe or serious events that are not hospitalized might not be identified.

Such a method might be useful for surveillance of exposures associated with a given event, for instance hospital admissions for acute liver injury or liver transplantation. Using national claims databases, or national transplantation networks, these events are easy to identify. Because all events in a given territory are captured, one can compare drug exposure in cases to the countrywide exposure to the same drugs (sometimes limited to the age group that might be transplanted), using either person-time or persons (Moore et al. 2013).

The important issue in case-based studies is that the cases and the controls should come from the same cohort (population): this is easy for the self-controlled and population controls, or in the nested case–control design, it might be more difficult for traditional field based case–control studies (Pierfitte et al. 2001).

In most case–control methods, exposures in cases are compared to exposures in controls.

 

Cases

Controls

Exposed

a

c

Unexposed

b

d

 

a + b

c + d

  1. The usual measure of association here is the odds ratio (ad/bc)

Most commonly exposure in cases and controls is compared to non-exposure (i.e. users to non-users). However, this presupposes that exposure is random and has no link with the event, whereas exposure to drugs is not random, but determined by a disease that causes the prescription and might be associated with the event. For instance, one might expect that patients using NSAIDs have pain or inflammation, more than persons not using these drugs. Pain and inflammation may indicate an underlying disease. One would therefore expect patients using NSAIDS to be sicker and therefore die more than non-users, which indeed is the case (Fosbol et al. 2009). This bias will be common to all drugs that are given to sick patients to treat diseases, especially if they may be associated with the event under consideration (confounding by indication). A related bias is confounding by contraindication, where a drug is avoided in patients at risk of the event of interest (e.g. late at night, passengers will often be more at risk of being drunk, and less at risk of causing an accident as the designated driver who did not drink). When drugs are given to a healthy patient for disease prevention, the comparison of users with non-users may be valid. In other cases, certainly the use of active controls, drugs with the same indications would be preferable (extreme restriction) (Secrest et al. 2019).

Going from self-controlled case series to case-population is just changing the nature of the controls, from full matching in self-controlled methods to more or less tight matching in the case–control methods, to little or no matching in case-population approaches.

Case-based approaches would be useful in a pharmacovigilance setting when there is a suspicion or signal of the association of a drug with an event, where the first step would be to verify how the association compares to similar drugs with similar indications. Case-based studies and especially the very simple case-population approach may also be used for systematic surveillance of known indicators of drug-related risks, such as the WHO critical terms lists or the more common reason for removing drugs from the market. As a first approximation, these might be liver injury, renal failure, myocardial infarction, sudden death, cytopenia, gastro-intestinal bleeding. Such systematic surveillance might be automated in the future.

3.9 The Comparator

Comparing the use of a drug to non-user is irrelevant in real life. If a drug is used there is a reason, and that reason may be associated with adverse outcomes that will be found only in treated (sick) persons, and not in untreated (healthy) persons. Most drugs when used in sick persons will be associated with a higher risk of disease. For instance, persons identified by the use of low-dose aspirin will have a much higher rate of cardiovascular events than non-users of low-dose aspirin (Duong et al. 2018). Users of NSAIDs will have a higher death rate than non-users, because NSAIDs are not used randomly, but because of pain or inflammation, which are associated with possibly fatal diseases. This is a typical indication bias. Clinical trials will typically use a placebo to negate the indication bias, since all patients have the same initial disease state. In real-life studies, to find patients with similar disease-related risk of events, one must choose comparator drugs with the same indications, i.e. active comparators. This is obviously true for cohort studies, but also in case-based analyses. A comparison with no treatment or untreated periods may simply measure the effect of the indication, not of the exposure. It is therefore imperative to use active comparators in pharmacoepidemiological studies. Ideally a comparator would be another new drug marketed within the same timeframe, and sharing similar pharmacological characteristics (mode of action, target) and indications. Using standard of care as comparator may lead to a biased comparison, where patients put on a new drug because of poor tolerance or lack of efficacy of the standard treatment. At the very least only new users of each not previously exposed to the other should be selected.

3.10 Biases

Biases in pharmacoepidemiology are for the most part common with traditional epidemiology, and can be divided into a few major categories:

  1. 1.

    Selection biases, where the wrong subjects are chosen: e.g. in a case–control study the controls are not from the same population as the cases (e.g. cases are detected in the emergency room and compared to hospitalized controls, or to controls hospitalized for other diseases); in a comparative cohort study one group may consist of incident users of a drug, whereas the other group may consist of prevalent users of the comparator, or again new users of a recently marketed drug may be compared to a historical cohort of patients followed at a time when disease management and outcomes might have been quite different.

  2. 2.

    Ascertainment biases, where the data are not collected in the same way in the different study groups. This might be the fact for historical cohorts, where the data available may vary over time.

  3. 3.

    Analysis biases, including the confounding biases, where the association between exposure and outcomes is in fact related to a third factor that is associated with both the exposure and the outcome.

In pharmacoepidemiology, where the exposure is not random as in clinical trials or externally determined (e.g. place of living) but related to patient conditions and history, there are some very specific biases, such as:

  1. 1.

    The protopathic bias (reverse causality), where the exposure is related to early symptoms of the outcome, rather than the other way around. For instance antibiotics may be prescribed for fever related to undiagnosed agranulocytosis. When agranulocytosis is later diagnosed, it may be mistakenly attributed to the antibiotics. This bias is identified by knowledge of early symptoms and causes of events, and careful determination of the time of onset of the event (index date) as that of the very first symptoms rather than its diagnostic date (Feinstein and Horwitz 1981; Horwitz and Feinstein 1980; Gulmez et al. 2013b).

  2. 2.

    Depletion of susceptibles (or healthy survivor bias), where patients remaining on long-term treatment (prevalent users) have a lower risk of having an event related to the use of the drug than patients initiating the drug. This is one of the reasons that new users designs are preferred (incident users) rather than prevalent users, who have been self-selected for good tolerability or effectiveness of the drug (Moride and Abenhaim 1994).

  3. 3.

    Immortal time bias: In this bias, patients are included in a study arm after having had a period of observations. Only those patients that have not died or had an event during that time are included. If the start of observation time is counted from the initial consideration, then that first time is immortal time. There are many variations on immortal time bias (Suissa 2008; Levesque et al. 2010).

4 Conclusion

Over the last 30 years or so, pharmacoepidemiology has changed considerably, from a field dominated mostly by case–control studies of severe adverse events such as upper gastro-intestinal bleeding with NSAIDs (Henry et al. 1996) or hip fractures with benzodiazepines (Pierfitte et al. 2001) to routine post-authorization surveillance and assessment of new drugs, made possible by the development of large population databases of electronic health records or claims data, which can cover many millions of patient-lives. These databases allow the description of usage patterns of new drugs as they are marketed, or of older drugs (Duong et al. 2014, 2016).

Pharmacoepidemiology has also contributed to better understanding and quantification of risks initially suspected from clinical trials, such as the cardiovascular risk related to rofecoxib (Graham et al. 2015), or from experimental data, such as bladder cancer with pioglitazone (Neumann et al. 2012). It has confirmed in real life the results of clinical trials in many studies for instance for direct-acting anticoagulants confirming the superior safety to warfarin (Graham et al. 2015) or antiplatelet agents confirming superior effects of ticagrelor to clopidogrel in similar patients (Blin et al. 2017, 2019a). They can also provide data on comparative effectiveness of drugs when no clinical trial exists (Blin et al. 2019b, c, d, e). Pharmacoepidemiological studies can also be used for systematic and comparative risk detection or quantification for selected adverse events such as hepatotoxicity (Gulmez et al. 2013a, 2015; Moore et al. 2019), or myocardial infarction (Duong et al. 2018).

Pharmacoepidemiology studies allow more precise evaluation of the association of exposure to drugs of interest with specific events of interest, using scientifically validated methods. The development of these resources provides considerably more leeway in the exploration of drug-related information, and studies can be tailored exactly to the question raised, and to the precise specifications of each data source. On the other hand, this requires knowledge of each database’s specificities, in addition to knowledge of pharmacology and therapeutics. Future developments will include the enrichment of claims or EHR databases with other information such as genetic or imaging, and the use of artificial intelligence or machine learning methods.

However powerful these tools may become, pharmacoepidemiology still requires an understanding of drug characteristics such as drug targets, mechanisms of actions and pharmacokinetics, in addition to drug safety and drug efficacy, and of the underlying diseases and disease characteristics and, of course, the statistical methods needed to explore these data sources, so as to avoid or obviate biases as much as possible.