Introduction

A properly designed nested case-control study with incidence density sampling is expected to yield effect estimates that are equivalent to results from a cohort study using the same data source [1, 2]. Nested case-control studies with incidence density sampling can be used to estimate causal effects, just like cohort studies, by emulating the approach that would be used to conduct a randomized controlled trial [2, 3]. Hernán and others developed the target trial framework as a heuristic tool for estimating causal effects from observational data [4,5,6]. By emulating a target trial, effect estimates from an analysis of observational data (under standard assumptions) should be identical to the effect estimates that would have been obtained from a randomized controlled trial (RCT) answering the exact same causal question, except for random variability [7, 8]. The target trial framework involves specifying the protocol of a hypothetical pragmatic randomized controlled trial (i.e., the target trial) [4]. The target trial protocol is then implemented using observational data, most often using a cohort study design [5, 7, 9, 10]. The target trial approach has been infrequently discussed in the context of observational data from case-control studies [2]. However, it is possible to extend the target trial heuristic to make causal inferences from case-control data.

There are clear conceptual links from randomized trials to cohort studies and from cohort studies to case-control studies [11]. A nested case-control study with incidence density sampling can be conceptualized as a cohort study that uses an efficient sampling approach to form a comparison group [3]. In a cohort study, denominators of incidence rates are calculated by counting person time contribution for individuals in the exposed and unexposed groups. In a nested case-control study with incidence density sampling, cases and controls are matched on follow-up time, thus each set of cases and controls contribute the same amount of person-time [3, 12,13,14]. For readers interested in a brief primer on case-control studies please see the Appendix. In a cohort study design, incident cases of disease are compared to all non-cases in the cohort [15]. In a nested case-control study with incidence density sampling, incident cases of disease are compared to a control group comprised of a sample of non-cases drawn from the study cohort (i.e., the risk set) at the time each case occurred [16]. Both study designs can be used to estimate the average causal effect of an exposure on an outcome, by examining a causal contrast of exposed cases and non-cases (controls) with unexposed cases and non-cases (controls). In a case-control study, the comparison group is called the control group, comprised of a sample of non-cases from the study base, whereas in a cohort study, the comparison group is comprised of unexposed participants in the cohort [2, 17]. In a cohort study, the unexposed individuals act as a stand-in for the unobserved counterfactual group, representing the experience of the exposed participants had they not been exposed [18,19,20]. In contrast, in a case-control study, the non-cases act as a stand-in for an unobserved counterfactual group that represent the experience of exposed and unexposed cases from the study base had they not become cases. Thus, conducting a valid case-control study is predicated, in part, on identifying and sampling an appropriate group of controls from the study base [16, 21, 22]. Ultimately, the validity of causal effect estimates from both cohort and case control studies depend on how well each comparison group approximates the unobserved counterfactual group [23,24,25].

In this manuscript, we present a new framework for estimating causal effects from case-control studies called the target cohort approach. The target cohort approach extends the existing target trial framework and uses the same criteria as the target trial approach. In the target cohort framework, we describe how to emulate a cohort study using a nested case-control design with incidence density sampling. Importantly, the cohort study being emulated in the target cohort approach has been developed using the target trial framework and therefore emulates a hypothetical randomized controlled trial. The target cohort framework is a heuristic tool for researchers seeking to make causal inferences from observational data when the study setting is suited to a nested case-control study with incidence density sampling. For example, in a study that utilizes outcome measures in an existing cohort, but involves collecting an expensive new exposure measure, the efficient sampling approach of a nested case-control study with incidence density sampling is appealing because it limits the exposure measurement to only a necessary subset of the full cohort. We demonstrate the target cohort approach using an example from a recently published randomized controlled trial examining the impacts of semaglutide, a type of GLP-1 medication sold under the brand name Ozempic, on adverse gastrointestinal events [26].

The Target Cohort Framework

The target cohort framework extends the target trial approach to emulate a cohort study using data from a nested case-control study with incidence density sampling [4,5,6]. The target cohort approach emulates a hypothetical cohort study that was developed using the target trial approach. See Fig. 1 for a conceptual diagram linking the target cohort approach and the target trial approach. The target trial framework outlines the necessary components for making causal inferences from observational data [4]. The components of the target cohort framework are the same as the target trial, but tailored to considerations that arise specifically for the case-control study design with incidence density sampling. Both target trial and target cohort frameworks require clear specification of eligibility criteria, treatment strategies, treatment assignment (randomization), follow-up period, outcomes of interest, causal contrast, and analysis plan [6]. Ultimately, the goal of the target cohort approach is the same as that of the target trial framework: to make causal inferences using observational data. In the next section we introduce the example that will be used to demonstrate how the target cohort approach can be used to emulate a prospective cohort study and corresponding target trial.

Fig. 1
figure 1

Conceptual diagram linking the target trial approach with the target cohort approach

Example: GLP-1 Medication and Adverse Gastrointestinal Events

To illustrate the use of the target cohort approach in a case-control study using incidence density sampling, we describe a case-control study to estimate the causal effect of GLP-1 receptor agonist medications (e.g., semaglutide, brand name: “Ozempic”) on the risk of adverse gastrointestinal events. The use of GLP-1 medications has increased substantially in recent years owing to strong evidence from randomized controlled trials demonstrating weight loss, improved management of diabetes, and lower risk of cardiovascular disease [26,27,28,29,30]. A recently published RCT by Lincoff and colleagues reported a hazard ratio of 0.80 (95% CI: 0.72, 0.90), indicating a lower risk of cardiovascular events among individuals with pre-existing cardiovascular disease taking semaglutide compared to placebo.

Evidence from semaglutide trials has also demonstrated increased risk of adverse gastrointestinal events, including pancreatitis and bowel obstruction potentially related to the use of this medication [31]. Rapid weight loss using GLP-1 medications can also result in cholelithiasis (gallstones) [32]. Randomized trials are often not powered to determine rates of adverse events because the number of participants included in a trial is often limited and follow-up duration is often relatively short [33]. Adverse GI events are a relatively rare outcome among GLP-1 users and can take some time to develop [31]. Case-control studies are particularly useful in this setting—a rare outcome requiring long follow-up duration [22]. In this example, we will emulate Lincoff et al.’s RCT but examine a secondary safety outcome, adverse GI events, rather than the original primary outcome in their trial, cardiovascular events [26]. Given the increasing availability of GLP-1 medications in the population, there are now large secondary data sources (e.g., electronic health records databases) that can be used to examine the causal effect of GLP-1 medication on adverse GI events using observational data.

To begin a target cohort analysis, we must specify an appropriate research question [6, 34]. Our case-control study is designed to estimate the causal effect of initiating GLP-1 medication on risk of adverse gastrointestinal events among individuals with body mass index (BMI) greater than 27 kg/m2 and pre-existing cardiovascular disease. We will describe a protocol for an RCT similar to Lincoff et al. [26], and describe how the elements of the trial can be emulated using a target trial and target cohort. Appendix Table S1 provides a worksheet that researchers can use to operationalize the target cohort approach, linking the randomized trial, prospective cohort, and case-control study with incidence density sampling.

Target Cohort Eligibility Criteria

The first step in the target cohort analysis is to describe the eligibility criteria to create the study cohort. Recall, a case-control study using incidence density sampling is conceptually the same as a cohort study but uses a sampling approach to assemble a group of non-cases. The protocol for describing eligibility criteria for the target cohort is the same as the protocol for describing the target trial. In Table 1, we describe the eligibility criteria for the RCT described by Lincoff and colleagues examining the use of semaglutide to prevent cardiovascular disease outcomes [26]. We also present the criteria that we would use to establish a target trial and target cohort emulating the RCT study population using a hypothetical EHR database. For the RCT, participants were at least 45 years of age, with body mass index (BMI) ≥ 27 kg/m2, and pre-existing cardiovascular disease, defined as previous myocardial infarction, stroke, or symptomatic peripheral artery disease [26]. Using data from an EHR database, we are able to emulate these inclusion criteria using data collected from routine clinical care encounters for the target trial and target cohort approaches. We can also emulate the exclusion criteria from the RCT using variables available in an EHR [26]. The exclusion criteria for the target trial and target cohort include observational analogs of self-report characteristics (e.g., planned coronary, carotid, or peripheral artery revascularization procedure vs. revascularization procedure documented in medical chart). There is a defined 5-year look-back window from the date eligibility is assessed for relevant inclusion and exclusion criteria related to medical history.

Table 1 Eligibility criteria for a randomized controlled trial, target trial, and target cohort

For the target cohort approach framed using a case-control design, there are additional eligibility criteria to consider related to who is eligible to be a case, who is eligible to be a control, and how many controls will be sampled per case. All individuals meeting the inclusion criteria to be in the study cohort are eligible to be a case and/or a control. Cases are any individuals in the study cohort that experience an incident outcome event (adverse gastrointestinal events) during the study period. The outcome of interest is described in greater detail in Step 5, below. Controls are individuals alive and sampled from the risk set (i.e., the eligible study cohort in the EHR) at the time the case event occurrence. The same individual can serve as a control multiple times and is eligible to become a case. There are differing opinions in the literature about how many controls to sample per case. Wacholder et al. recommended recruiting 4 controls per case based on the marginal gain in precision beyond a 1:4 case to control ratio, but others have demonstrated the benefits of including between 10 and 50 controls [22, 35]. In our illustrative target cohort analysis, we will recruit 4 controls per case.

Target Cohort Treatment Strategies

In the RCT, participants were randomly assigned to one of two treatment groups, subcutaneous semaglutide 2.4 mg per week or placebo injection [26]. An analogous treatment strategy must be identified for the exposure of interest in an observational context, either for the target trial or target cohort. As described in Table 2, in the observational context, medication initiation is indicated by first prescription date (2.4 mg semaglutide injection) recorded in the EHR. It is not possible to emulate a placebo-controlled trial using a target trial or target cohort approach. Non-placebo-controlled randomized trials may compare the treatment of interest to an active comparator, such as an alternative clinical treatment, or an inactive comparator, such as usual care [36, 37]. Huitfeldt et al. have discussed conditions for using an active comparator group in the context of a target trial [38]. The choice of active versus inactive comparators may have important implications for exchangeability. In this analysis, the comparison group is a group of individuals who are receiving usual care for weight management, most often diet or exercise advice. This approach emulates an unblinded pragmatic randomized controlled trial with treatment beginning at a defined time zero date (defined below, Step 4) for both the semaglutide and usual care group. In the target trial approach, treatment is measured for all participants meeting inclusion criteria. Participants are assigned to the treated group (exposed) only if their electronic health record indicates a prescription for semaglutide at time zero, and otherwise they are considered unexposed. In the target trial, untreated participants are all individuals meeting the eligibility criteria but without a prescription for semaglutide. In the target cohort approach, treatment is assessed only for cases and the controls sampled from the study cohort. Cases and controls from the eligible study cohort are assigned to a treatment group based on their EHR-recorded treatment at time zero (i.e., if they match the defined treatment strategies under comparison). In this context, we define semaglutide initiation as any prescription recorded for the medication. Imposing a length-based exposure requirement, such as medication use for one year, or a fixed number of prescriptions, such as 6 continuous medication refills, could introduce immortal time bias [39, 40].

Table 2 Strategies for defining and assigning treatment groups

Target Cohort Treatment Assignment

In an RCT, randomization is used to assign participants to a treatment group. If randomization is implemented correctly, treatment groups are assumed (on average) to be exchangeable and balanced according to both measured and unmeasured confounders. In the target trial and target cohort framework, analytic tools are used to emulate randomization and achieve conditional exchangeability [4]. Analytic approaches are used to achieve conditional exchangeability in observational data using measured covariates, ensuring balance between treatment groups and mimicking randomization [41,42,43,44]. At time zero, individuals are ‘assigned’ to a treatment group conditional on measured covariates. In this example, using electronic health records, time zero is an arbitrary calendar date chosen by the investigators. Individuals are considered to have been assigned to semaglutide if they have a prescription recorded on that date, otherwise will be considered to have been assigned to the usual care group at time zero. As a fictitious example, in an existing health record database, time zero could be January 1, 2022. This would be the date at which exposure status (treatment assignment) is assessed and the start of the accrual of person time for cases and controls.

In the target cohort approach, there are additional considerations related to achieving conditional exchangeability because of the case-control design and incidence density sampling approach [41, 45]. Specifically, confounder adjustment approaches must take into account the sampling approach: controls are matched to cases on follow up time and must be analyzed as pair-matched data, for example using conditional logistic regression [1]. Alternatively, if the sampling probabilities are known (i.e., the probabilities of being selected as a case or control from the eligible cohort), statistical analyses can account for the incidence density sampling by incorporating weights equal to the inverse of the sampling probabilities. In addition to matching on follow up time, matching on confounders is also common in nested case-control studies with incidence density sampling to improve statistical efficiency, facilitate conditional exchangeability, and reduce the risk of positivity violations [46,47,48,49]. Modern approaches to achieve conditional exchangeability that are compatible with case-control incidence density sampling include propensity score methods, inverse probability of treatment weighting, parametric g-formula (standardization), and g-estimation [42,43,44, 50,51,52]. For example, Matthay and colleagues used the parametric g-formula to adjust for confounders within a nested case-control study with incidence density sampling by incorporating sampling weights [45].

Target Cohort Follow-Up Period

The follow-up period for a typical RCT spans from randomization (treatment assignment) until the occurrence of the outcome of interest, censoring, withdrawal, or end of study. In a target trial, the start of follow-up is defined at cohort entry, often called study baseline, when the eligibility criteria are satisfied, and treatment assignment occurs (Table 2). The target trial framework explicitly describes that treatment assignment must occur at the same time as cohort entry to ensure time zero is well-defined and to avoid immortal time bias [5]. Time zero is the start of study follow up, the starting point for accrual of study outcomes. In a cohort study with primary data collection, time zero is typically the point at which individuals begin their participation in the study. This is usually when the baseline measure of exposure is obtained and the starting point of follow-up for the outcome of interest. In a cohort study using a secondary data source, like an electronic health record, time zero is often defined by the investigators as a specific date at which eligibility is assessed, baseline exposure is measured, and study follow-up for the outcome begins. In a case-control study nested within a cohort, regardless of whether it is a primary data collection cohort or a secondary analysis of a large database, time zero must be clearly specified by the investigators. In the context of a case-control study, time zero can refer to a specific calendar date that is the same for all study participants (i.e., Jan 1, 2022) or to a time that is indexed to a defined entry event for each participant (i.e., becoming 65 years old).

In a target trial, follow-up time usually spans from time zero and until the occurrence of the outcome, loss to follow-up, or administrative censoring at the end of study follow-up [4, 53]. In the target cohort approach, follow-up still must begin at a well-defined time zero to ensure alignment of eligibility status, treatment assignment, and matching cases and controls on follow-up time. Although a case-control study is an outcome-dependent sampling design, time zero does not correspond to the time the case event occurs. This would be analogous to an illogical scenario in which investigators in a trial randomly assigned individuals to a treatment group once they experienced the outcome of interest. The crux of the nested case-control design with incidence density sampling is that each time a case occurs, a control, or set of controls, is sampled from the risk set at that point in time. The length of follow up for that grouping of cases and controls is identical by design: from time zero (set to time zero of the target cohort) to the date the case event occurred (for cases) and the date the controls were sampled from the risk set (for controls) [54]. For controls, the date on which the case-event occurred is sometimes referred to as the index date, so the follow-up time can be defined as the time period from time zero to index date.

In the context of a nested case-control study with incidence density sampling, a helpful heuristic is to consider each case-control grouping as emulating its own hypothetical RCT, a miniature version of a target trial if you will. The target cohort approach can be conceptualized as emulating a series of hypothetical mini-trials occurring within the study cohort. There are as many mini-trials as there are case-events. Figure 2 illustrates four so-called mini-trials, corresponding with four outcome events in the study cohort. The person-time contribution of cases is indicated by a solid line and the person-time contribution of controls is indicated by dashed line. In the first hypothetical mini-trial, a case occurred at 5.5 months. Four controls were sampled from the risk set at the time the case occurred. Follow-up time for all cases and controls in mini-trial 1 spans from time zero (t0) to 5.5 months. Treatment assignment, defined as a prescription for semaglutide in the EHR (A = 1; blue) or not (A = 0; red), occurs at t0. A similar structure applies for mini-trials 2–4. At the bottom of Fig. 2, there are three participants who remain in the study cohort from time zero to administrative censoring at the end of study follow-up. They do not experience the outcome of interest nor are they sampled as controls, so their person-time is not included in the analysis of a nested case-control study with incidence density sampling.

Fig. 2
figure 2

Illustration of target cohort framework as emulating series of hypothetical mini trials. This is a visual representation of the target cohort approach. The study cohort follow-up spans from baseline, t0, to the end of study follow-up, administrative censoring, at 12mo. Consider each ‘grouping’ of cases and controls as an individual mini-trial. This illustration describes 4 case events (at 5.5 mo, 6 mo, 8.5 mo and 11.5 mo) and four controls sampled from the risk set at the time a case occurred. This results in four “mini trials” spanning from treatment assignment, emulating randomization, to the end of follow-up when the case-event occurred. In the figure, person time contribution by cases is indicated by a solid line and person time contribution by controls is indicated by a dashed line. Blue represents Semaglutide use (A = 1) and red represents usual care (A = 0). The person time contribution of the three individuals in the study cohort who did not experience a case event and were not selected as control is represented by the black dashed line.

A nuance of the nested case-control design with incidence density sampling is that controls may go on to serve as a case at a later point in time or even as a control for another case. If a control does go on to become a case, they are matched with their own series of four controls at the time their case-event occurs. Extending the analogy of emulating hypothetical mini-trials, individuals can contribute person-time as a control to a mini-trial and also serve as a case in another mini-trial or contribute person-time as a control to more than one mini-trial (Fig. 3).

Fig. 3
figure 3

Additional illustration of target cohort approach as a series of mini-trials. This figure describes three additional mini trials [14,15,16] including cases and controls sampled from the study cohort. For ease of explanation, the participant ID number for each case or control is included in parentheses in the figure (e.g., Case (010)). In a case-control study with incidence density sampling, a control may go on to be a case at a later point in time or a control may be selected as a control again at a later point in time. As illustrated, the participant with study ID 909 serves as a control in mini trial 15 at 5.5 months but then goes on to experience the outcome of interest at 9 months. They contribute 5.5 mo of person time on Semaglutide (A = 1) to mini trial 15 and 9 mo of person time on Semaglutide (A = 1) to mini trial 16. Further, the participant with study ID 140 was selected to be a control in mini trial 16, at 9 mo, and then also selected to be a control in trial 17 at mo 11. They contribute 9 mo of unexposed person time (A = 0) to mini trial 16 and 11 mo of unexposed person time (A = 0) to mini trial 17

Outcome of Interest

In RCTs, outcomes must be specified a priori. The CONSORT (Consolidated Standards of Reporting Trials) statement requires trial investigators to pre-specify primary and secondary outcome measures, whether physician adjudicated or patient-reported, and describe how they are assessed [55, 56]. Outcomes should be described in a similar level of detail in an observational study protocol. In our example, the outcome of interest in an RCT is incidence of any adverse GI outcomes, specifically biliary disease (including cholecystitis, cholelithiasis, and choledocholithiasis), pancreatitis (including gallstone pancreatitis), and bowel obstruction [31] (Table 3). In a target trial, these outcomes would be ascertained for exposed and unexposed members of the eligible cohort using ICD-10 diagnostic codes from the EHR. For our case-control study to emulate a target cohort, eligible cases are all individuals who have a diagnosis of the outcome of interest in the EHR, identical to the target trial. Cases are sampled from the study cohort during the follow-up period as described in steps 1–4 [57].

Table 3 Description of follow-up period and measured outcomes of interest

Causal Contrast

RCTs commonly estimate intention-to-treat effects, comparing individuals randomized to Semaglutide treatment to those randomized to placebo or usual care. In our target cohort analysis, we are interested in the observational analog of an intention to treat effect, examining a causal contrast between initiators and non-initiators of semaglutide at time zero [4, 26] (Table 3). However, there are other causal contrasts of potential interest in the context of target trial or target cohort approaches, including per protocol or as-treated effects [4, 53, 58]. Additionally, with a time-varying exposure, it is possible to examine contrasts considering changes in treatment throughout follow-up such as dynamic treatments using sequential nested trials [5, 59] and adaptive designs which involve change in treatment strategy in response to time-varying participant characteristics [53]. These extensions to the target cohort framework are possible so long as there are multiple measures of the exposure (treatment) of interest in the study cohort and treatment strategies are explicitly defined.

Analysis plan

In an RCT, assuming randomization was conducted appropriately, treatment groups are considered balanced on known and unknown confounders thus it is not necessary to control for confounding at baseline. In the observational analog, whether target trial or target cohort, regression adjustment can be used to adjust for confounding at baseline. In the target cohort and target trial approach, the analysis plan is therefore inherently linked to step 3 in which treatment assignment and the method of achieving conditional exchangeability is defined. In our target cohort, we use conditional logistic regression adjusted for measured confounders and conditional on the time-matching of each mini-trial group to estimate the odds ratio, which approximates the incidence density ratio, to examine the causal contrast of interest [26]. This analysis compares the frequency of adverse GI events in initiators and non-initiators of semaglutide. We must control for a sufficient set of confounding variables to achieve conditional exchangeability and attempt to emulate randomization [18]. The effect estimate from an analysis using the target trial or target cohort approach has a causal interpretation if the formal identifiability assumptions for causal inference are satisfied—including conditional exchangeability, positivity, consistency, no interference, and no model misspecification [18]. Of note, a conditional logistic regression model is appropriate in the present target cohort analysis, but many other analytic approaches from the causal inference literature can be applied to case-control studies with incidence density sampling [60, 61]. As discussed above, when the sample fractions for the cases and controls from the study base are known, the inverse of these fractions can be incorporated as weights in most statistical analyses to attain population-representative parameters [62].

Discussion

In this manuscript, we present a novel framework for estimating causal effects from case-control studies using an adaptation of the target trial framework [4, 5]. We conceptualize nested case-control studies with incidence density sampling as a series of mini-trials within a study cohort. The goal of the target cohort approach is to help investigators make explicit decisions about the eligibility criteria, treatment and sampling strategies, treatment assignment (randomization), follow-up period, outcomes of interest, causal contrast, and analysis plan when attempting to draw causal inferences from a nested case-control study with incidence density sampling. Theoretically, if using the same eligible study cohort, the target cohort approach will produce a causal effect estimate that is equivalent to the effect estimate from a target trial, apart from sampling and estimation error.

Case-control studies have sometimes been described in the epidemiologic literature as an inferior, or less valid, type of study design. Within the hierarchy of the evidence-based medicine pyramid, case-control studies are considered to provide lower quality evidence than cohort studies [63]. In the 1970s, Feinstein [64] described case-control studies as “trohoc” studies (cohort spelled backwards) reflecting an apparently backward approach to conducting research. However, over the past 50 years, there has been considerable methodological work demonstrating the advantages of case-control studies [1, 3, 65]. In part, the methodological evolution of case control studies has been driven by the understanding that case control studies are not simply ‘backwards’ cohort studies. Poole (1999) emphasized the concept of the “trohoc” fallacy, which manifests in two ways: 1) misplaced concern about the comparability of cases and controls, when the real concern is the comparability of controls and the study base and 2) mistaken assumption that the control group must be healthy and free of disease [17]. Case control studies are better conceptualized as an efficient study design involving sampling from a cohort, whether it is a clearly defined cohort or a hypothetical cohort. Others have described the case-control design as one in which information is missing at random for individuals in the population that are not included as cases or controls [66]. Many of the criticisms of case-control studies apply to traditional, cumulative case-control studies but not nested case-control studies using incidence density sampling [16]. For instance, in the cumulative case-control design, controls are sampled from individuals who remain non-cases when the study ends. This design makes it very difficult, if not impossible, to disentangle the effect of exposure on outcome from other factors such as loss-to-follow-up or selective survival [1]. However, using an incidence density sampling approach, the sampling probability for each control is proportional to the amount of time spent at risk for the outcome of interest and the sampling probability is independent of exposure [1]. Case-control studies with incidence density sampling can therefore be used to estimate causal effects with comparable validity to cohort studies.

Case-control studies with incidence density sampling are also an exceptionally efficient study design. Efficiency is defined by the amount of information a study produces relative to the size or cost [1]. A case-control study is more efficient than a cohort design because it includes all of the cases and a probability sample of controls that is representative of the study base that gave rise to the cases. This design over-represents cases relative to the total size of the cohort [1]. For a fixed number of cases, precision of a case-control study can be improved by recruiting a greater number of controls per case. With the increasing availability of secondary data sources, including ‘big data’, there are more opportunities to conduct case-control studies than ever before [65, 67, 68]. Case-control studies are a particularly efficient use of big data resources because they are efficient, in terms of cost, timing, and statistical efficiency. Although large secondary data sources may also provide increased opportunity for cohort studies as well, sampling non-cases from the study base can be a more efficient design.

Leveraging conceptual links between randomized trials, cohort studies, and case-control studies is key to estimating causal effects from observational data sources. By integrating foundational case-control concepts with modern approaches to causal inference, we aim to promote the use of case-control studies with incidence density sampling to estimate causal effects. Our goal in developing the target cohort approach is to improve the rigor and reproducibility of causal analyses from case-control study designs.