FormalPara Key Points

Observational research plays an important role in evaluating the clinical effects of diabetes treatment on all-cause mortality but many observational studies are based on designs or analyses that inadequately address the methodological challenges involved.

Most of these methodological challenges may be suitably addressed through application of appropriate methods.

Greater attention to the principles described in this review paper may serve to address many recurrent methodological issues and reduce the potential for spurious findings in the observational literature on the association between diabetes treatment and mortality.

1 Introduction

Recent years have witnessed a growing body of observational literature evaluating the effect of several glucose-lowering therapies on all-cause mortality, with results often inconsistent. Whereas in some observational studies the use of specific diabetes treatments have been associated with an increased mortality risk, others have found no association, while others have found beneficial effects (see electronic supplementary material [ESM], Tables 1–3). The availability of large databases, often derived from routine healthcare transactions, have promoted the multiplication of observational studies throughout the world to investigate the effect of therapeutics on health outcomes [1, 2]. However, if not rigorously conducted, investigations based on observational data can be affected by methodological problems that compromise their validity, with misleading results.

The objective of this review was to evaluate the study design and analytical methods employed in observational studies of the association between diabetes therapy and all-cause mortality, with a focus on whether specific methodological challenges were addressed, their potential impact on observed associations, and the possible approaches to manage these challenges.

2 Context and Scope

This review was conducted as one of several related reviews on the topic of methodological challenges for observational evaluation of the effect of diabetes therapy on different clinical outcomes. The other reviews are related to cardiovascular disease [3], malignancies, and kidney disease. Each review employed an independent search strategy, and used a different set of observational studies as the evidence base. The recurrence of methodological challenges, and the application of epidemiology principles to address them, may lead to some degree of overlap across these reviews.

3 Identification of Relevant Literature, Data Abstraction, and Data Review

We conducted a systematic search with a non-systematic extension to identify observational literature that evaluated the effects of diabetes therapy on all-cause mortality. This search used PubMed to identify articles published from January 2000 through December 2012, combining search terms for glucose-lowering medications, all-cause mortality, and features of study design and analysis, including methodological issues, likely to occur in observational research (see ESM, search strategy). We did not restrict to any specific type of primary diabetes and we broadly considered all observational studies independently of their use of primary or secondary data. We restricted the search to observational investigations on diabetes therapy with all-cause mortality as a primary or secondary outcome. We excluded non-English language studies, studies not conducted in humans, and those focusing on secondary diabetes or gestational diabetes (see ESM, search strategy).

The titles and abstracts of identified articles (N = 1366) were further screened for eligibility by three team members (EG, VG, EP), and in cases where the abstract was unavailable or provided insufficient detail, the full text was evaluated. Any discordance among reviewers was resolved by consensus. This systematic search led to 65 articles. We extended the systematic search by reviewing bibliographies of original articles, reviews, and meta-analyses to identify additional articles, and selectively searching other data sources, including the Cochrane database, Web of Science, and websites of medical societies, for the period January 2000 to December 2012. This non-systematic step led to the inclusion of two additional articles, for a total of 67 studies (Fig. 1), which included 11 studies evaluating the association between glucose-lowering medications and all-cause mortality as the primary and only outcome in a general diabetic population; 43 studies evaluating the association between glucose-lowering medications and all-cause mortality as the primary or secondary outcome, alone or in combination with cardiovascular outcomes, in a general population or in populations affected by pre-existing cardiovascular conditions, i.e. as a proxy for cardiovascular mortality; 13 studies evaluating the association between glucose-lowering medications and all-cause mortality as the primary or secondary outcome, alone or in combination with cancer outcomes, in a general population or in populations with cancer (i.e. as a proxy for cancer mortality).

Fig. 1
figure 1

Flowchart of included studies

Information on study type, exposure, outcome, effect on mortality, and relevant methodological aspects were abstracted from the 67 selected articles and summarized in table format (see ESM, Tables 1–3) by three members of the team (AP, EG, VG). Subsequently, articles were independently reviewed by three methodologically trained epidemiologists (EP, AP, JS) for a critical evaluation of the methodology, and then discussed at meetings for consensus. During these meetings, results of the review were discussed and a list of methodological issues and their possible management, compiled up to that point in time, was reassessed and updated based on potential new elements arising from the ongoing critical review of the literature. Any difference in opinion was discussed among a broader set of authors (EP, JS, SS, AP, EG, VG) to achieve consensus.

4 Major Methodological Issues

From the critical evaluation of the methodology of 67 observational investigations assessing the association between diabetes therapy and all-cause mortality, we identified the following recurrent methodological issues that may lead to biases: (1) trade-offs associated with the outcome of all-cause mortality; (2) incorrect temporal sequencing; (3) inadequate treatment of time-varying hazards and treatment duration effects; (4) unclear exposure risk window definition; (5) improper handling of exposures that change over time; and (6) incomplete accounting for confounding by indication.

4.1 Trade-Offs Associated with the Outcome of All-Cause Mortality

Disagreements over attributing cause to mortality can distract from the objectives of a research study; therefore, all-cause mortality may be preferred over cause-specific mortality due to its relatively unambiguous nature. Furthermore, information on the association between a therapeutic regimen and all-cause mortality may be of great interest to patients and healthcare professionals.

However, two main issues may contribute to determining whether the assessment of mortality is clinically relevant and methodologically accurate [4]. First, the value of all-cause mortality as an outcome varies with the stage of diabetes, and may be more useful in advanced stages of diabetes, where it more accurately approximates overall benefit and risk of treatment [4]; therefore, in less advanced stages of diabetes, specific causes of death may be preferable. Second, all-cause mortality may obscure potential insight related to biological mechanism of a drug effect and mediators of the effect (e.g. clinical events such as heart failure, myocardial infarction, and kidney disease) that might be targets for intervention themselves. Therefore, while all-cause mortality may represent a suitable proxy for cardiovascular disease overall, which accounts for more than 70 % of deaths in patients with diabetes [5], it will be less suitable for the identification of specific cardiovascular outcomes such as myocardial infarction or stroke, which represent clinically relevant outcomes. Following a similar logic, all-cause mortality may not adequately approximate non-cardiovascular outcomes, e.g. cancer risk or cancer mortality (see Sect. 5.2). If specific causes of deaths are not available, overall mortality will provide a better approximation of cancer mortality in studies restricted to diabetic populations with pre-existing cancer [6, 7]. However, even in such a setting, the contribution of cancer mortality to overall mortality might decrease with increasing levels of diabetes severity. Dehal et al. [7] described how cardiovascular mortality accounts for a greater percentage of deaths in patients receiving insulin than in patients not receiving insulin therapy; therefore, when the outcome of interest in a population with diabetes is a cause-specific mortality other than cardiovascular, specific strategies to deal with competing risks should be considered in order not to overestimate risks [8, 9].

In summary, the evaluation of all-cause mortality across different glucose-lowering medications may be preferred over cause-specific mortality due to its relatively unambiguous nature and clinical relevance. When the emphasis is on cause-specific outcomes, specific causes of death, when available, may provide useful insight into the mechanisms underlying a drug effect and also contribute to the elucidation of effect mediators.

4.2 Incorrect Consideration of Temporal Sequencing in Healthcare Databases

A temporal sequence that properly positions exposure relative to when covariates and outcomes are assessed is crucial for causal inference (Fig. 2) [10]. Baseline patient characteristics should be evaluated during the time preceding the first exposure to the drug of interest, and follow-up for outcome occurrence should start after criteria for cohort entry are met and exposure status determined. Only about 10 % of the reviewed studies presented an adequate temporal sequencing [1117].

Fig. 2
figure 2

Study temporal sequence. A temporal sequence that properly positions exposure relative to when covariates and outcomes are assessed is crucial for causal inference. Baseline covariates are assessed during the time preceding the first exposure to the drug of interest, and follow-up for outcome occurrence starts after criteria for cohort entry are met and exposure status determined

A clear chronology with appropriate sequencing of these factors reduces opportunities for including follow-up time during which events cannot occur, i.e. immortal time [18]. This event-free person-time may be incorrectly attributed to the drug exposure and lead to an apparent reduction in risk by diluting the treated person-time with time that, by definition, has no risk for study outcomes. Immortal time bias can occur when exposure status depends on information that is not yet known at the time of cohort entry but becomes known during study follow-up (Fig. 3). In many of the reviewed studies, exposure status is defined on the basis of future use of a specific diabetes treatment [1930], combination of treatments [20, 3033] or treatment dose [19, 3436]. This invites immortal time bias, particularly when coupled with a follow-up that begins before the exposure status with a specific diabetes treatment is defined. Examples in this regard come from studies that used a specific calendar date (time-defined cohort entry) [22, 23] or diagnostic definition (event-defined cohort entry) [20, 21, 2527, 29, 33, 37] to identify cohort entry and start of follow-up, but defined exposure status on the basis of future drug use ‘at any time during follow-up’ (Fig. 4) [31, 32, 34, 38]. Defining cohort entry by exposure to a drug of interest (exposure-defined cohort, especially a new-user cohort; see Sect. 4.3) substantially reduces the opportunity for immortal time bias to affect a study but does not remove it entirely. For example, if a study follows a cohort of drug initiators but then compares monotherapy with combination therapy, where these subgroups are identified by the use of particular drugs ‘at any time during follow-up’ [20, 3033], then immortal person-time has been created by this exposure definition. Alternately, if cohort entry is defined by more than one drug prescription/dispensing or more than a certain duration of treatment but follow-up starts at the first prescription/dispensing [24, 32, 39], then immortal person-time may be a source of bias.

Fig. 3
figure 3

Biased exposure status based on future information. An exposure status that depends on information that becomes known during study follow-up invites immortal time bias

Fig. 4
figure 4

Immortal time bias. In the example, the period of time between the start of the follow-up and the exposure, i.e. receipt of a drug of interest, is immortal as patients must survive until the first exposure to a drug of interest in order to be classified as users of that particular medication

Similarly, information on future exposures and particular events occurring during follow-up should not inform cohort inclusion or exclusion criteria. Several studies included or excluded patients on the basis of information collected during follow-up, such as the use of insulin after cohort entry [6, 24, 32, 34], use of glucose-lowering combinations [34, 40] or switching to other glucose-lowering therapy [24, 41]. By conditioning inclusion or exclusion criteria on the basis of future information, these studies have the potential for bias arising from immortal time acting through selection.

A clear chronology with appropriate sequencing of covariate assessment, cohort entry and exposure status definition, and follow-up initiation will also reduce the potential for bias that might arise through adjustment for covariates that have been affected by exposure to treatment, i.e. intermediate variables. When a covariate has been influenced by exposure to the drug being studied, adjusting for the covariate in the analysis has the effect of adjusting away some of the drug effect, typically resulting in a bias toward the null [42, 43] (Fig. 5). However, this bias is unpredictable and can lead to larger effects in particular situations, such as the scenario of collider bias [44]. Case–control studies in which patient characteristics that are used for covariate adjustment are measured during the time leading up to the date of case or control identification can be particularly subject to this bias [4547]. However, cohort studies are not exempt from this methodological concern [2022, 24, 27, 32, 34, 4852]. Examples of variables identified during follow-up that are likely to be intermediate between glucose-lowering therapy and overall mortality include comorbidities identified post-treatment, such as myocardial infarction and cancer [46, 53], drug utilization during follow-up [20, 22, 24, 27, 32, 34, 45, 46], and metabolic information such as mean body mass index (BMI) and HbA1c that are ascertained after cohort entry (and initiation of treatment) [36, 52].

Fig. 5
figure 5

Overadjustment bias (directed acyclic graph). Overadjustment bias occurs as a result of adjusting for an intermediate variable in the causal path between exposure and outcome, i.e. an outcome risk factor that has been influenced by the exposure

In summary, a study that defines cohort entry on the basis of the first exposure to particular drugs to be studied and involving a well-defined temporal sequence of covariate assessment, exposure definition, and start of follow-up, facilitates inferences regarding drug effects and reduces the potential for bias arising from immortal time and overadjustment.

4.3 Time-Varying Hazards and Treatment Duration Effects

Diabetes is associated with the development of cardiovascular, renal, and neurological outcomes, and may be associated with an increased risk of cancer development [54, 55]. The occurrence of these disease-specific outcomes develops on different time scales, including both overall age and duration of diabetes, and their contribution to overall mortality also varies according to these time scales. Similarly, risks and benefits that result from the use of medications may vary over time. Early susceptibility or intolerance to drugs may lead to rapid occurrence of medication-related adverse effects and discontinuation of the medication, potentially resulting in a ‘survivor cohort’ of prevalent drug users that is composed of people who are less susceptible to a range of outcomes by virtue of having passed through this early high-risk period, Furthermore, prevalent drug users who continue to take the medication may be demonstrating adherence to treatment that makes them more likely to do well on any therapy. Therefore, these individuals may spuriously appear to receive more benefit from the use of a drug when compared with patients who have just newly initiated a glucose-lowering agent. Similarly, when a new medication is initiated in response to diabetes progression, it is also possible that comparing prevalent drug users to new users may lead to findings of spurious drug benefit as patients escalating therapy may have higher baseline mortality risk than a survivor cohort of prevalent users with adequate glucose control. An existing data source is likely to contain some glucose-lowering medication users that represent such a survivor cohort; therefore, their inclusion will depend on the study design used.

An incident user design (or new user design), which identifies cohorts of patients at the time they start a new drug, is particularly well suited to evaluate drug effects that vary over time [10, 56, 57]. The well-defined start of follow-up in these cohorts of new users has the effect of ‘synchronizing’ patients on a same time scale that is relevant to the drug effect, making it possible to assess whether and by how much the risk of an outcome changes concomitantly with duration of use. Although a new user design is better suited to detect or assess time-dependent drug effects, less than 20 % of the reviewed investigations employed a new user design [1113, 15, 17, 24, 32, 35, 39, 41, 58]. A new user design should be applied to cohorts of patients initiating both the medication being studied and that to which it is compared, with the design being enhanced by a comparison medication (or class of medications) that is frequently used for patients at a similar stage of diabetes as this better distinguishes drug effects from disease effects (see Sect. 4.6). For example, if the study drug is typically used at more advanced diabetes stages, i.e. second- or third-line therapies, then a comparator medication that is similarly used is likely most appropriate. Conversely, comparing a second- or third-line glucose-lowering medication with a first-line glucose-lowering medication such as metformin leads to extensive methodological challenges, even for a new user cohort study, since only a subset of the cohort is appropriate for comparison (this subset may be explicitly identified by propensity score (PS) matching; see Sect. 4.6). A variant of the new user study design suitable for studies involving patients in more advanced diabetes stages compares patients switching or augmenting from a first-line diabetes treatment to the study drug with similar switchers/augmenters to a comparison drug [59].

The risk of developing clinical conditions that increase mortality may also vary over time according to duration or cumulative dose of diabetes therapy; thus, to account for these time varying hazards, an investigation should include analyses that estimate mortality risk according to duration or cumulative exposure to glucose-lowering agents. For this purpose, the number of drug dispensings and the dispensed dose (e.g. mg or IU) during the follow-up period can be used to estimate the cumulative exposure. Specifically, the person-time with similar exposure categories (defined as total dispensed dose or treatment duration) can be pooled together and the observed outcomes can be assigned to categories of cumulative exposure, e.g. subjects with a 1-year exposure to a glucose-lowering medication will be compared to subjects with 1-year exposure with a comparator agent [60]. In this analysis, subjects are balanced with regard to all baseline covariates and follow-up starts after the accumulation of pre-specified levels of dispensed dose or time on therapy. Stratified analyses by duration of follow-up could also assist in establishing whether the effect of the drug interacts with time of exposure. Less than 15 % of the reviewed studies included analyses assessing the effect of duration or cumulative exposure to glucose-lowering agents [13, 19, 24, 3436, 41, 61].

In summary, an incident user design increases the chances of identifying more comparable patients with respect to underlying risk of mortality, and is well suited to detect or assess time-dependent drug effects. To account for the time-varying mortality risk that might be associated with duration or cumulative dose of diabetes therapy, an investigation should include analyses that estimate mortality risk according to duration or cumulative exposure to specific glucose-lowering agents.

4.4 Definition of Exposure Risk Window

Glucose-lowering agents may contribute to overall mortality through different biological mechanisms, which can lead to increased or decreased risk of specific comorbidities. This underlying mechanism should be reflected in the exposure time window, i.e. the time period during which any harmful or beneficial effect with regard to a specific outcome can be attributed to a drug of interest. If a plausible biological mechanism cannot possibly lead to an increased or decreased mortality risk as a causal consequence of a short-term drug exposure, it is reasonable to consider a lag time between drug initiation and start of outcome follow-up. Such a lag time is highly dependent on background knowledge and the specific research hypothesis; therefore, it must be cautiously applied and carefully tailored to a particular study. The use of a lag period reduces the chances of protopathic bias (or alternatively termed reverse causation), which may occur when conditions with some preclinical phase, e.g. cancer, influence treatment choice. If a glucose-lowering medication is selectively chosen for patients with early signs of an underlying disease, e.g. decreased glycemic control caused by an undiagnosed malignancy, then the medication may appear to cause the disease or the disease-related mortality because it was initiated closer to the occurrence of an outcome than would a comparator medication (Fig. 6). Less than 10 % of the reviewed studies have considered lag periods before the start of outcome follow-up [24, 6264].

Fig. 6
figure 6

Protopathic bias (sometimes termed reverse causation) can occur when conditions with some preclinical phase, e.g. cancer, influence medication selection. In the example, a glucose-lowering therapy is selectively chosen for patients with early signs of an undiagnosed malignancy. In such a context, a medication of interest may appear to cause cancer-related mortality because it is initiated closer to the occurrence of the outcome than a comparator medication

The appropriate duration of the exposure risk window may also be quite relevant in the assessment of the effects of glucose-lowering medications on mortality. It is plausible, for example, that shortly before death, patients stop or change treatment. Thus, the exposure risk window is often extended for some time (latency or grace period), both to reflect an exposure latency period and to address the potential for treatment discontinuation or change close to an outcome (Fig. 7; see Sect. 4.5). Less than 10 % of the reviewed studies have considered latency periods after drug discontinuation [12, 13, 15, 17, 65, 66]. Sensitivity analyses evaluating the change in point effect estimates driven by the inclusion or exclusion of latency or grace periods can facilitate the interpretation of the findings and help test the robustness of the results.

Fig. 7
figure 7

Informative censoring. In the example, diabetes therapy is discontinued subsequently to early symptoms of a future outcome, and the observation is censored without any exposure latency period

In summary, in the assessment of the effects of glucose-lowering agents on mortality, it is recommended to clearly identify the biological hypothesis to test and to accordingly choose the appropriate exposure risk window, considering lag or latency periods, and conducting sensitivity analyses to address uncertainty.

4.5 Exposures that Change Over Time

Diabetes therapy is often characterized by high levels of treatment discontinuation, switching, or augmentation with a new agent. Diabetes treatment can change due to the advancement of the underlying diabetic condition or because patients may experience adverse effects associated with specific glucose-lowering agents. Both scenarios have the potential of leading to the observation of an increase in the risk of mortality shortly after therapy change or discontinuation, and may cause bias. An as-treated (AT) analysis, which terminates exposure to a medication at treatment discontinuation, is often the approach of choice for the assessment of the safety of therapeutics in observational studies. However, it may be prone to bias in contexts characterized by high rates of treatment non-adherence and strong association between non-adherence to specific diabetes therapies and mortality risk (Fig. 7). In this setting, discontinuation represents a form of informative censoring by being an early marker for a study outcome, which will be removed from the appropriate exposure category if exposure is terminated immediately upon discontinuation. Accordingly, an assessment of mortality timing relative to exposure can identify potential for informative censoring. Sensitivity analyses with varying grace periods can be used to address potential informative censoring [12, 13, 15, 17, 65, 66]. An intention-to-treat (ITT) approach, which analogously to an ITT analysis in a randomized controlled trial (RCT) carries forward the initial exposure status and disregards changes in treatment during follow-up, is not affected by informative censoring bias in the same way. However, it might be subject to exposure misclassification, which increases with longer follow-up periods and shorter time on treatment before discontinuation, and remains open to differential loss to follow-up [10, 67]. In most cases, such misclassification tends to reduce the effect of a medication and will produce conservative results [68, 69], which is an important motivation for the choice of an ITT approach as the primary analysis in RCTs evaluating drug efficacy. Although ITT analysis is by far the most commonly used approach across the recent observational literature on diabetes therapy and all-cause mortality, it may be worth considering results arising from both AT and ITT analyses, in light of the strengths and limitations inherent in each approach. Only a few reviewed studies used both the approaches [15, 17].

To account for time-varying exposures, several studies utilized Cox models analyzing exposure to drugs as a time-dependent variable [11, 14, 35, 65, 66, 70]. These models make the assumption that treatment changes are independent of the outcome and may lead to biased results in the presence of patient characteristics that vary over time (time-varying confounders), affecting both diabetes treatment choice and mortality risk [71], a likely scenario in the context of a chronic disease requiring therapy adjustments commensurate to its natural progression. Few studies have accounted for time-dependent confounders [62]. Suitable strategies exist to address time-dependent confounding, such as marginal structural models or G-computation. However, these methodologies require extensive programming and, as with any observational method, are based on the assumption that important predictors for treatment can be identified and are available in the data source (assumption of no unmeasured confounding). We refer interested readers to specialized papers dedicated to these topics [7275].

In summary, in light of the strengths and limitations inherent in each approach, both AT and ITT analyses should be considered in evaluating the effects of glucose-lowering agents on overall mortality. Sensitivity analyses assessing when most deaths occur can assist in identifying informative censoring. When selecting the strategy to account for time-varying exposures in the analysis, the researcher may be faced with a trade-off between methodological transparency and the ability to address confounding.

4.6 Confounding

In pharmacoepidemiological studies, confounding by indication [76], or drug channeling bias [77], is one of the most important threats to validity. Physicians prescribe drug treatments in light of the diagnostic and prognostic information available at the time of prescribing. If predictors of patient outcomes are unevenly distributed among treatment groups, then failing to control for such factors will result in confounded estimates of the differences between them [2, 78].

Diabetes mellitus is an established risk factor for all-cause and cardiovascular mortality [7982], and although the nature of the association remains unclear [83], it has also been shown to be associated with a decreased survival after cancer diagnosis [84]. Thus, patients treated with glucose-lowering medications used later in the course of diabetes, such as second- or third-line agents, might experience greater mortality risk compared with patients managed with diet alone or with first-line therapy, which can lead to confounding (Fig. 8) if not taken into account in the study design. Confounding will be minimized by the choice of a proper comparator group, i.e. with a similar stage of diabetes and medical surveillance. Comparison groups that have been used in the recent literature include non-diabetic individuals [7, 25, 26, 8587] (less than 10 % of the reviewed studies), untreated diabetic patients [9, 22, 46, 63, 8890] (less than 10 % of the studies), diabetic patients receiving a comparator drug [1113, 1517, 19, 20, 23, 24, 3033, 3941, 45, 4852, 6466, 70, 9194] (about 50 % of the studies), and combinations of the above, often in the form of any identifiable patient that did not use a specific agent of interest [6, 14, 21, 2527, 3638, 47, 61, 62, 95103] (approximately one-third of the reviewed studies).

Fig. 8
figure 8

Confounding by duration or severity of diabetes (directed acyclic graph). In the example, a glucose-lowering drug is selectively chosen for patients with higher diabetes severity who are therefore more likely to die than patients receiving a comparator medication, so that the observed drug effect is a combination of drug effect and selection effect

Non-diabetic individuals are not likely to be an appropriate comparison group because unmeasured lifestyle factors such as diet, exercise, socioeconomic status, and BMI, as well as unmeasured severity of underlying comorbidities, are more likely to be unbalanced between patients with and without diabetes. Untreated diabetic patients are not optimal comparators either. Patients with diet-controlled diabetes typically have less severe disease than drug users, and thus are at lower risk of mortality; or, conversely, diabetic individuals not receiving medications might have differential barriers to treatment and surveillance for comorbidities compared with medication users, and thus be at higher mortality risk. Both these scenarios will increase the plausibility of unmeasured confounding as an alternate explanation for an observed association between diabetes therapy and all-cause mortality.

Comparisons restricted to treated patients with diabetes offer improved confounding control, particularly where they include drug new users (or alternatively new switchers/augmenters) that are prescribed in clinical practice to patients with overlapping baseline characteristics [104]. Patients initiating therapies more commonly used at advanced diabetes stages, i.e. second- or third-line therapies or combinations of two or more agents, might be characterized by increased diabetes severity compared with a first-line agent such as metformin or an agent used as monotherapy [11, 20, 3033, 45, 52, 65, 90]. Thus, when the exposure of interest is represented by second- or third-line therapies or specific combinations of two or more glucose-lowering agents, choosing initiators of alternative second- or third-line agents, or combinations, as comparators can reduce confounding. Such a study design strategy facilitates the choice of a proper comparator group, i.e. one that is characterized by similar healthcare utilization, medical conditions, and diabetes duration or severity. Nevertheless, less than 10 % of the reviewed studies employed this strategy in the study design [12, 13, 17, 40, 49, 94].

Once a comparison group has been selected, several strategies at the study design or analysis level can be implemented to further control for confounding, including restrictions in the study population, matching, stratification, weighting, or regression models. Many adjustment methods are constrained by the number of covariates that can be accounted for per outcome [105]. In the setting of a large number of potential confounders such as found in research with large healthcare utilization databases, PS methodology is an effective strategy for confounding adjustment, especially when the study outcomes are relatively uncommon [106]. A PS is the estimated probability of receiving one treatment exposure versus another [107] conditioning on a set of predefined covariates, which in the context of diabetes therapy will emphasize healthcare utilization measures and variables capturing diabetes severity and duration, e.g. cardiovascular comorbidities and diabetes complications such as neuropathy, nephropathy, and retinopathy. This score can be used to reduce confounding via matching, stratification, regression adjustment, or some combination of these strategies [108]. PS matching in particular offers investigators the ability to balance treatment groups across all potential confounders and to inspect the achieved balance across covariates by comparing these variables before and after matching in a similar manner to the comparison of randomized treatment groups in an RCT [109, 110]. Covariate balance inspection and metrics [111] and complementary strategies such as stratification by PS levels [112] have been recommended to assess whether PS matching has properly worked in reducing confounding. The high-dimensional PS algorithm, an automated extension of PS methodology, which empirically selects covariates across thousands of diagnostic, procedural, and drug treatment codes, may help the researcher in identifying proxies for confounders, which are unmeasured in healthcare claims data, ultimately addressing aspects of unmeasured confounding [113115].

In summary, the choice of a proper comparator group, i.e. one that is characterized by similar healthcare utilization, medical conditions, and diabetes duration or severity, combined with PS methodology, is an effective strategy to balance important baseline risk factors across treatment groups and reduce confounding. Covariate balance inspection and complementary strategies such as stratification by PS levels are useful tools to assess whether PS matching has properly worked to reduce confounding.

5 Other Issues

5.1 Medical Surveillance or Detection Bias

Differential surveillance for outcomes represents a plausible scenario that could lead to the detection of more outcomes among some patients in comparison with others, thus leading to bias, particularly for outcomes with a long detectable pre-clinical phase (e.g. many cancers). Although largely removed from consideration for an outcome such as all-cause mortality, it remains possible for differential surveillance to affect overall mortality through a mechanism that involves differential identification of an intermediate condition (e.g. cancer) that is diagnosed at an earlier stage where treatment is more successful so that time to death is extended.

In such a context, if patients are characterized by differential medical surveillance as a consequence of their diabetic condition or their diabetes treatment, this can precipitate cancer diagnosis at less or more advanced stages and thus lead to differential mortality risk. This scenario is particularly problematic in the context of comparisons with patients ostensibly characterized by different medical surveillance, e.g. non-users [9], or non-diabetic patients [7, 25, 26, 86]. Patients with pre-existing diabetes may have different use of cancer screening compared with non-diabetic populations and more advanced cancer stages at diagnosis [116119]. This scenario may confer increased mortality risk and could lead to bias in the setting of studies evaluating mortality risk in patients with pre-existing cancer with or without diabetes [7, 25, 26, 86].

Furthermore, certain diabetes treatments have the potential to increase medical surveillance and lead to earlier cancer detection following treatment initiation. Patients receiving metformin or glucagon-like peptide-1 agonists might experience gastrointestinal effects, prompting diagnostic work-ups such as colonoscopy and thus earlier detection of cancers at less advanced stages than users of other diabetes treatments or non-diabetic patients. This scenario may be at play in several studies evaluating mortality risk in diabetic patients with pre-existing cancer receiving different diabetes treatment [6, 29, 38, 103].

In the setting of the assessment of diabetes therapy on overall mortality in populations with pre-existing cancer, the choice of a comparator group that is characterized by a similar intensity of medical surveillance, along with accounting for cancer stage [6, 26, 29], will reduce the detrimental consequences of differential medical surveillance.

5.2 Misclassification of Exposure, Outcome, and Covariates

Claims data record accurately dispensed medications with respect to date and quantity dispensed and are considered to have generally better data quality than self-reports and physician notes [120123]. Nonetheless, many of the reviewed studies relied on self-reported information [7, 22, 29] and inpatient or outpatient medical records [11, 16, 31, 39, 45, 46, 4851, 94, 96, 98] to investigate the effect of glucose-lowering agents on mortality, which may lead to exposure misclassification. However, even in the setting of claims data, chronic therapies with multiple refills, such as glucose-lowering medications, can undergo some form of misclassification depending on how the days’ supply is calculated and how long of a latency period is considered after drug discontinuation. These aspects paired with an understanding of the pharmacokinetics and pharmacodynamics of the drug of interest should be considered to limit the chances of exposure misclassification.

Misclassification of the outcome is generally not a significant instance in observational studies evaluating all-cause mortality. However, misclassification may arise in studies assessing overall mortality as a proxy for specific causes of death. While it may be reasonable to assume that all cause-mortality is an approximation for cardiovascular mortality in the context of a diabetic population, this assumption might not hold for all-cause mortality as a proxy for cancer-related mortality [41, 64], as shown by two observational studies performed in diabetic populations without pre-existing cancer [9, 61]. Thus, whenever specific causes of death are the endpoint of interest, mortality from specific causes, when available, becomes a preferred outcome measure to overall mortality.

Misclassification of confounding variables can lead to incomplete control of these variables and, ultimately, residual confounding. Thus, prioritizing sensitivity instead of specificity for covariate definition might be a helpful strategy to minimize the chances of confounder misclassification.

6 Discussion

In a review of observational studies evaluating the association between glucose-lowering medications and all-cause mortality, we identified several methodological issues that can affect inferences, including limitations inherent to the outcome of all-cause mortality, incorrect temporal sequencing in administrative databases, inadequate treatment of time-varying hazards and treatment duration effects, unclear definition of the exposure risk window definition, improper handling of exposures that change over time, and incomplete accounting for confounding by indication (Table 1). The following considerations could be helpful in reducing the impact of these methodological issues: (1) carefully consider the trade-offs associated with the outcome of all-cause and cause-specific mortality; (2) use a cohort entry defined by the first exposure to the drugs of interest (exposure-defined cohort) and follow a well-defined temporal sequence of covariate assessment, exposure definition, and start of follow-up to facilitate quantification of the effects of glucose-lowering medications and reduce potential for immortal time bias; (3) employ a new user design based on drug initiators to increase the chances of identifying more comparable treatment groups with respect to the underlying risk of mortality, and to account for medication effects that vary over time; consider analyses that estimate mortality risk according to duration or cumulative exposure to specific glucose-lowering agents; (4) clearly identify the biological hypothesis to test and accordingly choose the exposure risk window; use sensitivity analyses on the lag time and latency period; (5) in light of the strengths and limitations inherent in each approach, consider both AT and ITT approaches and use sensitivity analyses to assess informative censoring; recognize the trade-offs in selecting a strategy for dealing with time-varying exposures; and (6) choose an appropriate comparator group, i.e. characterized by similar healthcare utilization, medical conditions, and diabetes duration or severity, and utilize appropriate methods to account for confounding; check covariate balance and use complementary strategies such as stratification by PS levels to assess whether PS matching has properly worked to reduce confounding.

Table 1 Recurrent methodological challenges identified in the reviewed literature, suggested strategies to address the challenge, and specific rationale

7 Conclusions

The methodological issues identified in this review may be adequately addressed through application of appropriate methods; therefore, greater attention to the principles described in this paper is warranted. The implementation of suitable research methods can reduce the potential for spurious findings and thus the risk of misleading the medical community about benefits and harms of diabetes therapy.