Exact-matching algorithms using administrative health claims database equivalence factors for real-world data analysis based on the target trial emulation framework

Nishioka, Yuichi; Morita, Emiri; Takeshita, Saki; Tamamoto, Sakura; Myojin, Tomoya; Noda, Tatsuya; Imamura, Tomoaki

doi:10.1007/s10742-024-00322-9

Exact-matching algorithms using administrative health claims database equivalence factors for real-world data analysis based on the target trial emulation framework

Research
Open access
Published: 02 February 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Health Services and Outcomes Research Methodology Aims and scope Submit manuscript

Exact-matching algorithms using administrative health claims database equivalence factors for real-world data analysis based on the target trial emulation framework

Download PDF

Yuichi Nishioka^1,2,
Emiri Morita¹,
Saki Takeshita¹,
Sakura Tamamoto¹,
Tomoya Myojin¹,
Tatsuya Noda¹ &
…
Tomoaki Imamura¹

761 Accesses
Explore all metrics

Abstract

Real-world data have become increasingly important in medical science and healthcare. A new, effective, and practically feasible statistical design is needed to unlock the potential of real-world data that decision-makers and practitioners can use to meet people’s healthcare needs. In the first half of the study, we validated our proposed new method by simulation, and in the second half, we conducted a clinical study on actual real-world data. We proposed the “Exact Matching Algorithm Using Administrative Health Claims Database Equivalence Factors (AHCDEFs)” using a target trial emulation framework. The simulation trials were conducted 500 times independently, considering the misclassification and chance errors of all variables and competing events of outcome. Two conventional methods, multivariate and propensity score analyses, were compared. Next, we estimated the effect of specific health guidance provided in Japan on the prevention of diabetes onset and medical expenditures. Our proposed novel method for real-world data returns improved estimates and fewer type I errors (the probability of erroneously determining that there is a difference when, in fact, there is no difference) than conventional methods. We quantitatively demonstrated the effectiveness of specific health guidance in Japan in preventing the onset of diabetes and reducing medical expenditures during five years. We proposed a new method for analyzing real-world data and an exact-matching algorithm using AHCDEFs. The larger the number of patients available for analysis, the more the AHCDEFs that can be matched, thereby removing the influence of confounding factors. This method will generate significant evidence when applied to real-world data.

Comparison of two propensity score-based methods for balancing covariates: the overlap weighting and fine stratification methods in real-world claims data

Article Open access 03 June 2024

An Attempt to Replicate Randomized Trials of Diabetes Treatments Using a Japanese Administrative Claims and Health Checkup Database: A Feasibility Study

Article Open access 01 February 2023

Pragmatic Randomized Trials Using Claims or Electronic Health Record Data

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Background

Real-world data have become increasingly important in medical science and healthcare (Mellinghoff et al. 2022; Sun et al. 2018). Administrative claims databases contain larger data in many countries, and have great potential for new scientific discovery, solving problems, and making decisions that are otherwise unfeasible (Hernán and Robins 2016). Meanwhile, new, effective, and practically feasible statistical designs are needed to unlock the potential of real-world data; decision makers and practitioners can apply the results and conclusions to better meet the medical and healthcare needs of our society (Frieden 2017; Baumfeld Andre et al. 2020).

With the target trial emulation (TTE) framework, an increasing amount of real-world data is being utilized (Matthews et al. 2022; Xie et al. 2020; Madenci et al. 2021; Takeuchi et al. 2021). However, TTE is not always properly understood. The idea of TTE is not to bring observational studies closer to randomized controlled trials (RCTs), but to bring clinical studies, including observational studies and RCTs, closer to the “ideal study” that can settle a research question (Hernán and Robins 2016). For example, in an ideal comparison, the same people are divided and different interventions are used to compare outcomes. This is clearly an ideal study, but an impossible one. Therefore, in RCTs, randomization is used, which is mimicked in propensity score (PS) analysis (Liang et al. 2014). Randomization is a method of approaching the ideal, and not the goal. Based on the TTE approach, a study has greater comparability if all factors are matched, in all pairs, between the comparison and control groups.

One study suggests that the correctness of the estimates requires designing the study and analyzing the data based on principle (Hoffman et al. 2022). The purpose of this study is to verify a new method for real-world data analysis, based on the TTE framework.

2 Methods

In the first half of the study, we validated the new method by simulation, and in the second half, using this method, we conducted a clinical study on actual real-world data. As the subject of the latter half of the study, we estimated the effect of the specific health guidance provided in Japan, on the prevention of diabetes onset and medical expenditures. This subject was chosen because the effect of lifestyle guidance in preventing the onset of diabetes is almost self-evident (Muramoto et al. 2014), making it difficult to conduct an RCT from an ethical perspective. Therefore, real-world evidence is required to estimate this effect. This observational cohort study used administrative claims data and was approved by the Ethics Committee of Nara Medical University (approval no. 1123–7, October 8th, 2015).

2.1 Outline for a new method: exact-matching algorithm using administrative health claims database equivalence factors

This novel method is based on a very simple idea: according to the TTE concept, an ideal study in terms of comparability (= Target Trial) should have all covariates, except exposure, matched in both groups. However, the key to this simple approach is to match each of the covariates, not just the representative values of the covariates, as in an RCT or the PS in an observational study. This differs from traditional statistical methods, in that all measurable factors, including interactions of any order, are controlled. It is also characterized by the fact that the differences in the covariates are always zero and not distributed, regardless of the stratification of the analysis. Administrative health claims database equivalence factors (AHCDEFs) are collected from administrative claims databases and can confound exposure and outcome. The weighting factor for the group with a smaller number of people therein without exposure, and the group with exposure, among those with perfectly matched AHCDEFs, was set to 1, so that the sum of the weighting factors for the group without exposure and the group with exposure, was equal. For example, consider a simple case in which the AHCDEFs are age, gender, and systolic and diastolic blood pressure: consider a 42-year-old male with a systolic/diastolic blood pressure of 130/80 mmHg. The control for that case is a 42-year-old male with a blood pressure of 130/80 mmHg. Here, all four items are matched in the case and the subject; along with the four items, the 11 interaction terms of any combination of the four items are also controlled. The difference in this approach, compared to the conventional method that assumes agreement of representative values in the population, is that we are able to control for interaction terms of any order in the model.

2.2 Simulation procedure

The trials were conducted 500 times, independently (${\text{t}}=\mathrm{1,2}, \cdots , 500$) considering the misclassification and chance errors of all variables and competing events of outcome. The simulation process followed a sufficient causal model (Liang et al. 2014). Thus, it is only when there are equal or more than one sufficient causes that an event occurs. For each patient (${\text{n}}=1, 2, \cdots , 50 000$), we constructed a confounding model of exposure (${X}_{t1n}$) and outcome (${{\text{Y}}}_{tn}$) of interest. That is, there was no difference in the true proportions of outcome occurrence between the groups with and without exposure, and the true odds ratio of outcome occurrence for the group exposed to the group without exposure was 1. ${X}_{tin}$ is the low-to-moderate correlated variable. ${X}_{t1n}$ is observable, ${X}_{t 2 n}, \cdots , {X}_{t 100 n}$ contain both observable and unobservable variables. Any of the latter 99 variables causes Y. to estimate the effect of ${X}_{t1n}$ on ${{\text{Y}}}_{tn}$, when only some component causes are observable and known and there may be some noncausal factors being mistaken as causal factors. Thus, we have

$$P\left({{\text{Y}}}_{tn}=1|{X}_{tin},{{\text{K}}}_{ti} \right)=\frac{exp\left({\beta }_{t1}{X}_{t1n}+\sum_{i=2}^{100}{\beta }_{ti}{X}_{tin}{{\text{K}}}_{ti}\right)}{1+{\text{exp}}\left({\beta }_{t1}{X}_{t1n}+\sum_{i=2}^{100}{\beta }_{ti}{X}_{tin}{{\text{K}}}_{ti}\right)}$$

If ${{\text{X}}}_{tin}$ is an observable and known risk factor, ${{\text{K}}}_{ti}=1$, otherwise ${{\text{K}}}_{ti}=0$ (${\text{i}}=2, \cdots , 100$). For some ${\text{i}}, {{\text{K}}}_{ti}=1$ and ${{\text{X}}}_{tin}$ is a sufficient cause of Y. ${\beta }_{ti}$ is the estimated effect of ${X}_{tin}$ on ${{\text{Y}}}_{tn}$ if ${{\text{K}}}_{ti}=1$. In summary, ${X}_{t1n}$ is a factor that is observed and is not a sufficient cause of Y. Among the remaining 99 ${X}_{tin}$, there is at least one that is a sufficient cause of Y with ${{\text{K}}}_{ti}$= 1. When ${{\text{K}}}_{ti}$= 0, ${X}_{tin}$ is either unobservable or observable (as not a risk factor).

2.3 Details of the simulation

To set low-to-moderate correlated variables ${X}_{t 1 n},{X}_{t 2 n}, \cdots , {X}_{t 100 n}$, independent random sampling of ${{\text{V}}}_{tin}$ and ${{\text{U}}}_{tn}$ was performed from uniform [0,1) distributions. All ${{\text{V}}}_{tin}$ are 500 × 100 × 50,000 independent samples, and all ${{\text{U}}}_{tn}$ are 500 × 50,000 independent samples. Independent random sampling of ${{\text{P}}}_{ti}$ was performed from uniform [0.3,0.6) distributions. All ${{\text{P}}}_{ti}$ are 500 × 100 independent samples. For $j$=$1, \cdots , 9,$ ${{\text{A}}}_{tji}$ takes on a random value, drawn from a uniform [0.5, 0.9) distribution. All ${{\text{A}}}_{tji}$ are 500 × 9 × 100 independent samples. ${G}_{tjin}$ is an unknown value based on an exact threshold value, with no misclassifications. ${G}_{tjin}$=1, if a linear combination of variables ${{\text{V}}}_{tin}{{\text{P}}}_{ti}+{{\text{U}}}_{tn}\left(1-{{\text{P}}}_{ti}\right)$>${{\text{A}}}_{tji}$. ${G}_{tjin}$=0, if a linear combination of variables ${{\text{V}}}_{tin}{{\text{P}}}_{ti}+{{\text{U}}}_{tn}\left(1-{{\text{P}}}_{ti}\right)\le {{\text{A}}}_{tji}$. ${X}_{tin}$ is the known value, based on the expected threshold value of a uniform [0.5, 0.9) distribution, that is, 0.7 with some misclassifications. ${X}_{tin}$= 1 if the linear combination of variables ${{\text{V}}}_{tin}{{\text{P}}}_{ti}+{{\text{U}}}_{tn}\left(1-{{\text{P}}}_{ti}\right)$>$0.7$. ${X}_{tin}$= 0 if a linear combination of the variables ${{\text{V}}}_{tin}{{\text{P}}}_{ti}+{{\text{U}}}_{tn}\left(1-{{\text{P}}}_{ti}\right)\le 0.7$.

${C}_{tji}$ takes a random value, drawn from the Bernoulli distribution, with a probability of success of 0.05. All ${C}_{tji}$ are 500 × 9 × 50,000 independent samples, which indicate whether a linear combination of variables ${{\text{V}}}_{tin}{{\text{P}}}_{ti}+{{\text{U}}}_{tn}\left(1-{{\text{P}}}_{ti}\right)$ is a component of the $j$ th possible sufficient cause. Let ${O}_{tjn}=1$, when all components for the jth possible sufficient cause are active in the $n$ th observation of $t$ th trial, that is $\sum_{i=2}^{100}{G}_{tjin}{C}_{tji} =\sum_{i=2}^{100}{C}_{tji}$. If $\exists {\text{t}},j$ $\sum_{i=2}^{100}{G}_{tjin}{C}_{tji} <2,$ all variables are redetermined through the same $j$ th random process in the ${\text{t}}$ th trial. ${F}_{tj}$ takes on a random value, drawn from the Bernoulli distribution, with a probability of success of 0.5. All ${F}_{tj}$ are 500 × 9 × 50,000 independent samples, which show whether each of the nine possible sufficient causes is a real sufficient cause. If there is no real sufficient cause, that is, $\exists \mathrm{t }\sum_{j=1}^{9}{F}_{tj}$= 0, then all variables are redetermined through the same random process in the ${\text{t}}$ th trial.

${E}_{tn}$, and ${Q}_{tn}$ take a random value, drawn from the Bernoulli distribution, with a probability of success of 0.001. All ${E}_{tn}$ are 500 × 50,000 independent samples, which indicate competing events for the outcome. All ${Q}_{tn}$ are 500 × 50,000 independent samples, indicating a small error in the outcome.${{\text{Y}}}_{tn}=1$ where ${Q}_{tn}$= 1, and where $\sum_{j=1}^{9}{O}_{tjn}{F}_{tj}\ge 1$ and ${E}_{tn}=0$; otherwise ${{\text{Y}}}_{tn}=0$. ${{\text{K}}}_{ti}$ is a random value, drawn from the Bernoulli distribution, with a probability of success of 0.1 + $\sum_{j=1}^{9}{C}_{tji}{F}_{tj}$, which indicates whether ${X}_{ti}$ is an observable and known risk factor for the outcome.

2.4 Adjustment by new methods

In the above model, there was no difference in the true proportion of outcome occurrence between the groups with and without exposure. However, the bias in the model resulted in an apparent difference in the proportion of outcomes. To visualize the extent to which the bias was adjusted by our proposed method, we simulated $P\left({{\text{Y}}}_{tn}=1|{X}_{t1n}=1\right)-P\left({{\text{Y}}}_{tn}=1|{X}_{t1n}=0 \right)$ with and without adjustment.

2.5 Comparison with conventional methods

Two conventional methods, multivariate analysis and PS, were compared using this method (Seeger et al. 2005; Schneeweiss et al. 2009). In the model, the true odds ratio of outcome occurrence for the group exposed to the group without exposure was 1. Specifically, the odds ratios ${\beta }_{ti}$ of the three methods were compared. Type I error is the probability of erroneously determining that there is a difference when, in fact, there is no difference. Similarly, the probabilities of Type-I errors for the three methods were determined. In the PS analysis, PSs themselves were used as covariates in the regression model, because the variance in both groups was not very different (Rubin 1979).

2.6 Application to real-world data: data source

In the demonstration of the methodology, the study cohort comprised anonymized data of individuals in the Kokuho database (KDB) of Nara prefecture in Japan. This data provided information on personal identifiers, date, age group, sex, description of the procedures performed, World Health Organization International Classification of Diseases (ICD-10) diagnosis codes, medical care received, medical examinations conducted without the results, prescribed drugs, and specific health check-ups, including results from 2013 to 2021.

2.7 Study population

We included data on individuals who underwent specific health check-ups between April 2014 and March 2021. Those who were not followed up in the past year and those who were prescribed diabetic medications in the past year were excluded.

2.8 Definition of diabetes

A validated algorithm was used to define diabetes based on claims data from Japan. This algorithm (74.6% sensitivity and 88.4% positive predictive value), for detecting people with diabetes, had three elements: the diagnosis-related codes for diabetes without the “suspected” flag, the medication codes for diabetes, and these two codes on the same record (Nishioka et al. 2022). This algorithm cannot detect people with diabetes who have not consulted a doctor and those on diet and exercise therapy only, but it can identify most of them on medication.

2.9 Effect of the specific health guidance

Among those who received the specified health check-up during the six-month period, those who received health guidance were identified and designated as the health guidance group. Those whose age, sex, body mass index (BMI), abdominal circumference, medical expenses in the past year, the number of days of outpatient visits in the past year, and the number of days of hospitalization in the past year matched those of the health guidance group were identified and designated as the control group. In the health guidance and control groups, odds ratios for new onset diabetes were calculated. Generalized estimating equations were used in the analysis, and a binomial distribution was assumed for the outcome of diabetes. The link function was assumed to be a logit. In an additional analysis, average medical expenditures per person per month, for both groups, were observed over time. All statistical tests were two-tailed, and P-values < 0.05 were considered statistically significant. All statistical analyses were performed using the Microsoft SQL Server 2016 Standard (Microsoft Corp., Redmond, WA, USA) and IBM SPSS Statistics for Windows, version 27.0 (IBM, Armonk, NY, USA).

3 Results

3.1 Adjustment by new methods

We estimated $P\left({{\text{Y}}}_{tn}=1|{X}_{t1n}=1\right)-P\left({{\text{Y}}}_{tn}=1|{X}_{t1n}=0\right)$ in Fig. 1. The new method tended to make the estimated value closer to the true value of 0 and reduced the variation in the estimated values.

3.2 Comparison with conventional methods

Figure 2 shows the estimated odds ratios of outcome occurrence for the group exposed to the group without exposure, among the three methods. Compared to the multivariate model and PS, the estimates for the new method approach the true odds ratio of 1, with a smaller scatter. Table 1 shows the probability of a type I error when the odds ratios were estimated using the three methods and the univariate model. The probability of a type I error for the new method was 6.6%, while the univariate, multivariate, and PS models all had a probability of 97% or higher.

Table 1 Number of type I errors of each adjustment method

Full size table

3.3 Effects of specific health guidance in preventing the onset of diabetes mellitus

In total, 6332 individuals (4964 duplicate pairs) were enrolled. Of these, 240 were diagnosed with diabetes. Table 2 shows the characteristics of the study cohort. The odds ratio for type 2 diabetes in a participant was 0.75 (95%CI 0.58–0.97), if the participant was provided specific health guidance, compared to those who did not.

Table 2 Background after exact-matching algorithm using AHCDEFs

Full size table

3.4 Effects of specific health guidance in reducing medical expenditures

Figure 3 presents data on medical expenditures per person per month, with and without specific health guidance.　After adjusting for background factors, medical expenditures were lower in the group that received health guidance throughout the study period. The results of simple aggregation and PS matching are shown in the Supplementary Figure as a comparison with conventional methods. These are different from the results of the new method employed in this study, and baseline adjustment was considered inadequate.

4 Discussion

In this study, we showed that our proposed novel method for real-world data returned improved estimates and fewer type I errors than conventional methods. Using this new method, we also quantitatively demonstrated the effectiveness of specific health guidance in Japan, in preventing the onset of diabetes and reducing medical expenditures during five years.

In contrast, most previous studies have not shown the effectiveness of specific health guidance in Japan. Creating reliable evidence from complex longitudinal data is not easy, and many studies may have flaws in their designs (Hoffman et al. 2022; Groenwold 2021). The most important reason we were able to demonstrate the effectiveness of specific health guidance for the first time in this study is that we did not adhere to a feasible RCT when setting up the target trial. It is impossible to perfectly match the background factors in an RCT. However, it is only by matching the background factors that we can make the best use of information from the measured factors. Specifically, even arbitrary order interactions between background factors can be incorporated into the model and adjusted accordingly.　Various types of evidence can be generated by refining the design by constantly pushing for an ideal study within the framework of TTE, in order to take advantage of the strengths of observational studies.

All the studies had unmeasured confounding factors. Observational studies have reported a high rate of type I errors (Liang et al. 2014), and the same result was obtained in this study. However, with our proposed exact-matching algorithm using AHCDEFs, the type I error probability is 6.6%, which is much lower than that of conventional methods. Although the type I error is still slightly higher than the acceptable range, we confirmed that it can be sufficiently reduced by refining the design using our method.

This study is the first to demonstrate the effectiveness of specific health guidance in reducing the incidence of diabetes and medical expenditures in Japan. In the United States, evidence suggests that multicomponent behavioral interventions in adults with obesity can lead to clinically significant improvements in reducing the incidence of type 2 diabetes among such adults and those with elevated plasma glucose levels (US Preventive Services Task Force et al. 2018). In 2008, Japan introduced a screening program to identify individuals with obesity and metabolic syndrome (Tsushita et al. 2018). All adults aged 40–74 years were required by law to participate every year (Fukuma et al. 2020). Therefore, this study presents the impact of this national health guidance intervention with an appropriate design, such that an RCT cannot be assembled.

Although RCTs are designed to answer a single question, they are expensive, time-consuming, and resource-intensive. It is not possible, in principle, to randomize a large sample population by including the various comorbidities and other confounding factors, and the patients included are generally younger with fewer comorbidities due to resource constraints. Therefore, the results are not immediately generalizable, leading to a large selection bias that compromises representativeness in this regard. Despite these major limitations, there is currently a worldwide mass production of "evidence" from sub-analyses and stratified analyses of past RCTs, which should be one per RCT, and which is used in daily practice. However, observational studies, when properly designed, have multiple strengths that make them a suitable complement to RCTs for decision-making.

Despite the notable findings that have emerged from this study, it had several limitations that must be acknowledged. First, the control for time-dependent confounding factors was not considered. However, this method has great potential in this respect, that is, based on the TTE concept, A*B time comparisons are performed and weighted 1/(A*B) in the groups with (A) and without exposure (B), which are perfectly matched in AHCDEFs. For each comparison, the termination of the observation for comparison patients is added to the end of the observation requirement (usually the occurrence of an outcome, the end of the study period, or withdrawal from the study), allowing control for time-dependent confounding. Second, this study did not model the administrative claims databases. Therefore, compatibility with high-dimensional propensity score adjustments cannot be examined (Schneeweiss et al. 2009). In principle, the two methods are expected to be highly compatible with each other, and future studies should be conducted to apply this method, to perfectly match the variables selected by the algorithm used in high-dimensional propensity score adjustment. Finally, we discuss the generalizability of the results. Generalizability is broadly compromised in order to increase comparability. However, this method is considered less susceptible to random errors than conventional methods, even when stratified analysis is performed. Thus, it is possible to make comparisons based on the circumstances of individual patients. Observational studies have significant limitations in dealing with real-measurement confounding and cannot replace RCTs. Although this method is effective for research questions for which an RCT would be difficult to conduct, it goes without saying that an RCT should be conducted if it is feasible to do so.

5 Conclusions

We propose a new method for analyzing real-world data and an exact-matching algorithm using AHCDEFs. The larger the number of patients available for analysis, the more AHCDEFs that can be matched, thereby removing the influence of confounding factors. It is expected that this method will generate significant evidence when applied to real-world data. In this process, it is desirable to clarify in detail the problems that may arise when applying this method.

Data availability

The large datasets used and analyzed during the simulation study are available from the corresponding author on reasonable request. The KDB datasets generated and analyzed during the Japanese health guidance study are available from Nara prefecture, Japan, but restrictions apply to the availability of these data, which were used under license for the current study, and thus are not publicly available. The data will only be provided to those who have applied for and have received permission in advance, in Japan.

Abbreviations

AHCDEFs:: Administrative Health Claims Database Equivalence Factors
BMI:: Body Mass Index
EMA:: Exact-Matching Algorithm using AHCDEFs
HbA1c:: Hemoglobin A1c
HSLRG:: Health Science and Labour Research Grants
RCT:: Randomized Controlled Trial
SD:: Standard Deviation
TTE:: Target Trial Emulation

References

Baumfeld Andre, E., Reynolds, R., Caubel, P., Azoulay, L., Dreyer, N.A.: Trial designs using real-world data: The changing landscape of the regulatory approval process. Pharmacoepidemiol. Drug Saf. 29, 1201–1212 (2020)
Article PubMed Google Scholar
Frieden, T.R.: Evidence for health decision making - Beyond randomized, controlled trials. N. Engl. J. Med. 377, 465–475 (2017)
Article PubMed Google Scholar
Fukuma, S., Iizuka, T., Ikenoue, T., Tsugawa, Y.: Association of the national health guidance intervention for obesity and cardiovascular risks with health outcomes among Japanese men. JAMA Intern. Med. 180, 1630–1637 (2020)
Article PubMed PubMed Central Google Scholar
Groenwold, R.H.H.: Trial emulation and real-world evidence. JAMA Netw. Open 4, e213845 (2021)
Article PubMed Google Scholar
Hernán, M.A., Robins, J.M.: Using big data to emulate a target trial when a randomized trial is not available. Am. J. Epidemiol. 183, 758–764 (2016)
Article PubMed PubMed Central Google Scholar
Hoffman, K.L., Schenck, E.J., Satlin, M.J., Whalen, W., Pan, D., Williams, N., Díaz, I.: Comparison of a target trial emulation framework vs cox regression to estimate the association of corticosteroids with COVID-19 mortality. JAMA Netw. Open 5, e2234425 (2022)
Article PubMed PubMed Central Google Scholar
Liang, W., Zhao, Y., Lee, A.H.: An investigation of the significance of residual confounding effect. BioMed Res. Int. 2014, 658056 (2014)
Article PubMed PubMed Central Google Scholar
Madenci, A.L., Wanis, K.N., Cooper, Z., Haneuse, S., Subramanian, S.V., Hofman, A., Hernán, M.A.: Strengthening health services research using target trial emulation: An application to volume-outcomes studies. Am. J. Epidemiol. 190, 2453–2460 (2021)
Article PubMed PubMed Central Google Scholar
Matthews, A.A., Danaei, G., Islam, N., Kurth, T.: Target trial emulation: applying principles of randomised trials to observational studies. BMJ 378, e071108 (2022)
Article PubMed Google Scholar
Mellinghoff, S.C., Bruns, C., Al-Monajjed, R., Cornely, F.B., Grosheva, M., Hampl, J.A., et al.: Harmonized procedure coding system for surgical procedures and analysis of surgical site infections (SSI) of five European countries. BMC Med. Res. Methodol. 22, 225 (2022)
Article PubMed PubMed Central Google Scholar
Muramoto, A., Matsushita, M., Kato, A., Yamamoto, N., Koike, G., Nakamura, M., et al.: Three percent weight reduction is the minimum requirement to improve health hazards in obese and overweight people in Japan. Obes. Res. Clin. Pract. 8, e466–e475 (2014)
Article PubMed Google Scholar
Nishioka, Y., Takeshita, S., Kubo, S., Myojin, T., Noda, T., Okada, S., et al.: Appropriate definition of diabetes using an administrative database: a cross-sectional cohort validation study. J. Diabetes Investig. 13, 249–255 (2022)
Article PubMed Google Scholar
Rubin, D.B.: Using multivariate matched sampling and regression adjustment to control bias in observational studies. J. Am. Stat. Assoc. 74, 318–328 (1979)
Google Scholar
Schneeweiss, S., Rassen, J.A., Glynn, R.J., Avorn, J., Mogun, H., Brookhart, M.A.: High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology 20, 512–522 (2009)
Article PubMed PubMed Central Google Scholar
Seeger, J.D., Williams, P.L., Walker, A.M.: An application of propensity score matching using claims data. Pharmacoepidemiol. Drug Saf. 14, 465–476 (2005)
Article PubMed Google Scholar
Sun, X., Tan, J., Tang, L., Guo, J.J., Li, X.: Real world evidence: Experience and lessons from China. BMJ 360, j5262 (2018)
Article PubMed PubMed Central Google Scholar
Takeuchi, Y., Kumamaru, H., Hagiwara, Y., Matsui, H., Yasunaga, H., Miyata, H., Matsuyama, Y.: Sodium-glucose cotransporter-2 inhibitors and the risk of urinary tract infection among diabetic patients in Japan: Target trial emulation using a nationwide administrative claims database. Diabetes Obes. Metab. 23, 1379–1388 (2021)
Article CAS PubMed Google Scholar
Tsushita, K., Hosler, A.S., Miura, K., Ito, Y., Fukuda, T., Kitamura, A., et al.: Rationale and descriptive analysis of specific health guidance: the nationwide lifestyle intervention program targeting metabolic syndrome in Japan. J. Atheroscler. Thromb. 25, 308–322 (2018)
Article PubMed PubMed Central Google Scholar
US Preventive Services Task Force, Curry, S.J., Krist, A.H., Owens, D.K., Barry, M.J., Caughey, A.B., et al.: Behavioral weight loss interventions to prevent obesity-related morbidity and mortality in adults: US preventive services task force recommendation statement. JAMA 320, 1163–1171 (2018)
Article Google Scholar
Xie, Y., Bowe, B., Gibson, A.K., McGill, J.B., Maddukuri, G., Yan, Y., Al-Aly, Z.: Comparative effectiveness of SGLT2 inhibitors, GLP-1 receptor agonists, DPP-4 inhibitors, and sulfonylureas on risk of kidney outcomes: Emulation of a target trial using health care databases. Diabetes Care 43, 2859–2869 (2020)
Article PubMed Google Scholar

Download references

Acknowledgements

We would like to thank Editage [http://www.editage.com] for editing and reviewing this manuscript.

Funding

This study was supported by Health Science and Labour Research Grants (HSLRG) [Grant Number: 21IA1006] of the Ministry of Health, Japan Diabetes Society Junior Scientist Development Grant supported by Novo Nordisk Pharma Ltd. (2021–2022), and the Japan Society for the Promotion of Science KAKENHI (grant numbers: JP18K17390, JP18H04126, and JP22H03355).

Author information

Authors and Affiliations

Department of Public Health, Health Management and Policy, Nara Medical University, 840 Shijyo-cho, Kashihara City, Nara, 634-8521, Japan
Yuichi Nishioka, Emiri Morita, Saki Takeshita, Sakura Tamamoto, Tomoya Myojin, Tatsuya Noda & Tomoaki Imamura
Department of Diabetes and Endocrinology, Nara Medical University, Kashihara, Nara, Japan
Yuichi Nishioka

Authors

Yuichi Nishioka
View author publications
You can also search for this author in PubMed Google Scholar
Emiri Morita
View author publications
You can also search for this author in PubMed Google Scholar
Saki Takeshita
View author publications
You can also search for this author in PubMed Google Scholar
Sakura Tamamoto
View author publications
You can also search for this author in PubMed Google Scholar
Tomoya Myojin
View author publications
You can also search for this author in PubMed Google Scholar
Tatsuya Noda
View author publications
You can also search for this author in PubMed Google Scholar
Tomoaki Imamura
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

YN led the simulation and application of the method to real-world data and wrote the first draft of the paper. EM was involved in the construction and analysis of the simulation database. ST1 analyzed the simulation data and revised the manuscript. ST2 performed the analysis using real-world data. TM and SK, as database experts, created the environment for this study and provided advice on the analysis. TN, as an epidemiological expert, provided advice on the analysis of this study and assisted in the interpretation of the results. TI was responsible for the laboratory work, reviewed all analyses, interpreted the results, and revised the manuscript. All authors have reviewed the final manuscript and agree to the submission.

Corresponding author

Correspondence to Yuichi Nishioka.

Ethics declarations

Conflict of interest

YN received consultation fees from Novo Nordisk. The other authors declare that they have no conflicts of interest.

Ethical approval

All methods were conducted in accordance with relevant guidelines and regulations. All experimental protocols were approved by the Ethics Committee of Nara Medical University (approval no. 1123–7, October 8th, 2015). The need for informed consent was waived in view of the study design; all patient data were anonymized before analysis.

Consent to participate

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 50 kb)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Nishioka, Y., Morita, E., Takeshita, S. et al. Exact-matching algorithms using administrative health claims database equivalence factors for real-world data analysis based on the target trial emulation framework. Health Serv Outcomes Res Method (2024). https://doi.org/10.1007/s10742-024-00322-9

Download citation

Received: 22 December 2022
Accepted: 11 January 2024
Published: 02 February 2024
DOI: https://doi.org/10.1007/s10742-024-00322-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Exact-matching algorithms using administrative health claims database equivalence factors for real-world data analysis based on the target trial emulation framework

Abstract

Similar content being viewed by others

Comparison of two propensity score-based methods for balancing covariates: the overlap weighting and fine stratification methods in real-world claims data

An Attempt to Replicate Randomized Trials of Diabetes Treatments Using a Japanese Administrative Claims and Health Checkup Database: A Feasibility Study

Pragmatic Randomized Trials Using Claims or Electronic Health Record Data

1 Background

2 Methods

2.1 Outline for a new method: exact-matching algorithm using administrative health claims database equivalence factors

2.2 Simulation procedure

2.3 Details of the simulation

2.4 Adjustment by new methods

2.5 Comparison with conventional methods

2.6 Application to real-world data: data source

2.7 Study population

2.8 Definition of diabetes

2.9 Effect of the specific health guidance

3 Results

3.1 Adjustment by new methods

3.2 Comparison with conventional methods

3.3 Effects of specific health guidance in preventing the onset of diabetes mellitus

3.4 Effects of specific health guidance in reducing medical expenditures

4 Discussion

5 Conclusions

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to participate

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 50 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation