Introduction

Comorbidity is coexistence of multiple diseases within an individual [1], and is widely acknowledged as a significant role when analyzing the patients’ health status. With the ongoing demographic shift towards an aging population, the prevalence of comorbidities among patients has become increasingly commonplace [2, 3]. Individuals suffering from comorbidity often experience a diminished quality of life and encounter various hurdles when it comes to accessing healthcare services [4, 5]. The treatment efforts and strategies required for managing multiple diseases may differ from those for treating them independently, potentially influencing the patient’s treatment plan, prognosis, and management [6], making it crucial to fully consider comorbidity-related information during the treatment process for effective patient management [7]. Comorbidities are often non-randomly clustered, and the co-occurrence of comorbid conditions may stem from shared risk factors [8]. One disease can be a consequence of another disease or its treatment [9] Thus, identifying patterns in the progression of comorbidity among patients offers valuable insights to guide prevention, treatment, and prognosis improvement.

Over the past decade, the impact of comorbidity has been an emerging research field [2]. A wide range of resources recorded in temporal electronic health records (tEHRs), are utilized for conducting patient comorbidity analysis [10, 11]. This underscores the importance of leveraging the comprehensive data from tEHRs to better understand and address the complexities of comorbid conditions. The characteristics of tEHRs, including longitudinal nature, temporality, and multi-granularity [12], along with patient heterogeneity and intricate interrelationships among diseases, pose challenges to the analysis of conducting comorbidity analysis at both individual and population levels. Specifically, clinical practitioners face tEHR-related information overload [12], impeding comorbidity progression analysis due to data diversity and dynamic patient disease statuses. Complex comorbidity interconnections and reliance on current data add to the challenge, necessitating improved approaches for understanding and treatment. Therefore, the development of technology capable of effectively integrating patients’ longitudinal, irregular, and temporally dynamic clinical data, accurately presenting the progression of comorbidities, and providing decision support for subsequent clinical tasks, is of paramount importance.

To elucidate the intricate dynamics of comorbidities and to underpin the significance of such technological advancements, it is pivotal to delve into the insights offered by network analysis. Network analysis, by unraveling the interconnectedness of various health conditions, provides a robust framework for comprehending comorbidity interrelationships [13], thereby reinforcing the necessity for advanced technological solutions that can meet these analytical demands. Various networks have been constructed by connecting diseases with shared occurrence frequencies and other characteristics including the degree of coexistence [14], the number of outflow patients [15], distance or correlation of disease [3, 16, 17], and treatment results [18], et al. Khan et al. [10] first introduced the analysis of chronic disease comorbidity networks to understand disease progression. Building upon this foundational work, they extended their research to the realm of comorbidity prediction [11, 19]. In doing so, they have underscored the significant potential of comorbidity network analysis with machine learning (ML) techniques to enhance risk prediction models. Building upon this momentum, Xu et al. [20] introduced a novel methodology, embedding disease knowledge for improved self-inflicted harm forecasting. Lu et al. [21] proposed transforming patient-disease bipartite graphs into disease networks and predicting comorbidities, exhibiting an exceptional performance. Continuing this innovative trend, Hu et al. [22] applied comorbidity network and ML techniques to predict patient stay length effectively. Yang et al. [17] constructed a phenotype comorbidity network and a cost network, generating three new network features to capture the association between comorbidity and high medical expenses. This work highlights the growing interest in understanding not just the medical aspects of comorbidities, but also their economic implications. Choudhary and Fr\(\ddot{a}\)nt [23] established a disease co-occurrence network, determining temporal disease order and associations to predict comorbidities. This advancement not only refines the predictive capabilities of comorbidity analysis but also exemplifies the ongoing evolution of research in this domain. Another type of research into comorbidities involves understanding them at the molecular level by constructing gene networks or protein networks. Yan et al. [24] utilized protein-protein networks, key host factor-transcription factor networks, and a series of techniques to screen for core host factors and pathways associated with the comorbidity of asthma/COVID-19. Lakshmi et al. [25] utilized a weighted association rule mining approach to identify comorbidities with significant support and confidence levels by correlating protein interaction data, disease pathway information, gene ontology, and clinical disease profiles. Nevertheless, their methodology did not incorporate longitudinal patient data or account for the temporal relationships between diseases.

Despite the progress made in comorbidity network research, there are still several unresolved issues. First, some studies concentrate exclusively on the co-occurrence of diseases [26], thereby overlooking disease temporal relationships or neglect statistical significance. For instance, the work [16] and [26], only uses the comorbid diseases to construct a network, without considering the time dependency between diseases and whether the association between diseases is statistically significant. Second, most comorbidity studies highlight the methodologies employed or the goals pursued. Interpreting healthcare patterns from the perspective of technology enhanced health records as a unified goal [27], population health research [27, 28] utilizes tEHRs to analyze health outcomes, patterns, or care modalities at the population level, in order to address issues in impactful healthcare policy and interventions, thereby enhancing the understanding of population health [27]. Lastly, existing approaches, by employing single network [16, 26], neglecting the complexities of patient heterogeneity and individual disease progressions in their disease risk assessment [29]. Therefore, this study aims to provide countermeasures for the aforementioned three problems and offer clinical decision support.

To address the limitations inherent in current methodologies, it is imperative that we delve deeper into the more specific challenges inherent in the task of disease risk assessment, especially the critical issue of comorbidity prediction, which involves a myriad of complexities. This consideration brings us to the third and complex challenge - comorbidity prediction - which necessitates a comprehensive approach that can account for the inherent variability and non-linearity in the manifestation and evolution of comorbid conditions. The intricacies of comorbidity prediction demand a sophisticated and adaptive predictive framework to effectively address the complexities involved.

Currently, research in comorbidity prediction mainly relies on supervised learning methods [30]. While these methods have achieved notable improvements in prediction accuracy, they also have significant limitations. Firstly, supervised learning requires precisely labeled datasets, and the labeling process demands annotators with extensive medical expertise, making it cumbersome, costly, and time-consuming [31]. This limitation restricts the scale of datasets, thereby impacting the efficiency and quality of model training. Secondly, existing models often rely on data from a single patient visit [32], failing to fully utilize information from multiple visits. This results in an inability to comprehensively capture the dynamic changes in a patient’s health status and the continuity of disease progression [12], thus limiting prediction accuracy. Additionally, current methods overlook the temporal dependency of a patient’s health condition, which continuously changes over time, and the occurrence of comorbidities is closely related to long-term health changes [33]. A critical flaw in existing comorbidity prediction models is the neglect of potential interactions between different diseases. Most methods simplify comorbidity prediction to binary or multiple classification problem [30, 34], without fully leveraging the potential relationships between comorbidities. Although some studies attempt to use information from similar patients for auxiliary diagnosis [3] - where similar patients share key medical characteristics such as disease type, symptoms, genetic background, treatment response, shared environmental risk factors, and similar physiological and biochemical indicators - there is currently a lack of a unified and comprehensive similarity measurement method. Existing measurement techniques are often limited to a single dimension and fail to comprehensively reflect a patient’s overall health status and its dynamic changes over time [3].

In addressing the limitations of extant research, this study is poised to resolve the following core issues: (1) In what way does the investigation delve into constructing a Temporal Comorbidity Network (TCN) with longitudinal tEHRs from various patient encounters? This endeavor seeks to elucidate the underlying interconnections between comorbidities. This network is intended to reflect personalized health conditions at the individual level and demonstrate broader trends in disease progression at the population level. (2) From a patient population perspective, what advancements does this study make in exploring the application of the TCN for patient stratification and the discovery of comorbidity patterns within the cohorts? This approach is designed to provide a more nuanced understanding of how different diseases co-occur and interact over time within patient populations. The TCN framework allows for the visualization of complex health trajectories and the identification of at-risk groups that may benefit from targeted interventions. (3) From an individual patient perspective, how does this study undertake an individual-centric examination of the TCN for the purpose of comorbidity prediction? This work leverages the quantification of patient similarity as a pivotal step within the predictive process, with the goal of pinpointing similar cases that can significantly bolster the precision of clinical diagnostics. In a pioneering approach, this research introduces a novel comorbidity prediction method (TCN-DM) based on the Distance-Matched Temporal Comorbidity Network, achieving improved predictions for future patient visit comorbidities. This innovative method demonstrates a marked enhancement in the predictive accuracy of comorbidities for forthcoming patient visits. These endeavors are directed towards the enhancement of prognostic fidelity and the fortification of clinical decision-making with empirically grounded insights.

The study’s methodological approach holds promise for practical applications in clinical settings. Suppose we have a patient, Patient A, who was hospitalized due to heart disease. By constructing a TCN, we can compare Patient A’s medical records with those of other patients in the hospital database to identify common comorbidities among heart disease patients, such as diabetes or hypertension. This allows physicians to assess these potential risks in advance and provide corresponding preventive measures before Patient A is discharged. Physicians can utilize the TCN to analyze patient groups similar to Patient A and understand how these groups respond to different treatment regimens. For instance, if the TCN reveals that patients similar to Patient A respond well to a specific antihypertensive medication, the physician can recommend this medication for Patient A. Using clustering methods, patients can be categorized into different risk levels. For example, if Patient A is classified into a high-risk group, he may require more frequent follow-ups. Physicians can tailor a detailed follow-up plan for Patient A based on the analysis results of the TCN, including the timing of follow-ups and the necessary examination items. Through typical prescription analysis, physicians can understand the drug combinations and dosages used by other patients in similar situations. This can help physicians formulate a more precise drug treatment plan for Patient A and adjust the medication during follow-ups based on Patient A’s response. Utilizing survival analysis, physicians can predict the progression of Patient A’s disease and the probability of survival. This helps physicians determine the frequency and urgency of follow-ups. The TCN can assist physicians in identifying situations that require interdisciplinary collaboration. For example, if Patient A’s heart disease is related to kidney function, the TCN can prompt the physician to collaborate with a nephrologist to jointly develop a treatment plan. This example illustrates the diverse applications of TCN in a clinical setting, which helps improve the quality and efficiency of patient care while providing physicians with deeper insights to optimize treatment and follow-up strategies. Here are the contributions of this study:

  1. 1.

    A temporal comorbidity network construction method is proposed for constructing TCNs from longitudinal patient diagnosis data, ensuring statistically significant disease connections and reflecting temporal order, allowing for both individual and population-level networks.

  2. 2.

    Utilizing TCNs for patient stratification revealed distinct subtypes with significant differences, offering valuable insights into comorbidity progression and aiding clinical practitioners’ understanding.

  3. 3.

    A novel comorbidity prediction metho d, TCN-DM, utilizes distance-matched temporal comorbidity networks to enhance accuracy, integrating similar patients and leveraging comorbidity relationships, outperforming existing methods.

While applicable to various diseases, this study uses heart failure (HF) as an illustrative case due to its clinical complexity [35] and its global prominence as a major public health concern [36]. The interplay between HF treatment and comorbidities is particularly significant, as it can reciprocally affect each other and influence treatment outcomes [37]. While traditional personalized electrocardiogram (ECG) signal detection is valuable [38], the scope of information provided by tEHRs is much broader, including comprehensive medical histories and demographic details [39]. These data are crucial for a thorough evaluation of comorbidity conditions in HF patients. The study addresses the challenges of analyzing comorbidity progression in HF patients, highlighting the importance of well-informed decision-making for both patients and clinicians [39]. By leveraging the extensive information available in tEHRs, this research aims to enhance our understanding of comorbidities and contribute to more effective clinical management strategies.

The remaining sections will be arranged as follows. Section 2 covers preliminary information on notations, definitions, problem formalization, as well as dataset and cohort selection. Section 3 details the methodology. Section 4 showcases the experimental results and discusses the outcomes. Section 5 delves into the implication of this work. Section 6 summarize the work and highlight the future works.

Preliminary

Table 1 Notations used in this work

Problem formalization

With the notations in Table 1, given a dataset of patient \(P=\{P_{i}|i=1,2,\cdots ,n\}\), we obtain individual’s TCN \(\mathcal {G}_i=f_{1}(\mathcal {D}_{i})\) of patient \(P_{i}\), where \(f_{1}(\bullet )\) is individual’s TCN construction function. This study obtains patient feature vector \(h_{i}=f_{2}(\mathcal {G}_{i},S_{i})\)(\(h_{i}\in H\)) of patient \(P_{i}\) by constructing the features of patient using \(\mathcal {G}_{i}\) and demographics \(S_{i}\), where \(f_{2}(\bullet )\) is feature construction function, H is the patient feature vectors of P. Moreover, this study obtains population’s TCN \(\mathcal {G}=f_{3}(\{\mathcal {G}_{i}|i\in n\})\) by fusion the TCNs, \(f_{3}(\bullet )\) is a population’s TCN construction function. Finally, we aim to stratify patients into several sub-phenotypes \(f_{4}(H)=\{H_{X_{1}},H_{X_{2}},\cdots ,H_{X_{K}}\}\), \(X_{i}\cap X_{j}={\text{\O }}\), \(X=X_1\cup X_2\cup \cdots \cup X_K(i\ne j; i=1,2,\cdots ,K;j=1,2,\cdots ,K)\), \(f_{4}(\bullet )\) denoted clustering algorithm, K is distinct sub-phenotypes number, and \(H_{X_{i}}\) consisted of patient feature vectors from k-th sub-phenotype. In addition, this study also aims to predict comorbidities of a patient \(P_{i}\) during their upcoming visit, denoted as \(f_{5}(P_{i},\mathcal {G})=\{v_{j}|v_{j}\in ICD,j\in N\}\), where \(f_{5}(\bullet )\) is prediction function.

Dataset and cohort selection

We constructed our dataset by retrieving relevant data from the publicly available MIMIC-III dataset [40], spanning the period from 2001 to 2012. The dataset includes more than 40,000 patients admitted to the ICUs at the Beth Israel Deaconess Medical Center. Specifically, we focused on patients with two or more admissions and those with an ICD-9 code (428) indicating heart failure (HF) disease [41]. Our cohort consists of a total of 2,362 HF patients. These patients had a total of 6,747 recorded visits, with an average of 2.85 visits per patient, ranging from 2 to 23. Among these patients, 1,442 had exactly 2 visits, while 920 had more than 2 visits. In terms of patient demographics, we included age, gender, race, and insurance type. To account for changes over time, we used the final age recorded, calculated as the average of the last five visits for each patient.

Methodology

Figure 1 illustrates the study’s flowchart, comprising three critical modules: (1) TCN construction with tEHRs. (2) Patient stratification and sub-phenotypes analysis based on TCN. (3) Comorbidity prediction using TCN. The purpose of TCN component was to model the patient’s time-dependent diagnosis, enabling to reflect the patient’s comorbidity progression over all visits and generating a feature vector that characterizing the patient. This component includes three sub-modules: creating TCP, constructing individual’s TCN and extracting patient feature vector. The second component is patient stratification, which mainly includes patient clustering and stratification results analysis. The analysis tasks can be divided into preliminary analysis and typical prescriptions analysis. The third component consists of the fusion of individual’s TCN to obtain population’s TCN, and the prediction of patient comorbidity.

Fig. 1
figure 1

The framework of this study

Temporal comorbidity network construction

In order to convert longitudinal and temporal patient diagnosis data into TCN, this study presents a TCN construction method. We first extract TCP between comorbidity and form temporal comorbidity pair set TCPs in adjacent visits. We performed pairwise statistical analysis to detect comorbidity pairs with statistical significance in the cohort using Chi-square test. Only TCPs in the cohort with statistically significant correlations were retained. We used binomial test to determine the sequential directionality of TCPs. A preferred directionality (\(<v_{i}\rightarrow v_{j}>\) or \(<v_{j}\rightarrow v_{i}>\)) was established for those TCPs exhibiting a p-value<0.05. TCPs demonstrating statistical significance and having a preferred directionality were included, whereas pairs without a preferred directionality (p-value\(\ge \)0.05) were considered bi-direction. If both diseases in a TCP lack significant association with HF, the TCP was removed, ensuring HF’s statistical comorbidity. The TCPs of patient \(P_i\), denoted by \(TCPs_{i}\), is a subset of the population’s TCPs (denoted by \(TCPs^{P}\)). With regards to TCN construction, we used \(TCPs_{i} \) to construct \(\mathcal {G}_{i}\) of patient \(P_{i}\) and used \(TCPs^{P}\) to construct \(\mathcal {G}\) of population P. According to \(TCPs_{i} \) or \(TCPs^{P}\), the node attribute and edge attribute are extracted. Ultimately, this study has constructed TCNs at both individual and population levels. These TCNs preserve notable associations between comorbidities, connections between comorbidities and the target disease, while also capturing the temporal transitions between comorbidities. Figure 2 shows the workflow diagram of TCN construction.

Fig. 2
figure 2

Temporal comorbidity network construction workflow diagram

Appropriate features are essential for the patient representation and for subsequent analytical tasks. In this study, ten types of features were derived from the TCN, list in Table 2. While degree centrality is a feature in our study, we enhanced traditional calculation due to its limitation in accounting for relationships between unconnected nodes.

Table 2 Features used in this work

Traditional degree centrality measures connectivity of a node in a network, and is computed as:

$$\begin{aligned} k(u)=\sum _{v\in Nb(u)}e_{<u\rightarrow v>}\ (u\in \mathcal {G},v\in \mathcal {G}) \end{aligned}$$
(1)

where Nb(u) represents the adjacency set of node u, \(e_{<u\rightarrow v>}\) represents edge between u and v. Notably, \(e_{<u\rightarrow v>}\) only represents the existence of edge, other than indicate the frequency of edges.

We propose a novel TCN degree centrality method, emphasizing central nodes’ higher local density and distance from other centers. We normalize the degree centrality as local density of node using the following formula:

$$\begin{aligned} \rho _{u}=\frac{k(u)}{l-1} \end{aligned}$$
(2)

where l denotes the count of nodes in \(\mathcal {G}\). Then, minimum distance between u and any other nodes with higher local density is calculated by:

$$\begin{aligned} \delta _u=\left\{ \begin{array}{ll} \mathop {\min }\limits _{v\in Nb(u):\rho _v>\rho _u}(d_{uv}), & if\ \exists j\ s.t.\ \rho _v>\rho _u\\ \mathop {\max }\limits _{v}(d_{uv}), & otherwise. \end{array} \right. \end{aligned}$$
(3)

where \( d_{uv}\) denotes the distance between node u and v, and calculated by:

$$\begin{aligned} d_{uv}=1-\frac{|Nb(u)\cap Nb(v)|}{|Nb(u)\cup Nb(v)|} \end{aligned}$$
(4)

The degree centrality \(\gamma _{u}\) of node u is calculated by:

$$\begin{aligned} \gamma _{u}=\rho _u\delta _u \end{aligned}$$
(5)

In Fig. 3, we provide an example of degree centrality, which reveals that using the traditional approach for degree centrality calculation yields a degree centrality of 5 for both node 5 and node 11. However, this study factors in the adjacent connections’ information, leading to revised degree centrality values of 0.3 and 0.5 for the two nodes respectively.

Fig. 3
figure 3

An example of network and its degree centrality for each node

Based on relevant studies, age, gender, and race background have been identified as influential variables in predicting occurrences of HF events within the population [35]. Regarding demographic information about patients, we use age, gender, race, and insurance type. Moreover, we applied multi-hot encoding to discrete variables. We calculated the average values of degree centrality, eigenvector centrality, betweenness centrality, and clustering coefficient for network \(\mathcal {G}_{i}\), respectively, and concatenate those average centrality features, average clustering coefficient, network density, and demographic information to construct the \(P_{i}\)’s feature vector \(h_{i}\).

Patient stratification

Data mining techniques can be employed to unveil connections among comorbidities, thereby revealing the interrelation within the contextual information present in the dataset [42]. We will explore following main topics: patient clustering, preliminary analysis, and typical prescriptions analysis.

With feature vectors of patients as input, we use K-Means for patient clustering because K-Means is commonly used in patient stratification, and an appropriate number of clustering should be set [43]. we use Silhouette Coefficient (SC) [44] and Davis-Bouldin Index (DBI) [45] to identify appropriate the number of clusters. The higher SC, and lower DBI indicate better clustering separation.

After patient clustering, we conducted a preliminary analysis of the stratified outcomes, including performing a Chi-square test for discrete variables and a One-way ANOVA test for continuous variables, to evaluate and compare the differences in the stratified results. Additionally, we employed Kaplan-Meier survival analysis [46] to estimate patients’ survival probability.

We calculated prescription support within sub-phenotypes to identify common and unique prescriptions, considering variations. Prescriptions with support exceeding 0.8 were extracted. We classify prescriptions as common (in two or more sub-phenotypes) or unique (in one sub-phenotype). Unique prescriptions, when combined with TCN, help identify patient clusters and analyze comorbidity progression.

Comorbidity prediction

The proposed prediction method

TCN generates population’s TCN for comorbidity prediction, but excess unrelated data hampers accuracy. We propose a method for comorbidity prediction based on distance-matched temporal comorbidity network, named TCN-DM, which identifies nearest neighbors using a novel patient distance metric in this study. Given \(S_{i}\), \(\mathcal {G}_{i}\) of \(P_{i}\) and \(S_{j}\), \(\mathcal {G}_{j}\) of \(P_{j}\), the distance \(dist(P_{i},P_{j})\) between \(P_{i}\) and \(P_{j}\) is calculated by

$$dist({P_i},{P_j}) = dist({S_i},{S_j}) + dist({\mathcal{G}_i},{\mathcal{G}_j})$$
(6)
$$dist({\mathcal{G}_i},{\mathcal{G}_j}) = M{S_{node}} + M{S_{edge}}$$
(7)
$$\begin{aligned} M{S_{node}} = & \sqrt {\sum\limits_{u \in {\mathcal{G}_i}} {({\mathcal{A}_i}(} u) - \alpha {)^2}} \\ \alpha = & \left\{ {\begin{array}{*{20}{l}} {{\mathcal{A}_j}(u),}&{if\;u \in {\mathcal{G}_j}} \\ {0,}&{otherwise.} \end{array}} \right. \\ \end{aligned} $$
(8)
$$\begin{aligned} M{S_{edge}} = & \sqrt {\sum\limits_{u \in {\mathcal{G}_i},k \in {\mathcal{G}_i},u \ne k} {({\mathcal{W}_i}(} u \to k) - \beta {)^2}} \\ \beta = & \left\{ {\begin{array}{*{20}{l}} {{\mathcal{W}_j}(u \to k),}&{if\;u \in {\mathcal{G}_i},k \in {\mathcal{G}_i},u \ne k} \\ {0,}&{otherwise.} \end{array}} \right. \\ \end{aligned} $$
(9)

where \(dist(S_{i},S_{j})\) is demographics Euclidean distance of between patient \(P_{i}\) and \(P_{j}\), \(dist(\mathcal {G}_{i}, \mathcal {G}_{j})\) denotes the distance between \(\mathcal {G}_{i}\) and \(\mathcal {G}_{j}\). \(\mathcal {A}_{i}(u)\) and \(\mathcal {A}_{j}(u)\) present the frequency of node u in \(\mathcal {G}_i\) and \(\mathcal {G}_j\), respectively. \(\mathcal {W}_{i}(u\rightarrow k)\) and \(\mathcal {W}_{j}(u\rightarrow k)\) present the frequency of edge \(<u\rightarrow v>\) in \(\mathcal {G}_i\) and \(\mathcal {G}_j\), respectively. Figure 4 shows an example of node matching and edge matching diagrams.

Fig. 4
figure 4

A schematic diagram of node and edge matching

\(MS_{node}\) is match score of nodes, measuring the distance between \(\mathcal {G}_{i}\) and \(\mathcal {G}_{j}\) in terms of disease prevalence. \(MS_{edge}\) takes into account the frequency of nodes to calculate distance. \(\alpha \) means that if the node u that appears in \(\mathcal {G}_{i}\) also appears in \(\mathcal {G}_{j}\), then \(\alpha \) is the node frequency of u in \(\mathcal {G}_{j}\), and vice versa is 0. \(MS_{node}\) does not consider the disparity in the number of nodes between the two networks. It uses \(\mathcal {G}_{i}\) as a benchmark for node matching in order to identify the network with the closest disease prevalence to \(\mathcal {G}_{i}\), even if \(\mathcal {G}_{j}\) may have different additional diseases. If the nodes in the network are different, or the difference in node frequencies is larger, the value of \(MS_{node}\) is larger, indicating a larger network distance in terms of disease prevalence. Similar to \(MS_{node}\), \(MS_{edge}\) is match score of edges and quantifies the disparity in transition between diseases, taking into account both matched edges and their respective frequencies. \(\beta \) means that if the edge of u and k that appears in \(\mathcal {G}_{i}\) also appears in \(\mathcal {G}_{j}\), then \(\beta \) is the edge frequency in \(\mathcal {G}_{j}\), otherwise is 0. Similarly, \(MS_{edge}\) only considers the frequency differences of matched edges. Its aim is to identify a network that closely resembles the disease transition pattern observed in \(\mathcal {G}_{i}\), even if \(\mathcal {G}_{j}\) may exhibit distinct patterns of disease transition involving other diseases. A higher \(MS_{edge}\) value indicates greater distance in matched edges within the network or larger discrepancies in edge frequencies, signifying a larger network distance in terms of disease transition. Finally, we find the nearest neighbor patient of \(P_i\) through the following formula:

$$\begin{aligned} P_k=\arg \min (\{dist(P_{i},P_{j})|P_{j}\in P\}) \end{aligned}$$
(10)

After identifying the nearest neighbor patients \(P_k\), the population’s TCN \(\mathcal {G}\) is constructed using the historical diagnostic data of the current patient \(P_{i}\) and \(P_{k}\). Based on the constructed \(\mathcal {G}\), this study used the diagnosis data in last visit of the \(P_i\) to predict the possible comorbidity at the next visit. Specifically, we use the diagnostic data in last visit of \(P_i\) as source nodes to find the diagnosis set D consisting of their one-hop and two-hop diagnoses as target nodes. Then, the diagnosis in D were ranked in descending order according to degree centrality \(\gamma \), and the diagnosis with degree centrality in the top L was selected as the comorbidity prediction results. Figure 5 showcase a schematic diagram of comorbidity prediction.

Fig. 5
figure 5

A schematic diagram of comorbidity prediction

Baselines methods

We assessed the capability of TCN-DM in comparisom to two distinct categories of methods: one is TCN-based methods, while the other is non-TCN-based methods.

(1) TCN-based methods

Common Neighbor (TCN-CN): Common Neighbor (CN) [47] is employed for link prediction. In \(\mathcal {G}\), \(CN(v_i,v_j)\) represent the association strength of \(v_i\) and \(v_j\), and is calculated using the following formula:

$$\begin{aligned} CN(v_i,v_j)=|Nb(v_i)\cap Nb(v_j)|, \ (v_i\in D_H, v_j\in Nb(v_i)) \end{aligned}$$
(11)

where \(D_H\) is all historical diagnosis of candidate patient, \(Nb(\bullet )\) is the adjacency set of a node.

Jaccard Index (TCN-JI): We construct a population’s TCN \(\mathcal {G}\) using the historical visits data form all patients, and used all historical diagnosis \(D_H\) of candidate patient to predict comorbidities. Index JI is calculated by:

$$\begin{aligned} JI(v_i,v_j)=\frac{|Nb(v_i)\cap Nb(v_j)|}{|Nb(v_i)\cup Nb(v_j)|}, \ (v_i\in D_H, v_j\in Nb(v_i)) \end{aligned}$$
(12)

TCN-CCPA: CCPA [48] utilizes CN and centrality to detect the probable link among nodes:

$$\begin{aligned} CCPA(v_i,v_j,\omega )=\omega \bullet (|Nb(v_i)\cap Nb(v_j)|)+(1-\omega )\bullet \frac{N}{d_{ij}}, \ (v_i\in D_H, v_j\in Nb(v_i)) \end{aligned}$$
(13)

where \(D_H\) is all historical diagnosis of candidate patient, N is the count of nodes, \(d_{ij}\) denotes the shortest distance between \(v_i\) and \(v_j\). Parameter \(\omega \in [0,1]\) preset by the user, regulates the influence of common neighbors and centrality.

For each patient’s historical diagnosis, we select the highest associated diagnosis as potential predictive diagnoses. All the potential predictive diagnoses were ranked in descending order of \(\gamma \), and the top L diagnoses were selected as the final prediction.

TCN-ALL: To assess the effectiveness of utilizing adjacent nodes in the network for prediction purposes, we have devised a comparative method called TCN-ALL. We construct a population’s TCN using all patients. Subsequently, for a given patient, we consider their last visit data as the source node and identify the adjacent nodes in the network as potential predictive diagnoses. These potential diagnoses are then ranked in descending order of \(\gamma \). Finally, we select the top L diagnoses as the final predictions.

TCN-KNN: To demonstrate that utilizing patients’ own historical data is more advantageous for prediction, this study has devised a method of comparison, which is referred to TCN-KNN. We use current patients and its’ K-nearest neighbor patients to construct population’s TCN. The potential predictive diagnoses were derived from the last visit of the K-nearest neighbor patient’s one-hop and two-hop diagnoses. These diagnoses were ranked in descending order of \(\gamma \) and the top L diagnoses were selected as the final prediction. For the convenience of subsequent comparisons, this work set \(K=1\) in TCN-KNN.

TCN-DM-1:To assess our node degree centrality, we included TCN-DM-1 for comparison. It uses TCN-DM for potential predictions and traditional degree centrality to select top L nodes as predictions.

(2) Non-TCN-based methods

We utilized three non-TCN-based alternative methods for comorbidity prediction: Collaborative Filtering (CF) [15, 49], Decision Trees (DT) [50], and Random Forests (RF) [50]. These approaches are commonly employed in comorbidity prediction modeling research and serve as standard comparative methods.

Collaborative Filtering (CF) [15, 49]: We use \(x^i=[0,\cdots ,x_k^i,\cdots ,0]^{\top }\in \mathbb {R}^{N}\) to denote patient \(P_i\)’s daignosis multi-hot vector, where \(x_k^i=1\) if disease k appears, otherwise, \(x_k^i=0\). Meanwhile, we use \(s^i=[score_1^i,\cdots ,score_k^i,\cdots ,score_N^i]^{\top }\in \mathbb {R}^{N}\) to denote a socre vector to represent all diagnosis of \(P_i\) across all \(P_i\)’s visits, where \(score_k^i\) is the risk score of disease k in the future. The risk score is calulated based on the average prevalence of disease k and the similarity with other patients, and the formular is as follows:

$$\begin{aligned} score_k^i = {\bar{r}_k} + \rho (1 - {\bar{r}_k})\sum \nolimits _{j \in Set_k} {Si{m_{i,j}}} \end{aligned}$$
(14)

where \(\bar{r}_k\) is the prevalance of disease k in the training set, \(\rho = 1/\sum \nolimits _{j \in Se{t_{train}}} {Si{m_{i,j}}} \), \(Set_k\) is the set of patient who diagnosed disease k in the training set, \(Set_{train}\) is the set of all patient in the training set. The similarity \(Sim_{i,j}\) between patient \(P_i\) and any other patient \(P_j\) using cosine similarity is calculate by:

$$\begin{aligned} Si{m_{i,j}} = \frac{{{x^i}*{x^j}}}{{||{x^i}|| \times ||{x^j}||}} = \frac{\sum \nolimits _k {x_k^ix_k^j} }{\sqrt{\sum \nolimits _k {{{(x_k^i)}^2}} } \sqrt{\sum \nolimits _k {{{(x_k^j)}^2}} } } \end{aligned}$$
(15)

Finally, we select the top L diseases as predicted results.

Decision Trees (DT) [50], and Random Forests (RF) [50]: We use \(x^i=[0,\cdots ,x_k^i,\cdots ,0]^{\top }\in \mathbb {R}^{N}\) to denote patient \(P_i\)’s daignosis multi-hot vector, where \(x_k^i=1\) if disease k appears, otherwise, \(x_k^i=0\). With \(x_i\) as input, this study can be seen as a multi-label classification with a list probabilities of comorbidities, and we select the top L disease as predicted results.

Validation scheme and evaluation metrics

We use 10-fold cross validation (CV) to validate predictions. Traditional disease risk prediction primarily targets individual conditions and simplifies the task into a binary classification of disease presence versus absence. However, this study addresses the prediction of comorbidities, where the number of comorbidities to predict is artificially specified. Given that the actual number of comorbidities in patients varies from patient to patient, it is not appropriate to apply the conventional metrics of accuracy to measure the precision of our predictive models. Thus, we employ Precision, Recall, and \(F1-score\) as evaluation metrics.

$$\begin{aligned} Precision=\frac{\sum _{p\in P}|R(p)\cap T(p)|}{\sum _{p\in P}|R(p)|} \end{aligned}$$
(16)
$$\begin{aligned} Recall=\frac{\sum _{p\in P}|R(p)\cap T(p)|}{\sum _{p\in P}|T(p)|} \end{aligned}$$
(17)
$$\begin{aligned} F1-Score=\frac{2\times Precision \times Recall}{Precision + Recall} \end{aligned}$$
(18)

where R(p) represents the predicted set of diagnoses made for patient p based on their progression in the TCN, and T(p) represents the true set of diagnoses for the patient p at their next visit, P is all patients to be predicted.

Notably, when evaluating CF, DT, and RF, we use 90% of patients (about 2126 individuals) for model training, while the remaining 10% of patients (about 236 individuals) for testing. Then, we performed 10-fold CV to validate the predictive reuslts.

Results and discussion

Temporal comorbidity network construction results

We obtained both individual’s TCN and population’s TCN, with the latter built from data of 2362 HF patients. To manage the network’s size, we retained edges occurring more than 30 times. We chose to showcase the individual’s TCN of patient_193213 and the population’s TCN comprising all patients in the cohort, as depicted in 6. Node size indicates disease prevalence, and edge thickness shows comorbidity frequency with temporal transitions depicted by arrows.

Fig. 6
figure 6

Temporal comorbidity networks construction results. a and b are the individual’s and population’s TCN for heart failure patients, respectively, illustrating the relationships between various comorbidities over time

In Fig. 6a, patient_193213’s TCN reveals time-dependent relationships among various conditions, including gastroparesis, asthma, hypertension, neuropathy, convulsion, diabetic complications, anemia, heart disease, and HF. Figure 6 (b) depicts the population’s TCN with 77 nodes and 319 edges. It can be found that the relationship between AF and HF is particularly obvious, because AF and HF frequently coexist [51]. In Fig. 6b, we found gastroparesis, CKD(28521), and anemia are associated with heart failure. Some studies have shown that gastroparesis is linked to cardiovascular events in diabetic patients [52]. Anemia evaluation is an essential part of routine baseline assessment in HF patients, given its independent link to disease severity and mortality [53]. In particular, patients with concurrent anemia, CKD, and HF have a diminished life satisfaction and prognosis. Research findings have indicated that ID in HF patients correlates with unfavorable clinical outcomes irrespective of the presence of anemia and/or CKD [54].

Figure 6b illustrates a clear association between AF and HF in the population’s TCN. AF is a prevalent cardiovascular comorbidity among HF patients. The relationship of AF and HF is intricate, with AF exacerbating HF and HF increasing the risk of AF [35]. Ablation of AF can be considered a reasonable intervention for patients with HF experiencing symptoms and impaired quality of life (QOL). Figure 6b illustrates complex heart-kidney association, often deranged in HF, with adverse prognosis [55]. Impaired renal function is among the most influential factors in predicting HF prognosis [56], and regularly monitoring renal function within a suitable clinical context is crucial to manage HF effectively. Hypertension, IHD, hyperlipidemia, arthritis, COPD, depression, diabetes, and gout are also common comorbidities in HF patients [57], and the findings exhibit consistency in Fig. 6b. Referring to Fig. 6 (b), we can uncover additional insights into comorbidity relationships within the population. For example, we observe a connection between kidney failure and acidosis, where acidosis frequently manifests as a common consequence in late-stage kidney failure patients [58]. TCN can offer insights into the relationships between comorbidities within patient populations, providing decision support for clinical practitioners to gain a comprehensive understanding of disease progression in patients.

Patient stratification results

Since K-Means requires finding an optimal value of K, we divided the patients into 3 to 10 groups by calculating the SC and DBI to determine the K value. Higher SC and lower DBI indicate better separation of clustering results, with K=4 being the optimal value in this study, as illustrated in Fig. 7. Next, we performed patient stratification analysis using the patient clustering results with K=4.

Fig. 7
figure 7

Patient clustering results for different values of K

Preliminary analysis

For the analysis of demographic information among patient groups and the variations in TCN network structures created for these different groups, we performed inter-group statistical analysis, and the corresponding statistical results can be found in Table 3. We conducted preliminary patient clustering analysis, with average ages of four sub-phenotypes: 63.05, 75.16, 46.34, and 85.96. Sub-phenotype II had the most patients (32.29%), and Sub-phenotype III had the fewest (12.22%).

Table 3 Summary of patient stratification results

Table 3 reveals significant differences in age, gender, race, insurance type, survival state, average degree, betweenness centrlity, and weight fo edges among these sub-phenotypes. Differences in the number of nodes and edges in TCNs across distinct groups reflect variations in comorbidity numbers and disease transition frequencies. Furthermore, significant discrepancies are evident in the average degree, average betweenness centrality, and average edge weight of TCNs among different groups, indicating marked distinctions in the complexity of comorbidity relationships and the frequency of transitions within these groups. We created population’s TCNs for sub-phenotypes and used Kaplan-Meier analysis for patient survival. Figure 8 showcased the population’s TCN for each sub-phenotype. In Fig. 8, we visualize sub-phenotype population’s TCNs as chord diagrams. Outer circles represent diseases with diagnosis description. The TCNs for the subtypes have varying comorbidity counts: 45, 50, 10, and 48. Temporal comorbidity pairs are 162, 185, 22, and 149 in each subtype’s population-level TCN, respectively. Outer circle size denotes disease prevalence, inner connection thickness indicates comorbidity frequency, and arrows show temporal disease transitions. Figure 8 reveals that cardiovascular disease and kidney disease are the most prevalent conditions in all groups. This underscores the significance of prioritizing attention to cardiovascular and kidney diseases, as well as comprehending their progression and connections with other related ailments in hospitalized HF patients.

Fig. 8
figure 8

Population’s temporal comorbidity network of each sub-phenotype

We also presented the Kaplan-Meier curves for the four patient subtypes in Fig. 9, which showed some distinguishable patterns. Sub-phenotype IV had the steepest decline in the Kaplan-Meier curves, indicating shorter survival times compared to the other groups. Due to the advanced age and severe diseases of this patient population, along with a lower probability of survival, it is necessary to strengthen end-of-life care services for this subgroup. Sub-phenotype II had an initial rapid decline followed by a slower one. the segment with slower decline had low survival probability, requiring end-of-life care interventions. The Kaplan-Meier curve of Sub-phenotype III showed a slower decline compared to the other curves, which can be attributed to the younger average age and relatively fewer comorbidities within this patient group. These four Kaplan–Meier curves offer an intuitive representation of the survival outcomes for HF patients across various age groups, delivering essential insights for clinical practitioners seeking to understand disease progression in their patients.

Fig. 9
figure 9

Kaplan–Meier estimation of the survival probability of four sub-phenotypes

Typical prescriptions analysis

Prescriptions can reflect clinical interventions and to some extent influence the progression of patients’ diseases. Analyzing the relationships between frequent prescriptions sets in these four sub-phenotypes is a crucial task for understanding their interrelationships. We intersected the frequent prescriptions set between the four sub-phenotypes and plot the results in UpSet plot, as shown in Fig. 10. UpSet plots represents intersections in a matrix, with each column denoting a set and bar charts indicating their size. The shaded cells within a row signify the sets involved in a particular intersection.

Fig. 10
figure 10

The UpSet plot of frequent prescriptions (a) and population’s TCN of patients who use glucagon (b)

In Fig. 10 (a), the 34 commonly prescribed drugs in the four sub-phenotypes have various therapeutic effects, including bowel movement facilitation, diuresis, fluid replenishment, glucose and calcium supplementation, electrolyte balance modulation, sedation, pain relief, blood sugar and lipid reduction, treatment of reflux, broncho and vasodilation, bacterial infection treatment, hypertension, angina, myocardial infarction, and prevention of chemotherapy/radiation-induced emesis. Sub-phenotype III unique prescriptions include hydrocodone/acetaminophen, metoclopramide, propofol, and calcium carbonate. Sub-phenotype I characterize by frequent glucagon prescriptions. In Fig. 10b, we extracted patients who use glucagon in sub-phenotype I and constructed their TCN based on their diagnostic data. There are a total of 624 patients in sub-phenotype I, out of which 205 patients use glucagon. The use of glucagon to improve the harmful proarrhythmic effects of insulin may be a factor in lowering the likelihood of fatality [59]. We observed the presence of polyneuropathy in diabetes (DPN) in this patient population, which is a common complication of diabetes with currently no effective treatment methods available. Glucagon plays a pivotal role in maintaining balanced glucose levels and, in addition to its involvement in gluconeogenesis, it also exhibits neuroprotective properties within the central nervous system. The study [60] revealed for the first time the positive influence of glucagon on peripheral neuronal cells, making it a potential target for treating DPN. However, prolonged use of glucagon for DPN treatment may worsen blood glucose levels due to its ability to increase them[61].

Comorbidity prediction results

To predict comorbidities, patients needed at least two visits for similarity measurement and a minimum of three for prediction and evaluation. For instance, using a patient with three visits, we constructed a TCN with the first two visits’ data to identify similar patients. The diagnosis in the second visit served as head nodes for comorbidity retrieval, and the last visit’s diagnosis was used for prediction evaluation. All 2362 patients were used for similarity measurement and TCN construction, while 920 with at least 3 visits were selected for comorbidity prediction, evaluating the top L=5, 10, and 20 comorbidities.

Table 4 Comorbidity prediction results comparison of different methods

Table 4 showcased the 10-fold CV comorbidity prediction results of 920 patients for comparison methods. With increasing L, precision decreases, while recall and F1-score improve for all methods. We first discuss the TCN-based comparative results in detail. First, when predicting on the TCN constructed from entire patients, TCN-ALL outperforms TCN-CN, TCN-JI, and TCN-CCPA, indicating that using direct connections is more effective than measuring strength of associations between nodes in terms of the relationship of nodes. This because TCN captures temporal disease relationships, while TCN-CN, TCN-JI, and TCN-CCPA introduce redundant information by relying on common adjacent nodes. Second, when predicting on the TCN constructed from specific patients, TCN-KNN and TTCN-DM outperform TCN-CN, TCN-JI, and TCN-CCPA in comorbidity prediction by leveraging information from similar patients and incorporating patient-specific data. This is advantageous compared to relying on a TCN constructed using the entire patient population, which can introduce interference from other patients’ comorbidity associations and affect predictions for the current patient. The use of data closer to the current patient and the inclusion of patient-specific information in TCN-KNN and TCN-DM contribute to their improved performance. Third, when predicting comorbidity using TCN constructed based on similar patients, TCN-DM outperforms TCN-KNN due to its incorporation of the patient’s most recent diagnostic information, as opposed to TCN-KNN’s utilization of diagnostic information solely from similar patients. This highlights the added value of integrating comprehensive patient-specific data for improved prediction accuracy. Fourth, despite TCN-DM-1 having higher recall than TCN-DM at L=10 and 20, TCN-DM outperforms TCN-DM-1 in terms of F1-Score with improvements of 0.002, 0.002, and 0.008 at L=5, 10, and 20, respectively. This indicates that, overall, the proposed degree centrality in this work has effectively enhanced the comorbidity prediction performance. Fifth, the TCN constructed by TCN-DM provides a transparent view of disease relationships and temporal dependencies to infer potential future comorbidities. In contrast to the current “black box" deep learning techniques, TCN-DM operates as a “white box" approach, enabling clinical practitioners to infer and understand patient conditions more easily.

Regarding to non-TCN-based comorbidity prediction methods, such as CF, DT, and RF, our analysis unfolds as follows. Firstly, TCN-DM outperforms CF across different L values, with significant F1-Score improvements of 0.111, 0.173, and 0.186 for L=5, 10, and 20, respectively. This superiority is rooted in CF’s dependence on patient similarity calculation for comorbidity risk assessment, overlooking patient-specific disease information and comorbidity interrelationships. TCN-DM consistently excels, regardless of L value. Secondly, DT, renowned for its interpretability, falls short in capturing comorbidity relationships and patient comorbidity progression. TCN-DM outperforms DT across all metrics, with F1-Score improvements of 0.058, 0.071, and 0.093 for L=5, 10, and 20, respectively. Thirdly, despite RF benefitting from ensemble learning, its predictions based on multi-hot vectors neglect comorbidity relationships and patient-specific disease progression. While RF slightly outperforms TCN-DM in Recall for L=10, TCN-DM consistently outperforms RF across other metrics, with F1-Score enhancements of 0.02, 0.01, and 0.013 for L=5, 10, and 20, respectively.

Novelty discussion of temporal comorbidity network

In this work, we introduce a novel TCN specifically tailored for patients with specific diseases (in this paper, heart failure). Utilizing longitudinal diagnostic data, we capture the temporal progression and statistical significance of comorbidities, creating a patient-centric TCN that reflects the true complexity of disease interactions over time. This method not only enables patient stratification and comorbidity pattern analysis but also enhances our predictive capabilities for future comorbidities, providing a powerful tool to support clinical decision-making.

Our TCN-based patient stratification and comorbidity pattern analysis yield insights that directly support the clinical decision-making process. The analysis outcomes contribute to survival analysis and the formulation of typical prescriptions, offering healthcare providers a data-driven approach to patient care informed by the complex interplay of comorbidities over time.

Furthermore, the TCN-DM method we propose demonstrates superiority in the accuracy of comorbidity prediction. By constructing a population-level TCN through matching similar patients and retrieving the most likely comorbidities within the TCN, it offers significant advancements over existing methods.

The novelty of our TCN-based approach lies in its ability to capture the longitudinal aspect of patient health data, closely centered around the target disease, which has not been emphasized to this extent in previous research. We clearly demonstrate the advantages of our method, highlighting its superior performance in accuracy and adaptability for clinical analysis and prediction tasks compared to existing models. The superiority of our approach lies in its higher predictive accuracy, better interpretability, enhanced flexibility, and adaptability, making it a valuable asset in the clinical environment.

We emphasize the importance of the TCN-based method, highlighting its ability to address existing challenges in the field with new solutions and significant improvements over current technologies. It not only contributes to the field by offering a fresh perspective on patient care but also opens up new avenues for further research. The TCN-based method is expected to advance the field by introducing a new research paradigm that guides future investigations and improves clinical practices. With its potential to transform the understanding and management of comorbidities, our framework provides a certain direction for comorbidity research and patient care.

Implication

The relevant implications of this study in the field of healthcare are divided into three parts: Methodology of tEHRs data processing, comorbidity progression modeling and analysis, patient health management based on tEHRs data, and management enlightenment in practical application of medical information system (MIS).

From the methodology perspective, this study highlights the significance of utilizing the longitudinal and temporal features of tEHRs for modeling and analyzing the progression of comorbidities, thereby offering methodological and technical support for assisted medical decision-making. Particularly, we have devised an approach to construct temporal comorbidity networks using patients’ longitudinal and time-stamped diagnosis data. This method can handle diagnostic data with temporal order in tEHRs and can focus on comorbidities specific to certain diseases, providing new insights into representing patients. This study introduces a method for patient stratification using a temporal comorbidity network, effectively partitioning the patient population into meaningful subgroups and highlighting clinically significant patterns. Furthermore, this study analyzes patients’ survival probabilities, prescription sets, and comorbidity pairs, offering clinically supported explanations and analyses of the stratification results. Additionally, a predictive method based on distance-matched temporal comorbidity network is developed, offering the ability to forecast potential comorbidities during future medical encounters. This study provides a comprehensive solution for tHERs processing, modeling, and analyzing, encompassing the construction of temporal comorbidity network, patient stratification, interpretation of stratification results, and comorbidity prediction.

From the patient health management perspective, we divide patient health management into two levels, namely, individual-level and population-level. Information system researchers focus on issues like the adoption of tEHRs, patient privacy, and information integration at the individual level. In contrast, population-level tEHRs offer opportunities to set standards for data exchange between stakeholders, evaluate the effectiveness of treatment methods at a population level, and promote best care practices. At the individual level, the findings of this study effectively integrate patients’ longitudinal and temporal diagnosis data, enabling the description of comorbidity progression within a defined time range and prediction of potential future health conditions. At the population-level, the findings of this study further subdivide the patient group and describe the commonalities and characteristics of different subgroups, providing decision support for the formulation of treatment plans for patients in subgroups. Both these two level tEHRs aim to achieve efficient and effective healthcare delivery to promote individual well-being. Establishing comprehensive and usable patient records, aggregating them into population-level medical histories, and analyzing individual and population comorbidity progression from vast datasets pose formidable technical challenges for both healthcare providers and governments.

From the perspective of practical application of MIS, it can be elaborated from the perspective of different stakeholders. For patients, it is hoped that MIS can provide information on healthy lifestyles, monitoring diseases and some practical advice. This study provides insights into the comorbidity progression of patients and offers predictions for potential future comorbidity development. For doctors and nurses, comprehensively and accurately understanding patients’ tEHRs data, gaining insights into patients’ comorbidity progression, implementing effective intervention measures, and preventing adverse drug events are of paramount importance. Our work integrates patient tEHRs data with characterization of comorbidity progression, patient stratification and interpretable outcomes, as well as comorbidity prediction, to provide support for healthcare professionals in formulating treatment and care plans. For custodians of tEHRs data, such as hospitals, governments or insurance companies, they concerned with creating data standards, ensuring the appropriate storage and use of tEHRs data, and establishing sustainable revenue models. However, lack of data exchange standards, incompatible interfaces, data delay, unauthorized access are the key problems in the application of MIS. Therefore, technology updating, ensuring data quality, developing revenue models, and rapid operation roll-out provide potential support for the application of MIS. Our study’s primary findings highlight the significance of preparing clinical data effectively.

Conclusion

This work presents a framework for analyzing comorbidity progression by constructing TCNs for patients. The framework consists of three main steps: temporal comorbidity network construction, patient stratification, and comorbidity prediction. Firstly, we present a method to construct TCNs by identifying statistically significant temporal comorbidity pairs from longitudinal diagnostic data. This allows us to convert longitudinal and temporal diagnosis data into individual’s TCN. By aggregating these individual’s TCN, we obtain a population’s TCN. Secondly, we perform patient stratification analysis using the patient-specific features extracted from the TCNs. This analysis includes preliminary exploration and analysis of typical prescriptions. These steps help uncover valuable insights regarding patient survival status, medication characteristics, and unique temporal comorbidity patterns, which can provide information on comorbidity progression within different patient groups. Lastly, we propose TCN-DM, a comorbidity prediction based on distance-matched temporal comorbidity network. By leveraging the temporal relationships captured in the network, TCN-DM enables accurate prediction of comorbidity for a patient’s next visit. Our experimental results demonstrate the superiority of TCN-DM compared to other methods in terms of prediction performance.

The study has the following limitations. Firstly, it requires patients to have a minimum of two visit records to construct the TCN and participate in the prediction, which excludes newly admitted patients. Secondly, the comorbidity information is extracted from the dataset used, limiting it to the comorbidities present in that specific dataset. Additionally, if a patient has a new comorbidity that is not significantly correlated with existing comorbidities, it may not be accurately predicted. Moreover, while external clinical validation was not conducted, this study employed 10-fold CV to strengthen the reliability of the results. Lastly, the study focuses solely on the temporal order of visits and does not adequately account for the time intervals between visits. Furthermore, it does not consider physiological indicators or laboratory tests that could provide more comprehensive insights into comorbidity progression. Future directions include incorporating time intervals into the TCN to construct a more robust TCN model, considering additional patient-specific information, such as physiological indicators and laboratory test data, and validating the proposed research methodology using additional datasets.