Background

Almost 700 million cases of COVID-19 had been recorded worldwide by the end of 2023 [1]. In the long term, around 10% of survivors of the acute phase of the disease develop symptoms such as fatigue, shortness of breath, cognitive dysfunction and other features covered by the umbrella definition of “long COVID” [2]. According to the World Health Organization, long COVID encompasses the persistence or new development of a cohort of symptoms 3 months after the initial SARS-CoV-2 infection, with these symptoms lasting for at least 2 months, and with no other explanation [3]. Long COVID poses a significant healthcare challenge, since 20 to 30% of affected individuals report significant limitations in undertaking their habitual daily activities [4]. The symptoms of long COVID are highly heterogeneous in terms of both prevalence and severity. This reflects the impairment in structure and function of different target organs and apparatuses [5]. Moreover, a unifying pathophysiological mechanism is, at present, far from being identified [6].

Many predisposing factors for long COVID present mutual and reciprocal interconnections. The severity of the acute disease, which is a known risk factor for the development of long COVID, is predicted by a cluster of inter-related clinical features, such as older age, male sex, non-white ethnicity, obesity, hypertension, and cumulative exposure to chronic diseases [7,8,9,10]. On the other hand, several of these factors, such as male sex, obesity and previous CV disease, although associated with risk of severe COVID-19, do not seem to predict the occurrence of long COVID [11]. In order to investigate this complex relationship between predisposing factors, features related to the acute manifestations of disease, and several outcome variables of potential interest, we selected graphical chain modelling (GCMs, also known as Chain Graph Models [12]) as an appropriate statistical modelling technique.

GCM is a statistical analytical approach that structures a model around a natural partition of the study variables into a sequence of blocks, such that variables in each block are potential explanatory variables of those in subsequent ones; variables in the same block are assumed to be concurrent, i.e., their association structure is assumed to be symmetric [12]. GCM, which are an extension of Directed Acyclic Graphs [12], are useful for addressing situations where a complete ordering of the variables is not available but an ordering between blocks of variables is nonetheless reasonable.

In the present study, we applied the GCM strategy to shed light on the relationship between a set of variables associated with the long COVID condition, possibly overlapping and interrelated with each other in an unclear manner [13]. We examined the features of a cohort of patients surviving hospitalization for COVID-19-related consequences in a non-intensive care unit of the “Santa Maria” Hospital in Terni, central Italy, during the second COVID-19 wave in 2020–2021, and followed them up from 3 to 6 months later, searching both for objective measures of organ dysfunction (e.g. reduced peak oxygen consumption (VO2), fibrotic signs at the radiological lung examination) and subjective symptoms (e.g. dyspnea and fatigue). The main aim of the present study was to evaluate whether factors related to patient history and clinical course during hospitalization could independently predict the onset of long COVID features. The findings may also help to identify those individuals who will receive most benefit from targeted preventive and therapeutic strategies.

Materials and methods

All consecutive survivors of COVID-19-related pneumonia and associated respiratory failure, who had been admitted to the Internal Medicine Ward of the Santa Maria University Hospital in Terni, Italy, were approached to participate in an observational prospective registry named “CArdioPulmonary exercise Testing for a global fitness Assessment in patients with recent COVID-19 Interstitial Pneumonia (CAPTAIN) study”. The registry aimed to evaluate and quantify the long-term sequelae of COVID-19 related pneumonia and respiratory failure [14]. The first patient was recorded on October 11, 2020, and the last on May 2, 2021. All patients were unvaccinated for SARS-CoV-2.

Participation was proposed to all subjects aged between 18 and 80 years able to sign written informed consent. The Local Ethics Committee (protocol number 50970) approved the study protocol. The research was carried out according to The Code of Ethics of the World Medical Association (Declaration of Helsinki). The diagnosis of acute COVID-19 was based on positivity of viral RNA in RT-PCR of the nasopharyngeal swab performed at hospital admission, according to standardized procedures [15]. COVID-19-associated interstitial pneumonia was based on high resolution computed tomography (HRCT) findings and according to clinical presentation at hospital admission. After hospital discharge, follow-up visits were planned over the period between 3 and 6 months. Exceptions to this time interval due to personal reasons were tolerated.

The follow-up visit consisted of a clinical and instrumental evaluation including blood sample testing, evaluation of resting pulmonary and cardiac function through pulmonary function test and echocardiography, an HRCT scan, and the evaluation of the cardiorespiratory function under dynamic conditions through cardiopulmonary exercise testing (CPET). Participants were excluded from the CPET evaluation if they had a history of heart failure with ejection fraction less than 50%, symptomatic coronary heart disease, history of moderate or severe valvulopathy, atrial fibrillation, asthma or severe chronic obstructive pulmonary disease, or musculoskeletal disease affecting physical performance.

Clinical evaluation

The main dataset included demographic data, anthropometric measurements (height, weight, body mass index [BMI], body surface area), and smoking habits. Blood pressure and heart rate were measured in a sitting position, after 5 min of resting, using a validated oscillometric device (Omron M3 HEM-7155, Omron, Japan) according to current guidelines [16]. Pre-COVID-19 medical history and concomitant treatment were collected. Hospitalization data were acquired from each patient’s hospital admission up to discharge, including duration of the clinical course and need for non-invasive or invasive mechanical ventilation.

All patients were evaluated by a team of medical doctors specialized in internal medicine, infectious diseases and clinical psychology. General symptoms presented after discharge (e.g. fatigue, sleep disorders), along with respiratory, cardiovascular (dyspnea, palpitations, chest pain), dermatologic (hair loss) and gastrointestinal (diarrhea) symptoms were collected. Depression, stress and anxiety were quantified using the 42-item Depression Anxiety Stress Scales (DASS-42) [17]. Due to low frequencies in some categories, the score was recoded from the initial five ordered levels to three levels, formed as follows: moderate: score ≤ 9, severe: 10 ≤ score ≤ 13, extremely severe: score ≥ 14. The symptom “fatigue” was also obtained from the questionnaire, and was reported as a binary variable (No, Yes). Symptoms related to post-traumatic stress disorders triggered by illness severity or hospitalization experiences were collected by the Impact of Event Scale-Revised [18]. All findings were collected by medical specialists through face-to-face interviews.

Instrumental evaluation

Blood samples were drawn after 13-h overnight fasting and delivered to the same centralized laboratory of the Terni University Hospital that processed blood samples during the in-hospital acute phase.

Pulmonary function tests were performed using Master Screen Body (Jaeger, Wurzburg, Germany) by dedicated staff. Forced expiratory volume in one second (FEV1), measured vs. predicted (FEV1%), forced vital capacity (FVC), and measured vs. predicted (FVC%) were included in the analysis. For each patient, parameters were expressed as percentages of a theoretical value calculated from Global Lung Function 2012 equations [19].

Standard transthoracic echocardiography was performed with a commercially available device (Esaote MyLab60, Esaote, Italy) by an expert echocardiographer according to the American Society of Echocardiography recommendations [20]. The M-mode echocardiographic study of the left ventricle was performed under two-dimensional control and confirmed using the parasternal long-axis two-dimensional approach. Ejection fraction, tele-diastolic left ventricular diameter, presence or absence of left and right atrial dilation, moderate/severe valvulopathy, and tricuspid annular plane systolic excursion were collected. Left ventricular mass (LVM) was normalized by height2.7 [21]. Results of examinations of patients with low-quality transthoracic acoustic window were not collected.

HRCT and image analysis were performed by two expert radiologists working independently, both during the acute disease and at follow-up. Lung sequelae were classified according to quantitative scores based on the degree of each lobe involvement (0 = none, 1 = < 5%, 2 = 5–25%, 3 = 26–50%, 4 = 51–75%, 5 > 75%) and qualitative scores based on radiological features of pulmonary opacities, including ground-glass opacities, linear opacities, crazy-paving and consolidations, and fibrosis-like lesions including reticulation, traction bronchiectasis and honeycombing [22]. Agreement was reached by consultation in case of discrepancies.

CPET was performed using a commercially available system equipped with an ergometer bike and a ventilator gas expired analyser (Cosmed Bike, Cosmeds.r.l., Florence, Italy). Before each procedure, the equipment was calibrated using reference gases. Each patient underwent a familiarization test, useful to set up ramp protocols according to clinical and training level characteristics. The test was performed according to current recommendations [23] and was halted if muscular exhaustion or cardiac symptoms appeared. The Hansen-Wasserman equation [24] was used to calculate the normal predicted values of the main parameters evaluated. VO2 was expressed as the highest 10 s averaged sample obtained during the last 30 s of testing. The V-slope method and respiratory equivalents methods were used to measure the anaerobic threshold (AT). Ventilatory equivalent (VE) and carbon dioxide production (VCO2) values were acquired from the initiation of exercise to AT. The VE/VCO2 slope was calculated after exclusion of the first part of the test, which is potentially influenced by emotional hyperventilation. Oxygen pulse was defined as the ratio between VO2 and heart rate; the slope VO2/work rate (VO2/WR) was calculated by dividing VO2 by power expressed in watts.

Statistical analysis

Descriptive statistics are presented according to the nature (qualitative or quantitative) of the variables. For quantitative variables, mean ± standard deviation were reported, together with median and quartiles.

The choice of a GCM as a modelling strategy seems particularly suited to enhance our understanding of this complex systemic condition possibly due to the overlapping contribution of interrelated factors, as opposed to a modelling strategy that takes one response variable at a time. For instance, a clear ordering between the variables in block 3 (namely: VO2, FEV1, D-dimer levels, depression score and fatigue) cannot be postulated, as they may be influencing each other in an interrelated manner. However, they are all possible responses to the variables in previous blocks.

The GCM was represented by means of a graph. This visual aid allowed a comprehensive understanding of phenomena characterized by multiple related variables. In a GCM, variables are typically represented by nodes and nodes are partitioned into blocks, with a natural ordering between blocks, as follows: block 1 - explanatory variables; block 2 - variables that are responses of the variables in block 1 and explanatory for the variables in the subsequent blocks, and so on. GCMs are fitted via a series of univariate regressions [25], thus standard inferential procedures for generalized linear models, based on maximum likelihood, can be implemented. Here, we used the gRchain routine developed in the software R. Missing values were filled in with the k-nn method [26]. To check whether the underlying assumption of missing at random was reasonable, analyses were also performed on complete data only, showing no major differences.

In the graphical representation of the GCM, two nodes are joined by an edge whenever there is an association between the corresponding variables. Two kinds of edges are allowed: directed (←), also called an arrow, or undirected (-). An arrow is always used to join two variables in two different blocks, in line with the ordering between the blocks, implying a covariate-response relationship. An undirected edge is always used to join variables in the same block, reflecting the fact that a residual association between the variables remains, also after conditioning on all the variables in the preceding blocks. Further details on GCMs are given in the Supplementary Material and in the section Modelling.

Results

Variable selection and preliminary univariate and bivariate analyses

108 patients were included in the present study. Twelve patients were removed because of missing data for a large proportion of variables (Supplementary Fig. 1). The remaining 96 patients were included in the main analysis. Of these, 29 patients withdrew their consent to undergo HRCT examination at the follow-up visit and 11 patients did not perform CPET because of the exclusion criteria or because they withdrew informed consent to the examination.

The full dataset included 99 variables (Supplementary Table 1). Preliminary analyses were conducted mainly to investigate bivariate associations between potential outcome (objective findings and subjective symptoms) and background variables such as age, gender, BMI, smoking status, as well as intermediate variables such as admission to the intensive care unit (ICU) during the acute phase of the disease and length of follow-up. Based on these findings, and also on the current focus of the literature on long COVID [27,28,29], 12 clinically relevant variables were selected for further investigations (Table 1).

Table 1 List of key variables selected for the implementation of the graphical chain model

These variables were partitioned into blocks, as follows: block 1 - explanatory variables: this set consisted of four variables: gender, smoking, age and BMI; block 2 - intermediate variables: ICU admission during the acute phase of the disease, days between hospital discharge and follow-up visit; block 3 - outcome variables: VO2, FEV1, D-dimer serum levels, depression score and presence of fatigue; block 4 - HRCT findings of lung damage, such as persistence of ground-glass opacities and/or fibrotic signs. The analysis of the outcome variables such as VO2 and HRCT findings was restricted to the subset of participants for whom these data were available.

Descriptive statistics of key variables are shown in Table 2. The median age of the study population was 60 [IQR 52–65] years. The population pyramid according to age and gender is reported in Supplementary Fig. 2. Histograms and bar plots are reported in Supplementary Fig. 3. Men showed a higher variability in terms of age than women. 45% of patients were active or past smokers, and 41% had a BMI > 30 Kg/m2. During the acute phase of the disease, 18% of patients were admitted to ICU. The median time between discharge and follow-up visit was 131 days with large variations [IQR 96–176 days]. 44% of patients reported the presence of the symptom fatigue, while 30% and 36% reported a DASS-42 score > 9 compatible with the presence of severe anxious and depressive findings, respectively.

Table 2 Descriptive statistics of the quantitative and qualitative variables

Scatter plots of the continuous outcome variables against the continuous variables in the same block and in the preceding blocks are shown in Supplementary Fig. 4. The scatter plot of VO2 against FEV1 showed a possible nonlinear relationship. The normal Q-Q plots of the continuous outcome variables, grouped according to the qualitative variables in the same block and in the preceding blocks, are reported in Supplementary Fig. 5. For the case of VO2, using gender as a grouping variable, the two conditional distributions had different shapes, with males exhibiting higher values and higher variability. Patients who had been admitted to ICU tended to have lower values than the others, however the upper quantiles of the two distributions tended to overlap. No major differences emerged from the other Q-Q plots of VO2. Similar behaviour was found for the Q-Q plots of D-dimer and FEV1 against gender. The Q-Q plot of D-dimer against ICU showed that patients who had been admitted to ICU tended to have upper quantiles lower than the others. Patients complaining of fatigue tended to have uniformly lower values of FEV1 than others (p-value 0.004). The Shapiro Wilk test for normality was also performed on the marginal distributions (p-values: 0.478 for VO2, 0.035 for FEV1; <0.001 for D-dimer). This led us to log transform the variable D-dimer (after adding 0.01 to avoid taking the log of zero) to achieve normality (p-value after transformation 0.007).

Modelling

The GCM is depicted in Fig. 1. Due to the properties of conditional independence [12], a GCM is fitted as a series of univariate regressions - see the Steps in the Supplementary Material. For quantitative response variables, linear regression models are used. For binary response variables logistic regressions are used. For qualitative ordered variables, cumulative logit models are used. The summary of each univariate regression, together with the normal Q-Q plots of the residuals for continuous outcome measures, is reported in Supplementary Figs. 5 and 6. As the results may be sensitive to the order of the univariate regressions performed in Step 1, the stability of the analysis was confirmed after checking for robustness against the choice.

Fig. 1
figure 1

The Graphical Chain Model (GCM). In the GCM two variables are joined by an edge. Two kinds of edges are allowed: directed (←), also called arrow, or undirected (-). An arrow is always used to join two variables in two different blocks. Let A be in block 3 and B be in block (1) A←B implies that B is an explanatory variable of A. An undirected edge is always used to join variables in the same block. Let A be in block 3 and C be also in block 3. Then A-C means that the two variables are associated after conditioning on all the variables in block 1 and 2. BMI: body mass index; VO2: peak oxygen consumption; log-DDIMER: Serum D-dimer level after logarithmic transformation; ICU: admission to intensive care unit; HRCT: high resolution computed tomography; FEV1: Forced expiratory volume in one second

The inspection of the first block revealed an association between BMI and gender. Considering the variables in the second block, BMI and smoking status had a significant impact on the probability of being admitted to ICU. VO2 showed associations with length of follow-up, age, BMI and gender. After conditioning on these covariates, VO2 was no longer related to previous ICU access. D-dimer levels and FEV1 both showed dependency on age, gender and BMI. After taking the relevant covariates into account, FEV1 was related to complaining of fatigue, thereby confirming what emerged from the preliminary bivariate analysis. In turn, fatigue was significantly associated with the depression score. Due to the high correlation between the anxiety and depression scores, results did not change if anxiety scores were considered instead of depression scores.

Notably, neither fatigue nor depression were responses to any of the variables of block 2, including length of follow-up. VO2 was not influenced by smoking status after conditioning on length of follow-up, age, BMI and gender, and it was also found to be independent of D-dimer levels after conditioning on length of follow-up, age, BMI and gender. Further investigations involving the possible role of comorbidities (hypertension, dyslipidaemia, type II diabetes mellitus, atrial fibrillation, previous coronary artery events, chronic obstructive pulmonary disease, and thyroid disease) and concomitant medications (beta-blockers, antiplatelet drugs, anti-hypertensive drugs, hypoglycemic drugs and statins) did not show any significant effect on the outcome variables in block 3 (Supplementary Table 6), with the sole exception of significant effects of type II diabetes mellitus (+), hypertension (+), and statins (-) on D-dimer levels.

Concerning the fourth block, HRCT findings were related to VO2, age, gender, ICU access and follow-up duration. Finally, the follow-up duration did not have an impact on FEV1.

Discussion

In our study, we attempted to describe the complex relationship between baseline individual characteristics, manifestations of the acute phase of COVID-19, and long-COVID-related signs and symptoms observed in a cohort of 96 patients previously hospitalized for COVID-19-associated pneumonia. We applied the GCM strategy to identify interconnections and independent associations between variables after partitioning into blocks with a natural order. Since post-acute sequelae of COVID-19 can involve multiple organs and present with a variety of clinical features, it is helpful to adopt a modelling strategy that allows investigation of the interdependency of multiple variables. Overall, results from our analysis confirmed that GCMs proved to be effective in allowing the use of the prior knowledge of the order between blocks of variables and standard statistical methods to investigate the structure of associations or dependencies.

In our study cohort, we observed that both obesity and active smoking had a role in predicting ICU access during hospitalization, which is a hallmark of COVID-19 severity. Obesity, together with age and female gender, negatively affected FEV1 and peak oxygen consumption at follow-up. In turn, FEV1 at follow-up demonstrated a negative correlation with the occurrence of fatigue symptoms, while this association was not observed with peak oxygen consumption. Additionally, fatigue symptoms were also influenced by depressive symptoms. Finally, we showed that persistence of ground-glass opacities and fibrotic signs at HRCT during follow-up were independently predicted by age, gender, ICU admission and follow-up duration. An inverse association between HRCT findings and peak oxygen consumption was independent of the effect of the other outcome variables. These findings overall indicated that some features of long COVID, such as fatigue symptoms, being linked to both depressive disorders and objective measures of organ dysfunction, could persist in the long term and negatively impact on the quality of life. Our results also suggest the necessity for a personalized, multi-level, process-based intervention that adequately addresses the complexity of the biopsychosocial network of subjective symptoms and objective findings in the treatment of individuals suffering from long COVID.

Our results should be interpreted in the context of previous literature. Monteiro et al. first demonstrated in an inpatient population of 112 individuals diagnosed with COVID-19 during the first wave, that obesity and active smoking, along with increased inflammatory markers of acute phase such as procalcitonin, IL-6 and ferritin, independently predicted the need for mechanical ventilation [30]. These findings have been corroborated by subsequent studies, including a large meta-analysis, which found that obesity is associated with an increased risk of ICU admission and, in a dose-dependent manner, with the need for mechanical ventilation [31]. Although a unifying pathophysiological theory is still lacking, several hypotheses suggest that excess adiposity may increase the risk of severe COVID-19. These include increased expression of Angiotensin Converting Enzyme-2 (ACE2) receptor in adipocytes [32] and specific immunological signatures that predispose obese individuals to an enhanced cytokine storm [33].

Obese patients also showed impaired aerobic exercise capacity six months after acute Sars-CoV-2 infection [34]. Compared to non-obese individuals, they also showed increased odds of having FEV1lower than 80% of predicted value one year after hospitalization for severe COVID-19 [35]. Whereas the restrictive lung pattern related to fibrosis is recognized as a classical long-term radiological feature of severe COVID-19, emphysematous abnormalities, which can negatively impact on FEV1, and impaired cardiorespiratory response to exercise have often been observed in obese patients with chronic post-COVID-19 symptoms [3]. These symptoms are typical features of an exaggerated hyperventilatory response and impaired gas exchange at peak exercise.

Our results align with these observations, outlining a phenotype of obese patients who are prone to increased odds of ICU admission and, over the long term, may exhibit reduced FEV1 and diminished oxygen consumption during exercise.

In this context, it is important to focus also on the predictors of chronic fatigue, a distinct hallmark of long COVID that has been extensively investigated. Several studies have hypothesized a link between lung lesions, low oxygen saturation, the immune-inflammatory response to viral invasion and the onset of psycho-affective symptoms observed 3–4 months post-acute infection [36, 37]. Although the precise mechanisms underlying chronic fatigue symptoms remain incompletely elucidated, our findings align with existing evidence suggesting that fatigue may be the expression of both psycho-affective impairment and lung functional decline. However, in contrast to other studies, our research reveals that fatigue symptoms in our cohort were not directly correlated with lung features observed at HRCT.

It is well acknowledged that COVID-19 survivors experience a range of neuropsychological disturbances, including anxiety, depression, cognitive impairment, sleep problems, ageusia, anosmia, and brain fog [38]. Whereas some of these symptoms may be biologically linked to Sars-CoV-2 neurotropism, the pathophysiology of others remains a subject of debate. In accordance with our results, depression after COVID-19 was found to be the only predictor of persistent fatigue in a cohort of 495 patients who recovered from COVID-19 from whom clinical and psychopathological characteristics including fatigue presence and severity were collected at one, three, six and twelve months after infection [39].

In previous research there is a notable indication that invasive and non-invasive mechanical ventilation, as well as the length of stay in ICU for patients with severe COVID-19, may predispose individuals to a higher frequency of fibrotic lesions seen at HRCT. Sturgill et al. conducted a comprehensive examination of patient outcomes, specifically focusing on the incidence of lung fibrotic changes following COVID-19-related acute respiratory distress syndrome (ARDS) in comparison to non-COVID-related ARDS [40]. They found that fibrotic changes at HRCT imaging occurred more frequently in COVID-19 survivors (70%) than in the non-COVID group (43%, p-value < 0.001). Interestingly, fibrotic lesions were associated with ICU length of stay, and patients surviving pneumonia-ARDS frequently showed impairments in physical, emotional, and cognitive health [41]. Our data revealed an inverse relationship between the presence of lung lesions and the duration of the follow-up, suggesting a progressive nature of the healing process. Intriguingly, Wu et al. systematically gathered clinical and radiological data from 11 patients diagnosed with severe acute respiratory syndrome (SARS) during the 2003 outbreak, conducting serial follow-up thin-section CT scans at 3, 6, and 84 months. Notably, the extent of the lesions observed in the CT scans demonstrated a reduction at both 6 and 84 months when compared to the initial assessment at 3 months [41].

The findings from our study should be interpreted in the light of several limitations. Firstly, the relatively small sample size hinders us from drawing definitive conclusions regarding the observed relationships between variables. Additionally, despite the longitudinal nature of our observations, our research is constrained by the inherent challenge of establishing cause-and-effect relationships outside experimental contexts. To this end, it should be noticed that GCM inherit all limitations of statistical models when applied to observational studies and are not tools to infer causal inference. Nevertheless, the biological plausibility of the relationships between variables, as described by the GCM, underscores the effectiveness of this approach as a valuable statistical tool for unravelling structural features, such as conditional dependencies and associations. This promising method holds potential for investigating the long-term health implications of COVID-19 by identifying predictive factors and informing suitable therapeutic strategies.

Our findings, together with the limitations noted above, suggest several directions for future research. Generalizability would be enhanced by extending the investigation to include a wider selection of patients, for example in a multicentre study covering different regions. Longer follow up, tracking patients across several time points, would increase the ability to infer causal relationships. Furthermore, the observed link between depressive symptoms and fatigue suggests the desirability of investigating additional psychosocial factors.

Conclusions

We demonstrated the suitability of employing GCMs to elucidate the predictors of long COVID symptoms and signs of organ dysfunction in a cohort of previously hospitalized COVID-19 patients. In this model, variables are ordered in blocks based on the temporal sequence of events. Within our database, the application of GCM revealed a distinct cluster of patient characteristics - such as age, BMI, gender and smoking status - that exerted a significant impact on the severity of the acute COVID-19, as indicated by ICU admission. These variables also played a crucial role in the occurrence of long-term signs and symptoms linked to both functional and structural signs of lung dysfunction, including fatigue. This latter, however, was not solely predicted by features of organ dysfunction, but also by depressive symptoms. A GCM offers valuable insight into defining appropriate preventive strategies and informed therapeutic decisions to mitigate the health impact of long COVID.