FormalPara Take-home message

Phenome-wide association studies and subsequent multi-omics investigations provide novel biomarkers involving acute respiratory distress syndrome (ARDS). These findings present a promising prediction model with potential clinical utility, enhancing early clinical management and intervention strategies for ARDS development.

Introduction

Acute respiratory distress syndrome (ARDS) is a severe and fatal manifestation of respiratory failure, characterized by diminished lung compliance, tachypnea, and profound hypoxemia [1, 2]. The overall incidence of ARDS in intensive care unit (ICU) has been documented to be approximately 10%, with a mortality as high as 46% and even reaching 70% during the coronavirus disease 2019 (COVID-19) pandemic [3, 4]. Therefore, it is crucial to identify individual causal factors underlying the pathogenesis of ARDS, which would potentially improve early clinical management and increase the survival of ARDS.

Currently, several risk factors have been implicated in ARDS development, including sepsis [5], pneumonia [6], aspiration of gastric contents [7], and severe trauma [8]. However, relying solely on clinical risk factors poses challenges in accurately predicting patients who will develop ARDS or who with ARDS will survive. Intriguingly, emerging evidence showed that certain individuals may develop ARDS when exposed to COVID-19 but not to influenza, suggesting potential discrepancies in disease susceptibility or host response to specific pathogens [9, 10]. Amounts of studies have elucidated the significant contribution of genetic susceptibility in ARDS development, such as functional variants in genes encoding angiotensin-converting enzyme and surfactant protein B [11, 12]. In total, over 40 susceptible genes have been identified to have associations with the development or outcome of ARDS, which were mainly obtained from genome-wide association studies (GWAS) [13]. In addition, phenome-wide association studies (pheWAS) have emerged as a valuable tool to investigate associations between phenotypes or traits and specific single nucleotide polymorphisms (SNPs) within a comprehensive database [14]; and it can leverage GWAS summary statistics to decipher susceptible genes implicated in a particular trait of interest [15]. These two approaches, GWAS and pheWAS, are mutually complementary as they can corroborate and validate each other’s findings, enhancing the robustness of genetic discoveries [16]. Moreover, Mendelian randomization (MR), usually based on GWAS summary statistics, is a powerful approach to assessing the causal relationship between an exposure and an outcome [17].

In this study, we conducted an extensive Mendelian randomization phenome-wide association study (MR-pheWAS) to comprehensively profile potential individual causal factors associated with ARDS development. Subsequently, we employed peripheral blood transcriptome analysis and used a preclinical mouse model to validate and identify biomarkers implicated in the pathogenesis of ARDS. Ultimately, we developed a risk prediction tool for enhancing clinical management, patient care, and informed decision-making in ARDS.

Methods

Data available

Publicly available GWAS summary statistics of 33,150 traits, deposited in R package TwoSampleMR, were obtained from the Medical Research Council Integrative Epidemiology Unit (MRC-IEU) open GWAS database up to 03/20/2022 [18]. GWAS summary statistics of ARDS in European populations, consisting of 1250 ARDS cases and 1583 non-ARDS controls from Identification of SNPs Predisposing to Altered Acute Lung Injury Risk (iSPAAR) consortium and Molecular Epidemiology of Sepsis in the ICU (MESSI) cohort, was derived from our previous study [19]. Details of enrolled participants were described in supplementary Methods.

Causal inference via MR analysis

Two-sample MR analysis was performed to assess causal estimates between traits and ARDS using R package TwoSampleMR. SNPs that reached genome-wide significance (P < 5e-8) were selected for each exposure trait. Clumping was then performed to obtain the independent genetic variants (r2 < 0.001, within 10 Mb windows) as instrumental variables (IVs).

Five common methods, including inverse variance weighted (IVW), weighted median, MR-Egger, simple mode, and weighted mode, were applied to calculate the causal effects of each trait on ARDS development. Considering the minimal number of IVs required in each method, the significant association was determined based on the following criteria: (a) For traits with three or more IVs: P < 0.05 in at least one of five methods, P > 0.05 in both MR-Egger and IVW heterogeneity test, and P > 0.05 in MR-Egger pleiotropy test; (b) For traits with two IVs: P < 0.05 in IVW test, and P > 0.05 in IVW heterogeneity test; (c) For traits with only one IV: P < 0.05 in Wald ratio test. The results of MR analysis, heterogeneity test, and pleiotropy test for traits with more than three IVs were listed in Supplementary Table E1. Visualization and deposition of all MR results were generated using R package shiny.

Transcriptomic profile of blood of ARDS and non-ARDS patients

Blood samples of 160 ARDS cases and 142 non-ARDS controls were collected for RNA sequencing (RNA-Seq) analysis, among which participants were recruited from Molecular Epidemiology of ARDS (MEARDS) prospective cohort study (ClinicalTrials.gov Identifier: NCT00006496) [20], part of iSPAAR consortium. Data process, quality control, and data analysis follow our previous study [19]. Briefly, 19,898 protein-coding genes were identified for transcriptome analyses, including analyses of differential expression and immune cell decomposition via CIBERSORTx [21], which provided an estimation of the abundances of 22 immune cell types. Correlation matrix of candidate genes and immune cells underlying Spearman rank correlation analysis was performed via R package corrplot.

Transcriptomic profile of lung tissues of a preclinical mouse model

A preclinical lipopolysaccharide (LPS)-induced lung injury model was applied to investigate the potential biological effects of candidate genes in the duration of ARDS. Lung tissues were collected from mice exposed to LPS, and their corresponding transcriptome was detected via Affymetrix Mouse Genome 430 2.0 Array. The generated data was deposited in Gene Expression Omnibus under the access number GSE9314. We followed the previously published protocol to clean and analyze the collected data [22]. Specifically, we examined the changes in the expression of candidate genes at different time points (i.e., 1, 2, 4, and 18 h). The observed alterations in gene expression suggest the significant biological functions of candidate genes in the development of ARDS.

Construction of prediction model and nomogram

In the prediction model, a logistic regression model, incorporating Acute Physiology and Chronic Health Evaluation (APACHE) III Score, sex, body mass index (BMI), bacteremia, and sepsis condition of patients, was performed as the basic model. Subsequently, candidate gene expression and immune cell proportions were added into the basic model. The performance of the prediction model was evaluated using receiver operator characteristic (ROC) curves and the area under the curve (AUC) with the utilization of R package pROC. The AUC provided an assessment of the model's ability to discriminate between individuals who developed ARDS and those who did not. Furthermore, 95% confidence intervals (CI) were estimated using the ci.auc function, ensuring the reliability of the AUC estimates. Delong test was used to calculate the P-value for the candidate prediction model by comparing its AUC to that of the basic model [23]. Besides, a nomogram was constructed to facilitate the risk prediction of ARDS development based on the multivariate logistic regression model that demonstrates the optimal prediction performance. This nomogram was created using R package rms and served as a graphical tool to estimate an individual's risk of developing ARDS. The corresponding calibration was performed via Hosmer–Lemeshow test.

Statistical analysis

Wilcoxon signed-rank test, t test or χ2 test were used to determine statistical differences of clinical features or candidate genes between ARDS cases and non-ARDS controls when appropriate. ANOVA test was performed to determine statistical differences among gene expression in mice lung tissue samples at different time points. All statistical analyses were performed using R (version 4.2.1).

Results

Causal factors profiles of ARDS via MR-pheWAS

The overall workflow of the study is depicted in Fig. 1. Initially, a total of 33,150 traits were evaluated for their causal effects on ARDS development. Following a rigorous filtering process, 1,736 traits were found to be significantly associated with ARDS for further investigation. These traits were then categorized into three groups, including 1223 blood genes, 159 plasma proteins, and 354 non-gene phenotypes. Particularly, 354 non-gene traits were further classified into six subgroups, namely Biochemistry, Anthropometry, Disease, Nutrition and Habit, Immunology, and Treatment (Fig. 2A and supplementary Table E2).

Fig. 1
figure 1

Flowchart of the study design. This study contains three main stages: identification, validation, and application. In the identification stage, an MR-pheWAS was conducted to profile potential individual causal factors and biomarkers associated with ARDS development. In the validation stage, we used peripheral blood transcriptome analysis and a preclinical mouse model to validate biomarkers implicated in the pathogenesis of ARDS. In the application stage, we developed a CARDS platform including individual causal factors and biomarkers, as well as a risk prediction tool to enhance clinical management and informed decision-making for ARDS

Fig. 2
figure 2

Results of causal factors on ARDS via MR-pheWAS and corresponding visualization. A Bar chart categorizing 354 non-gene phenotypes into six subtypes manually. B Overview of CARDS (https://mulongdu.shinyapps.io/cards/). CARDS includes three modules: “MR-pheWAS”, “ARDS RNA-Seq”, and “Citation & Contact”. The usage is illustrated using “transmembrane protein” as an example. C Correlation of effect size between blood genes and their encoding plasma proteins according to MR

To facilitate access to our findings, we developed a user-friendly online tool: Causal traits for Acute Respiratory Distress Syndrome (CARDS; https://mulongdu.shinyapps.io/cards/). This visualization tool comprises three modules of “MR-pheWAS”, “ARDS RNA-Seq”, and “Citation & Contact”. For instance, the user can enter “transmembrane protein” in the “Search” box of “MR-pheWAS” module to obtain its causal effect on ARDS development (Fig. 2B).

Further, to integrate the genetic information, we compared the causal estimates of both blood genes and plasma proteins derived from MR analysis. Interestingly, we observed seven circulating biomarkers in consistent direction of effect estimates (Fig. 2C), including CPXM1, IL7R, P13, CTSS, SIGLEC7, ENG, and HBZ.

Validation of gene biomarkers in ARDS human blood and mouse lung tissues

To validate the potential blood gene biomarkers, we conducted RNA-Seq analysis on blood transcriptome of 160 ARDS cases and 142 non-ARDS controls. By merging putative blood genes identified through MR analysis, we observed 988 available blood genes and five of which, including TMEM176B, SLC2A5, CDC45, HTRA3, and VSIG8, exhibited differential expression between cases and controls (four up-regulated and one down-regulated; Fig. 3A–D and supplementary Fig. 1). As HTRA3 showed inconsistent effects in both MR analysis and RNA-Seq results, we therefore kept four remaining genes for subsequent analysis. As well, the reactive version of Supplementary Fig. 1 is accessed on CARDS platform, in which users can select the dots of interest to view the results of differential expression analyses.

Fig. 3
figure 3

Expression patterns of four candidate causal genes. AD Expression levels of TMEM176B, SLC2A5, CDC45, and VSIG8 in 160 ARDS cases and 142 non-ARDS controls. P values were calculated by Wilcoxon signed-rank test. EH Gene expression of four genes in mice after exposure to LPS for 1,2,4,18 h (s) from GSE9314. P values were calculated via ANOVA test

In addition, we assess the potential biological effects of four candidate genes in ARDS pathobiology via a preclinical model of lung injury induced by LPS and observed their dynamic expression patterns (Fig. 3E–H) that both SLC2A5 and VSIG8 showed the increased expression in response to LPS exposure, and TMEM176B initially exhibited decreased expression in the first 4 h after LPS exposure but then showed a dramatic increase. Conversely, CDC45 presented the opposite pattern, with increased expression initially followed by a decrease.

Correlation between candidate gene expression and immune cell proportion

Considering the potential role of the four mentioned genes in specific immune pathways [24,25,26,27], we explored their relationships with immune cell proportions decomposed via the blood transcriptome. Initially, five out of 22 immune cell fractions were significantly upregulated in 160 ARDS cases compared to 142 non-ARDS controls, including naïve B cells, activated CD4 + memory T cells, regulatory T cells (Tregs), M0 macrophages, and M2 macrophages (supplementary Fig. 2). The correlation among candidate genes and abundances of five immune cell types were further analyzed for ARDS cases and non-ARDS controls, respectively, and we intuitively observed distinct correlation pattern between cases and controls (Fig. 4). Specifically, TMEM176B showed a significant positive correlation with Tregs in ARDS cases but with M2 macrophages in non-ARDS controls; SLC2A5 was in a negative correlation with activated CD4 + memory T cells in cases but negligible in controls; CDC45 exhibited positive correlations with both activated CD4 + memory T cells and Tregs in cases, while with M0 macrophages in controls; however, there was no significant correlation between VSIG8 and immune cells.

Fig. 4
figure 4

Correlation of gene expression and immune cell fraction in ARDS cases and non-ARDS controls. Correlation matrix plot showed pairwise similarity (Spearman correlation) between four blood genes and abundances of five immune cell types across 142 non-ARDS controls (A) and 160 ARDS cases (B). The size and color shade of the squares in each cell represents the strength of the correlation, with a star (*) indicating statistical significance (P < 0.05)

Construction of ARDS risk prediction model via blood gene expression and immune cell proportion

With the basic model (APACHE III score, sex, BMI, bacteremia, and sepsis) as a reference, we observed an increased AUC value with the inclusion of candidate biomarkers (supplementary Table E3). Notably, both the combination of four blood genes (AUC = 0.756, 95% confidence interval (CI) = 0.683–0.828, PDelong = 0.059; Fig. 5A) and the additional combination of five blood immune cell proportions (AUC = 0.791, 95% CI = 0.722–0.860, PDelong = 0.004; Fig. 5A) had significantly higher AUC values than the basic model (AUC = 0.725, 95% CI = 0.646–0.804; Fig. 5A). Moreover, we developed a risk prediction nomogram based on the optimal model including variables of APACHE III score, sex, BMI, bacteremia, sepsis, four blood genes, and five blood immune cell proportions (Fig. 5B). Each patient’s characteristic was assigned “Points” based its position on the respective axis, and the total points were calculated on the “Total Points” line to determine an individual's probability of developing ARDS as indicated on the “Risk of ARDS” line. The calibration plot demonstrated the optimal agreement between the prediction probabilities from the nomogram and the actual observations (Fig. 5C). Furthermore, we observed the differential performance of the optimal prediction model for each ARDS subtype (supplementary Table E4).

Fig. 5
figure 5

Risk prediction models for ARDS. A ROC curve depicting the performance of basic model, basic model with the combination of four blood genes, basic model with the combination of five immune cell proportions, and basic model with above both. B Nomogram for prediction of ARDS occurrence. C Calibration curve of the nomogram for predicting ARDS occurrence. The black columns and red dots show predicted values, while the gray columns and blue dots show actual values. The corresponding calibration was performed via Hosmer–Lemeshow test

Discussion

In this study, we utilized the MR-pheWAS framework to profile 1736 potential individual causal factors involving ARDS development, among which we determined biomarkers of four blood genes and five blood immune cell proportions through human blood and a preclinical mouse lung model. Ultimately, we constructed a CARDS platform to show all candidate causal factors and a nomogram to effectively predict ARDS risk.

The use of MR analysis has become widespread for assessing causal relationships between exposures and outcomes of interest [28]. In this study, we employed an MR-pheWAS approach, which represents a powerful and novel design in causal inference [29]. This allowed us to expand the scope of our investigation by conducting thousands of MR analyses encompassing the entire phenome, enabling us to uncover the potential causal factors contributing to ARDS development. Moreover, we confirmed and validated findings through transcriptomic data of both human blood and mouse lung tissues, ultimately leading to the identification of TMEM176B, SLC2A5, CDC45, and VSIG8 as promising biomarkers for ARDS.

A recent study demonstrated that knockout of TMEM176B enhanced inflammasome activation and interleukin-1β release, leading to an augmentation of CD8 + T cell-mediated inhibition of tumor growth [30]. Additionally, TMEM176B acts as a negative regulator of NLRP3 inflammasome activation and downstream macrophage stimulation [31]. Molecular mechanism studies have revealed that GLUT5 (encoded by SLC2A5)-mediated fructose utilization was required to suppress AMPK activity and subsequently activate mTORC1 activity [32]. Moreover, an enrichment analysis showed that high SLC2A5 expression was positively correlated with gene sets of the inhibition of macrophages and T cells [33]. CDC45 has been shown to interact with minichromosome maintenance proteins and DNA polymerase alpha [25]. The elevated CDC45 could facilitate the transition from G1 phase to S phase by modulating the expression of cell cycle-related genes [34]. VSIG-8, a member of the V-set and immunoglobulin domain family, has been reported to interact with V-region immunoglobulin-containing suppressor of T cell activation, leading to the inhibition of T cell function [35]. Collectively, these studies indicated the potential immunoregulation involved in the development of ARDS.

T cells appear to be an important modulator in resolving lung injury evidenced by mouse models and human biospecimens [36, 37]. Interestingly, our study focused on transcriptome analyses to dissect the functions of T cells in ARDS, revealing that TMEM176B highly influenced Tregs fractions in blood. Besides, TMEM176B showed a dynamic expression pattern in the duration of ARDS. These results suggest the vital role of TMEM176B in lung immunity, particularly through the regulation of Tregs, during the dynamic development of ARDS.

To facilitate early intervention and treatment for ARDS, it is important to establish a pre-respiratory failure ARDS diagnosis. Clinically, researchers have emphasized the importance of implementing protective lung ventilation strategies in preventing ventilator-induced lung injury, exemplified by low tidal volume ventilation and positive end-expiratory pressure [38, 39]. Similarly, adhering to restrictive transfusion strategies based on established hemoglobin thresholds and clinical indications is pivotal in minimizing the risk of transfusion-related complications [40]. In addition, an effective prediction model would also improve the clinical management of ARDS. In this study, we observed a moderate baseline prediction model based on clinical basic variables of APACHE III score, sex, BMI, bacteremia, and sepsis, but the modeling performance was dramatically increased by incorporating four blood gene biomarkers and five blood immune cell proportions. These findings highlight the possibility of improving the precision prediction of ARDS development through introducing molecular biomarkers.

We acknowledged limitations in the present study. First, the methods for MR analysis possess inherent deficiency when inferring causality for a large number of traits, as was the case with our analysis of 33,150 traits. Therefore, further investigations should aim to refine our findings using more robust MR methodologies and validate them using randomized controlled trials. Second, the biological functions of candidate biomarkers were rarely studied, particularly from an immunological perspective. Further studies are needed to explore the underlying mechanisms of how four genes contribute to the occurrence and development of ARDS, focusing on their specific roles within the immune system. Third, while our ARDS risk prediction model showed promise, its performance could potentially be improved by incorporating additional clinical variables, such as the Sequential Organ Failure Assessment score and the measure of fragility. Additionally, the corresponding performance comparison against other well-established models is warranted.

In conclusion, this comprehensive study highlights the importance of utilizing the MR-pheWAS framework in unraveling the complex etiology and pathogenesis of ARDS, and further underlines the value of integrating blood-based biomarkers into the basic model in enhancing the precision of ARDS risk prediction. These findings have significant implications for risk assessment, early detection, and potential guidance of therapeutic strategies in the management of ARDS.