Introduction

Idiopathic pulmonary fibrosis (IPF) is a chronic, progressive, fibrosing interstitial lung disease with a high unmet medical need. It is characterized by scarring of the lungs, which disrupts the process of gas exchange, and may ultimately result in irreversible organ failure (Jia et al. 2023). The course of the disease is unpredictable, with a median survival rate of 3–5 years after diagnosis (Tomoto et al. 2024). IPF mainly affects individuals in middle and old age, with a higher prevalence rate in males than in females (Jia et al. 2023; Zhu et al. 2024). The incidence of IPF was recently reported to be 2.23 cases per 100,000 people with a prevalence of 10.0 per 100,000 people (Tomoto et al. 2024).

The etiology of IPF remains incompletely elucidated. However, it is postulated that a combination of environmental and genetic factors is responsible for the progression of the disease (Glass et al. 2020). The histological characteristics of IPF include the constant occurrence of micro-injuries to alveolar epithelial cells. These injuries result in aberrant communication between epithelial and fibroblast cells, which in turn leads to the excessive production of extracellular matrix (ECM), collagen deposition, the deterioration of the alveolar structure, and a subsequent loss of lung function (Glass et al. 2022).

Cigarette smoking, the inhalation of particulate matter, such as metal and wood dusts, pollution, are the major environmental factors that contribute to the development of IPF. Environmental factors have been demonstrated to induce epigenetic modifications, including alterations in DNA methylation, histone modifications, and microRNA (miRNA) expression, as observed in IPF (Wolters et al. 2014; Glass et al. 2020).

Several genetic variants that affect telomere length, epithelial integrity, adhesion, migration, and apoptosis have been identified as common contributors to genetic susceptibility (Richeldi et al. 2017). Genome-wide association studies have demonstrated that a single nucleotide polymorphism in the promoter region of the MUC5B gene (rs35705950) is the most significant risk factor for familial IPF, accounting for at least 30% of the overall risk of developing IPF (Schwartz 2018). In addition, mutations in toll-interacting protein, which regulates the innate immune response, have been linked to increased susceptibility to pulmonary infections (Spagnolo et al. 2021).

Despite intensified research and increased efforts on treatment options for IPF, a cure for the disease remains elusive. Two drugs, pirfenidone and nintedanib, are currently employed to retard the progression of the disease. The administration of pirfenidone has been shown to confer benefits to patients with advanced pulmonary fibrosis, reducing the risk of mortality and hospitalization (Nathan et al. 2019). The precise mechanism through which pirfenidone treats IPF is not yet fully understood. However, it is postulated that it may act by inhibiting the proliferation of fibroblasts and the differentiation of the cells into myofibroblasts (Aimo et al. 2022). Nintedanib, a triple tyrosine kinase inhibitor, has been demonstrated to exhibit anti-fibrotic activity through multiple mechanisms, including the inhibition of fibroblast to myofibroblast differentiation, the suppression of inflammation and angiogenesis, and the inhibition of epithelial to mesenchymal transition (EMT) (Makino 2021).

Diagnosis of IPF can be challenging due to the initial symptoms, such as breathlessness and cough, which may be related to advancing age, heart disease, or chronic obstructive pulmonary disease (Glass et al. 2020). Pulmonary function tests offer a non-invasive and quantitative assessment of the extent of IPF. Additionally, high-resolution computed tomography of the chest is of significant value in the diagnosis of IPF (Alsomali et al. 2023). In cases where the results are inconclusive, a lung biopsy is performed to provide further clarification (Glass et al. 2020).

The lack of reliable diagnostic and prognostic biomarkers highlights the necessity for innovative methods in biomarker discovery. A multi-omics approach to comprehending the pathogenesis of IPF has the potential to reveal molecular signatures, leading to significant advancements in diagnosis, prognosis, and therapy. Previous studies have identified thousands of differentially expressed genes (DEGs) in IPF through gene expression profiling (Konigsberg et al. 2021; Borie et al. 2022; Li and Niu 2022). However, to unlock their full potential as systems diagnostics, existing data need to be integrated with comprehensive information at different levels.

This study presents an original investigation and meta-analysis of genome-wide expression data from patients with IPF and controls. Specifically, we identified DEGs and associated molecular mechanisms. A systems science approach was employed to integrate these data with comprehensive human biological networks to identify key molecular signatures of IPF. Integrative analysis of gene expression data with protein-protein interaction (PPI), genome-scale metabolic, and transcriptional and post-transcriptional regulatory networks revealed hub proteins and reporter molecules (metabolites, transcription factors (TFs) and miRNAs), respectively. Furthermore, a drug repurposing approach was applied to determine potential drugs for IPF.

Materials and methods

Gene expression data acquisition

To collect transcriptome datasets associated with IPF, we extensively searched the publicly available database NCBI Gene Expression Omnibus (GEO). We selected datasets based on the following criteria: (i) samples should include both IPF and control phenotypes, (ii) each classified phenotype (i.e., IPF and controls) should have at least three samples, and (iii) samples should be obtained from human tissues. Four datasets (GSE173355 (Konigsberg et al. 2021), GSE150910 (Furusawa et al. 2020), GSE134692 (Sivakumar et al. 2019), and GSE124685 (McDonough et al. 2019) meeting the inclusion criteria were selected for meta-analysis (Table 1). The demographic characteristics of the participants from whom the lung tissues were obtained are presented in Table S1.

Table 1 Gene expression datasets used in this study

Differential expression analysis

DEGs were identified by analyzing each dataset individually. The DESeq2 package (Alessandrì et al. 2019) in R/Bioconductor (v4.1.3) (McDermaid et al. 2019) was used to determine DEGs between phenotypes in GSE173355 and GSE150910. DESeq2 fits a generalized linear model of the negative binomial family and uses the Wald test for statistical significance, and the median of ratios method for the normalization of data. The Linear Models of Microarray Data (LIMMA) method (Ritchie et al. 2015) in R/Bioconductor (v4.1.3) (McDermaid et al. 2019) was used for the determination of DEGs between phenotypes in GSE134692 and GSE124685 datasets, since normalized data were provided for these datasets. The Benjamini-Hochberg method was used to control the false discovery rate (FDR). Statistical significance was determined using an adjusted p-value cut-off of 0.05 (adj. p-value < 0.05), and the expression patterns (i.e., upregulation and downregulation) were determined using a fold change cut-off of 1.5. Further analysis was performed on the common DEGs of all four datasets, and considered the core DEGs of IPF.

Overrepresentation analyses

To elucidate the underlying molecular mechanisms of IPF, overrepresentation analyses of core genes were conducted using ConsensusPathDB database (Kamburov et al. 2013). Gene Ontology (GO) terms (Ashburner et al. 2000) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways (Kanehisa and Goto 2000) were used as annotation sources. Significantly enriched terms of upregulated and downregulated core DEGs were identified using an adjusted p-value cut-off of 0.05.

Reconstruction and analysis of protein-protein interaction networks

BioGrid database (release 4.4.219) (Chatr-Aryamontri et al. 2017), which contains 278,389 experimentally detected physical PPIs among 11,134 proteins, was used to extract physical PPIs in humans. Two PPI subnetworks were reconstructed around upregulated and downregulated core DEGs. The resulting subnetworks were visualized using Cytoscape (v3.9.1) (Doncheva et al. 2019). Duplicated edges, self-loops, and significantly small, connected components were eliminated. The topological network properties were analyzed using CytoHubba (Wang et al. 2021), a Cytoscape plug-in. Hub proteins were identified based on degree and betweenness centrality metrics. The top 10 proteins with the highest degree and betweenness centrality measures were selected as hub proteins.

Identification of reporter metabolites

Reporter metabolite analysis was performed using a consensus genome-scale model of human metabolism, Human1, which contains 13,417 reactions, 10,138 metabolites and 3625 genes (Robinson et al. 2020). We conducted an integrative analysis of the metabolic model with IPF gene expression data using the RAVEN toolbox (Wang et al. 2018a) and COBRA toolbox (Sarathy et al. 2020) in MATLAB. Human metabolic model, core DEGs and their merged p-values were used as an input. The merged p-value of a DEG was calculated by combining its adjusted p-values in four datasets using Brown’s method via ActivePathways (Paczkowska et al. 2020) implemented in the R platform. The metabolic model was optimized using Gurobi Solver, and a p-value threshold of 0.05 was used to determine reporter metabolites. Enrichment analysis of reporter metabolites based on KEGG pathways was performed using Metabolites Biological Role (MBRole v2.0) (López-Ibáñez et al. 2016). Statistically significant KEGG pathways were identified using an FDR corrected p-value cut-off of 0.05.

Identification of reporter regulatory elements

Reporter regulatory elements (i.e., TFs and miRNAs) around which the most significant transcriptional changes occur were identified by integrating gene expression data with human transcriptional regulatory network using Reporter Features algorithm (Oliveira et al. 2008). The transcriptional regulatory network was constructed by extracting experimentally verified TF-gene and miRNA-gene interactions from the Human Transcriptional Regulation Interactions database (HTRIdb) (Bovolenta et al. 2012) and the miRTarBase database (v9.0) (Huang et al. 2022), respectively. The human transcriptional regulatory network includes 25,560 interactions between 828 TFs and 12,678 genes, and 387,836 interactions between 2867 miRNAs and 15,424 genes. Human transcriptional regulatory network, core DEGs and their merged p-values were used as an input for Reporter Features analysis. A p-value cut-off of 0.05 was applied to identify reporter TFs and miRNAs.

Drug repurposing analysis

Signature-based drug repurposing analysis were performed using Drug Gene Burder (DGB) (Wang et al. 2019b). Key genes were identified as those encoding hub proteins regulated by reporter regulatory elements, and potential therapeutic compounds targeting these key genes were identified. A list of small molecules that were predicted to reverse the expression patterns of key genes was obtained via DGB. The LIMMA method was used to determine the expression of the gene of interest in response to the predicted molecule. Statistically significant drugs that target key genes were selected based on experimental data extracted from the original Connectivity Map (CMap) (Lim and Pavlidis 2021) using a q-value threshold of 0.05. The identified drugs were investigated using Drugbank database (Wishart et al. 2018) and L1000 fireworks display (Wang et al. 2018b) and unknown drugs were eliminated.

Results

Transcriptomic response in IPF

A comparative analysis was performed on transcriptome datasets from lung tissue of IPF patients and healthy controls to identify DEGs and their expression patterns. In GSE173355, 2952 DEGs were upregulated and 2644 DEGs were downregulated when comparing disease and control states. A total of 3500 genes, with 2241 upregulated and 1259 downregulated genes, were identified as DEGs in GSE150910. In GSE134692, a total of 3595 DEGs were identified, with 2193 upregulated and 1402 downregulated genes. In GSE124685, the significant upregulation of 2314 genes, and significant downregulation of 1379 genes were observed. A total of 473 mutual DEGs were identified across four datasets, and these DEGs were designated as core DEGs in IPF (Table S2). Upon considering the expression patterns, core DEGs included 279 upregulated genes (Fig. 1a) and 194 downregulated genes (Fig. 1b).

Fig. 1
figure 1

Core DEGs of IPF and their overrespresentation analysis. (a) The distribution of upregulated core DEGs among four datasets, (b) The distribution of downregulated core DEGs among four datasets, (c) Significantly enriched KEGG pathways of upregulated core DEGs, (d) Significantly enriched KEGG pathways of downregulated core DEGs, (e) Top-10 significantly enriched GO biological processes of upregulated core DEGs, and (f) Top-10 significantly enriched GO biological processes of downregulated core DEGs

Overrepresentation analysis of core DEGs revealed the upregulation of cell adhesion-related processes, such as cell-matrix adhesion, leukocyte cell-cell adhesion, regulation of cell-cell adhesion, and ECM organization. Morphogenesis-related processes, such as morphogenesis of an epithelium, epithelial tube morphogenesis, and regulation of anatomical structure morphogenesis, were also upregulated. In addition, analysis showed that neurodevelopmental processes including axon guidance, neurogenesis, and nervous system development were upregulated. On the other hand, downregulated core DEGs were significantly enriched in processes associated with cardiovascular and circulatory systems, such as vasculature development, blood circulation, cardiovascular system development, heart morphogenesis, circulatory system development, and blood vessel morphogenesis. Furthermore, processes related to cell motility, and cell migration were observed to be downregulated (Fig. 1).

Pathway enrichment analysis indicated upregulation of axon guidance, protein digestion and absorption, focal adhesion, ECM-receptor interaction, complement and coagulation cascades, Staphylococcus aureus infection, and ferroptosis. On the other hand, hematopoietic cell lineage, neuroactive ligand-receptor interaction, calcium signaling pathway, some cancer-related pathways, including pathways in cancer, and Rap1 signaling pathway were downregulated. Additionally, several pathways associated with cardiovascular diseases, including vascular smooth muscle contraction, dilated cardiomyopathy, fluid shear stress and atherosclerosis, and adrenergic signaling in cardiomyocytes, were also downregulated. Two pathways associated with the immune system, cytokine-cytokine receptor interaction, and viral protein interaction with cytokine and cytokine receptor, were found to be enriched in both upregulated and downregulated core DEGs (Fig. 1).

Proteomic signatures of IPF

To identify hub proteins that may play important roles in disease progression, we reconstructed PPI networks around proteins encoded by core DEGs and their first neighbors. PPI subnetwork constructed around upregulated DEGs contained 624 proteins (i.e., 345 proteins that are the first neighbors of 279 proteins encoded by upregulated DEGs) and 710 interactions, while PPI subnetwork constructed around downregulated DEGs included 583 proteins (i.e., 389 proteins that are the first neighbors of 194 proteins encoded by downregulated DEGs) and 672 interactions. The ten proteins with the highest degree and betweenness centrality measures were identified as hub proteins, and those that were encoded by core DEGs were further analyzed.

Seven hub proteins, namely TMPRSS4, HSPA4L, ESR2, PFKP, FHL2, RUNX1, and NGFR, were encoded by upregulated genes. Transmembrane serine protease 4 (TMPRSS4) is a member of the serine protease family, and its malfunction is often associated with human diseases or disorders. Heat shock protein family A member 4 like (HSPA4L) can protect the heat-shocked cells from the harmful effects of aggregated proteins. It is highly expressed in leukemia cells and has a potential as a therapeutic target. Estrogen receptor 2 (ESR2) belongs to the family of estrogen receptors and the superfamily of nuclear receptor TFs. Platelet phosphofructokinase (PFKP) plays a crucial role in regulating glycolysis. Four and a half LIM domains 2 (FHL2) is believed to have a role in assembling extracellular membranes and functions in cell growth. RUNX family TF 1 (RUNX1) is a TF that is involved in the development of normal hematopoiesis, and in the lineage commitment of immature T cell precursors. It is also associated with several types of leukemia. Nerve growth factor receptor (NGFR) plays an important role in the differentiation and survival of specific neuronal populations during development.

Five hub proteins, namely ARRB1, SPTBN1, PAK4, EPAS1, and CLEC4E, were encoded by downregulated genes. Arrestin beta 1 (ARRB1) is a member of the arrestin/beta-arrestin protein family and is thought to regulate agonist-mediated GPCR signaling and receptor-mediated immune functions. Spectrin beta non-erythrocytic 1 (SPTBN1) is a member of the beta-spectrin family, and plays a critical role in the development and function of the central nervous system. P21 activated kinase 4 (PAK4) is a member of the serine/threonine p21-activated kinase family, which is involved in the actin cytoskeleton reorganization. Endothelial PAS domain protein 1 (EPAS1) is a TF involved in the induction of oxygen regulated genes. C-type lectin domain family 4 member E (CLEC4E) is a member of the C-type lectin/C-type lectin-like domain superfamily that has diverse functions, including cell adhesion, cell-cell signaling, glycoprotein turnover, and roles in inflammation and immune response.

Metabolic signatures of IPF

Reporter metabolite analysis revealed that the presence of IPF resulted in significant transcriptional changes around 31 metabolites (Fig. 2a). To clarify the differences and alterations in the metabolic activities in response to IPF, functional enrichment analyses of reporter metabolites were conducted. Our results highlighted several pathways, including oxidative phosphorylation, methane metabolism, collecting duct acid secretion, taste transduction, cardiac muscle contraction, and proximal tubule bicarbonate reclamation, Vibrio cholerae infection and epithelial cell signaling in Helicobacter pylori infection (Fig. 2b).

Fig. 2
figure 2

Reporter metabolites of IPF. (a) Significant reporter metabolites, (b) Overrepresentation analysis results of reporter metabolites

Regulatory signatures of IPF

The reporter regulatory elements (i.e., TFs and miRNAs) were identified via the integrative analysis of transcriptome and human transcriptional regulatory network. Two TFs and 85 miRNAs were identified as reporter regulatory elements that have key roles in transcriptional and post-transcriptional control of core genes of IPF.

Signal transducer and activator of transcription 5B (STAT5B), and SRY-Box TF 10 (SOX10) were identified as reporter TFs. STAT5B is involved in various biological processes, including TCR signaling, apoptosis, adult mammary gland development, and sexual dimorphism of liver gene expression. Dysregulation of the signaling pathways mediated by this protein is believed to be the cause of the acute promyelocytic leukemia. STAT5B has been implicated in various medical conditions, including allergic diseases, immunodeficiency, autoimmunity, cancer, hematological diseases, growth disorders, and lung diseases (Kanai et al. 2012).

SOX10 plays a key role in the development of neural crest, the specification and differentiation of melanocytes, enteric progenitors, oligodendrocytes, and glial cells in the peripheral nervous system (Hogan et al. 2019). Additionally, SOX10 has been identified as a regulator of vascular inflammation and a potential checkpoint in inflammation-related vascular disease (Xu et al. 2023). Furthermore, SOX10 is thought to be a tumor suppressor gene in various carcinomas due to its ability to inhibit cell proliferation, metastasis, and EMT (Cui et al. 2019).

Fig. 3
figure 3

Functional enrichment results of reporter miRNAs. Significantly enriched GO BP terms and KEGG pathways of core DEGs regulated by reporter miRNAs

To elucidate the roles of reporter miRNAs in IPF, we constructed a miRNA-gene regulatory network using the interactions between reporter miRNAs and core DEGs. The consructed network consisted of 533 interactions between 139 core DEGs and 85 reporter miRNAs. We then performed functional overrepresentation analysis of reporter miRNAs via core DEGs regulated by these miRNAs to elucidate the molecular pathways and biological processes underlying IPF. Pathway enrichment analysis showed significant changes in axon guidance, signaling pathways that regulate pluripotency of stem cells, and proteoglycans in cancer. Additionally, processes associated with cardiovascular and circulatory systems, such as vasculature development, blood vessel morphogenesis, cardiovascular system development, and circulatory system development were prominent. Furthermore, significantly enriched GO biological processes included developmental processes, anatomical structure morphogenesis, and multicellular organismal process (Fig. 3).

From a holistic perspective, regulatory associations in the presence of IPF were investigated by analyzing the interactions between reporter regulatory elements (i.e., miRNAs and TFs) and hub proteins. A total of 26 interactions were identified between 10 reporter miRNAs and 7 hub proteins, along with one interaction between a reporter TF and a hub protein (Fig. 4). STAT5B, let-7e-5p, miR-145-5p, miR-195-5p, miR-3613-3p, miR-383-3p, miR-3914, miR-4284, miR-4649-3p, miR-6507-3p, and miR-7977 came into prominence as regulatory elements based on these associations. Moreover, seven genes (EPAS1,PAK4, SPTBN1, ESR2, HSPA4L, NGFR, TMPRSS4) encoding hub proteins that were regulated by these reporter regulatory elements were considered as key genes.

Fig. 4
figure 4

Regulatory associations in IPF. Circle, triangle, and hexagon represent hub proteins, miRNAs, and TFs, respectively

Potential drugs identified by signature-based drug repurposing

To identify potential drugs that target key genes, we performed drug repurposing analysis using CMap data. Among the key genes, EPAS1, PAK4, and SPTBN1 were downregulated, while ESR2, HSPA4L, NGFR, and TMPRSS4 were upregulated in response to the presence of IPF. Therefore, drugs that could reverse their expression patterns were investigated. After removing duplicate and unknown drugs, we identified a total of 21 potential drugs that met the statistical significance criterion of q-value < 0.05 (Table 2). No drugs could be found to downregulate ESR2.

Table 2 Potential drug candidates for IPF

Discussion

IPF is a progressive interstitial lung disease that is characterized by the formation of scar tissue, leading to the impairment of alveolar structure, and decreased lung capacity (Jia et al. 2023). Despite substantial progress in understanding its impact, molecular mechanisms of its pathogenesis have not been fully elucidated (Tomoto et al. 2024). Identifying the molecular substrates of IPF and discovering potential systemic biomarkers could provide valuable information for developing novel strategies for its diagnosis and treatment. In this study, we conducted a meta-analysis of four publicly available transcriptome datasets associated with IPF, identified the core DEGs of IPF, and combine this information with comprehensive biological networks, including PPI, metabolic, transcriptional, and post-transcriptional networks to identify reporter molecules that could serve as potential prognostic biomarkers and putative therapeutic targets. To the best of our knowledge, reporter molecules of IPF were determined for the first time in this study.

A total of 473 core DEGs, consisting of 279 upregulated and 194 downregulated genes, were identified. Overrepresentation analysis of core DEGs revealed the upregulation of three major mechanisms that accompany IPF: changes in cell adhesion properties, morphogenesis, and neurodevelopment. Mohammadi-Nejad et al. (2022) reported an association between the genetic risk of IPF and cortical changes in the brain, which could be due to neurodevelopmental differences (Mohammadi-Nejad et al. 2022). As anticipated, several processes and pathways related to the circulatory system, as well as cancer, were found to be downregulated in the presence of IPF. It is worth noting that IPF shares a number of common pathogenic events with cancer (Vancheri 2013). In addition, IPF has been found to increase the risk of developing lung cancer (Ballester et al. 2019). Therefore, genes related to these pathways could potentially serve as markers to reveal the mechanisms underlying the relationship between IPF and lung cancer.

Patients with IPF often have comorbidities related to cardiovascular disease (CVD). The cardiovascular effects of IPF may include pulmonary hypertension, heart failure, coronary artery disease, cardiac arrhythmias, and cardiac manifestations of drugs used to treat IPF (Agrawal et al. 2016; van Cleemput et al. 2019). Furthermore, IPF shares several symptoms with CVD, particularly with heart failure (van Cleemput et al. 2019; Mosher and Mentz 2020). We observed a significant association of downregulated DEGs with pathways associated with the cardiovascular system. Additionally, we found evidence of CVD affecting the occurrence and progression of IPF at the metabolite and transcriptional regulation levels.

At the regulatory level, we observed reporter miRNAs that regulate core DEGs associated with cardiovascular system and vasculature developments. At the metabolic level, the integration of transcriptomic response and the human metabolic model revealed glycolipid and heparan sulphate derivatives as reporter metabolites. Additionally, the enrichment analysis of reporter metabolites revealed cardiac muscle contraction as a significant pathway. Heparan sulphate is an essential regulator of cardiac inflammation and fibrosis, and its upregulation has been observed in dilated cardiomyopathy (Song et al. 2022). Heparan sulphate proteoglycans also contribute to lipid homeostasis and atherogenesis, and their induction has been linked to reduced CVD (Gordts and Esko 2018). Lipid and lipoprotein metabolism play a crucial role in the development of CVD. Although they have cardioprotective functions, disorders in lipid and lipoprotein metabolism or transport can lead to plaque formation and increased inflammation (Bhargava et al. 2022). Additionally, the dysregulation of glycolipid metabolism is considered as a significant contributing factor to vascular complications in diabetes (Chen et al. 2022b).

Although IPF represents the most prevalent form of interstitial lung disease (ILD), other ILDs, such as hypersensitivity pneumonitis and connective tissue disease-associated ILD, can also progress to pulmonary fibrosis (Yoon et al. 2023). Heparan sulphate, which plays a role in lung physiology and pathophysiology, acting as both a structural component of the lung parenchyma and a regulator of cellular signaling (Haeger et al. 2016), may also contribute to other ILDs. Given that the classification of IPF with fibrosing ILD has been proposed for clinical and treatment purposes (Cottin et al. 2019), the molecular signatures of IPF could also be assessed for the other forms of ILD that exhibit a progressive fibrosing phenotype.

The integration of information at different levels explored EPAS1, ESR2, HSPA4L, NGFR, PAK4, SPTBN1, TMPRSS4, STAT5B, let-7e-5p, miR-145-5p, miR-195-5p, miR-3613-3p, miR-383-3p, miR-3914, miR-4284, miR-4649-3p, miR-6507-3p, and miR-7977 as prominent. We identified genes that have already been shown to be associated with various pulmonary diseases through in vitro and/or in vivo studies in addition to novel candidates. TMPRSS4 plays a role in tissue remodeling and in the degradation of basement membrane and ECM. It was found to be upregulated in lung of IPF patients. Additionally, bleomycin-induced lung fibrosis was attenuated in TMPRSS4-deficient mice model, providing supportive evidence for a possible profibrotic role of TMPRSS4 (Valero-Jiménez et al. 2018). EPAS1 is exclusively expressed in endothelial cells (Romero et al. 2022), and its dysregulation has been linked to the development of fibrosis (Torres-Soria et al. 2022). In addition, EPAS1 activates genes that enable cell survival under hypoxic conditions in the presence of chronic obstructive pulmonary disease (COPD) (Li et al. 2016), which was supported by a reduced EPAS1 level observed in the lung of COPD patients (Yoo et al. 2015). NGFR plays a crucial role in nerve cell growth, differentiation, and hypoxic response. Circulating NGFR-positive cells promoted vascular remodeling and were found to be highly correlated with the development and severity of pulmonary arterial hypertension (PAH) (Goten et al. 2021).

ESR2 is a negative regulator of airway remodeling and airway hyperresponsiveness and it was found to be upregulated in asthmatic airway smooth muscle (Ambhore et al. 2019). In addition, its overexpression was linked to improved survival rates in non-small cell lung cancer (NSCLC) patients (Luo et al. 2015). HSPA4L is especially expressed in epithelial cells of the conducting airways and cooperates with HSPA4 during embryonic lung development. In mice, the deletion of HSPA4L and HSPA4 resulted in increased cell proliferation and was associated with respiratory failure at birth (Mohamed et al. 2014). Additionally, the upregulation of HSPA4L was reported in lung cancer (Dai et al. 2020). PAK4 plays a role in cytoskeleton reorganization, and its activation has been proposed to protect cells from apoptosis (Sheu et al. 2019). The cytoskeletal protein SPTBN1 has functions in cell adhesion, cell cycle, angiogenesis, apoptosis and EMT. SPTBN1 repression has been observed during the progression of various cancers, such as hepatocellular carcinoma, gastrointestinal cancer, pancreatic cancer, and lung cancer (Velázquez-Enríquez et al. 2022). STAT5B activation is associated with the expression of genes related to cell development, proliferation and survival (Smith et al. 2023). Mutations in STAT5B have been observed in pulmonary alveolar proteinosis (Nadeau et al. 2011). Furthermore, it has been reported that STAT5B is associated with the prognosis of NSCLC and lung adenocarcinoma (LUAD), and has been proposed as a potential prognostic marker for NSCLC (Yang et al. 2019).

The dysregulation of miRNAs has been linked to the biogenesis and progression of IPF due to their regulatory roles in apoptosis, proliferation, and differentiation (Cadena-Suárez et al. 2022). We identified miRNAs that have previously been associated with IPF, as well as new candidates. Let-7e-5p plays a crucial role in regulating endothelial function, specifically in relation to cell adhesion and angiogenesis (Mompeón et al. 2022). Let-7e-5p was proposed as a potential biomarker for PAH (Cai et al. 2021) and heart failure (Shen et al. 2022).

MiR-145-5p, which is a tumor-suppressive miRNA in various cancers (Kadkhoda and Ghafouri-Fard 2022), regulates EMT by increasing E-cadherin expression and suppressing vimentin expression, contributing to the inhibition of metastasis in NSCLC (Chang et al. 2017). Moreover, it has been shown to influence the pathogenesis of several non-malignant conditions, such as idiopathic and hereditary PAH (Deng et al. 2015), COPD, and asthma (Tiwari et al. 2021). Aberrant expression of miR-195-5p has been linked to various cancers, and its lower expression was reported to contribute to the proliferation, apoptosis, migration, and invasion in lung cancer (Long and Wang 2020). MiR-195-5p had a pivotal role in the immune response in LUAD patients (Duan et al. 2022). Additionally, its overexpression was found to alleviate lung inflammation and reduce lung damage in COPD rats (Li et al. 2020). MiR-3613-3p regulates tumorigenesis and progression in various cancers, and it was suggested as a specific plasma biomarker of LUAD (Liu et al. 2017). The bioinformatics analyses linked its upregulation to severe asthma (Chen et al. 2019). Furthermore, miR-3613-3p plays an important role in the hypertrophic scarring that is characterized by excessive ECM proteins and dysregulated fibroblast function (Li et al. 2021).

Downregulation of miR-383-3p was observed in mice with myocardial ischemia-reperfusion induced injury, and its overexpression was reported to attenuate hypoxia/reoxygenation induced injury of rat heart tissue cells and protect cardiomyocytes against hypoxia/reoxygenation injury. These findings suggested miR-383-3p as a potential target for cardiovascular diseases (Zeng et al. 2022). MiR-4284, which exhibited an increased expression in NSCLC tissues and cell lines, showed a correlation with accelerated cell proliferation, migration, and invasion; poor differentiation, positive lymph node metastasis, and advanced tumor, node, and metastasis (TNM) stage (Yang et al. 2021). Additionally, increased plasma and expression levels of miR-4284 was observed in arteriosclerosis obliterans patients, in acute respiratory distress syndrome patients with COVID-19 patients, respectively (Najafipour et al. 2022).

MiR-4649-3p was predicted to regulate cyclooxygenase-2 (COX2) that is a key element in heart failure (Yan et al. 2020). In addition to its suppressive role in nasopharyngeal carcinoma (Liu et al. 2021), alterations in serum miR-4649-3p expression in malignant melanoma was reported (Ayvaz et al. 2022). Previous reports revealed the involvement of miR-6507-3p in paclitaxel resistance in NSCLC (Cai et al. 2019; Alnuqaydan 2020). Exosomal miR-7977 was found to be significantly associated with node and TNM stages in LUAD patients. In vitro experiments demonstrated that the inhibition of miR-7977 increased cell proliferation and invasion while suppressing apoptosis (Chen et al. 2020). MiR-7977 has been identified as a key post-transcriptional regulator of SIRT3, which is linked to several severe diseases, such as hepatocellular carcinoma, heart failure, diabetes, and IPF (Chen et al. 2021).

The molecular signatures identified in this study were found to be associated with various cancers, pulmonary and cardiovascular diseases. These associations suggest that the resulting regulatory interactions may represent the alterations in the cardiovascular and circulatory systems, as well as the proliferative nature of fibrosis development. Therefore, the identified molecules could be assessed as potential markers for diagnostic and therapeutic purposes in IPF.

Among the drug candidates identified for IPF by key gene-based drug repurposing, alvespimycin, geldanamycin, tanespimycin, and monorden are inhibitors of heat shock protein 90 (HSP90). Tanespimycin has been shown to reduce the migratory capacity, proliferation, and ECM production in fibroblasts. HSP90 inhibitors have potential as therapeutics for IPF by targeting profibrotic signaling pathways involved in fibroblast activation and pulmonary fibrosis (Sontake et al. 2019). Lanatoside C, digoxigenin, and helveticoside are cardiac glycosides that have demonstrated tumor-suppressing effects in lung cancer (Nie et al. 2019; An et al. 2020). Lanatoside C has been shown to provide protection against lung fibrosis and reduce fibroblast proliferation and migration in a pulmonary fibrosis mouse model (Nie et al. 2019). In addition, helveticoside treatment resulted in changes in the expression level of genes involved in the regulation of cell proliferation and apoptosis signaling in lung cancer (Kim et al. 2015).

Trichostatin A was a common drug found for DPAK4, HSPA4L, and NGFR. It functions as an inhibitor of histone deacetylase, which has been reported to contribute to the bronchiolization process in IPF. Another histone deacetylase inhibitor, LBH589, has been shown to significantly reduce the expression levels of genes associated with ECM synthesis, proliferation, and cell survival in primary IPF fibroblasts (Korfei et al. 2015). Cephaeline functions as a translation inhibitor (Lee et al. 2021). It has been identified as a potential drug candidate for COPD treatment through bioinformatics analysis (Zhang et al. 2021). Sirolimus, an mTOR inhibitor, has been reported to reduce the concentration of circulating fibrocytes, which contributes to fibrogenesis and lung injury. Sirolimus treatment has also been shown to inhibit lung fibrosis (Gomez-Manjarres et al. 2023) and reduce lung collagen deposition in a mouse model of lung fibrosis (Tulek et al. 2011). Similarly, in vitro analysis demonstrated that gossypol, an inhibitor of lactate dehydrogenase, hindered fibrotic lesions, collagen accumulation, and ECM production in radiation-induced lung fibrosis (Judge et al. 2017).

Tretinoin has been found to possess anti-fibrotic and anti-inflammatory properties, while also promoting cell differentiation (Segel et al. 2001). Similarly, methotrexate is used to treat inflammatory diseases and various types of tumors, but it has been associated with pulmonary side effects, including non-productive cough, shortness of breath, fever, interstitial pneumonia, asthma, and lung fibrosis (Kim et al. 2009).

Prochlorperazine and fluphenazine, two antipsychotic drugs, have demonstrated anti-cancer activity in several types of cancer (Otręba and Kośmider 2021; Chen et al. 2022a). These drugs have been shown to reduce cancer cell viability, affect the cell cycle, promote apoptosis, and inhibit migration and invasion (Otręba and Kośmider 2021). Wortmannin, a phosphoinositide 3-kinase inhibitor, inhibited proliferation in NSCLC (Boehle et al. 2002). Additionally, it decreased lung injury in severe acute pancreatitis (Wei et al. 2015). Monastrol, a potent inhibitor of the kinase-like protein KIF11, hindered migration and decreased proliferation in lung cancer (Wang et al. 2019a). LY294002, a phosphatidylinositol 3-kinase inhibitor, reduced the expression levels of two tight junction proteins, CLDN1 and CLDN11, which were highly expressed in human lung squamous cell carcinoma (Goncharova et al. 2002). Thioridazine also exhibits anti-cancer activity, and it has been shown to ameliorate lung cancer (Yue et al. 2016). 15-Delta prostaglandin J2 functions as a ligand for peroxisome proliferator-activated receptor gamma. Xiong et al. (2018) demonstrated that 15-Delta prostaglandin J2 induces apoptosis and inhibits migration in cancer cells (Xiong et al. 2018). Additionally, phenoxybenzamine, an adrenergic receptor antagonist, has been reported to have an antiproliferative effect (Inchiosa 2018).

We identified 21 repurposed drugs that contain agents previously associated with IPF through in vitro studies, as well as novel candidates. These drugs include HSP90 inhibitors, cardiac glycosides, antipsychotic agents, and drugs with anti-fibrotic, anti-inflammatory, and anti-cancer activities. The drug candidates identified in this study should undergo further investigation through experimental and clinical validation studies for future clinical development as therapeutics for IPF.

Conclusions

IPF is a progressive interstitial lung disease resulting in decreased lung capacity and marked morbidity and mortality. Identifying the molecular substrates of IPF and discovering potential systemic biomarkers could provide valuable information for developing novel strategies for its diagnosis and treatment. Given the significant importance of system biomarkers in IPF management, this study aims to identify molecular signatures and potential therapeutics in IPF using transcriptome data. To achieve this, we conducted a meta-analysis of transcriptome datasets associated with IPF and revealed the alterations in the transcriptional re-organization in the presence of IPF. Additionally, integrative analysis of gene expression data and comprehensive human biological networks deciphered disease-specific proteomic, metabolic, and regulatory signatures that could serve as potential biomarkers. We identified EPAS1, ESR2, HSPA4L, NGFR, PAK4, SPTBN1, and TMPRSS4 as gene signatures of IPF. The transcriptional reprogramming of these genes was found to be significantly regulated by STAT5B, let-7e-5p, miR-145-5p, miR-195-5p, miR-3613-3p, miR-383-3p, miR-3914, miR-4284, miR-4649-3p, miR-6507-3p, and miR-7977, which warrant future development as potential biomarkers. Signature-based drug repurposing analysis identified several agents with anti-fibrotic, anti-inflammatory, and anti-cancer activities. These agents are noteworthy as potential drug candidates for IPF. Despite the significant results obtained, future experimental and clinical studies are necessary to validate their clinical value.