Introduction

The use of highly active antiretroviral therapy (ART) has significantly reduced morbidity and mortality among those living with HIV. However, as life expectancy has increased, there has been a rise in co-morbidities, including obstructive lung disease (OLD) [1,2,3,4,5,6,7,8]. Persons living with HIV (PLWH) are at increased risk of accelerated lung function decline and developing OLD, even after adjusting for smoking [9, 10]. The Global Initiative for Obstructive Lung Disease (GOLD) 2024 report now recognizes HIV as a risk factor for COPD [11].

In non-HIV associated OLD, chronic inflammation and the activation of proteases play a crucial role in its pathogenesis. The severity of airflow obstruction often corresponds to the degree of inflammation in the lung and airways. In PLWH, various factors have been implicated in the development of OLD, such as epigentic aging, chronic systemic inflammation, innate immune activation, and abnormal immune function related to HIV [12, 13]. Furthermore, in PLWH who smoke and have emphysema, there is an upregulation of matrix metalloprotineases (MMP−1, −7, −9 and −12) compared to HIV-negative individuals, underscoring the potentially significant role of proteases in OLD pathogenesis in PLWH [14].

Numerous proteases contribute to lung disease, falling into three general categories that include serine proteases, cysteine proteases, and matrix metalloproteinases (MMP). The cellular sources of these proteases include inflammatory cells such as neutrophils and macrophages, as well as bronchial epithelial cells. While the impact of individual proteases on the lung extracellular matrix (ECM) has been well-documented in numerous studies, the extent to which other proteins are subject to proteolytic degradation and the physiological effects of this destruction remain relatively unknown. Previous studies have predominantly focused on individual proteases or their families in OLD, mainly limited to ECM targets. To better understand the role of proteases in HIV-associated OLD, we used complementary proteomic techniques combined with peptidomics to identify active proteases associated with OLD, comparing their activity in disease versus health and elucidating their specific targets.

Methods

Study population

PLWH who had undergone bronchoscopy were selected from the Pittsburgh and Vancouver Lung HIV Cohorts [15, 16]. This consisted of individuals (n = 25) with OLD as defined as the ratio of forced expiratory volume in 1-s/forced vital capacity (FEV1/FVC) < lower limit of normal. Those without OLD consisted of 26 individuals with HIV and normal lung function (defined as FEV1/FVC > lower limit of normal and FEV1 > 80% of predicted normal) matched on age (± 5 years), antiretroviral treatment use, and smoking status (current vs. non-smoker). Participants in the parent cohort studies provided informed consent for BALF collection and storage with approval by their respective Institutional Review Boards at Pittsburgh and Vancouver. At study enrollment, BALF was collected on fasting participants as previously described [15, 16]. Pulmonary function tests were performed within 3 months of collecting the samples. All data and samples were sent to the University of Minnesota were de-identified. The current study was reviewed in adherence to the Declaration of Helsinki and accepted by the University of Minnesota Institutional Review Board (Number 00003486).

Protein processing and protease identification

BALF samples underwent centrifugation at the local collection sites to remove cells, and cell-free BALF samples were stored at -80 degrees Celsius prior to processing. The BALF was processed as previously described [17]. Briefly, the cell-free BALF samples were centrifuged twice to separate out the insoluble component of BALF from the soluble fraction. Endogenously produced peptides were collected from the soluble component of the supernatant via a 3 kDa MW cutoff filter. The soluble component of the supernatant was sent for SomaScan, analysis as previously reported, and MS analysis [18]. BALF samples from 21/25 with OLD and 24/26 with normal lung function had adequate protein amounts for tandem mass tagging (TMT, Thermo Fisher Scientific) and MS analysis. The insoluble BALF component was also processed for TMT labelling and liquid chromatography (LC) tandem mass spectrometry (MS/MS) analysis. Proteins were matched to UniProt IDs using Fragpipe. The combined proteins from SomaScan and MS were filtered to identify proteases and peptidases with known substrates based on the MEROPS database [19]. We utilized the STRING database to visualize protein networks [20].

Peptide analysis and protease assignment

The endogenous peptides isolated from the BALF underwent label-free identification and quantification by LC–MS that included delayed normalization and maximal peptide ratio extraction (MaxLFQ). A FASTA database was downloaded containing protein sequences of the entire human proteome (UniProt proteome sequence 2021-12-10, 101,014 protein sequences). The peptide tandem mass spectra (MS/MS) files were matched to the FASTA files using the Fragpipe software and were assigned to their native protein substrates [21,22,23,24,25,26,27,28]. Peptides matched with the Fragpipe software were quantified using the MaxLFQ method, and assigned cleavage sites. The cleavage sites were categorized by type of cleave based on cleave location and whether other similar peptides were detected, indicative of multiple cleavage events. The cleaves assigned were a result of exopeptidase or endopeptidase activity and mapped back to the original FASTA protein sequence with 4 residues before and after each cut, depending on location of the cut, based on starting residue and peptide length (Fig. 1S). The MEROPS catalog of preferred substrate patterns of cleavage was compared to our assigned cleavages from detected peptides. For each protease, a z-score was calculated for each cleave using z = (x-μ)/σ where x was the number of substrates in the MEROPS database with a given amino acid at a specific position, μ was the average number of substrates with any data for that cleave position, and σ for the standard deviation of the substrates for that cleave position. We treated the z-scores as a quantitative indicator for whether the peptide matches the protease’s target cleavage sequence. A higher z-score implied a higher likelihood that the protease cleaved a protein and yielded the corresponding peptide. We assigned cleaved proteins to proteases if the associated z-score was deemed an “outlier.” To define an outlier, we computed the z-score quartiles and interquartile range (IQR) within each protease. We defined an outlier as a peptide’s z-score exceeding the third quartile plus 1.5 times the interquartile range for that protease.

Statistical analysis

All data underwent cleaning prior to performing statistical analysis (see Supplement). We sought to describe associations between the detected proteases along with the degraded proteins mapped from the endogenous peptides with measures of lung disease, defined as percent predicted forced expiratory volume in 1 s (FEV1pp).

Proteases associated with lung function

We examined the overall association between protease abundance and FEV1pp using the combined SomaScan and two untargeted MS datasets from the soluble and insoluble components of BALF. For each identified protease, we calculated the correlation between the measured abundance and FEV1pp to the SomaScan and the two untargeted datasets. For the SomaScan dataset, we averaged the correlations across aptamers and proteins detected across datasets if multiple aptamers were present. We used the p-values from a Pearson correlation test to assess the strength of association between protease abundance and FEV1pp. We obtained an overall p-value for each protease by aggregating the individual p-values using Fisher’s combination method [29]. We controlled the false discovery rate (FDR) using the Benjamini–Hochberg correction [30]. We report on associations that were significant at the FDR < 0.05 level.

Association between protein degradation and disease

For each protein assigned to an endogenous peptide, we dichotomized patients into two groups: one in which the degraded protein was detected and one in which it was not. A protein was “detected” if its corresponding MaxLFQ intensity was non-zero. Due to heavy missingness, we only considered proteins detected in at least five samples. We compared the average forced expiratory volume in 1-s (FEV1pp) between these two groups for each protein using a two-sample t-test. We controlled the Benjamini–Hochberg FDR [31]. For pathway analysis we used a less stringent FDR of below the 0.1 level using IMPaLa software to examine pathways reflected among the degraded proteins. [32]

Results

Study Participant Demographics

Table 1 summarizes the demographics of participants whose samples were used in the endogenous peptides analysis. The soluble and insoluble components of BALF TMT datasets differed by two samples from individuals with OLD and the SomaScan dataset differed by one sample from an individual with OLD, but overall showed similar demographic distributions across those with and without OLD. Most of the participants were male (72.5%) with a mean age of 56.8 and 54.9% identified as black, non-Hispanic, 43.1% as white or Hispanic/Latino, and 2.0% identified as Asian or Pacific Islander. Most participants were receiving antiretroviral treatment (ART) (92.2%) at the time of study. Smoking status was similar between those with and without OLD, with 52.9% actively smoking at the time of enrollment, however, average pack years were greater in those with OLD (31.1) vs those without OLD (15.2). Lung function ranged from 21 to 128% of predicted normal. Among those with OLD, the average FEV1pp was 67.5% and for those without OLD the average was 104%.

Table 1 Demographics of study participant cohort.

Proteases associated with lung function

To enhance our proteomic coverage to identify proteases in BALF, we leveraged the previously-reported SomaScan proteomic data from the BALF soluble component, along with proteins measured by TMT with MS of both the soluble and insoluble BALF components. [18] A total of 101 proteases were identified, many of which overlapped between the three different methods of measuring proteins (Fig. 1, Table 1S). Of these proteases, 40 were unique to Somascan, 9 unique to the insoluble component of BALF and 3 in the soluble BALF component measured by TMT. Most of these proteases make up a network that is functionally associated or linked (Fig. 2S). We identified 22 proteases that were associated with FEV1pp, four positively correlated and 18 negatively correlated (Table 2). The four proteases that correlated with higher lung function included carboxypeptidase M, prothrombin, urokinase-type plasminogen activator and gastricsin. Many of the 22 proteases associated with lower lung function are proteases previously described in OLD, including cathepsins, metalloproteinases (MMP), caspases and neutrophil elastase. All but six of these proteases have functional associations with each other (Fig. 2a).

Fig. 1
figure 1

Venn diagram of proteases that correlate with FEV1pp from the various BALF proteomic analyses

Table 2 Proteases significantly correlated with FEV1pp across the Somascan and two tandem mass tagging datasets.
Fig. 2
figure 2

a) STRING diagram demonstrating protein–protein associations of the 26 proteases identified in BALF by LC–MS/MS and SomaScan that associate with FEV1pp. b) STRING diagram demonstrating protein–protein associations of the 31 substrate proteins mapped to endogenous peptides that associate with FEV1pp

Protein substrates subject to proteolytic cleavage

We identified 31 proteins, mapped from endogenous peptides, that were the substrates for proteolytic cleavage and associated with FEV1pp (Table 3). Table 3 depicts the top 15 proteins and the mean FEV1pp among participants for whom their samples contained these substrate proteins. Among the top five proteins were alpha-enolase, an enzyme involved in glycolysis, histones, and tubulin. Among these 31 proteins, 28 proteins showed inverse relationships with FEV1pp, i.e. increased degradation was associated with lower average FEV1pp, indicating these proteins were more likely to be subject to proteolysis in the presence of OLD. Figure 3a depicts the protein–protein interaction of these 31 proteins and all but three have functional associations. The top ten pathways reflected among these 31 proteins are shown in Table 4. There were 39 pathways with FDR below 0.05, including pathways involving small ubiquitin-like modifier mediated modification (SUMOylation), a post-translational process to control protein quality [33] and histone methylation.

Table 3 Top 15 proteins identified from endogenous peptides whose degradation was associated with FEV1pp
Fig. 3
figure 3

Proteases targeting proteins whose degradation was associated with FEV1pp and the total number of substrate proteins mapped to the endogenous peptides

Table 4 Pathways reflected in proteins whose degradation was associated with FEV1pp

Proteases participating in substrate cleavage

To identify the proteases linked to the generation of the endogenous peptides, we analyzed 101 proteases identified across the SomaScan and TMT datasets with the top 31 identified substrate proteins that associated with FEV1pp. After linking candidate endogenous peptides to their corresponding proteases responsible for their cleavage by examining the z-scores, we studied how many proteins each protease cleaved. The number of proteins assigned to each protease ranged from one to 23 (Fig. 3; Table 2S) with the top 10 proteases included neutrophil elastase, granzyme, and cathepsin D (Table 5).

Table 5 Top 10 most active proteases mapped to cleaved proteins. P-value and FDR describes significance of correlation between protease and FEV1pp.

Discussion

Proteases are a diverse group of proteins comprising over 500 members which makes up almost 2% of the human genome. There are five major classes of proteases in mammals with serine, cysteine and metallo- proteases being the most prevalent in human lung disease. Traditionally these proteases have been viewed as substrate specific protein degrading enzymes and originally not to be participants in signaling or regulatory pathways. In the last decade, advances in degradomics and the study of protease substrate have revealed that protease targets and their substrates are complex [34, 35]. It is now evident that proteases are key components of regulatory mechanisms via cleavage of specific substrates with concomitant activation, silencing or modulation of regulatory functions through a mechanism called proteolytic processing [34]. While most studies related to the role of proteases in OLD, both HIV and non-HIV associated, have been limited to individual proteases or protease families; it is highly unlikely that single proteases or even single protease families are solely responsible for OLD pathogenesis. More likely there are complex interactions among proteases and their substrates that participate in multiplexed regulatory systems. In this study, we characterized the complex protease proteome in HIV-associated OLD via a combination of proteome profiling and identified protease activity and their substrates through peptidomic analysis.

Utilizing a comprehensive proteomic approach that included a combination of targeted aptamer-based proteomics and untargeted mass spectrometry with TMT labeling we identified 101 proteases within the BALF in PLWH, 22 of which were significantly associated with lung function as measured by FEV1pp. Proteases are key regulatory proteins in both homeostasis and disease and several of the proteases we identified are associated with normal lung function. One protease, gastricsin, is a gastric protease and likely represents micro-aspiration, which is common in individuals with OLD [36]. Interestingly, gastricsin was observed in individuals with preserved FEV1. Aspiration is likely to be equally, if not more, common in those with severe lung function, although it is probably less prevalent compared to the proteases that are upregulated in disease. We found both prothrombin and urokinase-type plasminogen activator to be associated with normal lung function and these proteases have roles in fibrin homeostasis in the healthy lung [37, 38]. Many of the proteases associated with lower lung function have been described in OLD, such as the metalloproteinases, cathepsins, caspases and neutrophil elastase. Caspases are proteases involved in apoptosis and associated with the generation of emphysema [39]. Unfortunately, we were not able to correlate specific proteases with emphysema in this cohort as CT imaging was limited.

What is most striking is that no single protease or protease family predominates. Rather, there is upregulation of many proteases across divergent protease families. Proteases can interact either directly or indirectly with other proteases and become interconnected in what has been termed a ‘protease web’ [40]. We found that all but six of the proteases that associated with lung function were part of such a functionally-associated network. This interconnection and redundancies of proteases in OLD create challenges in identifying individual therapeutic targets for anti-protease therapy. Most of the proteases belong to a common network and it remains unknown whether it requires targeting individual or multiple proteases to effectively block proteolytic activity. Proteases are also involved in normal physiological functions; therefore, broad proteolytic blockage could have untoward effects on homeostasis.

Proteases initiate and modulate many important cellular functions by highly specific substrate cleavage. In the inflammatory state, upregulated proteases have a wide range of substrate targets that are not limited to extracellular matrix proteins. Not only do proteases cleave multiple substrates, but substrates can be cleaved by multiple proteases. Utilizing peptidomic analysis by mass spectrometry, we were able to map endogenously produced peptides to their cognate proteins. All but three of these proteins had functional associations, suggesting the targeting or susceptibility of specific biological pathways. In addition, most of these proteins were associated with lower lung function and mapped to pathways vital for cellular function, including SUMOylation. SUMOylation is critical in broad biological functions including cell cycle and protein stability. Cigarette smoke upregulates SUMOylation in human bronchial epithelial cells, providing a potential link to OLD [41].

Although our complementary proteomics analysis identified proteases that had upregulated abundance with disease, increased abundance does not guarantee increased protease activity. To address this, we utilized endogenous peptides to characterize protease activity. To identify the proteases responsible for protein degradation we assigned cleavage sites to endogenously produced peptides and matched these sites to their conjugate proteases. Neutrophil elastase, granzyme M and cathepsins D and E were among the most active proteases linked to substrate degradation. These proteases are commonly associated with OLD [42]. Although metalloproteinase and caspase proteases were upregulated in disease, they were less active.

Limitations of this study include the relatively small sample size, lack of non-HIV controls and lack of more detailed lung structure and function metrics, (e.g. CT quantitative imaging). In addition, there were a large percentage of smokers, current or past, in both those with normal lung function and disease. Cigarette smoking is associated with elevated protease activity in both humans and animal models and additional studies are needed to differentiate the roles of tobacco smoke and HIV infection in protease activation [43]. Overall, this study brings to light the large repertoire of proteases that are upregulated and actively involved in proteolysis in HIV-associated OLD. In addition, we identified specific proteins that were subject to proteolysis that are linked to specific pathways vital to cellular and organ function, suggesting a possible role in pathogenesis Future studies are needed to validate these findings, especially comparing HIV to non-HIV controls. It is important for future mechanistic studies to be aware of the multitude of proteases and their substrates that are active in HIV-associated OLD.