Introduction

Coronavirus disease 2019 (COVID-19), characterized as a severe acute respiratory syndrome caused by SARS-CoV-2, has rapidly spread, posing a significant global public health challenge [1]. In the first 3 months of 2020 alone, over 2 million individuals were infected globally, resulting in 150,000 fatalities [2]. While the majority of research has concentrated on the epidemiology and clinical diagnostics of COVID-19, there are reports of SARS-CoV-2 PCR relapse in patients following two consecutive negative PCR tests [3]. Concurrently, concerns regarding the sequelae following acute COVID-19 recovery have intensified. A prospective follow-up study revealed that nearly half of the patients recovering from SARS-CoV-2 infection continued to exhibit persistent symptoms and decreased lung function 2 months post-infection [4]. Furthermore, a single-center longitudinal study indicated that clinical sequelae, encompassing cardiovascular, respiratory, and systemic symptoms, are prevalent among COVID-19 survivors [5]. Hence, research to determine the rehabilitation status of COVID-19 patients and to identify biomarkers is crucial.

This study utilized metabolomics to analyze patients recovering from COVID-19. Metabolomics is a powerful tool for qualitative and quantitative studies of small molecular metabolites in biological samples to understand cell physiological and biochemical reactions after exogenous stimulation. Various research fields, including life science, disease diagnosis, drug research and development, employ metabolomics [6].

Mass spectrometry detection enables the analysis of qualitative and quantitative information on thousands of molecules with high sensitivity, resolution, selectivity, specificity, and accuracy [7]. Recent studies applied metabolomics to identify COVID-19 biomarkers and search for therapeutic drug targets [8,9,10]. A cross-sectional study of serum metabolomics using UPLC-MS/MS showed differences in amino acids, carbohydrates, fatty acids, and glycerophospholipids among COVID-19 patients with different severity levels [11]. Bruzzone et al. observed abnormally elevated levels of ketone bodies (acetylacetate, 3-hydroxybutyrate, and acetone) and 2-hydroxybutyrate acid in response to SARS-CoV-2 infection [12]. Previous studies have shown that, despite post-recovery from COVID-19, a considerable proportionof survivors exhibit diffuse lung abnormalities and 13% of patients displaydecreased eGFR during follow-up after discharge [4, 13, 14]. Adittional studies have also suggested that survivors may be at risk of developing fibrosis [13, 15, 16]. Therefore, identifying differential metabolites between COVID-19 convalescent patients and healthy individuals is crucial for early intervention and accurate rehabilitation prognosis.

For the aim, this study applied non-targeted metabolomics technology, specifically ultra-performance liquid chromatography–tandem mass spectrometry (UPLC-MS/MS), to characterize the metabolic profiles of convalescent serum and urine in COVID-19 patients. Additionally, the study explored altered metabolic pathways to elucidate the underlying pathophysiology.

Methods

Study participants

A total of 32 participants were included in this prospective study. Specifically, serum samples from 16 COVID-19 recovery patients were collected within 1 month post-discharge from the Changchun Infectious Disease Hospital, along with samples from 16 healthy controls at the First Hospital of Jilin University’s physical examination center. After statistical analysis, there was no statistically significant difference between the two sample groups. Urine samples were simultaneously obtained for these subjects. Upon recruitment, all participants tested negative for SARS-CoV-2 nucleic acid via real-time polymerase chain reaction (RT-PCR). COVID-19 recovery patients (Case) were diagnosed and stratified at admission according to the New Coronavirus Pneumonia Prevention and Control Program (7th edition) issued by the National Health Commission of China. Participants with underlying lung diseases were excluded. Serum and urine samples, along with laboratory findings from COVID-19 recovery patients, were collected from the Changchun Infectious Disease Hospital. Patients met the mandatory discharge criteria: normal body temperature for over 3 days, significantly improved respiratory symptoms, and negative results from two consecutive SARS-CoV-2 RNA tests at least 24 h apart. Metabolomic profiling of all 64 samples (serum and urine) was conducted using ultra-performance liquid chromatography-tandem mass spectrometry (UPLC-MS/MS) to quantify identifiable metabolites. The study was reviewed and approved by the Ethics Committee of the First Hospital of Jilin University (AF-IRB-032-05). Written informed consent was wavied from the subject(s).

Non-targeted UPLC–MS/MS analysis

Non-targeted metabolomic analysis was conducted by Calibra Lab at DIAN Diagnostics (Hangzhou, Zhejiang, China) on their CalOmics metabolomics platform. Samples were extracted using methanol in a ratio of 1:4. The mixtures were shaken for 3 min and precipitated by centrifugation at 4000 × g, 10 min at 20 °C. Four aliquots of 100 μL supernatant were transferred to sample plates and dried under blowing nitrogen, then re-dissolved in reconstitution solutions for sample injection into UPLC-MS/MS systems. The instruments for the four UPLC-MS/MS methods are ACQUITY 2D UPLC (Waters, Milford, MA, USA) plus Q Exactive (QE) hybrid Quadrupole-Orbitrap mass spectrometer (Thermo Fisher Scientific, San Jose, USA). QE mass spectrometer was operated at a mass resolution of 35,000, the scan range was 70–1000 m/z. In the first UPLC-MS/MS method, QE was operated in positive ESI mode and the UPLC column was C18 reverse-phase (UPLC BEH C18, 2.1 × 100 mm, 1.7 um; Waters); the mobile solutions used in the gradient elution were water (A) and methanol (B) containing 0.05% PFPA and 0.1% FA. In the second UPLC-MS/MS method, QE was operated in negative ESI mode, and the UPLC column was C18 reverse-phase (UPLC BEH C18, 2.1 × 100 mm, 1.7 um; Waters), the mobile solutions used in the gradient elution were water (A) and methanol (B) containing 6.5 mM ammonium bicarbonate at pH 8. The third UPLC-MS/MS method had the QE operated in ESI positive mode and the UPLC column was C18 reverse-phase (UPLC BEH C18, 2.1 × 100 mm, 1.7 um; Waters), the mobile solutions were water (A) and methanol/acetonitrile/water (B) contain 0.05% PFPA and 0.01% FA. In the fourth method, QE was operated in negative ESI mode, the UPLC column was HILIC (UPLC BEH Amide, 2.1 × 150 mm, 1.7 um; Waters), and the mobile solutions were water (A) and acetonitrile (B) with 10 mM ammonium formate.

Compound identification and quantification

After pre-processing of raw data and data quality control inspection, ion peaks were extracted using proprietary in-house IT hardware and software. Metabolites were identified by searching an in-house library generated from running reference standards commercially purchased or obtained from other sources. Identification of metabolites in samples requires strict matching of three criteria between experimental data and library entry: narrow window retention index (RI), accurate mass with variation less than 10 ppm and MS/MS spectra with high forward and reverse searching scores. For the identified metabolite, we used a single asterisk symbol (*) to indicate that the identification of this metabolite has not been validated by library data entries generated from running purified compound standards through our experimental platforms. But the identification was obtained through literature reports and searching other databases, which is also a very reliable identification. A double asterisk symbol (**) indicates that the identification of this metabolite has not been validated by corresponding standard samples, and the identification were obtained through literature reports and searching other databases, which is a relatively reliable identification. Peak area for each metabolite was calculated using area-under-the-curve.

Data normalization

Before statistical analysis, raw peak areas were normalized to adjust for system fluctuation among different run days. The normalized peak areas were then log-transformed (log2) to reduce data distribution skewness and be in approximate normal distribution (Gaussian distribution). Missing values in peak area matrix were imputed by using the minimal detection value of a metabolite among all samples. All these analyses were conducted using MetaboAnalyst (version 5.0) [17].

Quality control of metabolome analysis

A blend of internal standards was added to each sample in order to assist with chromatographic peak alignment and monitor instrument stability. The variability of the instrument was assessed by calculating the median relative standard deviation (RSD) of all internal standards in each sample. The median RSD for this study is ≤ 5%, meeting our quality control criteria. Additionally, extracted water samples were used as blanks, and extracted commercial plasma samples were employed to monitor instrument variation.

Pathway analysis

The pathway enrichment analysis was conducted using MetPA [18] based on KEGG database and Pathview [19]. Only significantly different metabolites with associated KEGG ID were included in this analysis. Significance analysis of pathway enrichment was completed by hypergeometric test.

Statistical analysis

All statistical analyses were performed with R software (version 3.4.1). Significantly changed metabolites between case and control groups were found by parametric (student’s t-Test, ANOVA) or non-parametric (Wilcox’s rank test, Kruskal–Wallis, etc.) statistical methods. Multivariate analysis approach orthogonal partial least square discriminant analysis (OPLS-DA) and principal component analysis (PCA) were conducted using mixOmics (version 6.10.9) [20]. The random forest (RF) method was implemented in randomForest (version 4.6-14) [21].

Results visualization were provided for the performed statistical analyses, including volcano plot in differential metabolite test, scatter plot with confidence ellipse in PCA, scatter plot with confidence ellipse and variable importance dot plot in OPLS-DA, and variable mean decrease accuracy dot plot in the model construction.

Results

Non-targeted metabolomic analysis of serum and urine samples using UPLC-MS/MS

Non-targeted analyses of metabolites in serum and urine samples from two patient groups (COVID-19 survivors and healthy controls) were conducted using a UPLC-MS/MS system to identify metabolites that change in COVID-19 survivors.

Variables were selected based on the median RSD of internal standard signal fluctuations in QC samples, and metabolites with a median RSD < 5% underwent subsequent multivariate statistical analysis. In the UPLC-MS/MS dataset, all identified metabolites, both in positive and negative ion modes, were combined and classified based on their chemical taxonomic features, as illustrated in Fig. 1a (serum) and Fig. 1b (urine). A total of 1187 metabolites were detected in serum samples, and 960 metabolites in urine samples, with the three most abundant classes of metabolites in both types of samples being lipids (43.3%, 19.27%), amino acids (21.15%, 29.58%), and xenobiotics (16.09%, 23.13%), respectively.

Fig. 1
figure 1

Proportion of identified metabolites in each chemical class. a Serum b Urine

Prior to detailed analysis of specific metabolic changes, PCA and OPLS-DA models were employed to ascertain whether there were differences in metabolic profiles between COVID-19 survivors (case) and the healthy individuals (control). In both PCA and OPLS-DA models, the two groups of serum samples did not exhibit a clear separation trend (Fig. 2a and b). However, compared to the PCA model (Fig. 2c), the OPLS-DA model demonstrated significant differences in the urinary metabolomic profiles between Case and Control, with good reproducibility within each group (Fig. 2d). Furthermore, the Q2 and R2 values from the OPLS-DA permutation test exceeded 0.5, indicating high explanatory and predictive power for categorical variables. These results indicated differences in the urinary metabolomic profiles, although they didn’t preclude differences in the serum metabolomic profiles between the two groups. However, differences in urine samples were more pronounced in PCA and OPLS-DA compared to serum samples.

Fig. 2
figure 2

Plot of the PCA scores a Serum c Urine. Plot of the OPLS-DA scores b Serum d Urine

Following initial observations of differences, both univariate and multivariate statistical methods were employed to identify distinct metabolites in the samples of the two groups. Metabolites were deemed significantly different in this study if they had an adjusted P-value < 0.05, with log2FC > 1 (red) or log2FC < 1 (blue), resulting in 874 serum metabolites (457 upregulated, 417 downregulated) and 39 urine metabolites (12 upregulated, 27 downregulated), as shown in the volcano plots (Fig. 3a and b). Varian importance inprojection (VIP) scores were calculated for serum and urine metabolites using the OPLS-DA model, ranking the top 30 metabolites. The top five metabolites in blood (Fig. 3c) were identified as phenylacetylglycine, cis-4-decenoate (10:1n6), methylsuccinate, branched-chain, straight-chain, or cyclopropyl 12:1 fatty acid**, and allantoic acid; in urine (Fig. 3d), they were cis-urocanate, carnitine of C10H1402(4)**, acetylhydroquinone sulfate, pseudoephedrine, and resveratrol sulfate(1).

Fig. 3
figure 3

Fold-change plot showing of metabolism data between Case and Control a Serum b Urine; OPLS-DA VIP score charts. c Serum d Urine

The Random Forest model analyzed the top 50 metabolites by importance in blood (Out-of-Bag, OOB error rate of 3.12%) and urine (OOB error rate of 6.25%) samples, identifying it as the strongest driver of overall metabolic differences between the healthy individuals and COVID-19 survivors. Based on the literature and KEGG/HMDB databases, metabolites were annotated to one of super pathways corresponding to their general metabolic processes. The most distinctive metabolites primarily originated from pathways. Including: Amino acids, Carbohydrates, Energy, Lipids, Nucleotides, Partially characterized molecules, Peptides, Secondary metabolism, and Xenobiotics (Fig. 4a and b). Based on VIP scores greater than 2 and adjusted P-values less than 0.05, serum and urine metabolites were analyzed together, further identifying 16 metabolites with significant differences (Table 1). A heatmap was used to display these significantly different metabolites, showing that in the Case compared to the Control, 11 metabolites were upregulated and 5 were downregulated in urine samples (Fig. 4c). Combining Random Forest and cluster analysis results, eight metabolites, including 1-ribosyl-imidazoleacetate*, carboxyethyl-GABA, cis-urocanate, glucuronide of C10H18O2 (2)**, N,N-dimethyl-5-aminovalerate, N1-methyladenosine, pseudoephedrine, and resveratrol sulfate (1)*, were found to perfectly distinguish between the healthy individuals and COVID-19 survivors, considered potential biomarkers.

Fig. 4
figure 4

Random forest model a Serum b Urine. Clustering heatmap of significant metabolism c

Table 1 The differential metabolites among Case and Control

Metabolic pathway analysis

To explore metabolic pathways potentially implicated in COVID-19 survivors, metabolites with significant differences between the two groups were enriched, showcasing the top 25 metabolic pathways in blood (Fig. 5a) and urine (Fig. 5b). Results indicated (Table 2) that 11 metabolic pathways exhibited significant changes (FDR < 0.05) between the two groups, namely Alanine, aspartate and glutamate metabolism; Arginine and proline metabolism; Arginine biosynthesis; beta-Alanine metabolism; Biosynthesis of unsaturated fatty acids; Butanoate metabolism; Glycine, serine and threonine metabolism; Histidine metabolism; Nicotinate and nicotinamide metabolism; Phenylalanine, tyrosine and tryptophan biosynthesis; Valine, leucine and isoleucine biosynthesis.

Fig. 5
figure 5

Enriched KEGG iterms a Serum b Urine

Table 2 Metabolic pathways significantly altered by Case and Control

Discussion

Metabolomics research methodologies are straightforward, with UPLC-MS being the most commonly utilized technique in metabolomics, widely applied in the screening for diagnostic biomarkers of various diseases. This study combines UPLC-MS detection methods with multivariate statistical analysis to investigate the metabolomics of serum and urine in COVID-19 survivors and healthy individuals. The findings demonstrate differences in the serum and urine metabolomic profiles between the two groups, with 874 differential metabolites identified in serum and 39 in urine. Subsequently, a combination analysis of the top-ranked important serum and urine metabolites was conducted using a random forest model and cluster analysis to control confounding factors and enhance the reliability of the results. This results indicates that, despite recovery and discharge, COVID-19 survivors still exhibit differences in endogenous substances compared to healthy individuals, aligning with the majority of research findings [22, 23].

Among the metabolites that can clearly distinguish COVID-19 survivors from healthy indivivuals in this study, 1-ribosyl-imidazoleacetate* is an intermediate in the synthesis of zoledronic acid, a drug for treating malignant hypercalcemia. In one study, the results confirmed that 1-ribosyl-imidazoleacetate* is positively correlated with ischemic stroke [24]. However, studies specifically targeting 1-ribosyl-imidazoleacetate* in relation to COVID-19 are limited. Similarly, research on the glucuronide of C10H18O2 (2)** is also limited. Carboxyethyl-GABA, although lacking genetic or cytotoxic effects, was found in one study to induce time-dependent proliferation and migration of mouse fibroblasts [25]. Fibroblasts can maintain the structural integrity of connective tissue and secrete a large amount of collagen fibers, thereby playing a role in wound healing. With the passage of time, the increase in carboxyethyl-GABA concentration leads us to hypothesize that carboxyethyl-GABA may be a potential marker for interstitial lung fibrosis, which is related to lung injury. One of the complications following COVID-19 infection is the development of fibrosis. It has been reported that lung fibrosis can be detected early in the infection, regardless of pre-existing lung conditions and disease severity [26]. The decline in lung function of COVID-19 survivors can last up to 12 months and may even become permanent, especially in the case of fibrosis [27, 28].

N,N-dimethyl-5-aminovalerate is related to the catabolism of microbial corpse alkaloids [29]. A study showed that the plasma metabolic profile of N,N-dimethyl-5-aminovalerate differs significantly before and after long-term antiretroviral therapy, and its metabolite levels can clearly distinguish HIV-infected patients from healthy controls [30]. Therefore, this study speculates that N,N-dimethyl-5-aminovalerate may also be a potential marker for distinguishing between COVID-19 and healthy controls, but further confirmation is needed in future research. Many studies have proven that N1-methyladenosine is closely related to tumor response [31,32,33]. However, research on N1-methyladenosine in the context of COVID-19 is limited. Pseudoephedrine can be used to treat symptoms of the common cold and flu, sinusitis, asthma, and bronchitis, and is a long-standing drug. Since this study did not completely exclude drug variables, the significant metabolic profile differences in COVID-19 survivors might be due to drug residues. Resveratrol sulfate (1)* is a polyphenolic chemical, and it has been proven that resveratrol can improve inflammatory diseases involving the intestinal mucosa [34, 35]. About half of acute COVID-19 patients experience gastrointestinal symptoms, continuing inapproximately 10%–25% of COVID-19 patients continuing for up to 6 months [36, 37]. Due to the potential interaction between the immune response associated with SARS-CoV-2 infection and the immune dysregulation associated with inflammatory bowel diseases (IBD), resveratrol might offer a new therapeutic approach for COVID-19 survivors. Although research on these substances in the context of the COVID-19 pandemic remains limited, the results of this study can provide new research directions.

Enrichment analysis revealed significant enrichment of the arginine biosynthesis metabolic pathway in the serum of COVID-19 survivors. Arginine not only serves as a crucial substrate for protein synthesis but also as a precursor for the synthesis of substances like creatine, polyamines, and nitric oxide (NO) in the body, playing a significant role in human nutritional metabolism and regulation [38]. The physiologically active form of arginine in the body is L-arginine. Recent research on COVID-19 has found that serum levels of L-arginine in adults and children affected by COVID-19 are significantly lower compared to control groups [39]. Another study demonstrated that serum levels of L-arginine are inversely correlated with the severity of COVID-19 [40]. In vitro assays have shown that T cell proliferative capacity is significantly reduced in COVID-19 patients, which can be restored by supplementing with arginine [41]. Recent metabolomics data indicate changes in the L-arginine pathway in COVID-19 patients [42], and an increase in arginase mRNA expression was also found in peripheral blood mononuclear cells (PBMCs) of COVID-19 patients [43]. Reports suggest a close relationship between the expression of arginase or nitric oxide synthase (enzymes essential for arginine catabolism) and airway remodeling in chronic obstructive pulmonary disease (COPD) patients [44]. Data indicates that levels of arginine are reduced in the serum of COVID-19 survivors with pulmonary function abnormalities. The results of this study show that L-arginine levels in the serum of COVID-19 survivors are lower than in healthy individuals, thus suggesting that pulmonary function changes may still persist in COVID-19 survivors, necessitating timely re-examination and monitoring. Furthermore, we speculate that monitoring changes in L-arginine could also be beneficial in managing long COVID-19, as the persistence of chronic inflammation and endothelial dysfunction has been demonstrated to underlie COVID-19 sequelae [45, 46].

Despite these findings, the study has limitations: the sample size is small, and due to the unbiased nature of non-targeted metabolomics, the identified metabolites may have certain biases. Future research should aim to increase the sample size for targeted metabolomics validation.

Conclusions

In this study, UPLC-MS/MS metabolomics was applied to select for differential metabolites in COVID-19 survivors. Co-analysis of the top-ranked importance metabolites in serum and urine identified 16 metabolites with significant differences. Among themwere 1-ribosyl-imidazoleacetate *, carboxyethyl-GABA, cis-urocanate, glucuronide of C10H18O2 (2) * *, N, N-dimethyl-5-aminovalerate, N1-methyladenosine, pseudoephedrine, and resveratrol sulfate (1). * These 8 metabolites are considered as potential biomarkers in COVID-19 survivors. Our research provides new insights into the metabolomics of the COVID-19 recovery phase and may offer potential new therapeutic targets for preventing COVID-19 relapse. Future research is needed to confirm our preliminary data and identify effective diagnostic biomarkers for the COVID-19 recovery phase.