Introduction

The goal of clinical metabolomics can be defined as a search for the causal links between the time-dependent changes in metabolic composition of the body fluids and alterations in human physiology/pathophysiology. Numerous analytical, computational and logistic challenges should be addressed in order to reach this goal; maintenance of the sample integrity before the analysis is one of such challenges. Indeed, keeping a sample integrity and minimization of the pre-analytical variability in the process of sample collection and storage is an essential element in any metabolomics workflow. The sampling factors such as a storage delay or/and storage conditions (temperature) are affecting the metabolic composition of the samples due to the residual enzymatic activities, chemical transformations of the metabolites or bacterial contamination [1]. There are several reviews with guidelines and recommendations for sample handling; the main rules are simple: minimize the time of sample handling and freeze samples as soon as possible [24].

However, with the expansion of metabolomics into the clinical domain, it is not so difficult to imagine a situation when the sample handling requirements for a metabolomic study are simply not applicable in practice. For example, biobanks contain the thousands of highly valuable clinical samples collected prior to the “omics era” and value of such archived samples for metabolomics studies is being discussed regularly [5]. Another example is the field studies or point-of-care (POC) sample collection, e.g., when patients have limited mobility and samples have to be collected outside of a hospital environment. Either way, the samples will be exposed to non-controlled conditions. Consequently, a question arises: should such samples be discarded completely? Or such samples still retain some physiologically relevant information and under circumstances when no alternative available can be of use? To address this question, we took advantage of facioscapulohumeral muscular dystrophy (FSHD) study, in which one set of the samples was collected in the hospital and another set of samples was collected by the patients themselves at home and sent by mail to the hospital. Using two analytical platforms, liquid chromatography–mass spectrometry (LC–MS) and proton nuclear magnetic resonance (1H NMR) spectroscopy, we demonstrate that the metabolic profiles of samples collected in the hospital setting differ significantly from the samples collected under uncontrolled conditions. Nevertheless, such basic physiological parameters as age and BMI remain the dominant traits in the data.

Materials and Methods

We analysed 117 urine samples from 47 patients included in the FACTS-2-FSHD trial, a one-centre, randomized controlled study which aimed to evaluate the impact of two different treatment strategies on the muscle function and experienced fatigue in patients with FSHD [6]. However, an effect of the therapy remains out of scope of the manuscript. The protocol was approved by the Medical Ethics Committee of Radboud University Nijmegen Medical Centre (Nijmegen, The Netherlands). Urine samples were collected at four consecutive points, namely: just before therapy (n = 38), during therapy at months 1 (n = 20), 2 (n = 16) and 3 (n = 13), and 1 month after therapy (n = 30). Urine samples before and after therapy were collected at the hospital under controlled conditions. Urine samples during therapy were collected by patients at home and were sent by post to the hospital. All samples were stored at −80 °C when they were received at the hospital, until the measurement.

Urine samples were analyzed by UPLC-ESI-UHR-ToF (UPLC Ultimate 3000 RS tandem LC system, Dionex, Amsterdam, The Netherlands; ESI-UHR-ToF maXis, Bruker Daltonics, Bremen, Germany) and 1H NMR (600 MHz Bruker Avance II spectrometer, Bruker BioSpin, Karlsruhe, Germany) details of which have already been reported elsewhere [7, 8]. Sample preparation for 1H NMR experiments was slightly modified as follows. Using the Bruker Sample Track LIMS system in combination with a Gilson 215 robotic system, 540 µL of each urine sample were added to 60 µL of pH 7.4 sodium phosphate buffer (1.5 M) in D2O containing 4.0 mM sodium 3-trimethylsilyltetradeuteriopropionate (TSP) and 2.0 mM NaN3. After centrifugation, a modified Gilson 215 robot was used to transfer 560 µL of sample from the plate into 5 mm SampleJet NMR tubes.

LC–MS data files were aligned using the in-house developed alignment algorithm msalign2 tool [9] (http://www.ms-utils.org/msalign2/); peak picking and peak matching were performed using the XCMS package (The Scripps Research Institute, La Jolla, USA) [10]. NMR spectra were manually phase and baseline corrected and automatically referenced to the internal standard (TSP = 0.0 ppm). Phase offset artefacts of the residual water resonance were manually corrected using a polynomial of degree 5 least square fit filtering of the free induction decay (FID). A bucket table with a bucket size of 0.04 ppm was generated for the regions 10.0–6.0 and 4.5–0.2 ppm [11], respectively, using an AMIX (version 3.5; Bruker Biospin, Germany). All buckets were normalized to total area prior to multivariate analysis.

LC–MS and 1H NMR generated data matrices were separately imported to the SIMCA-P 12.0 software package (Umetrics, Umeå, Sweden). The data were mean-centered, unit variance-scaled and logarithmically transformed prior to statistical analysis. Unsupervised multivariate statistical analysis using principal component analysis (PCA) and supervised analysis using projections to latent structures (PLS) and PLS-discriminant analysis (PLS-DA) were performed. PLS and PLS-DA models were validated by random permutation of the response variable and comparison of the goodness of fit (R2Y and Q2Y) of 200 such models with the original model in a validation plot. R2Y and Q2Y of the original PLS models as well as intersects of the R2Y and Q2Y regression lines of the validation plots with the vertical axis were calculated as quality parameters.

Identification of metabolites from 1H NMR experiments was performed by matching chemical shift depending patterns of interest against reference spectra from the Bruker Biorefbase database (Bruker BioSpin, Karlsruhe, Germany). For LC–MS metabolites identification, MS2 experiments were performed using UPLC-ESI-UHR-ToF. MS/MS data was collected for each precursor ion of interest (the ones relevant after statistical analysis). Collision energies were set to 25 for m/z <200 and 30−35 for m/z >200. Compound assignment was based on accurate masses (mass error <1 ppm), MS2 pattern and isotopic distribution (sigma value <20). This information was matched against online metabolomic databases (METLIN, Human Metabolome Database, MassBank, Mascot). Molecular formulas were also built using the tools from Bruker data analysis that are based on accurate masses and isotopic patterns (Bruker Compass Data Analysis, Bremen, Germany).

Results

A PCA analysis performed on the entire dataset has shown that the differences in the sampling conditions (hospital setting versus patient collection with mail delivery) are already present in the first two principal components for both LC–MS and 1H NMR (Fig. 1a, b). This factor appeared to have a stronger influence on the data than traditional confounders such as gender or age (Supplementary Figs. 1, 2 and 3). As Fig. 1 shows there is a certain degree of similarity between the PCA score plots built on LC–MS and NMR data. To convey this similarity into the numerical values we applied a multivariate extension of the correlation coefficient−the RV coefficient, which is a measure of the degree of matching between complex data matrix [7, 12]. RV coefficient flattened off to 0.46 after the fourth component (Table 1). Thus, despite a visual appearance there is no strong correlation between PCA models based on LC–MS and 1H NMR.

Fig. 1
figure 1

PCA scores plots of LC–MS and 1H NMR data. Samples are coloured according to sample collection (dark red hexagons: samples sent by postal services; blue hexagons: samples collected at the hospital). First two components of the PCA cover 10.2 and 7.8 % of variation of LC–MS data (a) and 9.3 and 7.6 % of variation of 1H NMR data (b), respectively

Table 1 Summary of principal component analysis performed on 1H NMR and LC–MS data and multivariate correlation (RV coefficient), calculated on the resulting score matrices

Subsequently, two-class PLS-DA models were built using the type of sample collection as the response variable (hospital setting versus collection outside of hospital). Statistical parameters of the models were as following: for LC–Ms data R2Y = 0.875, Q2 = 0.718, F = 74.11, p value = 6.2372e−031, for NMR data R2Y = 0.806, Q2 = 0.609, F = 30.33, p value = 1.8498e−021. An overview of the variables importance on the projection (VIP) with scores ≥1.5 showed that 95 (of 700) and 26 (of 256) variables were responsible for the separation of the samples for LC–MS and 1H NMR data, respectively.

Analysis of 1H NMR variables showed that, for example, citric acid (2.54 and 2.68 ppm) and hippuric acid (3.98, 7.62, 7.66 and 7.82 ppm) were decreased, while acetic acid (1.94 ppm), benzoic acid (7.5 and 7.9 ppm) and formic acid (8.46 ppm) were increased in samples collected outside of the hospital setting: a pattern which could be interpreted as an indication of bacterial contamination of the samples [1, 1317].

For LC–MS identification, the 95 LC–MS features were sorted according to their retention time. Then, examination of isotopic patterns and in source fragmentation reduced the list to 77 features. Subsequently, samples with the highest intensity for each feature were identified and examined for intensity values. A total of 17 features presenting intensity >104 were selected to perform MS2 experiments. MS and MS2 data were matched against the available databases and molecular formulas were built for each ion. A total of four compounds that were decreased in samples collected outside the hospital setting were identified: hippuric acid (m/z 180.06), uric acid (m/z 169.04), methyladenosine (1-methyladenosine or N6-methyladenosine; m/z 282.12) and a triple charged peptide derived from uromodulin (SGSVIDQSRVLNLGPITR; m/z 638.03). One compound identified as N,N-dicyclohexylurea (m/z 225.19) was increased in samples collected outside the hospital setting. MS2 spectra are displayed in Supplementary Fig. 4.

To test the congruency between MS and 1H NMR data and to assist with the annotation of the MS data, statistical heterospectroscopy was applied [18, 19]. From the 17 LC–MS features of interest, only m/z 108.06, identified as hippuric acid correlated with the chemical shifts of hippuric acid. The rest of the LC–MS features were positively and/or negatively correlated with several 1H NMR signals (i.e. acetic acid, formic acid or citric acid) but none of them provided us with useful information about the identity of the features.

The identification of bacterial contamination as the main factor of the metabolomic differences between the samples collected under controlled conditions and uncontrolled conditions is not so surprising [1, 13, 1517]. The next question is whether the contaminated, suboptimal samples still retain any physiologically relevant information. Consequently, we focused the analysis on the samples collected by patients at home and sent by mail (n = 49). PCA was performed on LC–MS and 1H NMR data separately. Therapy group, time point of the study, season of sample collection or gender did not have any influence on the clustering of the data. However, PLS models revealed that age and BMI correlated with both LC–MS and 1H NMR data separately, although such correlations were stronger for LC–MS than for 1H NMR data (Figs. 2, 3). To understand the differences observed between the LC–MS and 1H NMR models, the modified RV coefficient was again calculated for the PCA models of LC–MS and 1H NMR data (Table 2). Interestingly, the RV coefficient for the first component was very low (0.004), which can explain the differences between LC–MS and 1H NMR.

Fig. 2
figure 2

Observed vs. predicted plots of PLS models built on of LC–MS and 1H NMR data with age as a response variable. a A model built on LC–MS data: R2Xcum = 0.299, R2Ycum = 0.993, Q2cum = 0.743; CV-ANOVA: F = 6.7, p value = 1.3e−5. b A model built of 1H NMR: R2Xcum = 0.113, R2Ycum = 0.878, Q2cum = 0.234. CV-ANOVA: F = 3.4, p value = 0.02

Fig. 3
figure 3

Observed vs. predicted plots of PLS models built on LC–MS and 1H NMR data with BMI as a response variable. Samples are coloured according to patient ID. a A model built on LC–MS data: R2Xcum = 0.191, R2Ycum = 0.951, Q2cum = 0.644; CV-ANOVA: F = 9.8, p value = 9.17e−7. b A model built on 1H NMR data: R2Xcum = 0.088, R2Ycum = 0.554, Q2cum = 0.261, CV-ANOVA: F = 6.8, p value = 0.02

Table 2 Summary of principal component analysis performed on 1H NMR and LC–MS data of samples sent by post and multivariate correlation (RV coefficient), calculated on the resulting score matrices

Discussion

Standardisation of sample collection and storage is an essential component of the metabolomics workflows [1, 13, 1517]. In an ideal world, samples should be collected according to a standard protocol with carefully controlled “collection-storage” delay. In practice, however, it is not so difficult to imagine a situation when sampling and storage conditions will deviate from the optimum. One could encounter such situation during field studies (e.g., collection of samples in rural areas of Africa or Southeast Asia) or when samples are collected outside of hospital settings. The differences in the metabolic profiles between samples collected in hospital setting and under “suboptimal” conditions are not surprising. We, however, made an attempt to explore those differences systematically using two different analytical platforms, namely LC–MS and 1H NMR, respectively. Both platforms showed clear differences between the sampling conditions, thought the application of RV coefficient as a measure of a degree of agreement between the datasets, revealed little correlation between PCA models build on LC–MS and 1H NMR. A possible interpretation of this observation is that each technique covers complementary fractions of metabolites responsible for the differences. Indeed, while main discriminative metabolites in 1H NMR were only organic acids and the products of bacterial metabolism, LC–MS data included organic acids, peptides, nucleosides, and urea derivatives.

Considering an effect that sampling imposed on the samples (the effect was present in the first two principal components), a strong correlation between metabolic composition of “suboptimal” urine samples and such physiological traits as age and BMI might appear surprising. It has been shown, however, that age represents a very strong trait in human metabolic profiles. For example, two independent field studies on metabolomics of helminthic infections have reported strong, age-related effects represented in the first principal components during the PCA analysis. Age-related trait dominated both data sets despite the differences in geographical regions, co-infections and morbidity [20, 21]. Here we show that metabolic representation of the patient’s age is not only a dominant but also a robust trait, which is retained in the data even though the sample collection was clearly suboptimal. There is no simple mechanistic explanation of the correlation between age and urinary metabolic profiles and the observed effect is, most probably, a result of complex interplay of the age-dependent changes in the protein turnover, lipid synthesis, kidney function and microbiota. The complexity of the phenomenon allows us to assume that our samples, though not collected at the clinical setting, retain a fraction of biologically and clinically relevant information and as such can be used for metabolomics studies if no alternative is present. Influence of BMI on chemical composition of urine is an established fact as well. Moreover, it has been show that metabolic representation of BMI is gender biased [22]. Authors explain such bias as an effect of the differences in the rates of proteins and lipids turnover between males and females. A gender bias is difficult to address using our data set as the majority of patients are males (36 males vs. 13 females) but using BMI as a response variable we have obtained a relatively strong model for LC–MS data, while correlation between BMI and 1H NMR data set was poor. That, brings us to a conclusion that a fraction of “robust” metabolome is better represented in LC–MS data, and consequently we should consider LC–MS rather than 1H NMR for the analysis of samples that have been exposed to suboptimal conditions.

Future Perspectives

In conclusion, we would like to emphasize once more that the optimal sample collection and storage conditions are crucial for the metabolomic studies and our report is by no means a call to loosen our grip on the sampling routines. Here, our only intention is to show that the metabolic profiles altered by suboptimal sampling conditions still retain some important physiological information as demonstrated for correlation with traits such as age and BMI. One might apply here a motto of the forensic sciences––“every contact leaves a trace”; our physiology indeed leaves traces on the metabolic profiles of the body fluids and those traces are robust enough to survive suboptimal sampling conditions. The physiological information retained in such metabolic profiles may still have practical value if care is taken on the interpretation of the results. Finally, one might consider a possibility of introducing a scale of the sample quality, similar to a scale The European Confederation of Laboratory Medicine (ECLM) has proposed for a classification of the procedures for urine measurements [23].