1 Introduction

Metabolomics is concerned with the study of naturally occurring, low molecular weight organic metabolites within a cell or tissue (Griffiths 2008; Harrigan and Goodacre 2003; Lindon et al. 2007). An important application of metabolomics is the study of the interactions of living organisms with their environment, which is defined as environmental metabolomics (Bundy et al. 2009; Miller 2007). A definition proposed for this field is ‘‘the application of metabolomics to the investigation of both free-living organisms obtained directly from the natural environment and of organisms reared under laboratory conditions” (Morrison et al. 2007). Metabolomics presents several advantages for the study of organism–environment interactions and for assessing the health status of organisms in the field (Bundy et al. 2009; Krastanov 2010). Metabolomic measurements report on the actual functional status of biological organism levels, (i.e. cell, tissue) which, in principle, can be mechanistically anchored to effects occurring at lower (i.e. genomic) or higher (phenotypic) levels of biological organization (Fiehn 2001). As such, at present, the number of metabolomic studies applied to environmental problems are growing considerably trying to improve the understanding of organism responses to abiotic stressors (Arbona et al. 2013; Sardans et al. 2011). Metabolomic studies are based on simultaneous measurement of multiple metabolites, using inherently parallel analytical techniques such as nuclear magnetic resonance spectroscopy or mass spectrometry, followed by appropriate statistical analysis (Lindon and Nicholson 2008a, b; Dunn et al. 2011). The application of metabolomic techniques, however, is still limited, due to the inherent problems of separation and identification of unique metabolites and hence metabolic changes (Viant 2008; Bundy et al. 2009; Samuelsson and Larsson 2008; Miller 2007).

To evaluate the effects of stressors on the metabolome, gas chromatography coupled to mass spectrometry (GCMS) in electron ionization (EI) is a widely used analytical technique as it provides high sensitivity and reproducibility, and permits compound identification using spectral libraries (Steinhauser and Kopka 2007; Vandenbrouck et al. 2010b). The major limitation of GCMS is the restriction to semi-volatile analytes, and hence the requirement to convert many polar metabolites to volatile derivatives through chemical derivatization processes. Oximation and trimethylsylilation (TMS) reactions of organic compounds are the classical and most widely used derivatization procedures for metabolome analysis by GCMS (Villas-Boas et al. 2011; Kanani et al. 2008; Yi et al. 2014; Gullberg et al. 2004; De Souza 2013; Little 1999). Derivatization, however, increases the amount of adduct-derivatives and hence the complexity of metabolomics data. Understanding this complex big-data generated from GCMS is still currently a bottleneck (Vandenbrouck et al. 2010b; Dunn et al. 2013; Wehrens et al. 2014).

In order to maximize the extraction of information, a pipeline for GCMS metabolomic data analysis was designed and principal component analysis (PCA), partial least squares-discriminant analysis (PLS-DA) and multivariate curve resolution-alternating least squares (MCR-ALS) (De Juan et al. 2014; van Stokkum et al. 2009; Tauler 1995) were proposed for the resolution of complex chemical mixtures. This last chemometric tool is a powerful data analysis tool that solves the mixture analysis and interferences co-elution problems in complex natural samples as well as problems derived from GCMS systems, such as baseline drift, spectral background, noise contributions or low S/N ratio values (Parastar et al. 2012). MCR-ALS has been also used in previous works related with ecological and environmental problems (Navarro-Reig et al. 2015; Malik and Tauler 2015; Alier et al. 2010; Terrado et al. 2009).

Daphnia magna, a freshwater crustacean, is used extensively as an aquatic test species (Ikenaka et al. 2006; Martins et al. 2007; Barata and Baird 2000a) being the object of standardized testing guidelines from the Organization for Economic Co-operation and Development (OECD 2012). Acute and chronic tests of D. magna are among the most frequently performed studies in aquatic toxicology because these animals are relatively easy to culture, have a short lifecycle, and can be maintained at high population densities in relatively small volumes and thus are cost-effective (Barata et al. 2005; De Meester and Vanoverbeke 1999). Daphnia magna is also a keystone species in freshwater food webs, thus it is also an ecological model species to study effects of global climate change. Water temperature, salinity and oxygen levels of freshwater bodies are likely to be affected by climate change (IPCC 2014; Adrian et al. 2009; Coutant 1990).

Here an untargeted GCMS metabolomics study of D. magna exposed to mild stress provoked by high salinity, increased temperature and low dissolved oxygen is presented.

2 Materials and methods

2.1 Chemicals and standard solutions

Pure metabolites: pimelic acid, malic acid, l-phenylalanine, l-tyrosine, putrescine, dopamine hydrochloride, l-histidine, l-methionine, d-maltose monohydrate, myo-inositol, d-ribose, d-galactose and cytidine used as test standards were purchased from Sigma-Aldrich (St. Louis, MO, USA). d-glucose (U-13C6, 99 %), used as the internal standard (IS), was supplied by Cambridge Isotope Laboratories, Inc. (Andover, MA, USA). Pyridine (anhydrous, 99.8 %), chlorotrimethylsilane (TMCS), methoxyamine hydrochloride (98 %) (MeOX) and N-methyl-N-trimethylsilyl trifluoroacetamide (>98.5 %) (MSTFA), used as derivatizating agents, and the saturated Alkane standard mixture for the performance test of GC systems from C7 to C30 were also obtained from Sigma-Aldrich (St. Louis, MO, USA). Hexane, methanol and chloroform were analytical reagent grade, and sodium chloride (NaCl) salt was supplied by Merck (Darmstadt, Germany).

2.2 Sample preparation

2.2.1 Daphnia magna growth conditions

A single laboratory clone F, which has been the subject of many investigations (Barata and Baird 2000b) was used. Bulk cultures of 10 animals L−1 were maintained in ASTM hard synthetic water (ASTM, 1995) as described by Barata and Baird (2000b). Individual or bulk cultures were fed daily with Chorella vulgaris Beijerinck (5 × 105 cells mL−1, corresponding to 1.8 μg C mL−1) (Barata and Baird 1998). The culture medium was changed every other day, and neonates were removed within 24 h. Photoperiod was set to 14 h light: 10 h dark cycle and temperature at 20 ± 1 °C.

2.2.2 Experimental design

Four days old D. magna juveniles were used for experiment. Experimental animals were obtained from bulk cultures of 100 individuals reared in 10 L of ASTM as described above (“Daphnia magna growth conditions”). Randomly collected animals were exposed to high salinity, elevated temperature and hypoxia treatments in groups of ten individuals in 1.2 L jars filled with 1 L of ASTM hard water. A control treatment of individuals exposed to ASTM hard water maintained at 20 °C under saturating oxygen conditions (≈9 mg L−1) without salt (0 g L−1 NaCl) was also included. Each treatment was replicated five times. Salinity exposed samples were prepared with 5 g L−1 NaCl in ASTM hard water, temperature samples were prepared in a water bath at 25 °C and hypoxia samples at ≤2 mg O2 L−1 were prepared by bubbling liquid nitrogen into the ASTM hard water thus displacing oxygen from it. The oxygen levels were controlled by an oxygen sensor, approximately the levels had to be ≤2 mg L−1 in every jar sample. After 24 h of exposition daphnids were collected from each jar, pooled in an eppendorf, the water was removed and then samples were frozen in liquid N2 and stored at −80°C before metabolite extraction.

2.2.3 Extraction and derivatization of metabolites

In the first part of this work, GCMS method and derivatization step were optimized to obtain high sensitivity and to maximize the number of detected compounds. Several standard mixtures (i.e. organic acids, sugars, nucleotides, nucleosides and amino acids) were tested to optimize the experimental conditions and the derivatization step in order to ensure a comprehensive and reliable metabolic profiling. An aqueous standard solution (1000 µg mL−1) of each standard was prepared in methanol and stored in the freezer at −20 °C until their use. Working standard solutions of 50 µg mL−1 were obtained by diluting the stock solutions with water. Diluted standard solutions of 5 µg mL−1 were used to optimize experimental conditions and also several derivatization conditions were tested. Metabolite extraction was performed with methanol, water and chloroform and the chemical derivatization was carried out with two agents, methoxyamine (MeOX) in pyridine and N-methyl-N-(trimethylsilyl) trifluoroacetamide (MSTFA) with 1 % of trimethylchlorosilane (TMCS) as a catalyst of the reaction, which are very commonly used in metabolomics studies by GCMS analysis.

Polar metabolites of the whole organism were extracted with 400 µL methanol, sonicated for 15 min, and then 200 µL water and 400 µL chloroform were added before centrifugation at 10,000×g during 15 min, in order to separate the aqueous and the lipid phase. The aqueous phase of every sample was transferred to a new eppendorf and 10 µL of the Internal Standard (IS) [d-glucose (U-13C6, 99 %)] were added at a concentration of 50 µg mL−1. Afterwards, the extract was evaporated to dryness with a Speedvac (Thermo Scientific) at 40 °C during 3 h. When samples were completely dry they were stored at −80 °C until analysis.

Prior the analysis by GCMS, a derivatization method was performed to create thermally stable derivatives suitable for the analysis (Orata 2012). The initial conditions were adjusted from (Vandenbrouck et al. 2010a) and adapted to the needs of this study, in order to detect the maximum number of metabolites. The first step was methoximation with methoxyamine hydrochloride (MeOX) in pyridine, which was used to stabilize carbonyl radicals of metabolites and to stop charge transfer. Then, trimethylsilylation was carried out with N-methyl-N-(trimethylsilyl) trifluoroacetamide (MSTFA) with 1 % of trimethylclorosilane (TMCS) as catalyst. Trimethylsilylation was applied to replace active hydrogens on a wide range of polar compounds with a trimethylsilyl [–Si(CH3)3] group. This derivatization method permits to detect the silylated molecules in the gas chromatography analysis because it increases the volatility of the metabolites. Different MeOX concentrations (20, 30 and 40 mg mL−1), MeOX volumes (30, 65 and 100 µL), MeOX temperatures (20, 30 and 40 °C), MeOX times (90, 525 and 960 min), MSTFA volumes (30, 65 and 100 µL), MSTFA temperatures (20, 30 and 40 °C) and MSTFA times (30, 60 and 90 min) were tested to improve the derivatization reaction and to determine the most suitable conditions for the analysis of D. magna metabolites (Pacchiarotta et al. 2010; Kanani et al. 2008; Azizan et al. 2012; Kanani and Klapa 2007; Strehmel et al. 2008).

After the extraction of polar metabolites, the final protocol of the derivatization conditions was implemented: to the dry residue obtained after extraction from daphnids, 65 µL of 20 µg µL−1 MeOX in pyridine were added. After mixing for 1 min, the mixture was incubated for 90 min at 30 °C. Thereafter, 30 µL of MSTFA (1 % TMCS) were added, vortex mixed for a minute and then incubated for another 30 min at room temperature. Prior to injection, daphnid derivatized extracts were finally filtrated through a 0.22 µm filters (Ultrafree®-MC, Millipore) and, then transferred to a chromatographic vial and 5 µL of hexane were added to obtain a final volume of 100 µL in order to be injected into the GC system.

2.3 GC–MS analysis

A Trace GCMS Thermo Fisher system operated in EI mode at 70 eV and equipped with a ZB-5MS column (Phenomenex, 30 m × 0.25 mm ID × 0.25 µm) (Vandenbrouck et al. 2010a) was used. The oven temperature program was set at 70 °C and then increased at 10 °C min−1 to 250 °C, and then to 310 °C at 5 °C min−1 and held for 5 min, the delay time was 5.3 min and the time of analysis was 49 min. Other operating conditions were as follows: carrier gas was He, with a flow rate of 0.6 mL min−1; source temperature, 200 °C; interface temperature, 280 °C; split ratio of 1:20; detector voltage, 440 V; 2 µL were injected in the split mode injection. The m/z values were recorded in full scan mode in the range of m/z 60–650 amu. The mass spectrometer was interfaced to a computer workstation running Xcalibur software (version 2.2, Thermo Scientific) for data acquisition and processing. For each detected peak, a linear retention index (RI) (Strehmel et al. 2008) was calculated using GC RI standards (hydrocarbons from C7 to C30 were used as internal standards). Therefore, an n-alkane Standard mixture was prepared dissolved in hexane at a concentration of 10 µg µL−1, and was injected at the same programme conditions as the samples in the GCMS instrument.

2.4 Data analysis

Original full scan GCMS data sets “.raw“ from Xcalibur software (version 2.2, Thermo Scientific) were converted to “.cdf“ data files by File Converter tool of Xcalibur Software to be further processed with MATLAB (The Mathworks, Inc., Natick, MA, USA). Every data matrix corresponds to one of the samples analyzed by GCMS (control and exposed), with 2581 rows (retention times, 49 min run) and the same number of rows equal to 590 columns (m/z values, from 60 to 650 with one unit of mass resolution). Data values in every data matrix were normalized with the peak area value of the Internal Standard (d-glucose U-13C6) to correct for the instrumental intensity drifts among injections, and to scale data internally. A total of 21 individual data matrices [1 blank, 5 controls and 15 exposed (3 treatments × 5 exposed samples)] with dimensions of 2581 × 590 (retention times, m/z values) were obtained and these resulting pre-processed data matrices were analyzed by MCR-ALS.

Additionally, for initial exploration of GCMS data, a new data set was arranged consisting in a single data matrix where every individual total ion current (TICs) Chromatogram of every control and exposed sample were arranged in the rows of this matrix. Three different single data matrices of TICs were obtained, one for each of the three investigated factors (salinity, temperature and hypoxia), with dimensions of 10 × 2581 [number of samples (5 controls and 5 exposed) × retention times], which were then submitted to PCA and PLS-DA using PLS Toolbox 7.3.1 (Eigenvector Research Inc., Wenatchee, WA, USA).

2.4.1 Initial exploration of GCMS data

PCA (Wold et al. 1987; Esbensen and Geladi 2010) is an unsupervised technique that assumes no prior knowledge of class information and permits the unbiased comparison of the samples. This allows an initial exploration of the GCMS TIC data comparing the behavior between control and exposed samples according to salinity, temperature and oxygen levels changes, finding patterns and detecting possible outliers in the samples. Prior to the chemometric analysis, some pretreatment methods were applied using PLS Toolbox 7.3.1 (Eigenvector Research Inc., Wenatchee, WA, USA). The first step was to align the chromatograms to compensate for possible retention time shifts between chromatographic runs using the correlation optimized warping method (Tomasi et al. 2004; Nielsen et al. 1998). After alignment, automatic weighted least squares from PLS Toolbox 7.3.1 (Eigenvector Research Inc., Wenatchee, WA, USA) was applied for automatically removing baseline offsets from TIC data matrices and variables were mean-centered also using PLS Toolbox 7.3.1 (Eigenvector Research Inc., Wenatchee, WA, USA).

Partial least squares discriminant analysis (PLS-DA) (Kalivodová et al. 2015; Wold et al. 2001) is a supervised linear regression technique in which the class for each group of samples is assigned prior to the construction of the sample scores plot, and the maximum separation of the classes (control and exposed samples) is achieved. Variable importance in projection (VIP) scores (Wold et al. 2001) were calculated to recognize the chromatographic regions of interest and to reveal the most influent variables (peak retention times related with metabolite concentration changes) discriminating controls and exposed samples.

2.4.2 MCR-ALS analysis

MCR-ALS method allows for the resolution of the pure elution and mass spectra profiles of all sample constituents in one or multiple samples analyzed by GCMS, in a similar way to the one previously shown for LC-MS analysis (Farrés et al. 2014) or for other hyphenated chromatographic techniques (Ortiz-Villanueva et al. 2015). From the resolved elution profiles of the constituents, their peak heights or areas can be easily evaluated and compared between runs. Due to the high fragmentation of molecules observed in GCMS and to the presence of a large number of derivatized subproducts, the proposed MCR-ALS analysis is especially helpful because it resolves and distinguishes the relevant metabolites profiles of sample constituents from the irrelevant contributions of the derivatizing signal, and therefore, it allows for the discrimination of the relevant D. magna elution profiles (metabolites) from the large number of interfering peaks (undesired derivatized compounds).

MCR analysis can be mathematically expressed with the bilinear model shown in the Eq. 1:

$$\textbf{D} = \textbf{CS}^\textbf{{T}} +\textbf{E}$$
(1)

where C contains the chromatographic profiles of all the resolved components, ST their resolved (pure) mass spectra, and E is the matrix expressing the error or the variance not explained by the model, related to background and other unknown noise contributions. Every full scan GCMS gives a data matrix, D, which has the mass spectra at all retention times in its rows, and the chromatograms at all m/z values in its columns.

As shown previously, the solution to Eq. 1 for C and ST is ambiguous if no additional information is available, in other words, this solution contains rotational and scale freedom. This problem is usually referred as the factor analysis ambiguity problem (Tauler 2001). This problem can affect the resolution of the component profiles. Thus, exploratory information should be used as constraints during the resolution process and it can also be used to build good initial estimates of concentration and spectra profiles. Constraints such as non-negativity and others are frequently used in MCR studies (De Juan and Tauler 2007). In the case of GCMS data, due to the very high spectral selectivity, possible ambiguities in the resolution of elution and spectral profiles are very much diminished or totally eliminated in most of the cases.

Apart from the derivatizing agent blank sample, five control and five exposed D. magna samples were obtained for every investigated treatment: salinity, temperature and hypoxia. This gives a total number of 21 samples analyzed by GCMS [1 blank, 5 controls and 15 exposed (3 treatments × 5 exposed samples)], giving 21 individual data matrices, each of them of dimensions 2581 retention times (between 5.3 min and 49 min) and 590 m/z values (between 60 and 650 m/z values). These individual GCMS full scan large sized data matrices were subdivided into eleven different chromatographic time sub-windows to reduce computer storage and execution time requirements. The size limits of these time subwindows were initially obtained considering a sufficient number of retention times (approximately 200 retention times) to cover complete peak clusters and to avoid peak halving. See Table 1 for the details of these eleven time windows and their retention times.

Table 1 MCR-ALS results in the analysis of the augmented data matrices for the three treatments (salinity, temperature and hypoxia)

In order to analyze all these data sets, eleven new column-wise augmented data matrices D aug , corresponding to every time window containing the information of one blank derivatizing agent sample, five replicate control samples and five replicate abiotic exposed daphnids samples, were built for every type of treatment (salinity, temperature and hypoxia). A total number of thirty three (3 treatments × 11 time windows) column-wise augmented data matrices were obtained by arranging the corresponding data matrices one on top of each other, containing in all cases the same number of columns (590 m/z values). In this way, all the measured mass spectra were linked across all data (with common m/z values). GCMS data matrices of a blank sample containing only the derivatizing agent were included in all analyses to determine the influence of derivatization and to describe the components influenced by it. In Fig. 1 these column-wise matrix arrangements are displayed.

Fig. 1
figure 1

GCMS data sets for blank (1), control (5) and exposed (5) samples. Selection of time and column-wise matrix augmentation for the first time window (w1) Daug to be analyzed by MCR-ALS. Processing steps involve resolution of elution profiles (Caug) of the components present in this time window in the eleven samples [blank (1), control (5) and exposed (5)] and of their pure mass spectra (ST). Statistical evaluation of peak area changes between control and exposed samples via Student’s t test and PLS-DA. Tentative identification of potential biomarkers via NIST mass spectra matching and proposal of possible metabolic pathways involved

Every column-wise augmented data matrix, D aug , corresponding to a time subwindow and containing 11 submatrices (one blank derivatizant, D b ; five control (D 1 D 5 ) and the five exposed samples (D 6 D 10 ) according to a specific treatment) can be decomposed according to the bilinear model Eq. 2 (see also this matrix decomposition in Fig. 1):

$${\mathbf{D_{aug}}} = \left[ {\begin{array}{*{20}c} \begin{aligned} {\mathbf{D_{b}}} \hfill \\ {\mathbf{D_{1}}} \hfill \\ \end{aligned} \\ {\mathbf{{D_{2}}} } \\ \begin{aligned} . \hfill \\ . \hfill \\ . \hfill \\ \end{aligned} \\ {{\mathbf{D_{10}}} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} \begin{aligned} {\mathbf{C_{b}}} \hfill \\ {\mathbf{C_{1}}} \hfill \\ \end{aligned} \\ {{\mathbf{C_{2}}} } \\ \begin{aligned} . \hfill \\ . \hfill \\ . \hfill \\ \end{aligned} \\ {{\mathbf{C_{10}}} } \\ \end{array} } \right]{\mathbf{S^{T}}} + \left[ {\begin{array}{*{20}c} \begin{aligned} {\mathbf{E_{b}}} \hfill \\ {\mathbf{E_{1}}} \hfill \\ \end{aligned} \\ {{\mathbf{E_{2}}} } \\ \begin{aligned} . \hfill \\ . \hfill \\ . \hfill \\ \end{aligned} \\ {{\mathbf{E_{10}}} } \\ \end{array} } \right] = {\mathbf{C_{aug}}} {\mathbf{S^{T}}} + {\mathbf{E_{aug}}}$$
(2)

MCR-ALS applied to this augmented data matrix gave the resolved augmented elution profiles of the resolved components (C aug ) for every sample and a matrix of the pure mass spectra of the resolved components (S T) for all samples.

MCR-ALS analysis requires an initial estimation of the number of components that can be obtained looking at the sizes of the singular values (Golub et al. 2000) of the investigated data matrix. Larger singular values are associated to systematic changes and lower ones to noise and minor contributions. This initial estimation can be afterwards revised according to the explained variances and to the interpretability of the recovered profiles. The number of pre-selected components should describe a sufficient amount of data variance and to include minor contributions such as background and derivatizing agents effects. Only components with reliable chromatographic peak shapes and meaningful metabolite mass spectra are finally considered. MCR-ALS requires also the use of constraints to give meaningful solutions (for instance positive elution and spectra profiles) (Tauler and Barceló 1993; Tauler 1995; De Juan et al. 2010). In this work, constraints of non-negativity to chromatographic and mass spectra profiles and spectra normalization of the pure mass spectra profiles (spectra equal height) were applied. The quality of MCR-ALS models was measured evaluating the lack of fit, which is the difference among the input data D aug and the data reproduced by MCR-ALS (C aug S T), and the percent of explained data variance (R 2).

2.4.3 Biomarkers detection and NIST identification

Resolved elution and spectra profiles can be used then to investigate potential biomarkers of the exposed effects (see Fig. 1). Two different procedures were applied to detect discriminant metabolites. First, a Student’s t test was tested within the data matrix C aug , resolved components showing significant concentration differences between groups (p value lower than 0.05) were finally considered and identified by their corresponding MS spectrum in S T. Secondly, resolved elution profiles were autoscaled to give equal relevance to all metabolites whatever is their total concentration and, focusing more in their relative concentration changes in control and exposed samples. Then, PLS-DA was applied to MCR-ALS autoscaled resolved peak areas for selecting candidate metabolites. Variable importance in projection (VIP) scores (Eriksson et al. 2006) were chosen as selection criteria. Peak areas or elution profiles showing higher VIP scores than 1 were considered discriminant metabolites between control and exposed samples. Metabolites from selected peak areas or elution profiles were identified comparing RIs (from MCR-ALS resolved elution profiles) and mass fragmentation patterns (from MCR-ALS resolved mass spectra profiles) with the standard mass spectral database of the National Institute of Standards and Technology (USA) (NIST 2014) (www.nist.gov/srd/nist1a.htm) and of the Golm metabolome database (GMD) of derivatized compounds (Kopka et al. 2005; Schauer et al. 2005). Two different values for the matching factors of experimental and theoretical mass spectra can be obtained by the NIST Mass Spectral Search 2.2 program distributed with the NIST 2014 library, a match factor for the unknown and the library spectrum (direct match), and a match factor for the unknown and the library spectrum ignoring any peaks in the unknown that are not in the library spectrum, match factor (MF) and reverse match factor (RMF), respectively. For each mass spectrum, 100 hits were retrieved. These matching factors reported are between 0 (no match) and 1000 (perfect match). As a general guide, a value of 900 or greater was considered to be a very good matching; between 800 and 900, a good match; between 700 and 800, a fair match; and less than 600 a poor or very poor match. In the calculation of MF, the experimental spectrum is used as a template, whereas for RMF, the template is the library spectrum. To increase the reliability of the identification, we have included the internal linear RI markers in the evaluation of the library hits, by injecting a saturated Alkane standard mixture from C7 to C30. An advantage of using this MCR-ALS strategy is that it allows for a better comparison of the MS spectra of the resolved peaks with those from the NIST 2014 library, because of the simultaneous use of peak elution retention times and of the presence of the GCMS fragment ions in the resolved mass spectrum assigned to these peaks.

Finally, as shown in the workflow of Fig. 1, metabolites tentatively identified as potential biomarkers of salinity, temperature and hypoxia treatments were further characterized by Kyoto encyclopaedia of genes and genomes (KEGG) database (www.genome.jp/kegg/ligand.html) (Kanehisa and Goto 2000) to investigate the possible metabolic pathways involved.

2.5 Software

Data analyses were performed using MATLAB 2013a (The Mathworks Inc. Natick, MA, USA) and PLS Toolbox 7.3.1 (Eigenvector Research Inc., Wenatchee, WA, USA) was used for PCA, PLS-DA and variables importance in projection (VIP) scores calculations. MCR-ALS toolbox (Jaumot et al. 2015) were used in MATLAB 2012b (The Mathworks Inc. Natick, MA, USA) for resolution of metabolite profiles from full MS scan augmented data matrices. The tentative identification of the metabolites was carried out by NIST Mass Spectral Search program (version 2.2) (http://chemdata.nist.gov/mass-spc/ms-search/) distributed with NIST 2014 Mass Spectral Library.

3 Results and discussion

3.1 PCA and PLS-DA exploratory and discriminatory analysis of GCMS TIC of D. magna control and exposed samples

Preliminary analysis of GCMS TIC data matrices (see Fig. 2) was performed for the three treatments (salinity, Fig. 2a, d, temperature, Fig. 2b, e, and hypoxia, Fig. 2c, f) using PCA and PLS-DA, respectively. In Fig. 2a, b, PC1-PC2 scores plot separates control (red) from exposed (blue) samples. In Fig. 2c (hypoxia treatment) PCA did not distinguish between control and exposed D. magna samples.

Fig. 2
figure 2

PCA scores plots of the GCMS TICs of D. magna extracts for the salinity (a 5 samples), temperature (b 5 samples) and hypoxia (c 5 samples) treatments. Red are control samples (control conditions) and blue are exposed samples (salinity, temperature or hypoxia). PLS-DA VIP (variables importance in projection) scores plots for GCMS TIC retention time variables across salinity (d), temperature (e), and hypoxia (f) treatments (Color figure online)

PLS-DA analysis was applied to the same three GCMS TIC data matrices and VIP scores were calculated for every factor: salinity (Fig. 2d), temperature (Fig. 2e) and hypoxia (Fig. 2f). In the case of salinity a single latent variable was selected explaining 83.5 and 98.31 % of the X- and y-variances respectively. In the case of temperature, two latent variables were selected explaining 56.19 and 97.19 % of the X- and y-variances respectively, and in the case of hypoxia, three latent variables were selected to explain 46.24 and 96.18 % of the X- and y-variances respectively. In all cases, the number of latent variables used for the PLS-DA model were much lower than the number of samples used in the analyses (10 samples). According to these graphs, for salinity and temperature, a small group of retention times were selected to explain most of the variance across control and exposed samples, whereas many retention times were needed to explain the hypoxia treatment.

These two exploratory analysis (PCA and PLS-DA) showed that exposures to salinity and temperature produced larger distinctive effects on the metabolite profile of D. magna samples. However, in the case of low oxygen, effects were not so pronounced and discrimination of samples was difficult. However, although GCMS retention times with larger VIP scores could already be preliminary investigated, their very large number and the presence of co-eluting chromatographic peak prevented determining precisely metabolite profiles of exposed D. magna samples changed from this preliminary analysis. Further analysis was then attempted using a deeper MCR-ALS analysis of the set of individual full scan GCMS data matrices for all analyzed samples (control and exposed).

3.2 MCR-ALS analysis of GCMS full scan data matrices of D. magna control and exposed samples

MCR-ALS investigation of full scan data matrices from GCMS analysis of control and exposed D. magna samples allowed the simultaneous resolution of the elution and mass spectra profiles of a large number of components. Number of finally resolved peaks, number of resolved peaks in each time window augmented data matrix for each treatment, and explained variances of the MCR-ALS model are summarized in Table 1. Dimensions of the 11 time window augmented data matrices for the three treatments and their dimensions are also given in Table 1.

For all treatments, explained variance (R 2) percentages were higher than 94.8 %, and, in all cases, the number of estimated components was higher than the number of resolved peaks with reliable chromatographic shapes. However, this additional number of estimated components was useful to explain other possible experimental data variance sources such as background and derivatization signals. Examples of results obtained in the MCR-ALS analysis of the GCMS full scan data are given in Fig. 3 where the representation of resolved concentration and mass spectra profiles for different components are given.

Fig. 3
figure 3

Examples of elution profiles (left) and of corresponding mass spectra profiles (right) resolved by MCR-ALS analysis: a applied to one single control sample in the 5th time window for the salinity treatment, showing in different colors the 8 component resolved elution profiles in this time window (left), and the corresponding mass spectra profiles (right); b applied to the salinity treatment column-wise augmented data matrix showing the resolution of one of the components in the 11 simultaneously analyzed matrices, in the order 1st blank sample, 2nd–6th control samples and 7th–11th exposed samples. And c applied to the same salinity treatment column-wise augmented data matrix than in b but showing the resolution of the derivatizing species too (Color figure online)

In Fig. 3a, results from the application of MCR-ALS analysis on time window 5 for salinity treatment are shown. In this particular case, optimal results were obtained using eight components (in different colors), some of them with highly overlapped (embedded peaks) elution profiles (on the left) and with multiple MS spectra signals (on the right). On the left of Fig. 3b, the 11 elution profiles of one of the resolved components in the different samples (1 blank, 5 control and 5 exposed samples) are given. As it is clearly shown, control samples showed considerably higher peaks than exposed samples. On the right of Fig. 3b the mass spectrum of this resolved component is given. Mass spectrum of this component gives the fragmentation pattern of a metabolite assigned to Arginine [–NH3] (2TMS) derivative according to NIST 2014 MS Spectral Search program (www.nist.gov/srd/nist1a.htm). In Fig. 3c concentration and mass spectra profiles of one component related with the derivatizing agent, MSTFA (m/z = 73) is shown. Additional profiles related to derivatized sub-products are resolved giving peak shapes with an irregular chromatographic shape. All components of this type were finally discarded for their further metabolite identification. Only those MCR-ALS resolved components showing reasonable chromatographic peak shapes (like the one in Fig. 3b) were considered for their identification and linked to D. magna metabolome.

3.3 Tentative metabolite identification

Two strategies were implemented to investigate component profiles changing with the treatments. Prior to the application of these analyses, peak areas of the same MCR-ALS component profiles in every sample and for all components and treatments were arranged in a data table for every treatment. PLS-DA was applied to these three peak area tables of MCR-ALS resolved components in control and exposed samples for each treatment (10 salinity samples × 81 peak areas, 10 temperature samples × 77 peak areas, and 10 hypoxia samples × 70 peak areas). In the case of salinity a single latent variable was selected explaining 67.53 and 96.45 % of the X- and y-variances respectively. In the case of temperature, four latent variables were selected explaining 82.28 and 98.10 % of the X- and y-variances respectively, and in the case of hypoxia, four latent variables were selected to explain 73.07 and 98.33 % of the X- and y-variances respectively. In all cases, the number of latent variables used for the PLS-DA model were much lower than the number of samples used in the analyses.

PLS-DA VIP scores provided the most important variables (resolved peaks) (see Online Resource 1) that help to discriminate between control and exposed samples, with a threshold above 1. For salinity, temperature and hypoxia 50 of the 81 resolved peaks, 27 of the 77 resolved peaks and 24 of the 70 resolved peaks, respectively (Online Resource 2), were determined as potential biomarkers (metabolites with significant concentration changes). Peak areas of MCR-ALS resolved profiles of control and exposed samples were compared using Student’s t test, which confirmed that 66, 26 and 16 peaks were changing significantly (p < 0.05) for salinity, temperature and hypoxia, respectively (Online Resource 2). Finally, only the peaks chosen by both approaches were finally selected to discard false positives. Numbers of selected biomarkers determined for each treatment agreed well with sample discrimination already observed in PCA. As with PCA, there was a good separation of samples of the control vs exposed samples for salinity and temperature treatments and a poor separation for hypoxia treatment.

Tentative identification of metabolites for salinity, temperature and hypoxia treatments using resolved mass spectra and NIST library spectra matching are shown in Online Resource 3, 4 and 5, respectively. Most of the metabolites were identified with high RMF value in their evaluation in NIST library hits. Metabolite identifications for salinity, temperature and hypoxia treatments did not take into account that the derivatization with MSTFA produced different products from the same metabolite. For instance, Lysine (3TMS) and Lysine (4TMS) (Online Resource 3) were found as biomarkers for salinity treatment, both corresponding to the same metabolite (Lysine). Finally, a total number of 74 potential metabolites whose concentrations (peak areas) changed with the treatment were identified, 40, 21 and 13 metabolites for salinity, temperature and hypoxia treatments, respectively.

For the salinity treatment (Online Resource 3) most of the 50 metabolites detected as potential biomarkers were identified with RMF values higher than 715, which can be considered a relatively fair matching factor. For most metabolites the difference between the theoretical and the experimental RI, was lower than 148, except for the resolved peak 52. This peak 52 was identified as Galactose, and it had a difference between experimental and theoretical RI of 205. However, this (Galactose) peak had a very good RMF value (927) showing a very good mass spectral coincidence with the theoretical mass values. For the salinity treatment, most of the identified metabolites were amino acids, carbohydrates, organic acids and nucleosides. Most of these metabolites were down-regulated, except 6 metabolites peak areas, 14, 55, 65, 73, 75 and 76 which were up-regulated.

For the temperature treatment (Online Resource 4), the identification of the 25 metabolites was reached with RMF values higher than 747, considered as fair matches, and with differences between theoretical and experimental RI lower than 52. The theoretical RI value of the resolved peak 76 could not be calculated because of the n-alkane standard only permitted to adjust up to a value of RI of 3000, which corresponds to the alkane C30. Carbohydrates, amino acids and organic acids were the most commonly detected metabolites for changes in the temperature. Only two metabolites were up-regulated: 1,3-diaminopropane and putrescine.

Twelve of the 13 profiles with peak areas changing with the hypoxia treatment were identified (Online Resource 5). This identification of metabolites were reached with a difference of theoretical and experimental RI lower than 34, and with a RMF value higher than 826, considered as good matches. Carbohydrates, amino acids and organic acids were the most commonly detected metabolites. All biomarkers detected were down-regulated, except resolved peak number 17 which was up-regulated.

3.4 Metabolic pathways involved

After identifying potential biomarkers of salinity, temperature and hypoxia treatments, these metabolites were investigated in KEGG database to detect the possible affected metabolic pathways. In Fig. 4, the Venn diagram shows the metabolites identified, commons and unique, in each treatment. Salinity treatment had 24 unique metabolites identified, temperature 5, and hypoxia 6 unique metabolites. By comparing common metabolites for salinity and temperature, 11 metabolites were affected in both treatments. Temperature and hypoxia and, salinity and hypoxia had 2 metabolites in common in each pair of treatments.

Fig. 4
figure 4

Venn diagram of the metabolites detected in every treatment. Salinity factor is represented by 24, temperature by 5 and hypoxia by 6 unique metabolites, respectively. Salinity and temperature have 11 metabolites in common, whereas temperature and hypoxia, and salinity and hypoxia have only 2 metabolites coincident for each

In Table 2 specific enriched KEGG pathways affected by two or more metabolites were depicted. Metabolic pathways were sorted considering the percentage coverage of the pathway (the ratio between the altered metabolites and the total unique metabolites in the KEGG pathway database). The top three KEGG pathways represented in Table 2 coincided with three of the top five pathways described in D. magna exposed to Cd (Poynton et al. 2011). In this previous study, metabolomic changes were determined using different tissues and analytical methods: hemolymph samples and using an FT-ICR mass spectrometer, respectively. This may indicate that many of the obtained altered metabolites were unspecific and probably related with stress. The three tested treatments decreased the relative amount of two to three metabolites from the pyruvate metabolism and Cytrate or Krebs cycle, which are considered critical energy metabolic pathways under hypoxia or normoxia. Indeed individuals of D. magna, exposure to high salinity levels supress energy metabolic rates and oxygen demand (Arnér and Koivisto 1993). Increasing temperature raised up metabolic demands for oxygen in Daphnia, which causes hypoxia (Paul et al. 2004a; Paul et al. 2004b). D. magna is well adapted to hypoxia levels or anoxia, increasing its hemoglobin content and under severe conditions switching to anaerobic metabolism (Paul et al. 1998). This means that the three studied environmental stressors had in common several features related to hypoxia and as a result shared several metabolites and metabolomic pathways. Specific metabolomic effects of salinity included increased levels of glycerol and trehalose, which are related to cell osmoregulation (Diamant et al. 2001).

Table 2 Most disrupted KEGG metabolic pathways following exposures of D. magna to salinity, high temperature and hypoxia. Unique metabolites from Online Resources 3, 4 and 5 were included in the metabolic KEGG pathway analysis. Total number of unique metabolites in KEGG pathway database, N number of altered metabolites, % is percentage of pathway coverage (N/total)

4 Concluding remarks

Detection and identification of Daphnia magna metabolites whose concentrations suffered changes during the exposition to the three abiotic factors studied (salinity, temperature and hypoxia) was achieved. Changes on metabolite GCMS peak areas of controls and exposed samples were statistically assessed for their discrimination. 74 metabolites were identified as possible biomarkers of the studied effects. For salinity treatment 40 metabolites were identified, for temperature and hypoxia treaments 21 and 13 metabolites were identified, respectively. The three treatments shared effects in several metabolites related with pyruvate metabolism and Cytrate or Krebs cycle, which are linked to energy metabolic pathways under hypoxia. Apart from common metabolites altered, salinity treatment had impact in glycerol and trehalose, which are related to cell osmoregulation.

The methodology proposed in this work confirmed the usefulness of the combination of GCMS and multivariate data analysis tools for metabolomics studies of D. magna, exposed to mild stress of salinity, temperature and hypoxia. MCR-ALS analysis is a powerful tool to resolve directly the chromatographic signals (elution and spectra profiles) of most of the constituents (metabolites) present in complex biological samples, and differentiate them from the large number of GCMS interfering derivatizing signals.