Introduction

Plants provide humans with food, feed, bioenergy, and phytochemicals such as medicinal compounds and biorefinery. Since improvements in plant production are increasingly important for sustaining the human society, understanding the mechanism of plant production, that is, plant metabolic system, is an immediate necessity of plant science.

To understand the regulatory network of metabolism, we should view metabolism as a whole rather than focus on a specific metabolic pathway, enzyme, or gene. In terms of application study, genetic modification of a specific gene without understanding the whole regulatory mechanism often results in an unexpected metabolic change. To achieve expected results, for example, hyper-accumulation of a useful metabolite—we should consider that the plants’ responses possibly resulted from metabolic perturbation caused by genetic modification. After the sequencing of the Arabidopsis genome (Arabidopsis Genome Initiative 2000), it has become possible to obtain a bird’s eye view of its metabolism via omics such as transcriptomics and proteomics. In addition to omics data contributed by individual researchers, thousands of microarray data that were systematically acquired and deposited in the public domain (Craigon et al. 2004; Goda et al. 2008; Kilian et al. 2007; Schmid et al. 2005) have enabled us to conduct quantitative and statistical analyses in silico and generate data-driven hypothesis without conducting “wet” experiments on a given object. Good examples are found in coexpression analysis-based functional genomics of Arabidopsis. The functions of unknown genes can often be predicted by using genomic sequences (i.e., functional annotation) and coexpression patterns with known genes, even without biological knowledge on a given biological aspect (Saito et al. 2008). Thus, coexpression analysis utilizing a large-scale transcriptome dataset has become a powerful tool for functional genomics and has led to the release of many web-based applications for coexpression analysis in the past 5 years (Akiyama et al. 2008; Goda et al. 2008; Horan et al. 2008; Jen et al. 2006; Manfield et al. 2006; Mutwil et al. 2008; Obayashi et al. 2009; Rawat et al. 2008; Srinivasasainagendra et al. 2008; Steinhauser et al. 2004; Tokimatsu et al. 2005; Toufighi et al. 2005; Zimmermann et al. 2004).

Both metabolome and transcriptome data can be gold mines of information that provide clues to understanding life phenomena. Unfortunately, however, total throughput of metabolome analysis is far lower due to technical difficulties than that of transcriptome analysis in which commercial microarrays and analytical methodologies have almost been established. Metabolites, the targets of metabolomics, are chemically diverse, so no single analytical technique can detect and quantify all metabolites. The Golm Metabolome Database provides public access to custom mass spectral libraries, metabolite profiling experiments from gas chromatography–mass spectrometry (GC–MS) as well as additional information and tools (Kopka et al. 2005). PlantMetabolomic.org was recently developed to provide a portal for accessing metabolomics data generated by different analytical platforms [capillary electrophoresis–mass spectrometry and different types of GC–MS and liquid chromatography–mass spectrometry (LC–MS)] from multiple laboratories along with the key visualization tools (Bais et al. 2010). AtMetExpress development project, focusing on accumulation patterns of LC–MS-detectable secondary metabolites across 36 Arabidopsis tissues, has been just launched (Matsuda et al. 2010). However, there is no large-scale metabolome dataset currently available in the public domain that is comparable in size to the transcriptome dataset.

Our technical goal is to generate large-scale metabolome datasets comparable in size to the transcriptome dataset and to develop a bioinformatics methodology to mine them for novel findings. As is the case with transcriptome data, metabolome data obtained in various plant organs, at various developmental stages, and under various growth conditions will be informative and give clues to overview of metabolism. We recently established a novel methodology, widely targeted metabolomics, which can generate thousands of metabolic profile data for 734 compounds (as of February 2010) in a high-throughput manner (Sawada et al. 2009a). We have started to acquire metabolic profile data of publicly available large-scale bioresources. Earlier, we conducted targeted metabolite analysis of 2,656 Arabidopsis transposon-tagged mutants and 225 Arabidopsis accessions as a case study in order to make a smaller dataset of metabolite accumulation and learn how to analyze such a dataset. In this paper, we report the results obtained by the targeted metabolite analyses of the two Arabidopsis bioresources. We also introduce widely targeted metabolomics and its possible application.

Materials and methods

Plant materials used for large-scale metabolic profiling

Seeds of transposon-tagged mutants were analyzed for metabolic profiling. From the Ds transposon single-copy insertion lines that we previously established (Kuromori et al. 2006), we used 2,656 knockout mutants in which the Ds transposon was homozygously inserted into the coding regions. For each gene mutant, seeds were harvested from an F3 individual plant described in Kuromori et al. (2006). Briefly, F3 seeds were sown on agar-solidified MS medium containing 3% w/v sucrose and 20 μg mL−1 hygromycin. After stratification at 4°C for 1 week, plants were grown for 3 weeks under a 16-h light/8-h dark cycle at 22°C, transferred to soil and grown under the same conditions with application of 2,000-fold-diluted Hyponex (Hyponex Japan Co Ltd., Osaka, Japan) until seed maturity.

Seeds of 225 accessions of Arabidopsis were also analyzed for metabolic profiling. Seeds of each accession were sown on soil and germinated plants were grown in a greenhouse under natural light at 22°C with application of 2,000-fold-diluted Hyponex until flowering finished. Then seeds were desiccated on plants for 1.5–2 months, harvested, and then stored at 4°C and 20% humidity until use.

Transposon-tagged mutants and accessions used in this study are available at RIKEN BioResource Center (http://www.brc.riken.jp/inf/en/index.shtml) through the National Bio-Resoure project of the MEXT, Japan.

Construction of CAM9-overexpressing plants

The coding region of CAM9 (At3g51920) was amplified by polymerase chain reaction with forward (5′-CACCATGGCGGATGCTTTCACAGA-3′) and reverse (5′-CTAATAAGAGGCAGCAATCATC-3′) primers using a cDNA library prepared from 2-week-old rosette leaves of Arabidopsis Columbia-0 as a template. An entry vector CAM9cds-pENTR was prepared by using a pENTR/D-TOPO Cloning Kit (Life Technologies Japan Ltd., Tokyo, Japan) with the amplified CAM9 coding region (CAM9cds). A binary vector CAM9cds-pB2GW7 harboring the CAM9cds driven by cauliflower mosaic virus (CaMV) 35S RNA promoter was constructed by LR reaction under Gateway® technology with CAM9cds-pENTR and pB2GW7. A transposon-tagged line 13-3943-1 and its parental line Ds13 were transformed with CAM9cds-pB2GW7 by the floral dip method (Clough and Bent 1998) using Agrobacterium tumefaciens strain GV3101. The transformants were selected on agar-solidified 1/2 MS medium containing 50 μM glufosinate. Transgenic plants (T1) were transferred to soil and grown at 23°C under natural and fluorescent light (16-h light/8-h dark cycle) (Hirai et al. 2007). T2 seeds (segregating population) were used for metabolite analysis by means of widely targeted metabolomics (Sawada et al. 2009a).

High-throughput amino acid and glucosinolate analysis

A total of 200 seeds of each independent mutant and accession were homogenized using a mixer mill MM 200 (Retsch) in 80 μL of extraction buffer [40% acetonitrile in H2O with 25 μM hydroxyphenyl–glucosinolate (GSL) and 50 μM norleucine as internal standards]. The extracts were diluted with 500 μL of LC–MS grade H2O and centrifuged (1,000 g) for 5 min. The supernatants were filtered through a CAPTIVA 0.45 μm filter (Varian) and subjected to ultra performance liquid chromatography (UPLC) (Waters)-quadrupole MS analysis. Thirty-six metabolites (17 amino acids, 18 GSLs, and 1 flavonoid; Table S1) were separated on UPLC through a reverse phase column (50 × 2.1 mm, HSS T3 1.8 μm; Waters) (Table S1) and detected using ZQ mass spectrometers (Waters) with an electrospray ionization (ESI) interface (positive mode for amino acid analysis; negative mode for glucosinolates and flavonoid analysis; capillary voltage, +3.0 and −3.0 keV; cone voltage, +25 and −40 V; source temperature, 120°C; desolvation temperature, 350°C; cone gas flow, 50 L/h; desolvation gas flow, 600 L/h). MassLynx software version 4.0 (Waters) was used to control all instruments and calculate peak areas. The analytical error of this analysis using UPLC-quadrupole MS is less than 10% (data not shown); thus, we did not perform an analytical replicate in this study.

Data analysis using R

The peak area of metabolites was divided by that of the internal standards (norleucine for amino acids; hydroxyphenyl-GSL for glucosinolates and flavonoid) after missing values were replaced with 0. Spearman’s rank-correlation coefficient (SCC) was calculated by R (http://www.r-project.org). The log2-transformed data were used for drawing histograms by R.

Results

Analysis of amino acids and GSLs in seeds of transposon-tagged mutants and accessions

As a case study, we chose to clarify accumulation profiles of amino acids and GSLs as examples of primary and secondary metabolites, respectively. GSLs, a group of sulfur- and nitrogen-containing secondary metabolites found mainly in Brassicaceae plants, are synthesized from 1 of 8 amino acids including Met and Trp (Grubb and Abel 2006; Halkier and Gershenzon 2006). In Arabidopsis, the biosynthetic pathways of Met- and Trp-derived GSLs are well studied. In this study, we analyzed 17 protein amino acids, 15 Met-derived GSLs, 3 Trp-derived GSLs and 1 flavonoid (Table S1) in mature dry seeds of 2,656 transposon-tagged mutants and 225 Arabidopsis accessions by means of LC–MS-based high-throughput targeted metabolite analyses (see “Materials and methods”).

Transposon-tagged mutants have been established as public bioresources and are utilized for generating a phenome database in which visible phenotypic changes are described (Kuromori et al. 2004, 2006). These mutants originated from 1 of 5 parental transgenic lines (Ds11, Ds12, Ds13, Ds15, and Ds16) carrying the Ds element. In the process of establishing each mutant line, Ds has been transposed into a single locus on the genome and fixed homozygously in successive generations (Kuromori et al. 2004).

Arabidopsis accessions were collected from around the world by a coordinated effort of Arabidopsis researchers and are now maintained at the RIKEN BioResource Center (http://sassc.epd.brc.riken.jp/sassc/top.php?mode=general).

Metabolite accumulation data obtained from both bioresources are included in Tables S2 and S3. In this study, we further analyzed the data obtained from transposon-tagged lines.

Global profiles of amino acid and GSL accumulation in transposon-tagged mutants

Transposon-tagged genes covered all 23 main functional categories defined in our classification system, Arabidopsis Gene Classifier vol. 1.4 (http://kanaya.naist.jp/GeneClassifier/top.jsp?fn=arabidopsis) (Takahashi et al. 2009).

To understand the global tendency of metabolite accumulation patterns, we first analyzed the distribution of each metabolite’s content across all 2,656 transposon-tagged mutants (Fig. S1). In the case of amino acids, a single peak was observed, indicating that the amino acid content varied continuously. On the other hand, in the case of several GSLs such as MSc8, two (or three) peaks were observed. However, when the distribution was analyzed across each series of mutants derived from the same parental transgenic line, a single peak was observed (Fig. S1). The seeds of transposon-tagged mutants have been collected under almost the same growth conditions but in different growth chambers depending on their parental line; namely, the mutants that originated from some parent lines were grown in a growth chamber, whereas the mutants from the other parental lines were grown in another growth chamber. We consider that bi- or trimodal distribution of some of the GSL contents resulted from the slight difference in growth conditions under which the seeds were harvested since GSL content varies easily depending on growth conditions (data not shown). Another possibility is that the mutants have different features originated from the parental lines. Then, for further statistical analyses, the mutants that originated from different parental lines were treated separately.

To understand the trend of metabolite accumulation patterns, correlation between metabolite contents was analyzed by calculating Spearman’s correlation coefficient (SCC) (Fig. 1). As mentioned above, since some metabolites exhibited bimodal distribution depending on which parent the mutants were derived from, the mutants were classified into five subgroups according to their parental lines, and the correlation coefficient was separately calculated within each subgroup. Then five SCC values were averaged for each metabolite combination. Among 36 metabolites analyzed, the content of BOc4, 1MOI3M, and 4MOI3M (for abbreviations, see Table S1) was very low and often fell below detection limits. These metabolites were then omitted from further analyses and 528 combinations among 33 metabolites were analyzed. All SCC values were summarized in Table S4. The metabolite combinations that exhibited higher positive correlations, ranked in the top 5%, and had relative standard deviation (RSD) values of less than 20% are listed in Table 1.

Fig. 1
figure 1

Schematic diagram of the co-accumulation relationship between metabolite pairs. Horizontal and vertical axes indicate metabolite content. Each dot represents a mutant with a given genotype (this study) or an individual plant with the same genotype (Kusano et al. 2007; Weckwerth et al. 2004). The left and right panels indicate positive (SCC > 0) and negative (SCC < 0) correlations, respectively

Table 1 Metabolite combinations co-accumulated in seeds of transposon-tagged mutants

An example of mutants which exhibited hyper-accumulation of amino acids

Each transposon-tagged mutant we analyzed had a transposon in the coding region of a given gene. If a gene involved in metabolism was knocked out or knocked down by insertion of a transposon, the accumulation pattern of the amino acids and/or GSLs might be changed by direct or indirect effects. We analyzed 13-3943-1 as a model mutant that exhibited drastic changes in metabolite accumulation. This mutant exhibited hyper-accumulation of branched-chain amino acids (Leu, Ile, and Val) in seeds (Fig. 2). Hyper-accumulation of these amino acids was not observed in rosette leaves (data not shown). In this mutant, the transposon was inserted in At3g51920 (CALMODULIN 9, CAM9). To confirm that the metabolic change was caused by knocking out CAM9, 13-3943-1 (hereafter referred to as cam9) was complemented with the CAM9 gene driven by CaMV 35S RNA promoter. In complemented lines, the contents of Leu, Ile, and Val in seeds were recovered to similar levels to a parental plant, Ds13 (Fig. 2). In addition, when CAM9 was overexpressed in a parental line Ds13, accumulation of Leu, Ile, and Val slightly decreased (0.2- to 0.8-fold) (Fig. 2). These results indicated that knocking out CAM9 actually resulted in hyper-accumulation of these amino acids, suggesting that an unknown function of CAM9 is involved in amino acid accumulation.

Fig. 2
figure 2

Accumulation of branched-chain amino acids in seeds. The average contents of Leu, Ile, and Val in a parental line Ds13 (n = 4) were set to 1. Vertical axes indicate relative contents of these amino acids in a a knockout mutant cam9 (n = 5), b complementary lines (35S::CAM9/cam9), and c CAM9-overexpressing lines (35S::CAM9/Ds13). For b and c, amino acid contents in the seeds harvested from three independent T1 plants were separately analyzed and averaged

Discussion

Analysis of metabolic profiles in seeds of transposon-tagged mutants

In this study, we analyzed the accumulation profiles of 36 metabolites in seeds of 2,656 transposon-tagged lines and 225 Arabidopsis accessions. Our main purpose was to get an overview of the accumulation pattern of amino acids and GSLs as the first step to large-scale metabolome analysis, which is followed by statistical analysis and mathematical modeling to understand metabolism as a system. We selected mature seeds as the first target of metabolite analysis since the seed is the most important sink of metabolites as nutrient reservoir for the next generation to grow during infancy. In addition, the seed is a major organ in terms of GSL accumulation.

Although metabolite accumulation is genetically regulated, there is fluctuation in the absolute content of each metabolite among individual plants of the same genotype, even under the same growth conditions. However, when paying attention to specific metabolite pairs, positive or negative correlation between their contents can often be found in the data obtained from a number of individual plants grown under the same growth conditions (Kusano et al. 2007; Weckwerth et al. 2004) (Fig. 1, each dot represents an individual plant with the same genotype). The correlation relationship between a given metabolite pair could change in different genotypes (Kusano et al. 2007; Weckwerth et al. 2004). For example, a positive correlation between the contents of malate and fumarate in leaves was observed in wild-type and transparent testa 4 (tt4) mutant of Arabidopsis but not in methionine over-accumulation 1 (mto1) mutant (Kusano et al. 2007). In this study, we observed a correlation relationship between metabolite pairs in seeds conserved throughout genotypes (Fig. 1). In addition, although the maternal plants were grown under almost the same growth conditions, the seeds analyzed were not always harvested side by side at the same time. Therefore, the correlation relationship observed in our analysis was conserved irrespective of changes in the microenvironment.

Higher positive correlation was observed between amino acids and between Met-GSLs (Table S4). Co-accumulation of Met-GSLs could be explained by the fact that most of the genes involved in Met-GSLs biosynthesis are coordinately regulated by specific transcription factors, Myb28, Myb29, and Myb76 (Beekwilder et al. 2008; Gigolashvili et al. 2007a, b; Hirai et al. 2007; Malitsky et al. 2008; Sønderby et al. 2007). On the other hand, co-accumulation of amino acids cannot be simply explained from the viewpoint of metabolic pathway, i.e., metabolite pairs that were not directly connected in the metabolic pathway (e.g., His and Tyr) exhibited a co-accumulation relationship. We then further analyzed our data focusing on amino acids. Figure S3 shows the SCC values for all amino acid pairs. Co-accumulation relationships are schematically indicated in Fig. S4. These data suggest that accumulations of Val, Ile, Leu, Arg, Lys, Tyr, and His are positively correlated to each other while accumulations of Met, Asp, Ala, Trp, and Glu are independent of those of other amino acids in these series of mutants. In our previous integration analysis of transcriptome and metabolome data obtained from nutrient-starved Arabidopsis, we observed that changes in amino acid accumulation were not simply explained by the changes in expression of their biosynthetic genes (Hirai et al. 2004). Co-accumulation of some amino acids observed in this study might be achieved by post-translational regulation such as feedback inhibition of biosynthetic enzymes and/or other mechanisms in which unknown genes are involved. Given that some genes with unknown functions play a role in maintaining metabolic balance, knocking out the gene would result in metabolic imbalance. In this study, we found many outliers in which some metabolites accumulated to unexpected levels judging from the frequency distribution of the metabolite. One example is a mutation in CAM9, which resulted in hyper-accumulation of branched-chain amino acids in seeds only. CAM9 encodes a divergent member of calmodulin, an EF-hand family of Ca2+-binding proteins. Recently, it was reported that the lines possessing mutations in CAM9 presented a hypersensitive response to ABA and enhanced tolerance to salt stress and water deficits (Magnan et al. 2008). Our result suggested an as-yet-unknown relationship between amino acid metabolism and ABA-mediated stress responses. Thus, this study can lead to discovery of genes that play important roles in metabolic homeostasis.

Our current technology and future perspective

MS- and NMR-based metabolomics have been developed over the past decade in order to establish non-targeted analysis procedures (Werner et al. 2008a, b). Due to the diverse nature of the chemical and physical properties of these metabolites, metabolome analysis can only be achieved through the integration of the obtained data using various techniques. Although non-targeted metabolomics enables us to identify the broad metabolite profiles of samples and to find novel metabolites that can be used as biomarkers (Glinski and Weckwerth 2006; Saito and Matsuda 2010), there are still a lot of difficulties to be settled. For example, it is technically difficult and extremely time-consuming to merge all of the obtained data in different formats by different instruments and to identify the unknown metabolites (Werner et al. 2008b). Hence, the application of non-targeted metabolomics to thousands of biological samples is not yet practical. This is the reason why we need an alternative methodology for targeted but high-throughput metabolomics.

We recently established a highly selective and sensitive procedure that utilizes the multiple reaction monitoring (MRM) mode of UPLC-quadrupole MS. MRM enables high sensitivity, ease of reproducibility, and a broad dynamic range of analysis (Unwin et al. 2005). Briefly, we first optimized the following analytical conditions for each of the 734 authentic compounds (as of February 2010) (Sawada et al. 2009a): retention time (RT) of UPLC separation and combination of polarity, precursor ion mass, fragment (product) ion mass, cone voltage and collision voltage for MS/MS detection, which we call MRM condition. As more than one compound occasionally gave the same optimal condition, we finally obtained 574 sets of RT and MRM conditions (RT–MRM sets) for 734 compounds (as of February 2010). For metabolic profiling of the plant samples, we measured areas of the peaks observed using the abovementioned 574 RT–MRM sets. In the case of Arabidopsis leaves, we observed that 217 out of 574 RT–MRM sets reproducibly gave detectable peaks (Table S5). This analysis, which we call widely targeted metabolomics, was successfully applied to functional genomics of Arabidopsis by analyzing metabolic profiles in knockout mutants of genes involved in GSL biosynthesis (Sawada et al. 2009b, c).

Owing to high-throughput (e.g., 600 Arabidopsis metabolic profiles per week), this methodology has broadened the application field of metabolomics. We are analyzing various kinds of bioresources such as knockout lines, accessions of Arabidopsis, and wild species of crop plants as well as field-grown plants from agronomic and ecological viewpoints (to be published elsewhere). Acquired large-scale datasets would be novel gold mines of biological findings. Our next challenge is to develop a novel strategy for analysis of metabolome data, i.e., a novel tool for treasure hunting.