Abstract
Data from functional trait databases have been increasingly used to address questions related to plant diversity and trait-environment relationships. However, such databases provide intraspecific data that combine individual records obtained from distinct populations at different sites and, hence, environmental conditions. This prevents distinguishing sources of variation (e.g., genetic-based variation vs. phenotypic plasticity), a necessary condition to test for adaptive processes and other determinants of plant phenotypic diversity. Consequently, individual traits measured under common growing conditions and encompassing within-species variation across the occupied geographic range have the potential to leverage trait databases with valuable data for functional and evolutionary ecology. Here, we recorded 16 functional traits and leaf hyperspectral reflectance (NIRS) data for 721 widely distributed Arabidopsis thaliana natural accessions grown in a common garden experiment. These data records, together with meteorological variables obtained during the experiment, were assembled to create the AraDiv dataset. AraDiv is a comprehensive dataset of A. thaliana’s intraspecific variability that can be explored to address questions at the interface of genetics and ecology.
Similar content being viewed by others
Background & Summary
Functional ecology has long used comparative approaches across species and environments to identify general patterns of trait variation linked to individual performance (functional traits1). In this quest for general trait patterns, the development of large trait databases, such as the TRY Plant Trait Database2, has become an important ally. Notably, it has allowed the demonstration that plant worldwide phenotypic variation and ecological strategies are better explained by multidimensional spaces summarized by both vegetative (e.g., specific leaf area) and reproductive (e.g., seed mass) traits3. Some lines of evidence, such as trait-environment relationships4, have suggested that the trait syndromes that underlie these multidimensional spaces are driven by adaptive processes. However, such an adaptive hypothesis still needs to be directly tested. For that, intraspecific trait data are valuable, but still lacking in functional trait databases5. Though these databases do present multiple records for a given species, such records come from distinct populations at different sites and, hence, environmental conditions. As a consequence, confounding effects prevent the distinction of sources of trait variation, which is a necessary condition for the study of evolutionary processes.
In order to deal with confounding effects and produce robust intraspecific trait data, experimental approaches like common gardening have been developed6. Common garden experiments consist of growing individuals from different populations under common conditions to assess the genetic basis of traits while controlling for the effects of phenotypic plasticity6,7. Another condition for a robust estimation of intraspecific variation is to assess a representative range of variation through the study of contrasted populations in terms of environmental conditions of origin5,8. Accordingly, more functional trait data both measured under common conditions and covering populations across the species’ geographic range are needed. Though logistically challenging, measuring intraspecific variation as described above within networks of standardized common garden experiments in different geographic zones can be a compelling perspective9.
Additional initiatives to make intraspecific trait data even more powerful to address questions at the interface of ecology and evolution is to link them with whole-genome data. Notably, this has been done with natural populations of Arabidopsis thaliana, Medicago truncatula, and Populus trichocarpa, allowing the study of genomic signatures of adaptation10. To promote more studies of adaptation, compilations of phenotypic data have been developed for genotypes whose sequencing had been previously made available to the scientific community (e.g., AraPheno11, PHENOPSIS DB12). Though such initiatives are valuable, they still largely lack key traits that functional ecology studies endorse as those capturing plant form and function (such as the vegetative and reproductive traits previously mentioned). Therefore, intraspecific information in these databases is likewise limited in its applicability to test hypotheses about the evolutionary determinants of plant trait syndromes and ecological strategies.
Leveraging trait databases with intraspecific trait data faces the major challenge of large-scale cross-population phenotyping. In recent years, the development of high throughput phenotyping methods, such as near-infrared spectroscopy (NIRS), has made phenotyping faster and easier, promoting intraspecific trait analyses13. Notably, commonly investigated functional traits underlying resource economics have been shown to be well predicted from the near-infrared reflectance value of leaves in A. thaliana14. Moreover, reflectance spectra have been applied to predict various chemical traits according to specific interests (e.g., secondary metabolites15 and phytohormones16) and they have also started to be used directly as phenotypic dimensions13. This last option can be interesting when lacking empirical data for trait prediction and when there is a special interest in analyzing as many phenotypic dimensions as possible, even those whose biological meaning remains unknown13. For instance, phenotype-blind approaches in plant biology for breeding selection have recently benefited from hyperspectral data17,18.
Here, we present the AraDiv dataset, which provides phenotypic and leaf hyperspectral reflectance (NIRS) data for 721 widely distributed A. thaliana accessions (Fig. 1). Phenotypic data include vegetative, phenological, and reproductive functional traits, constituting a comprehensive dataset of A. thaliana’s intraspecific phenotypic diversity. Phenotyped accessions were grown in a common garden (Fig. 2a,b), and meteorological data recorded during the experiment were also included in AraDiv. By phenotypically analyzing a subset of the accessions that are genetically and geographically (GPS coordinates) described in the 1001 Genomes Project19 dataset (http://1001genomes.org/), we intend to complement information about A. thaliana’s intraspecific diversity with a large amount of functional trait data obtained from plants grown under common conditions. This ultimately aims at fostering studies at the interface of genetics and ecology that link phenotypic, genetic, and environmental data to understand plant adaptation.
Methods
Experimental setup
A completely randomized common garden experiment was conducted between February and July 2021 under shade cloth in the experimental field of Centre d’Ecologie Fonctionnelle et Evolutive (CEFE), Montpellier, France (Fig. 2a,b). Pots of 0.08 L were filled with a sterilized soil mixture composed of 50% river sand, 37.5% calcareous clay soil from the experimental field at CEFE, and 12.5% blond peat moss. Seeds used in the experiment were distributed by NASC (https://arabidopsis.info/). Before sowing, we selected a total of 730 accessions from the 1001 Genomes Project19 (http://1001genomes.org/), based on Exposito-Alonso’s et al. list of accessions for maximizing the geographic and genetic coverage of A. thaliana20. At the beginning of February, all 730 accessions were sown in triplicates for harvest at flowering, and one replicate per accession was attributed to one of three blocks. At the end of February, a subset of 529 accessions were sown in three other replicates for harvest at fruit maturation, and one replicate per accession was attributed to one of three other blocks. Supplementary Table 1 shows the list of the 713 accessions harvested at flowering and 505 accessions harvested at fruit maturation, in a total of 721 accessions that germinated and reached the phenological stage of harvest. Experimental blocks were placed on growing benches adapted with a subirrigation system, and plants were irrigated three times per week until the end of the experiment.
Phenotypic data
The recorded phenotypic data correspond to 16 functional traits (Table 1). Eight vegetative traits recorded at the beginning of flowering (first flower at anthesis), two phenological traits related to flowering, and six reproductive traits recorded at fruit maturation (first mature fruit). Common phenological stages were chosen for harvest since A. thaliana’s functional traits are known to vary across ontogeny21. The traits that were measured are likely to inform us about plant strategies both at the leaf and the whole-plant level, analogous to cross-species observations on plant strategies3.
After pots had been subirrigated for at least two hours, to promote leaf and whole-plant rehydration22,23, we recorded leaf fresh mass and leaf area for one leaf per plant (leaf selection followed methods described in Pérez-Harguindeguy et al.23). For that, the selected leaf was cut and right after weighed and photographed. One-sided projected leaf area (Fig. 2c) was measured through image analysis using ImageJ 1.53k software24. We recorded leaf and whole-plant dry mass after drying the samples at 60 °C for at least 72 hours. Leaf nitrogen content per leaf dry mass (LNC) was predicted using near infrared spectroscopy (NIRS, see below) and the predictive model developed by Vasseur et al.14 (Fig. 3a,b). To check NIRS prediction accuracy (see Technical validation), LNC was measured on a subset of 403 dried leaf samples (weighing 0.1 to 1 mg) using an elemental analyzer (Vario-PYROcube, Elementar, UK). Only predicted LNC values were incorporated into the phenotypic data record of the AraDiv dataset. Leaf thickness was estimated through the inverse of the product of specific leaf area (leaf area per leaf dry mass, m2 kg−1) and leaf dry matter content (leaf dry mass per leaf fresh mass, mg g−1)25. We calculated days to flowering from the sowing day until the first flower at anthesis. Days to flowering were also expressed in growing degree days (GDD), that is, the daily cumulation of Celsius degrees (°C) from sowing until the flowering date. Daily GDD was computed as \({\rm{GDD}}=\frac{Tmax+Tmin}{2}-{\rm{Tb}}\), where Tmax is maximum temperature, Tmin is minimum temperature, and Tb is base temperature, which was considered as 4 °C, the temperature usually used for cold acclimation and vernalization of A. thaliana26,27. Tmax and Tmin were obtained from meteorological data collected in the experimental field at CEFE (see below). For the measurement of reproductive traits, we cut inflorescence stems at their base and photographed them (Fig. 2c). Fruit (i.e., silique) length and number, inflorescence length, and secondary branch number were subsequently recorded using a macro, adapted from Vasseur et al.28, in ImageJ 1.53k software24. Fruit length was measured as the mean length of three randomly selected mature fruits per plant. Fertility was estimated through the product of mean fruit length and fruit number29,30. For measuring seed dry mass, inflorescence stems were dried at ambient temperature after harvest, and around 30 seeds per plant were weighed for an estimation of individual seed mass.
NIRS spectral data
NIRS screening was performed using the ASD LabSpec 4 Hi-Res Analytical Instrument (Malvern Panalytical Ltd., UK). We screened rehydrated leaves right after they had been cut and leaf fresh mass and leaf area had been measured. The leaf spectrum of light reflectance was recorded for the spectral region 350–2500 nm (Fig. 3a), using a contact probe with a 10 mm spot size. The leaf midrib could not be avoided during NIRS screening due to a generally small leaf size. Only leaves that were big enough to be completely covered by the NIRS probe were screened. Acquired spectra are provided as a data record (see Data records).
Meteorological data
Meteorological data were recorded using a weather station Davis Vantage Pro2 (Davis Instruments Corporation, USA) installed in the experimental field at CEFE. Ten meteorological variables (Table 2) were recorded every 30 minutes to derive daily means for the period of the experiment, i.e. from February to July 2021.
Data Records
The AraDiv dataset is composed of three data records, which have been deposited in the data.InDoRES repository (https://doi.org/10.48579/PRO/SW1OQD)31 and are described below.
Phenotypic data record
Phenotypic data are organized following the ecological trait‐data standard proposed by Schneider et al.32. Accordingly, data were recorded in a tidy format with the following 12 columns:
-
1.
scientificName: a taxonomic reference to A. thaliana that follows the standard developed by the National Inventory of the Natural Heritage (https://inpn.mnhn.fr/accueil/index?lg=en).
-
2.
X1001g_ID: accession number from the 1001 Genomes Project19 dataset (http://1001genomes.org/).
-
3.
verbatimOccurrenceID: user‐specific identifier defining individual plants. Letters “P” and “A” mark individuals harvested at flowering and at fruit maturation, respectively.
-
4.
verbatimBlockID: user‐specific identifier defining experimental blocks. Letters “P” and “A” mark blocks dedicated to plants harvested at flowering and at fruit maturation, respectively.
-
5.
DateSowing: sowing date.
-
6.
DateHarvest: harvest date.
-
7.
HerbivoryIndex: qualitative herbivory index, ranging from zero to five, defined for the technical validation of reproductive traits (see Technical validation).
-
8.
traitName: trait names according to plant ontologies, namely the Thesaurus of Plant characteristics (TOP33), the Plant Trait Ontology (TO34), and the Crop Ontology (CO34,35).
-
9.
traitValue: trait values.
-
10.
traitUnit: trait units.
-
11.
traitID: Uniform Resource Identifier (URI) for applied trait ontology.
-
12.
taxonID: URI for taxon identification.
NIRS spectral data record
NIRS spectral data file comprises 2,152 columns. The first column is equivalent to the “verbatimOccurrenceID” column previously described, and remaining columns comprise leaf hyperspectral reflectance values in spectral regions from 350 to 2500 nm.
Meteorological data record
Meteorological data were recorded in a tidy format with the following four columns:
-
1.
Date: date from 1st February to 31st July 2021.
-
2.
varName: meteorological variable names.
-
3.
varValue: meteorological variable values.
-
4.
varUnit: meteorological variable units.
Technical Validation
A completely randomized design was adopted in the common garden experiment to minimize confounding effects related to the placement of the pots.
Phenotypic data
Functional trait measures followed standardized procedures described in Pérez-Harguindeguy et al.23, helping to reduce measurement variability. For dealing with measurement errors, we verified each trait for unrealistic extreme values by analyzing deviation from the mean. Values outside a range of µ ± 6σ (µ, mean; σ, standard deviation) were used to filter samples out (N = 2). Because outliers for reproductive traits were often associated with intense herbivory damage, we defined a qualitative herbivory index, ranging from zero to five, to each photographed plant according to the number of herbivores and their traces (e.g., damaged inflorescence stem and/or fruits) that could be detected in the images. Herbivory index for individual plants can be found in the phenotypic data record of the AraDiv dataset and may be used for filtering out possibly biased samples with high herbivory index (i.e., 35 samples with 4–5 herbivory index). The intraspecific phenotypic variation assessed in this study is shown in Fig. 4.
NIRS prediction for LNC
NIRS prediction accuracy was verified by comparing observed and predicted LNC values using Vasseur’s et al.14 model (Fig. 3b). Prediction performances were estimated through the coefficient of determination (R²) and the root mean square error (RMSE, %).
Code availability
No custom code has been used during the generation and processing of this dataset.
References
Violle, C. et al. Let the concept of trait be functional! Oikos 116, 882–892 (2007).
Kattge, J. et al. TRY plant trait database - enhanced coverage and open access. Glob. Change Biol. 26, 119–188 (2020).
Díaz, S. et al. The global spectrum of plant form and function. Nature 529, 167–171 (2016).
Joswig, J. S. et al. Climatic and soil factors explain the two-dimensional spectrum of global plant trait variation. Nat. Ecol. Evol. 6, 36–50 (2022).
Westerband, A. C., Funk, J. L. & Barton, K. E. Intraspecific trait variation in plants: a renewed focus on its role in ecological processes. Ann. Bot. 127, 397–410 (2021).
de Villemereuil, P., Gaggiotti, O. E., Mouterde, M. & Till-Bottraud, I. Common garden experiments in the genomic era: new perspectives and opportunities. Heredity 116, 249–254 (2016).
Sultan, S. E. Organism and Environment: Ecological Development, Niche Construction, and Adaptation (Oxford University Press, 2015).
Albert, C. H., Grassein, F., Schurr, F. M., Vieilledent, G. & Violle, C. When and how should intraspecific variability be considered in trait-based plant ecology? Perspect. Plant Ecol. Evol. Syst. 13, 217–225 (2011).
Schwinning, S., Lortie, C. J., Esque, T. C. & DeFalco, L. A. What common-garden experiments tell us about climate responses in plants. J. Ecol. 110, 986–996 (2022).
Bamba, M., Kawaguchi, Y. W. & Tsuchimatsu, T. Plant adaptation and speciation studied by population genomic approaches. Dev. Growth Diff. 61, 12–24 (2019).
Seren, Ü. et al. AraPheno: a public database for Arabidopsis thaliana phenotypes. Nucleic Acids Res. 45, D1054–D1059 (2017).
Fabre, J. et al. PHENOPSIS DB: an information system for Arabidopsis thaliana phenotypic data in an environmental context. BMC Plant Biol. 11, 77 (2011).
Kothari, S. & Schweiger, A. K. Plant spectra as integrative measures of plant phenotypes. J. Ecol. 110, 2536–2554 (2022).
Vasseur, F. et al. A perspective on plant phenomics: coupling deep learning and near-infrared spectroscopy. Front. Plant Sci. 13, 836488 (2022).
Couture, J. J. et al. Spectroscopic determination of ecologically relevant plant secondary metabolites. Methods Ecol. Evol. 7, 1402–1412 (2016).
Burnett, A. C., Serbin, S. P., Davidson, K. J., Ely, K. S. & Rogers, A. Detection of the metabolic response to drought stress using hyperspectral reflectance. J. Exp. Bot. 72, 6474–6489 (2021).
Rincent, R. et al. Phenomic selection is a low-cost and high-throughput method based on indirect predictions: proof of concept on wheat and poplar. G3-Genes Genomes Genet. 8, 3961–3972 (2018).
Robert, P., Brault, C., Rincent, R. & Segura, V. in Genomic Prediction of Complex Traits. Methods in Molecular Biology Vol. 2467 (eds Ahmadi, N. & Bartholomé, J.) 397–420 (Humana Press, 2022).
Alonso-Blanco, C. et al. 1,135 genomes reveal the global pattern of polymorphism in Arabidopsis thaliana. Cell 166, 481–491 (2016).
Exposito-Alonso, M. et al. Genomic basis and evolutionary potential for extreme drought adaptation in Arabidopsis thaliana. Nat. Ecol. Evol. 2, 352–358 (2018).
Vasseur, F. et al. Climate as a driver of adaptive variations in ecological strategies in Arabidopsis thaliana. Ann. Bot. 122, 935–945 (2018).
Garnier, E., Shipley, B., Roumet, C. & Laurent, G. A standardized protocol for the determination of specific leaf area and leaf dry matter content. Funct. Ecol. 15, 688–695 (2001).
Pérez-Harguindeguy, N. et al. Corrigendum to: new handbook for standardised measurement of plant functional traits worldwide. Aust. J. Bot. 64, 715–716 (2016).
Schneider, C. A., Rasband, W. S. & Eliceiri, K. W. NIH image to ImageJ: 25 years of image analysis. Nat. Methods 9, 671–675 (2012).
Vile, D. et al. Specific leaf area and dry matter content estimate thickness in laminar leaves. Ann. Bot. 96, 1129–1136 (2005).
Gilmour, S. J., Hajela, R. K. & Thomashow, M. F. Cold acclimation in Arabidopsis thaliana. Plant Physiol. 87, 745–750 (1988).
Weigel, D. & Glazebrook, J. Arabidopsis: A Laboratory Manual (Cold Spring Harbor Laboratory Press, 2002).
Vasseur, F., Bresson, J., Wang, G., Schwab, R. & Weigel, D. Image-based methods for phenotyping growth dynamics and fitness components in Arabidopsis thaliana. Plant Methods 14, 63 (2018).
Roux, F., Gasquez, J. & Reboud, X. The dominance of the herbicide resistance cost in several Arabidopsis thaliana mutant lines. Genetics 166, 449–460 (2004).
Wilczek, A. M., Cooper, M. D., Korves, T. M. & Schmitt, J. Lagging adaptation to warming climate in Arabidopsis thaliana. Proc. Natl. Acad. Sci. USA 111, 7906–7913 (2014).
Przybylska, M. S. et al. AraDiv: a dataset of functional traits and leaf hyperspectral reflectance of Arabidopsis thaliana. data.InDoRES https://doi.org/10.48579/PRO/SW1OQD (2023).
Schneider, F. D. et al. Towards an ecological trait-data standard. Methods Ecol. Evol. 10, 2006–2019 (2019).
Garnier, E. et al. Towards a thesaurus of plant characteristics: an ecological contribution. J. Ecol. 105, 298–309 (2017).
Cooper, L. et al. The Planteome database: an integrated resource for reference ontologies, plant genomics and phenomics. Nucleic Acids Res. 46, D1168–D1180 (2018).
Shrestha, R. et al. Bridging the phenotypic and genetic data useful for integrated breeding through a data annotation using the crop ontology developed by the crop communities of practice. Front. Physiol. 3, 326 (2012).
Acknowledgements
This work was supported by Agence National de la Recherche (ANR; www.anr.fr) and Deutsche Forschungsgemeinschaft (DFG; www.dfg.de) funding as part of the AraBreed project (Grants ANR-17-CE02-0018-01 and SCHE 1899/2-1). C.V. was supported by the European Research Council (ERC) Starting Grant Project “Ecophysiological and biophysical constraints on domestication in crop plants” (Grant ERC-StG-2014- 639706-CONSTRAINTS). F.V. was supported by the European Research Council (ERC) Starting Grant Project “PHENOVIGOUR” (Grant ERC-StG-2020-949843). We are grateful to the “Terrain d’expériences” platform at CEFE, namely to Thierry Mathieu, David Degueldre, Pauline Durbin, Fabien Lopez, Pierrick Aury, and Jean-Marc Donnay, for supporting the conception and execution of the experiment, and for collecting and providing the meteorological data presented in this study. We are grateful to Thibaut Perez for LNC measurement (AQUI platform, Montpellier), and to Robin Latapie, Annick Lucas, Rahma Kazi-Tani, Maeva Tremblay, Ana Elkaim, Damien de La Faye, Lucie Mahaut, Marie-Charlotte Bopp, Alvaro Delgado Tenllado, Elodie Certenais, Thibault Martino, Benoit Berthet, Léo Delalandre, and Alexis Bediee for help during the experiment.
Author information
Authors and Affiliations
Contributions
M.S.P., C.V., D.V., J.F.S. and F.V. conceived the study in collaboration with X.L.R. M.S.P. and F.V. coordinated, and M.S.P., L.S.-M. and P.M. carried out the common garden experiment. M.S.P., C.V., D.V., L.P., L.S.-M., M.L., M.V., P.M. and F.V. collected phenotypic data. G.B., L.R. and D.C. analyzed NIRS data, and D.C. wrote about NIRS analysis. B.L. performed LNC quantification and wrote about it. M.S.P. wrote the manuscript, produced figures and tables, organized, formatted, and published the AraDiv dataset in data.InDoRES. D.C. produced Fig. 3 and NIRS data record. All authors read and improved the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Przybylska, M.S., Violle, C., Vile, D. et al. AraDiv: a dataset of functional traits and leaf hyperspectral reflectance of Arabidopsis thaliana. Sci Data 10, 314 (2023). https://doi.org/10.1038/s41597-023-02189-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-023-02189-w
- Springer Nature Limited