Background & Summary

Functional ecology has long used comparative approaches across species and environments to identify general patterns of trait variation linked to individual performance (functional traits1). In this quest for general trait patterns, the development of large trait databases, such as the TRY Plant Trait Database2, has become an important ally. Notably, it has allowed the demonstration that plant worldwide phenotypic variation and ecological strategies are better explained by multidimensional spaces summarized by both vegetative (e.g., specific leaf area) and reproductive (e.g., seed mass) traits3. Some lines of evidence, such as trait-environment relationships4, have suggested that the trait syndromes that underlie these multidimensional spaces are driven by adaptive processes. However, such an adaptive hypothesis still needs to be directly tested. For that, intraspecific trait data are valuable, but still lacking in functional trait databases5. Though these databases do present multiple records for a given species, such records come from distinct populations at different sites and, hence, environmental conditions. As a consequence, confounding effects prevent the distinction of sources of trait variation, which is a necessary condition for the study of evolutionary processes.

In order to deal with confounding effects and produce robust intraspecific trait data, experimental approaches like common gardening have been developed6. Common garden experiments consist of growing individuals from different populations under common conditions to assess the genetic basis of traits while controlling for the effects of phenotypic plasticity6,7. Another condition for a robust estimation of intraspecific variation is to assess a representative range of variation through the study of contrasted populations in terms of environmental conditions of origin5,8. Accordingly, more functional trait data both measured under common conditions and covering populations across the species’ geographic range are needed. Though logistically challenging, measuring intraspecific variation as described above within networks of standardized common garden experiments in different geographic zones can be a compelling perspective9.

Additional initiatives to make intraspecific trait data even more powerful to address questions at the interface of ecology and evolution is to link them with whole-genome data. Notably, this has been done with natural populations of Arabidopsis thaliana, Medicago truncatula, and Populus trichocarpa, allowing the study of genomic signatures of adaptation10. To promote more studies of adaptation, compilations of phenotypic data have been developed for genotypes whose sequencing had been previously made available to the scientific community (e.g., AraPheno11, PHENOPSIS DB12). Though such initiatives are valuable, they still largely lack key traits that functional ecology studies endorse as those capturing plant form and function (such as the vegetative and reproductive traits previously mentioned). Therefore, intraspecific information in these databases is likewise limited in its applicability to test hypotheses about the evolutionary determinants of plant trait syndromes and ecological strategies.

Leveraging trait databases with intraspecific trait data faces the major challenge of large-scale cross-population phenotyping. In recent years, the development of high throughput phenotyping methods, such as near-infrared spectroscopy (NIRS), has made phenotyping faster and easier, promoting intraspecific trait analyses13. Notably, commonly investigated functional traits underlying resource economics have been shown to be well predicted from the near-infrared reflectance value of leaves in A. thaliana14. Moreover, reflectance spectra have been applied to predict various chemical traits according to specific interests (e.g., secondary metabolites15 and phytohormones16) and they have also started to be used directly as phenotypic dimensions13. This last option can be interesting when lacking empirical data for trait prediction and when there is a special interest in analyzing as many phenotypic dimensions as possible, even those whose biological meaning remains unknown13. For instance, phenotype-blind approaches in plant biology for breeding selection have recently benefited from hyperspectral data17,18.

Here, we present the AraDiv dataset, which provides phenotypic and leaf hyperspectral reflectance (NIRS) data for 721 widely distributed A. thaliana accessions (Fig. 1). Phenotypic data include vegetative, phenological, and reproductive functional traits, constituting a comprehensive dataset of A. thaliana’s intraspecific phenotypic diversity. Phenotyped accessions were grown in a common garden (Fig. 2a,b), and meteorological data recorded during the experiment were also included in AraDiv. By phenotypically analyzing a subset of the accessions that are genetically and geographically (GPS coordinates) described in the 1001 Genomes Project19 dataset (http://1001genomes.org/), we intend to complement information about A. thaliana’s intraspecific diversity with a large amount of functional trait data obtained from plants grown under common conditions. This ultimately aims at fostering studies at the interface of genetics and ecology that link phenotypic, genetic, and environmental data to understand plant adaptation.

Fig. 1
figure 1

Origin of the natural accessions (red dots) assessed in this study.

Fig. 2
figure 2

Photographs of the common garden experiment (a,b) and example of images used for trait analyses (c).

Methods

Experimental setup

A completely randomized common garden experiment was conducted between February and July 2021 under shade cloth in the experimental field of Centre d’Ecologie Fonctionnelle et Evolutive (CEFE), Montpellier, France (Fig. 2a,b). Pots of 0.08 L were filled with a sterilized soil mixture composed of 50% river sand, 37.5% calcareous clay soil from the experimental field at CEFE, and 12.5% blond peat moss. Seeds used in the experiment were distributed by NASC (https://arabidopsis.info/). Before sowing, we selected a total of 730 accessions from the 1001 Genomes Project19 (http://1001genomes.org/), based on Exposito-Alonso’s et al. list of accessions for maximizing the geographic and genetic coverage of A. thaliana20. At the beginning of February, all 730 accessions were sown in triplicates for harvest at flowering, and one replicate per accession was attributed to one of three blocks. At the end of February, a subset of 529 accessions were sown in three other replicates for harvest at fruit maturation, and one replicate per accession was attributed to one of three other blocks. Supplementary Table 1 shows the list of the 713 accessions harvested at flowering and 505 accessions harvested at fruit maturation, in a total of 721 accessions that germinated and reached the phenological stage of harvest. Experimental blocks were placed on growing benches adapted with a subirrigation system, and plants were irrigated three times per week until the end of the experiment.

Phenotypic data

The recorded phenotypic data correspond to 16 functional traits (Table 1). Eight vegetative traits recorded at the beginning of flowering (first flower at anthesis), two phenological traits related to flowering, and six reproductive traits recorded at fruit maturation (first mature fruit). Common phenological stages were chosen for harvest since A. thaliana’s functional traits are known to vary across ontogeny21. The traits that were measured are likely to inform us about plant strategies both at the leaf and the whole-plant level, analogous to cross-species observations on plant strategies3.

Table 1 Measured traits and their units.

After pots had been subirrigated for at least two hours, to promote leaf and whole-plant rehydration22,23, we recorded leaf fresh mass and leaf area for one leaf per plant (leaf selection followed methods described in Pérez-Harguindeguy et al.23). For that, the selected leaf was cut and right after weighed and photographed. One-sided projected leaf area (Fig. 2c) was measured through image analysis using ImageJ 1.53k software24. We recorded leaf and whole-plant dry mass after drying the samples at 60 °C for at least 72 hours. Leaf nitrogen content per leaf dry mass (LNC) was predicted using near infrared spectroscopy (NIRS, see below) and the predictive model developed by Vasseur et al.14 (Fig. 3a,b). To check NIRS prediction accuracy (see Technical validation), LNC was measured on a subset of 403 dried leaf samples (weighing 0.1 to 1 mg) using an elemental analyzer (Vario-PYROcube, Elementar, UK). Only predicted LNC values were incorporated into the phenotypic data record of the AraDiv dataset. Leaf thickness was estimated through the inverse of the product of specific leaf area (leaf area per leaf dry mass, m2 kg−1) and leaf dry matter content (leaf dry mass per leaf fresh mass, mg g−1)25. We calculated days to flowering from the sowing day until the first flower at anthesis. Days to flowering were also expressed in growing degree days (GDD), that is, the daily cumulation of Celsius degrees (°C) from sowing until the flowering date. Daily GDD was computed as \({\rm{GDD}}=\frac{Tmax+Tmin}{2}-{\rm{Tb}}\), where Tmax is maximum temperature, Tmin is minimum temperature, and Tb is base temperature, which was considered as 4 °C, the temperature usually used for cold acclimation and vernalization of A. thaliana26,27. Tmax and Tmin were obtained from meteorological data collected in the experimental field at CEFE (see below). For the measurement of reproductive traits, we cut inflorescence stems at their base and photographed them (Fig. 2c). Fruit (i.e., silique) length and number, inflorescence length, and secondary branch number were subsequently recorded using a macro, adapted from Vasseur et al.28, in ImageJ 1.53k software24. Fruit length was measured as the mean length of three randomly selected mature fruits per plant. Fertility was estimated through the product of mean fruit length and fruit number29,30. For measuring seed dry mass, inflorescence stems were dried at ambient temperature after harvest, and around 30 seeds per plant were weighed for an estimation of individual seed mass.

Fig. 3
figure 3

(a) Near infrared spectra used for model development (blue) and produced in this study (red). (b) Leaf nitrogen content per leaf dry mass (LNC) prediction model performances (left plot) and residuals (right plot). The solid blue line represents the fitted linear regression curve (with its formula in the top left corner), while the dashed lines show the equality line. R2: coefficient of determination (N = 403); RMSE: root mean square error.

NIRS spectral data

NIRS screening was performed using the ASD LabSpec 4 Hi-Res Analytical Instrument (Malvern Panalytical Ltd., UK). We screened rehydrated leaves right after they had been cut and leaf fresh mass and leaf area had been measured. The leaf spectrum of light reflectance was recorded for the spectral region 350–2500 nm (Fig. 3a), using a contact probe with a 10 mm spot size. The leaf midrib could not be avoided during NIRS screening due to a generally small leaf size. Only leaves that were big enough to be completely covered by the NIRS probe were screened. Acquired spectra are provided as a data record (see Data records).

Meteorological data

Meteorological data were recorded using a weather station Davis Vantage Pro2 (Davis Instruments Corporation, USA) installed in the experimental field at CEFE. Ten meteorological variables (Table 2) were recorded every 30 minutes to derive daily means for the period of the experiment, i.e. from February to July 2021.

Table 2 Recorded meteorological variables and their units.

Data Records

The AraDiv dataset is composed of three data records, which have been deposited in the data.InDoRES repository (https://doi.org/10.48579/PRO/SW1OQD)31 and are described below.

Phenotypic data record

Phenotypic data are organized following the ecological trait‐data standard proposed by Schneider et al.32. Accordingly, data were recorded in a tidy format with the following 12 columns:

  1. 1.

    scientificName: a taxonomic reference to A. thaliana that follows the standard developed by the National Inventory of the Natural Heritage (https://inpn.mnhn.fr/accueil/index?lg=en).

  2. 2.

    X1001g_ID: accession number from the 1001 Genomes Project19 dataset (http://1001genomes.org/).

  3. 3.

    verbatimOccurrenceID: user‐specific identifier defining individual plants. Letters “P” and “A” mark individuals harvested at flowering and at fruit maturation, respectively.

  4. 4.

    verbatimBlockID: user‐specific identifier defining experimental blocks. Letters “P” and “A” mark blocks dedicated to plants harvested at flowering and at fruit maturation, respectively.

  5. 5.

    DateSowing: sowing date.

  6. 6.

    DateHarvest: harvest date.

  7. 7.

    HerbivoryIndex: qualitative herbivory index, ranging from zero to five, defined for the technical validation of reproductive traits (see Technical validation).

  8. 8.

    traitName: trait names according to plant ontologies, namely the Thesaurus of Plant characteristics (TOP33), the Plant Trait Ontology (TO34), and the Crop Ontology (CO34,35).

  9. 9.

    traitValue: trait values.

  10. 10.

    traitUnit: trait units.

  11. 11.

    traitID: Uniform Resource Identifier (URI) for applied trait ontology.

  12. 12.

    taxonID: URI for taxon identification.

NIRS spectral data record

NIRS spectral data file comprises 2,152 columns. The first column is equivalent to the “verbatimOccurrenceID” column previously described, and remaining columns comprise leaf hyperspectral reflectance values in spectral regions from 350 to 2500 nm.

Meteorological data record

Meteorological data were recorded in a tidy format with the following four columns:

  1. 1.

    Date: date from 1st February to 31st July 2021.

  2. 2.

    varName: meteorological variable names.

  3. 3.

    varValue: meteorological variable values.

  4. 4.

    varUnit: meteorological variable units.

Technical Validation

A completely randomized design was adopted in the common garden experiment to minimize confounding effects related to the placement of the pots.

Phenotypic data

Functional trait measures followed standardized procedures described in Pérez-Harguindeguy et al.23, helping to reduce measurement variability. For dealing with measurement errors, we verified each trait for unrealistic extreme values by analyzing deviation from the mean. Values outside a range of µ ± 6σ (µ, mean; σ, standard deviation) were used to filter samples out (N = 2). Because outliers for reproductive traits were often associated with intense herbivory damage, we defined a qualitative herbivory index, ranging from zero to five, to each photographed plant according to the number of herbivores and their traces (e.g., damaged inflorescence stem and/or fruits) that could be detected in the images. Herbivory index for individual plants can be found in the phenotypic data record of the AraDiv dataset and may be used for filtering out possibly biased samples with high herbivory index (i.e., 35 samples with 4–5 herbivory index). The intraspecific phenotypic variation assessed in this study is shown in Fig. 4.

Fig. 4
figure 4

Histograms showing variation in vegetative (a) phenological (b) and reproductive (c) traits.

NIRS prediction for LNC

NIRS prediction accuracy was verified by comparing observed and predicted LNC values using Vasseur’s et al.14 model (Fig. 3b). Prediction performances were estimated through the coefficient of determination (R²) and the root mean square error (RMSE, %).