Introduction

In the next decades, a growing human population, changing diets, extreme climate conditions, lower availability of natural resources, higher competition for arable soils with non-food crops, soil degradation, and the need to minimize harmful impacts on ecosystems, will pose new challenges to global food security (Godfray et al. 2010; Asseng et al. 2015). To cope with the aforementioned challenges, both agricultural production and sustainability need to increase (Tilman et al. 2002). The selection of new crop cultivars with favourable traits such as increased drought and heat tolerance, and input use efficiency, will be key to this process (Esquinas-Alcazar 2005; McCouch et al. 2013). To meet this goal, the eroded genepool of modern crop plants needs to be broadened and the widest genetic diversity needs to be available and exploitable, in order to select for the improved cultivars of the future (Ford-Llyod et al. 2011; Guarino and Lobell 2011; Vincent et al. 2013; Warschefsky et al. 2014). This is achievable only by conserving and keeping available, for research and breeding programmes, the widest possible genepool for each crop, especially landraces and crop wild relatives (CWR) (McCouch et al. 2013). In many areas of the world, in the last few decades, genetic erosion (the loss of genetic diversity in the form of alleles and genotypes as well as domestic crop accessions) of higher than the 70% has been observed (Hammer et al. 1996; Veteläinen et al. 2009; van de Wouw et al. 2010). Given this scenario, it is of key-importance to conserve agrobiodiversity in the long-term. More specifically, this is the diversity of crop species used in different agro-ecosystems as well as the genetic diversity within and among crop and CWR accessions (Last et al. 2014). Furthermore, the conservation of plant genetic resources for food and agriculture (PGRFA) is fundamental to achieving target 9 of the 2011–2020 Global Strategy for Plant Conservation as well as target 13 of the Aichi biodiversity targets. Thus, the effective conservation of agrobiodiversity and its sustainable use is considered to be of pivotal importance by the Convention of Biological Diversity (CBD) (Convention of Biological Diversity 2011, 2012).

One of the most effective strategies to ensure the conservation and availability of PGRFA is through ex situ conservation in genebanks (McCouch et al. 2013; Davies and Allender 2017). In particular, seed banking shows several advantages as a long-term ex situ conservation strategy for plant species and therefore is used for the maintenance of most of the PGR (plant genetic resources) collections ex situ. Seed material is relatively easy to collect, can be stored in small spaces, can provide a decent sample of the genetic diversity within the species and often remains viable for long periods. Furthermore, seed collections do not require a high level of maintenance and are also economically viable (Li and Pritchard 2009; Riviere and Muller 2017). Worldwide there are more than 1750 genebanks containing over 7.4 million accessions (FAO 2010; Davies and Allender 2017).

All countries are highly dependent upon PGRFA conserved beyond their borders. This global interdependence, and therefore global flows of PGRFA among conservation facilities, are likely to increase in order to cope with future challenges, especially climatic changes (Galluzzi et al. 2016).

The stakeholders of PGRFA indicated as a major constraint in the usage of conserved germplasm the difficulty in accessing it and to obtaining associated information (Kell et al. 2017). The accessibility of PGR accessions is strictly linked with the existence and updating of global information databases, this involves the gathering of data for accessions from many collections into a centralized source, therefore facilitating the transfer of the germplasm among institutions. Consequently, the building and improvement of information systems that link collections together in order to create a single searchable database of genetic resources is a high priority in PGR conservation and utilization (Khoury et al. 2010).

The correct and clear taxonomic identification and labelling of accessions is of key importance to making accessions of PGR, conserved ex situ, usable (Dempewolf et al. 2017). Taxonomic issues are indeed demonstrated to highly threaten the effectiveness of conservation programmes (Garnett and Christidis 2017). Moreover, the updated and precise taxonomic labelling of PGR accessions in public databases is also fundamental to performing prioritization studies aimed at finding gaps in ex situ conservation (Dempewolf et al. 2014). This is because a correct taxonomic naming of accessions is essential for the identification of PGR taxa that are currently underrepresented in ex situ conservation facilities worldwide and therefore have a high priority for future collecting missions and urgent conservation measures (Maxted et al. 2010; Castañeda-Álvarez et al. 2016).

In order to better understand the extent of taxonomic misnaming in databases of PGR accessions which are conserved ex situ and whether this issue could prevent their exploitation and conservation, we analysed and quantified the occurrence of this problem in seed accessions belonging to the watermelon genepool (Citrullus). We decided to focus on this genus as a case study, since it has a relatively small number of taxa (species and subspecies) and, after having been considered for long a critical taxonomic group, its systematics, taxonomy, and nomenclature have been revised and improved in recent years, with the aid of genetic investigation too (see Nesom 2011; Schaefer and Renner 2011; Chomicki and Renner 2014; Renner et al. 2014; Paris 2015). Moreover, Citrullus is of significant importance as a vegetable crop (Applequist 2016), with 3.5 million hectares of agricultural land used to cultivate watermelons in 2014 when annual production reached 111 million tons, which was 9.5% of global vegetable production, grown on 6% of the area used globally for the cultivation of vegetables (FAOSTAT 2017). The genus is of great importance in terms of food security in desert and semi-arid areas (Mujaju et al. 2011; Modi and Zulu 2012). Citrullus includes eight taxa (seven species, one of which articulated into two subspecies), three are widely cultivated [C. amarus, C. lanatus subsp. lanatus, C. mucosospermus], two are only locally cultivated [C. colocynthis and C. lanatus subsp. kordophanus], and three have only wild populations [C. ecirrhosus, C. naudinianus, C. rehmii] (Chomicki and Renner 2014; Paris 2015).

The aims of the current research, adopting Citrullus as a case study, are: (a) to understand and define which are the most frequently nomenclature-related issues in databases of PGR accessions, (b) to clarify if these issues could prevent the usage of these accessions, and (c) to identify if and how the discovered issues may be resolved.

Materials and methods

We checked for Citrullus accessions in the two major databases of plant genetic resources worldwide: Genesys PGR and EURISCO. Genesys is a global portal that lists more than 3.6 million accessions of plant genetic resources from 482 institutes worldwide. It is a gateway from which germplasm accessions from genebanks can be found and ordered (Genesys 2017). EURISCO is a search catalogue providing information about ex situ plant collections maintained by institutions located in Europe, and includes data for 1.9 million accessions (EURISCO 2017). Individual databases contributing to EURISCO simultaneously upload data into Genesys (Genesys 2017), but, not knowing the details of updating between the two databases, we decided to consider both separately in our analysis.

We searched as keywords, the existing nomenclatural combinations acquired from literature (e.g., Pangalo 1930; Mansfeld 1959; Fursa 1972; Nesom 2011; Chomicki and Renner 2014; Paris 2015). In order to increase the possibilities of finding accessions, we also searched for the most widespread vernacular names (e.g., colocynth, tsamma, citron, egusi), with reference to geographic provenance and biological status (wild, landrace, modern cultivar) too. The resulting accession names were examined on nomenclatural grounds, by comparing them with the most updated taxonomic and nomenclatural treatments of Citrullus (see Nesom 2011; Chomicki and Renner 2014; Renner et al. 2014; Paris 2015); names were further checked in compliance with the rules of the International Code of Nomenclature for algae, fungi, and plants (ICN) (McNeill et al. 2012). A complete list of the accepted names and synonyms we came across in the consulted databases are provided in Online Resource 1.

The encountered misnaming issues were classified into “issue types” and then grouped into two main “issue categories” (Table 1). If an accession was indeed wrongly named, we attributed the misnaming to one or more issue types and then to one or more issue categories. Eventually, when practicable, we proposed, for each accession, the most updated scientific name (Table 1), based on the updated taxonomic and nomenclatural treatment of Citrullus reported in Online Resource 1. We then provided a numerical estimation of each misnaming issue type and category found in the two databases, this was intended to evaluate the extent of taxonomic misnaming in PGR databases (Online Resource 2, 3).

Table 1 Misnaming issue types (“Issue types”), with definitions and examples (“Issue types definition”) acquired from the results of our database search

Additionally, by analysing the different databases, we found that the name of the same taxon changes among institutions, we referred to this as “variability”. When more scientific names are applied to the same taxon within the same institution we referred to “intra-institution variability”, among different institutions was “inter-institution variability” (Table 2), within the same database “intra-database variability” or between different databases “inter-database variability” (Table 3). We chose C. amarus as a target taxon to illustrate this point. In each database, we searched for the current accepted name and its synonyms.

Table 2 Intra- and inter-institution variability of taxon names
Table 3 Intra- and iter-database variability of taxon names

Eventually, in order to verify in practical terms the validity of our doubts, three accessions that appeared to be misnamed in the Genesys and EURISCO databases were propagated at the CREA-ORL institute of Montanaso Lombardo (northern Italy, province of Lodi). The accession names were first revised on nomenclatural grounds and then, after propagation, their taxonomic identity was checked (Table 4; Fig. 1). Cultivation was performed in purity to avoid crossbreeding among the different accessions: female flowers were isolated with paper bags and hand pollination was performed. Herbarium vouchers are stored at the Herbarium of the University of Pavia (PAV).

Table 4 Propagated Citrullus accessions
Fig. 1
figure 1

Propagated Citrullus accessions (see Table 1): a, c, e fruits, scale bar = 5 cm; b, d, f foliage. a GBR004-83216: C. amarus, b RUS001-4679: C. amarus, c USA016-PI490380: C. mucosospermus. Photos by T. Abeli (a, c, e), V. Ottobrino (b, f), and N. M. G. Ardenghi (d)

Results

Misnaming issues: categorization

We identified six types of misnaming issues (Table 1), along with several misprints regarding both the names of the taxa and their authors. Each issue has been classified into two “issue categories”:

  1. 1.

    “Nomenclatural inaccuracy” (N): the scientific name associated with the accession does not correspond to the most updated scientific name, but it represents a synonym (homotypic or heterotypic; see Glossary and Art. 14.4 of the ICN) or a spelling variant of the latter; it does not prevent the establishment of the real taxonomic identity of the accession;

  2. 2.

    “Taxonomic error” (T): the scientific name associated with the accession prevents the establishment of the real taxonomic identity of the accession.

Misnaming issues: quantification

By querying the two PGR databases: 8494 single entries of Citrullus accessions were obtained, 6631 entries from Genesys and 1863 from EURISCO. The scientific names of 5864 accessions (69.03% of the total) showed nomenclature inaccuracies (N), while the scientific names of 2355 accessions (27.72% of the total) showed taxonomic errors (T) (Online Resource 2, 3; Fig. 2).

Fig. 2
figure 2

Percentages of the nomenclature inaccuracies (green), taxonomic errors (light blue), and correct names (red) pooling together the data of Citrullus accessions extracted from both Genesys and EURISCO databases

In detail, 9.85% of the misnaming issues encountered in both the databases fell within the issue type “authors” (N); 66.79% fell within “synonym” (N); 1.40% fell within “non-existent name” (N); 0.94% fell within “no species” (T); 26.60% fell within “no subspecies” (T); and 0.38% fell within “conflicting scientific and cultivar names” (T) (the sum of the aforementioned percentages exceeds 100% because some of the accessions showed more than one issue type). Misprints regarding the taxon names or their authors affected 12.54% of the accessions (Online Resource 2, 3).

Only 275 accessions (3.23% of the total) were correctly and unambiguously named according the most recent and updated taxonomic and nomenclatural treatment of Citrullus (Online Resource 1, 2, 3). Considering each database individually the percentages are similar: 2.88% (190 accessions) in Genesys and 4.59% (85 accessions) in EURISCO (Online Resource 2, 3).

Variability

Taking the name Citrullus amarus as an example, we discovered the following variability concerning the accessions’ names:

  1. 1.

    “Intra- and inter-istitution variability”: in Table 2, it can be seen that within two institutions (RUS001 and HUN003) different accessions belonging to the same species are named using different nomenclatural combinations, which are synonyms of the current accepted name. Moreover, each of the two institutions uses a name not employed by the other institution (C. lanatus var. capensis for RUS001, C. lanatus var. caffer for HUN003). None of the accessions have been updated with the currently accepted scientific name, C. amarus.

  2. 2.

    “Intra- and inter-database variability”: Table 3 reveals that both databases employ more nomenclatural combinations (four in Genesys and three in EURISCO) to name accessions belonging to the same species. Moreover, one database (Genesys) uses a name (C. lanatus var. capensis) not adopted by the other (EURISCO). Similarly to what found in “intra- and inter-istitution variability”, there are no accessions bearing the name C. amarus in the two databases.

Cultivation

The cultivation in purity confirmed that the names of two out of the three misnamed propagated accessions did not correspond to their actual taxonomic identity (taxonomic error). While the identity of RUS001-4679, although stored with a misprinted heterotypic synonym [“Citrullus lanatus (Thunb.) Matsum. et Nakai var. citroides (Bailley) Mansf.”, thus a nomenclatural inaccuracy; see Online Resource 1 and Table 1], was confirmed (C. amarus; Table 4; Fig. 1), the identity of GBR004-83216 and USA016-PI490380 turned out to be incorrect or unclear. Specifically, the identity of GBR004-83216 appeared to be doubtful already at the stage of the nomenclatural revision, since the institution did not provide any infraspecific rank for C. lanatus, preventing any safe choice among C. amarus, the subspecies of C. lanatus, and C. mucosospermus (see Table 1, third example). The cultivation allowed us to resolve this issue, revealing that the correct identity of the accession is C. amarus (Table 4; Fig. 1). In USA016-PI490380, the scientific name of the accession (C. lanatus var. lanatus, a homotypic synonym of C. lanatus spp. lanatus, see Online Resource 1) appears to be in conflict with the vernacular name (Egusi), which applies to another species (C. mucosospermus, see Online Resource 1). In this case, propagation also proved to be decisive in resolving the issue, being C. mucosospermus the plant having emerged from the seeds constituting the accession (Table 4; Fig. 1).

Discussion

The results of our database search and subsequent revision of the accessions’ scientific names, highlight the fact that taxonomic misnaming issues actually greatly limit the conservation and usage of Citrullus seed material conserved ex situ; only 3% of the material is correctly and unambiguously named, in conformity to the most updated taxonomic and nomenclatural treatment. Some sort of nomenclatural inaccuracies has been found for 69% percent of the material (Fig. 2; Online Resource 2, 3), which could be merely solved by updating the databases through the application of the current taxonomic and nomenclatural treatment. On the other hand, 28% of the accessions showed taxonomic errors (Fig. 2, Online Resource 2, 3) and therefore cannot be unequivocally attributed to any existing taxon; as a consequence, they are prevented from being employed in any research, breeding, cultivation, reintroduction or conservation project. Their usage can be recovered only by means of re-determination (if a herbarium voucher of the seed accession is available) or re-propagation, followed by re-determination, which enables the revision of the accession’s taxonomic identity and its re-accessioning under the correct name. If this re-propagation and re-determination process is not undertaken, research and breeding activities on 28% of the Citrullus accessions currently stored worldwide will be corrupted by an erroneous taxonomic identification, causing their priceless genetic variability and the potential of their useful traits to be improperly exploited. Moreover, it is possible, considering that species barriers to crossing in Citrullus are weak (Assis et al. 2000), that, during the re-propagation phase, introgressants or other intermediates might be found. Specifically, introduction and cropping of dessert watermelons, Citrullus lanatus, in parts of Africa in which other Citrullus species are indigenous, might have resulted in introgression of dessert watermelon genes into indigenous germplasm, thus complicating taxonomic identification. We suggest classifying these introgressants as Citrullus sp. in the databases, with their possible intermediate origin in their passport data. The results of our propagation experiment (Fig. 1; Table 4) show unequivocally in practical terms the existence of the problem of taxonomic identity, since the revised taxonomic identity of two out of the three propagated accessions did not correspond to the originally adopted accession name. On the other hand, the propagation experiment demonstrates the feasibility and effectiveness of a growth and re-determination phase in recovering the usability of accessions affected by taxonomic issues.

Our results clearly show great variability in the taxonomic and nomenclatural treatments adopted by the different institutions and even by the same institution (Tables 2, 3). As shown by Table 2, different accessions belonging to the same taxon are named using different synonym combinations, leading to an apparent overestimation of the taxa conserved ex situ by each institution. Specifically, in our example, two taxa (C. lanatus var. citroides and C. lanatus var. capensis in RUS001, and C. lanatus var. caffer and C. lanatus var. citroides in HUN003) appear to be conserved rather than one (C. amarus) (Table 2). On the other hand, it is even worse to notice that the variability in accession naming, along with the aforementioned misnaming issues, can lead to a great underestimation of the overall number of accessions conserved per taxon. As shown in Table 3, for instance, it appears that no seed accession of C. amarus is conserved worldwide. This would make this species, which is widespread in southern Africa (Paris 2015) and frequently cultivated in all the tropical and sub-tropical areas of the world (Laghetti and Hammer 2007), of extremely high priority for ex situ seed conservation measures, when in fact accessions stored under four different synonyms actually exist (49 in EURISCO and 251 in Genesys). The apparent and erroneous underestimation of the number of stored accessions of a certain taxon is of particular relevance since it could diminish the importance of prioritization studies intended to unveil which taxa are currently underrepresented in long-term conservation facilities and therefore which of them should be the target of collecting missions. Prioritization studies are indeed based on the quantification of the accessions conserved ex situ of a particular target species or group of species (Maxted et al. 2010; Castañeda-Álvarez et al. 2016).

Achieving the goal of having the biggest number of accessions correctly named is fundamental to standardizing the naming process of accessions in the PGR databases, using as a reference the most updated taxonomic and nomenclatural treatments. Special attention should be paid to the choice of taxonomic ranks, without neglecting the infraspecific ones (e.g., subspecies), both for their employment in distinguishing between closely related wild and domesticated taxa (see e.g., Galasso et al. 2017), and for their possible raising to higher ranks (e.g., species) as a result of the advances in taxonomic knowledge. This latter case is illustrated by C. lanatus, which until recent times has been divided into various subspecies and varieties, that have been subsequently regarded as independent species (Online Resource 1). Thus, when accessions are stored under names without mention of subspecies or variety, they are almost impossible to interpret on taxonomic grounds. In our case, a simple “C. lanatus” entry may refer to C. amarus, C. mucosospermus or one of the two subspecies of C. lanatus (Tables 1, 2; Online Resource 2, 3).

To avoid the re-occurrence of all the aforementioned problems, a process of taxonomic and nomenclatural peer review is urgently needed before making each new accession public, in order to guarantee that each single accession is usable by the stakeholders. This great number of misnaming issues and variabilities in the application of taxon names also compromises the database analyses that, nowadays, are fundamental to planning and performing ex situ conservation programmes intended to find out taxa and locations underrepresented in current seed collections (Castañeda-Álvarez et al. 2016; Guzzon and Müller 2016).

It is known that the efficiency of species conservation measures greatly depends on the reliability of the taxonomic information available (Bortolus 2008). The conservation and utilization of PGR depends on the building and update of global searchable databases (Khoury et al. 2010). Our investigation demonstrates that taxonomic misnaming threatens the ex situ conservation efforts of the genus Citrullus. Our results confirm that taxonomic issues are a major problem in aggregator databases and that those issues have serious implications for the uncritical use of specimen data from botanical collections (Goodwin et al. 2015). On the basis of our experience, we here propose a series of actions useful for understanding the extent of the phenomenon and for solving its detrimental effects across the genepools of plant genetic resources for food and agriculture (PGRFA):

  1. 1.

    Perform studies similar to the present one on further PGRFA genepools to unveil the extent of similar problems also in other taxa;

  2. 2.

    Establish taxonomic authorities in order to provide an updated and standardized taxonomic and nomenclatural treatment that should be followed in the application of the accessions’ scientific names for a certain genepool within and the among the PGRFA databases;

  3. 3.

    Update the nomenclature inaccuracies following the most recent nomenclatural treatments;

  4. 4.

    Propagate, identify, and correct the scientific names of accessions affected by taxonomic errors;

  5. 5.

    Always link a herbarium voucher to each seed accession, in order to allow a quick taxon re-determination and to avoid the lengthy and costly process of re-propagation. The collection and preservation of herbarium specimens is often not considered by institutions involved in germplasm conservations;

  6. 6.

    Perform a peer review of the accessions’ scientific names before their publication in databases.

This series of actions appears to us to be the only way to solve and prevent the occurrence of large numbers of taxonomic and nomenclatural issues with detrimental effects on PGR conservation and usage. Such procedures require investment in personnel and resources, and therefore would require adequate recognition and funding from government-supported sources.

The current study has been undertaken on the genus Citrullus, a relatively small genus whose taxonomy has been resolved in recent times. Nevertheless, a remarkable number of issues emerged, with consequences for the interpretation of the data stored in the databases and the practical usage of some accessions. In light of this observation, how serious may be the situation for accessions belonging to critical taxonomic groups with a great importance for food security (e.g., Triticum, Musa; see Hammer et al. 2011; Čížková et al. 2015) as well as other agriculture-related activities (e.g., Festuca s.l.; see Ardenghi et al. 2017), still characterized by controversial and unsolved taxonomy? This highlights once again the key-importance of taxonomic studies, often neglected in modern biology, for species conservation (Garnett and Christidis 2017). Moreover, we suggest that studies similar to the current one should be performed on more crop genepools that significantly contribute to global food security in order to get an idea of the scale of the issues highlighted in this study and for them to be corrected where possible. Genepools of different sizes will have to be considered. Of particular importance are genepools that had a recent taxonomic revision, for instance the pea (Lathyrus oleraceus subsp. oleraceus), lentil (Vicia lens subsp. lens), broad bean (Vicia faba) and wheat, which are considered some of the founder crops of Neolithic agriculture in the Fertile Crescent and hold a fundamental importance in food security (van Slageren 1994; Kilian et al. 2011; Schaefer et al. 2012).

While the current paper focuses more on PGRFA for their role in food security, the extent of taxonomic misnaming could also affect the conservation of seed accessions of endangered plants and of wild species of interest for reintroduction and translocation programmes.