Introduction

Leguminosae (or Fabaceae) is the third largest family of flowering plants, with over 750 genera and 20,000 species and worldwide distribution, from alpine and arctic regions to the equatorial tropics. Legumes include herbaceous plants, such as pea, vetch, soybean, through large woody lianas to 100-m tall tropical forest trees. The Leguminosae are characterized by the distinct legume fruit, which gives the family its name.

Legumes, together with cereals, have been fundamental to the development of modern agriculture. Legumes are second only to grasses in importance for human and animal dietary needs. The major crop grain legumes include bean, chickpea, cowpea, faba bean, lentil, pea and pigeonpea. The other two major crop legumes, soybean and peanut (not the subject of this conference), are predominantly oil producing crops; however, both occupy substantial acreage worldwide and provide high quality protein for food and feed purposes. In addition to these primary grain legumes, several minor or underutilized legumes also contribute to food and nutritional security, for example, grasspea, guar, horsegram, moth bean, mung bean and urd bean, and primarily grown in the Indian sub continent, China and South East Asia. Globally, the legume crops were grown during 2005–2009 period on average of ~66 Mha with a total production of ~55 Mt, and productivity of 1.0 t ha−1 (data accessed on Feb 4, 2011; http://www.faostat.fao.org). The area, production and productivity of crop legumes in the last 45 years (1965–2009) remained roughly constant in the 1960s and 1970s, before increasing steadily until the present (Figs. 1 and 2).

Fig. 1
figure 1

Global area (million ha) and production (million tons) of major grain legumes, projected at 5 yearly interval, for the period from 1965 to 2009 (data accessed on 4 Feb 2011; http://www.faostat.fao.org)

Fig. 2
figure 2

Average productivity (Kg ha−1) of the major grain legumes, projected at 5 yearly interval, for the period from 1965 to 2009 (data accessed on 4 Feb 2011; http://www.faostat.fao.org)

Grain legumes, rich in protein, carbohydrate, fiber, and minerals, are characterized by low glycaemic index (GI), and food with low GI are generally associated with several long-term health benefits (http://www.extension.usu.edu). The isoflavones in legumes play a role in plant defense (Padmavati and Reddy 1999), root nodulation (Subramanian et al. 2007), and also on human health (Jung et al. 2000). The major anti-nutritional components in grain legumes include protease inhibitors, tannins and phytic acid. The excess consumption of grasspea grains frequently lead to the crippling disease (neurolathyrism) in humans due to presence of neurotoxic amino acid, beta oxalyl-l-alpha, β-diaminopropionic acid (β-ODAP) (Getahun et al. 1999; Geda et al. 2005). Faba bean seeds contain glycosides, vicine and convicine, causing the disease favism in genetically susceptible humans. Low vicine-convicine accessions of Vicia faba provide an improved performance of poultry feed and are presently tested for their ability to reduce the risk of favism for humans (Crépon et al. 2010).

The nitrogen fixing capacity of legumes makes them an important component in cropping systems where they enrich soil fertility and improve soil texture for other crops (Graham and Vance 2003), and their fodder is a valuable resource as animal feed. In addition, many legumes release soil-bound phosphate through their symbiotic relationships with mycorrhizal fungi (Sanders and Tinker 1973; Hayman 1983). Grain yield and quality in legumes is adversely affected by biotic (e.g., parasitic weeds, insects, weevils, nematodes, fungi, bacteria and viruses) and abiotic (imbalances in water, temperature, or mineral availability) stresses (Dwivedi et al. 2005).

The use of plant genetic resources (PGR) in crop improvement is one of the most sustainable ways to conserve valuable genetic resources for the future while simultaneously increasing agricultural production and food security. Key to successful crop improvement is a continued supply of genetic diversity in breeding programs, including new or improved variability for target traits. Collectively, ~1 M samples of grain legume genetic resources are preserved in ex-situ genebanks globally (“Ex situ collection of cultivated and wild genetic resources” section). Managing and utilizing such large diversity in germplasm collections are great challenges to germplasm curators and crop breeders. This paper focuses on preserving and managing grain legume diversity in situ as well as ex situ; assesses the risk of genetic erosion/drift in ex-situ collections; compares datasets compatibility and accessibility across genebanks; highlights the effect of climate change on loss of biodiversity; provides greater insights into population structure and association mapping in germplasm collections; and discusses the role of wild relatives to expanding crop genepools for use in breeding and genomics applications in legumes.

Legumes germplasm holdings in national and international genebanks

Ex situ collection of cultivated and wild genetic resources

Currently, about 7.4 M accessions of PGR are maintained globally, while 25–30% of total holdings are unique (2nd Report on the State of the World Plant Genetic Resources for Food and Agriculture, 2009, referred hereafter as SWPGRFA 2009). Legumes constitute the second largest group (~15% of all the accessions) after cereals. Collectively, CGIAR (Consultative Group on International Agricultural Research) centers hold 0.741 M accessions of 3,446 species from 612 different genera. The grain legume germplasm in CGIAR genebanks consists of 0.146 M samples, predominantly cultivated types (Table 1). The CGIAR centers such as CIAT (Centro Internacional de Agricultura Tropical), ICARDA (International Center for Agricultural Research on Dryland Agriculture), ICRISAT (International Crops Research Institute for the Semi-Arid Tropics) and IITA (International Institute for Tropical Agriculture) are the custodians of the largest collections of bean, chickpea, cowpea, faba bean, lentil and pigeonpea germplasm while the Australian genebank (ATFCC, Australian Tropical Crops & Forage Genetic Resources Center) has the largest collection of pea germplasm (SWPGRFA 2009). Other genebanks with sizable collections of legumes germplasm include Leibniz Institute of Plant Genetics and Crop Plant Research, Germany (bean, faba bean and pea); National Bureau of Plant Genetic Resources (NBPGR), India (chickpea and pigeonpea); S9 (Southern Regional Plant Introduction Station, Griffin, Geogria) and W6 (Western Regional Plant Introduction Station, Pullman, Washington) genebanks in USA (bean, chickpea, lentil and pea); and N.I. Vavilov Research Institute of Plant Industry, Russia (lentil and pea) (SWPGRFA 2009). The NBPGR genebank also has a substantial collection of cluster bean, cowpea, French bean, grasspea, horsegram, lablab bean, lentil, moth bean, mung bean, pea, rice bean and urd bean. In addition, 81,985 unique accessions of bean, chickpea, cowpea, faba bean, lentil, pigeonpea and soybean have so far been preserved for safe duplication at Svalbard Global Seed Vault genebank, Norway, with the commitment from the genebanks to place unique accessions including those from legumes in phased manner to this genebank (www.croptrust.org).

Table 1 Cultivated, weedy and wild relatives of bean, chickpea, cowpea, faba bean, grasspea, lentil, pea and pigeonpea germplasm collections preserved in CGIAR genebanks (assessed on 27 Jan 2011; http://singer.cgiar.org/)

In spite of the large number of collections maintained ex situ in genebanks globally, there are still important collection gaps that must be addressed in chickpea, common bean, faba bean, grasspea, lentil and pigeonpea (http://gisweb.ciat.cgiar.org/GapAnalysis/; Heywood et al. 2007; Maxted et al. 2008; Zong et al. 2008a, b; Mikic et al. 2009), before these priceless genetic resources are lost for ever. With the anticipated climate change-associated increase in the frequency of drought and temperature extremes in agricultural production systems (IPCC 2008), collecting pre-adapted germplasm from areas exposed to stressful climates will become a priority (Nelson et al. 2010).

Crop wild relatives (CWR) are an important source of genes for breeding (Dwivedi et al. 2008; Maxted and Kell 2009). Unlike cultivated germplasm, there are difficulties associated with ex situ conservation of CWR because of their specific agronomic needs and tendency for pod dehiscence, seed dormancy, seed shattering, high variability in flowering and seed production, and rhizomatous nature of the some species. Accordingly, there is global interest in in situ conservation of CWR in protected areas, growing from ~56,000 in 1996 to ~70,000 in 2007, with associated area increases from 13 to ~17 M km2. Countries such as India, Iran, Iraq, Israel, Japan, Jordan, Lebanon, Mexico, Slovak Republic, Syria and Turkey have initiated programs to establish in situ conservation of CWR of food crops including legumes (Meilleur and Hodgkin 2004). High priority species for in situ CWR conservation include Pisum abyssinicum and P. sativum in pea; Vicia faba subsp. paucijuga, V. galilaea, V. hyaeniscyamus, and V. kalakhensis in Vicia spp. The suggested in situ reserves for pea include Cyprus, Ethiopia, the Syrian Arab Republic, Turkey, and Yemen, with the latter two countries also considered important for Vicia spp (SWPGRFA 2009). Maxted et al. (2008) suggested that in situ reserves be established for genetic reserves for conservation of several African Vigna species at the southern tip of Lake Tanganyika, the coastal area of Sierra Leone and between Lake Victoria and the other great lakes, and identified priority countries in Africa for targeted collection.

Genetic stocks and mutant collections

The development of reference collections of genetic stocks for single or limited combinations of characters is a relatively recent activity dating back to the late 1800s when there was strong interest in novel forms in vegetables. The trend was further stimulated in the early 1900s following the rediscovery of Mendel’s work on inheritance. One of the earliest formal collections was developed by the famous French vegetable breeding house of Vilmorin-Andrieu et Cie. The collection lists 21 pairs of cultivated peas (Pisum sativum L.) lines for contrasting characters covering plant form, foliage, flowers, pods and seeds that were the subject of genetic investigation held within a collection of 550 cultivars (de Vilmorin 1913). Numerous other working collections came into existence around the same period and some coalesced into larger holdings that have unbroken continuity to collections of the present day.

The development of methods of inducing mutants through either chemical or radiation became widespread as a form of accelerating mutation rates to create novel genetic variation for selection. The adoption of induced mutagenesis as a breeding approach became widespread in many legume crops from the 1940 onwards and is still a primary breeding strategy in many programmes today. A recent study by Kharkwal et al. (2010) highlighted the importance of induced mutants in legume improvement programs and reported the release of more than 450 improved mutant varieties belonging to 29 species. While there is a considerable body of literature on mutants that have been identified and used in crop improvement and scientific studies, significant proportions fail to become formally registered into long term ex situ collections and the consequence is that many such lines are now sadly unavailable for use and have ultimately been lost. Where they have become registered they form an invaluable reference resource of verified new mutation events and the development of allelic series for future generations. These categories of stocks represent a form of common currency of varying denominations that underpin genetic advancement and exchange between breeders, geneticists, developmental biologists, pathologist and biochemists that go hand in hand with the advancement in basic knowledge of the underlying genetics of a crop.

Genetic (translocations and inversions, deletions, multiple marked stocks, RILs, NILs, double haploids) and mutant (spontaneous and induced, transposon tagged populations, TILLING populations) stocks, by their very nature, frequently have a higher maintenance requirement. Type line specimens for mutations are often in themselves less vigorous than other type of germplasm proving unsuitable for field regeneration and can only be grown under glasshouse or controlled environment conditions. In addition they may be of low fertility or even sterile while others are lethal as recessive mutants and so have to be maintained in a heterozygous state. Some mutations may be genetically unstable and require cytological or marker verification which are both more costly and require greater expertise to maintain (Goodman 1990). Some of the major Pisum mutant collections include 575 John Innes Collection, Norwich, UK (http://www.jic.ac.uk/GERMPLAS/pisum/index.htm), 122 Institute of Plant Genetic Resources collection, Plovdiv, Bulgaria (http://www.genebank.hit.bg), and 93 symbiotic mutants (26 genes), Dijon, France. The INRA-Dijon-F has a 30 mutant collection of faba bean (http://195.220.91.17/legumbase/index.php?mode=96&doc=1), with mutant phenotypes include male sterility, root nodulation, seed composition, closed-flower and determinate growth (Duc 1997). Two of the model legume mutant resources currently available are the TILLING populations of Lotus japonicus (Perry et al. 2003; http://www.revgenuk.jic.ac.uk) and the DE-TILLING population of Medicago truncatula (Rogers et al. 2009). TILLING resources in grain legume crop species include common bean (Porch et al. 2009), chickpea (Muehlbauer and Rajesh 2008), groundnut (Ramos et al. 2008), and pea (1840 phenotypes) (Le Signor et al. 2009; http://www.urgv.evry.inra.fr/UTILLdb).

Managing legumes germplasm in genebanks

Conservation, characterization, evaluation, regeneration, distribution and documentation

Conserved plant genetic resources are essential to meet the current and future needs of crop improvement programs. The management of genetic resources includes (i) regenerating and conserving already collected genetic resources, (ii) enriching the genetic resources through collections of new germplasm and creation of new genetic variability, (iii) characterizing, evaluating, documenting and assessing the pattern of genetic diversity to identify gaps in the collection, (iv) assessing the impact of plant genetic resources in crops breeding and (v) promotion and awareness raising.

Ex situ seed storage, in the form of storing seeds as active (medium term) and base (long term) collections, is the most convenient, cost effective and widely used method of conservation. Active collections are kept in conditions, which ensure that the accessions viability remains above 65% for 10–20 years. Different combinations of storage temperature and moisture content can provide this longevity (IPGRI 1996). Base collections are maintained at −20°C to ensure long-term viability of seed materials, often more than 50 years (FAO/IPGRI 1994). The periodic monitoring of the viability and timely regeneration of the materials is an essential part of ex situ conservation, and vary according to the crop species, and its reproductive system (Breese 1989).

The conserved germplasm is characterized for distinct morpho-agronomic traits, using set of crop-specific descriptors. Approximately 78% of the 146,837 grain legumes germplasm accessions held in CGIAR centers (Table 1) have been characterized for morphological traits, including for resistance to biotic and abiotic stresses; however, only a small percentage of these collections across have been characterized for biochemical traits (SWPGRFA 2009). Clearly, more emphasis and funding are needed in generating data on biochemical characteristics and response to biotic and abiotic stresses on national and regional collections.

Various systems are in place to retrieve information on many aspects of germplasm collections, such as Genebank Information Management System (GIMS) at ICRISAT, GRIN-Global used at USDA ARS (Cyr et al. 2009) and SINGER at CGIAR centers at system wide level. The European based EURISCO system provides information about the ex situ plant collections maintained in Europe (http://eurisco.ecpgr.org/). Information on the pattern of seed distribution, such as those reported for common bean germplasm by CIAT, provides a valuable indicator of the use of plant genetic resources (Gaiji and Debouck 2009).

Developing global conservation strategies

The International Treaty on Plant Genetic Resource for Food and Agriculture (ITPGRFA), entered into force in 2004 and so far ratified by 123 countries (www.planttreaty.org), promote the conservation and utilization of genetic resources of 64 crops (annex 1 list) under multilateral system (MLS). Seven food legumes (bean, chickpea, faba bean, grasspea, lentil, pea and pigeonpea) and 15 genera of forage legumes are listed amongst the 64 selected crops. Within the MLS, access to PGR is facilitated for research and breeding in food and agriculture. The MLS promotes full use of the material and the fair and equitable sharing of the benefits resulting from such use, whether as commercial benefit, or access to scientific information, technology transfer or improved genetic material. The MLS implicitly requires the development of an efficient and sustainable global system which will ensure the long term conservation and availability of PGRFA (GCDT 2007).

The global crop diversity trust (GCDT) supports the development of global crops (including some legumes) and regional strategies for ex situ conservation and utilization of crop diversity. These strategies represent a major undertaking in the field of PGR, mobilizing experts to collaboratively plan for the more efficient and effective conservation and use of crop diversity. The themes viewed under these strategies include regeneration, crop wild relatives, collecting, crop descriptors, information systems, user priorities, new technologies and research, and challenges to building a strategy for rational conservation (Khoury et al. 2010). During 2005–2007 the GCDT was instrumental in initiating the development of regional strategies for the long term conservation and use of PGR for Asia and the Caucasus; the Americas; West Asia and North and East Africa; West and Central Africa; South, Southeast and East Asia; Europe and Pacific (http://www.croptrust.org). All strategies involved surveys on genetic resources conservation and use and regional expert consultations. Following these consultations, key ex situ collections of globally important crops were identified and a list of priority crops per continent was established. Depending on the region, several food legumes ranked high in the priority list. Amongst them are bean, chickpea, cowpea, faba bean, lentil, pigeonpea and soybean. All these legumes except soybean are part of the annex 1 list of the ITPGRFA. It should be noted that the situation for pea has yet to be assessed and is likely to also be ranked as high priority.

As a follow up to the regional strategies, more specific, crop-based conservation strategies were developed. Similarly to the regional strategies, crop specific strategies were developed based on surveys and expert meetings. Although not exhaustive, the survey provided valuable information on who holds, maintains and distributes what, where and how and draws together the common issues faced by the PGR holders. These crop conservation strategies represent important base line assessments which the specific crop communities will need to take forward. Within grain legumes, strategies have already been published for bean, chickpea, cowpea, faba bean, grasspea, lentil and pigeon pea (http://www.croptrust.org). The pea germplasm community is in discussion with the Trust to prepare a comparable crop strategy so that pea does not lag too far behind.

All strategies underlined the urgent need to maintain at least one duplicate of each unique accession in an international standard storage facility. The crop-based strategies also emphasized the urgent need to regenerate unique accessions and to develop crop-specific regeneration guidelines. This is to prevent the irreversible loss of unique samples and maintain their genetic integrity during regeneration (avoid genetic drift, inbreeding depression etc.). The Trust is presently supporting the regeneration of several collections of food legumes worldwide. The sine qua nun condition for grant attribution is the uniqueness of the accessions and the transfer via SMTA (Standard Material Transfer Agreement) of the newly regenerated accession into a genebank with international standards. Such initiatives allow safe storage of unique samples while placing it in the MLS, thus, making it accessible to all. All conservation strategies report on the necessity to develop inventory/catalogue of existing ex situ maintained crop collections, including their wild relatives. In addition to facilitating germplasm selection and access, these global portals will help identifying global ecogeographical gaps, the degree of coverage of the genepool and level of duplication amongst collections.

In order to assure the regeneration and duplication of high quality samples (genetic integrity, high germination), regeneration guidelines have been developed jointly between the Trust and the System Wide Genetic Resources program (SGRP) of the CGIAR’s Centers. CIAT, ICARDA, ICRISAT, IITA and Bioversity International have developed manuals for chickpea, cowpea, faba bean, grasspea, lentil and pigeonpea, which are available on line (http://cropgenebank.sgrp.cgiar.org). At the crop level, the development of registry is a recurrent recommendation of the conservation strategies. ICRISAT is presently developing such a registry for chickpea in association with ICARDA.

Various non-crop specific regional and global information portals already exist such as EURISCO, SINGER and GRIN. Another global portal, GIGA is presently being developed by Bioversity International (Bioversity), in partnership with the Trust and the Secretariat of ITPGRFA. This new tool will provide comprehensive information on germplasm at the accession level by utilizing innovative functionality. Whatever the size and function of the portal, its quality will rely on the availability of good passport, characterization, evaluation and meta data. The geo-referencing of existing accessions, as well as their characterization and evaluation remains an important task ahead for many collections. In this regard, the development of plant ontology, i.e., standardization of definition and naming of the vocabulary terms is highly needed.

Genetic erosion in ex situ germplasm collections

As crop genetic resources continue to erode worldwide, the need to maintain germplasm is ongoing and urgent. Since the advent of agriculture ~7000 species have been used as crops. However, today only 150 plant species are under extensive global cultivation, with 12 crop species providing 80% of the world’s food (Motley et al. 2006). Although modern agriculture feeds more people on less land than ever before, it also results in high genetic uniformity by planting large areas of the same species with genetically similar cultivars, making entire crops highly vulnerable to pest and diseases and for abiotic stresses (Motley et al. 2006). Thus uniform high-yielding cultivars are displacing traditional local cultivars, a process known as genetic erosion (Breese 1989). There are two approaches for conservation of plant genetic resources, namely in situ and ex situ. While in situ conservation involves the maintenance at natural habitats, ex situ involves conservation outside, like seed bank or field bank and botanical gardens. The danger of landrace diversity vanishing from cultivation was recognized very early upon scientific breeding (von Proskowetz 1890; Schindler 1890). To avoid such genotype extinction and enable long term ex situ conservation, the germplasm collection concept was proposed by Baur (1914) and made a reality by Vavilov in 1920–1940.

To maintain the integrity and functionality of stored seed samples, the long-term conservation of the entire genetic spectra is required together with maintenance of sufficient seed for users (“Conservation, characterization, evaluation, regeneration, distribution and documentation” section). Although the periodical regeneration of ex situ collections is performed according to accepted standards (Sackville and Chorlton 1997) there is the risk that small population sizes together with unequal reproduction of genotypes will lead to a decrease or even loss of diversity (Steiner et al. 1997). Moreover, the process of sexual reproduction plays significant role (Breese 1989) especially in cross-pollinating species. For example, the measurements of Vicia faba inter and intraplot gene flow and pollen dispersion has shown that considerable heterogeneity exists and gene flow is location-, isolation zone- and genotype-dependent (Suso et al. 2006). The number of generations required for both rejuvenation and multiplication should be limited to as few as possible especially for species with orthodox seed (Spagnoletti-Zeuli et al. 1995; Reedy et al. 1995). In regeneration, the total number of individuals, the nominal population size (N), is less important than the average number of actively breeding individuals, the effective population size (Ne). Thus as many seeds as possible should be collected during sampling or the first regeneration to provide adequate seed for immediate use and long-term storage (Penteado et al. 1996). In the case of already existing samples, effective methods to monitor genetic composition prior and after rejuvenation and multiplication should be regularly employed. Furthermore, since the type of breeding system is of paramount importance, germplasm curators must be thoroughly familiar with the breeding systems of their material. Complete self-pollination as the most extreme form of inbreeding results in the rapid fixation of allelic combinations into homozygous genotypes, preserved intact within several generations. Landraces and wild populations are usually genetically heterogeneous and therefore have complex genetic structures, even when the degree of self-pollination is virtually complete. Furthermore, especially in case of wild species, features like seed dormancy, seed shattering, and high variability in flowering time and seed production play important role in the relative frequency of alleles as a result of changes in population genetics. The most effective way of preserving the gene and genotype constitution of highly self-pollinated populations is to maintain the accessions as subsets of homozygous inbred lines (Hirano et al. 2009) thus the chances of allele or genotype lost during regeneration are minimized.

Only a few investigation of the genetic integrity have been performed on long-term conserved accessions undergoing periodical regeneration and changes in allelic frequencies were detected in various species (Enjalbert et al. 1999; Parzies et al. 2000; Le Clerc et al. 2003; van Hintum et al. 2007; Smýkal et al. 2008; Soengas et al. 2009; Cieslarová et al. 2011). These findings imply that regeneration protocols should be improved to accommodate more numerous samples (larger population, Ne) and the composition of the collection should be continuously monitored to prevent the risk of genetic diversity loss.

Datasets compatibility and comparison across various genebanks

The collections are the repositories of millions of years of natural selection and thousands of year’s human artificial selection, domestication and breeding, and can comprise of several dozen or even hundred thousand accessions, totaling ~1 M samples of grain legume genetic resources (SWPGRFA 2009). Traditionally, classifications are made by morphological, agronomical and phytopathological descriptors which are still the only legitimate marker type accepted by the International Union for the Protection of New Varieties of Plants (UPOV), together with known pedigree and passport data. This type of data is commonly found in national and international web-based germplasm catalogues.

In recent years genetic structures of major grain legumes germplasm collections have been investigated by various molecular marker approaches, ranging from protein to DNA polymorphism, including both hybridization (RFLP) and PCR-based (RAPD, AFLP, SSR, SNP and based on retrotransposon insertion polymorphism) markers (“Greater insights into the structure of the germplasm diversity and association mapping using genome-wide markers” section). Improvements in marker methods have been accompanied by refinements in computational methods to convert original raw data into useful representation of diversity and genetic structure. Commonly used distance-based methods (Reif et al. 2005) have been challenged by model-based Bayesian approaches which, with their incorporation of probability, measures of support and their ability to accommodate complex model and different variate types (Beaumont and Rannala 2004; Corander et al. 2007) make them more attractive and powerful. However, after data processing, further use is limited, especially in the absence of cross-comparison between collections. A major challenge is therefore to integrate and analyze these different types of information. Efforts to make unprocessed data available to the research community in the form of open searchable databases are in progress. A key component of this process is the recording and storage of plant genotype and phenotype information in a form of open database. The driving force is accessing and sharing of data rather than providing analytical and statistical tools. This is because the major bottleneck to data integration and utilization is not statistical software but rather difficulty of finding, extracting and managing the data and in the quality of the associated meta data. Very important or even crucial to this process are web access and long-term curation of data supplied by users which is similar to the issues associated with gene/sequence repositories.

The Global Diversity Trust provides long-term funding for an integrated approach to genetic resources and foresees the genebanks as major players in rational global system (CGIAR 2009). Although the legume community, in contrast to world-wide rice and maize projects, is fragmented by species and resources, it can benefit from larger and more advanced projects such as the maize genome, Panzea (Zhao et al. 2006) and the International Rice Information System (IRIS) (McLaren et al. 2005). Among the objectives of all these platforms are shared public platform-independent domain models, ontology, and data formats to enable interoperability of data. It is time for the establishment of virtual world-wide collections combining suitable molecular platforms with robust morphological parameters to address population structure and allow better cross-comparison of results (Smýkal et al. 2009b) for effective germplasm exploitation for crop improvement to meet future demands.

Biodiversity loss, especially crop wild relatives, due to climate change

The Intergovernmental Panel on Climate Change predicts that by 2100 the temperature will rise in the range of 1.1–6.4°C due to global warming, which will have serious consequences to global agricultural and food production (IPCC 2007; Lobell et al. 2008). Associated with global warming is biodiversity loss as organisms are no longer adapted to their changed environment (McLaughlin et al. 2002; Thomas et al. 2004; Biggs et al. 2008). Triggered by the loss of biodiversity and its impact on human well-being and sustainability of ecosystems functioning, the Convention on Biological Diversity (CBD) in 2002 has adopted a resolution to achieve, by 2010, a significant reduction in the current rate of biodiversity loss at the global, regional and national levels as a contribution to poverty alleviation and to the benefit of all life on earth (UNEP 2002). Globally, only a fraction of the total genetic variability that exists in crop wild relatives (CWR) has been preserved in ex situ genebanks. CWR are under threat from ecosystem instability due to climate change, natural habitat destruction resulting from the increasing use of land for agriculture, urbanization and other infrastructure, and from the industrialization of agriculture and change in pest and disease repartition/occurence.

Jarvis et al. (2008) used current and projected future climate data for ~2055, and a climate envelope species distribution model to predict the impact of climate change on the wild relatives of cowpea (Vigna), peanut (Arachis) and potato (Solanum). Their study revealed that climate change strongly affected all taxa, with a estimated 16–22% (depending on migration scenario) of these species predicted to go extinct and most species loosing over 50% of their range, and becoming highly fragmented. Arachis were the most affected group, while Vigna least affected. Likewise, the Western Ghats in the southwestern India are very rich in Vigna and Cajanus (pigeonpea) spp. Of late, with the change in the temperature and photoperiod coupled with other factors such as habitat destruction, the population of same wild species is becoming alarmingly less and calls for a strategy for their immediate collection and conservation. Clearly, these observations suggest that there is an urgent need to identify and effectively conserve CWR that are at risk from climate change. This situation is to be addressed in a major 10 year global intitative to find, collect, catalogue and use the grain legume CWR of has recently been announced by the GCDT in cooperation with the CGIAR centres and the Millenium Seed Bank Kew. The legume crops that will be specifically covered are bean, faba bean, lentil, pea, chickpea, grasspea, cowpea and pigeonpea.

Establishing in situ conservation of CWR, close to their natural habitats, allows new variation to arise and species to adapt to gradual changes in environmental conditions and biotic interactions. This is in contrast to ex situ genebank conservation, where population evolution has effectively been truncated at the time of collection. This will facilitate the ongoing capture of new variation for use in crop improvement programs to develop climate proof crops. It is in this context that several priority areas have been designated as “gene reserve” for in situ conservation of CWR including those from legumes in many countries (Heywood et al. 2007; “Ex situ collection of cultivated and wild genetic resources” section).

Strategic research to enhance the use of genetic resources in crop improvements

Forming core and mini core collections and genotype-based reference sets as resource to identifying new sources of variation

An important reason for the underutilization of germplasm in crop improvement programs is the lack of information on the performance of large number of accessions, particularly for traits of economic importance which display a great deal of genotype × environment interaction and require multilocation evaluation. The development of core collections (~10% of the accessions from entire collection) has been suggested to facilitate the greater use of germplasm in crop improvement programs (Frankel 1984). A core collection is a subset of accessions that represent at least 70% of the genetic variation in the entire collection of a given species (Brown 1989). In situations, where there are large number of accessions in the genebank, for example chickpea and common bean, even a core collection could be unmanageably large so a further reduction is warranted if the diversity range can be maintained. Upadhyaya and Ortiz (2001) suggested a mini core collection based on further sub-sampling (10% of core or 1% of entire collection) of species diversity. Initially a representative core collection is developed using passport and characterization and evaluation data. In the second stage, the core collection is evaluated for various morpho-agronomic and quality traits to select a subset of 10% accessions to form a mini core collection. At both stages, standard clustering procedures are used to separate groups of similar accessions combined with various statistical tests to identify the best representatives. Core and/or mini core collections have been reported in adzuki bean, chickpea, common bean, cowpea, hyacinth bean, lentil, mungbean, pea and pigeonpea (Bisht et al. 1998; Wang et al. 2001; Pengelli and Maass 2001; Upadhyaya and Ortiz 2001; Upadhyaya et al. 2001, 2006; Dwivedi et al. 2005; Mahalakshmi et al. 2007; Logozzo et al. 2007; Zong et al. 2008a; Pérez-Vega et al. 2009; Redden et al. 2009; Hamwieh et al. 2009). Forming such core collection is also underway in different faba bean germplasm collections using passport and molecular data (Zong et al. 2009a; Duc et al. 2009). On the other hand, accessions not included in core/minicore collections are maintained as reserve collections for deeper study for specific traits and gene variants.

The development in genomic science, especially in the last 10 years, has provided the scientific community with a tremendous opportunity to dissect population structure and diversity to form genotype-based reference set and identify genetically diverse germplasm with beneficial traits for use in crops breeding. To this end, researchers first developed global composite collections (using passport, characterization and evaluation data of the entire collection) which were genotyped to form reference sets (~10% of the composite collection) representing ~80% allelic diversity of the composite collection. Such reference sets are available in chickpea (Upadhyaya et al. 2008a), pigeonpea (Upadhyaya et al. 2008b), and lentil (http://S2.generationcp.org). Thus, both core and mini core collections and reference sets provide the basis for association mapping linking genome-wide next generation molecular markers with agriculturally beneficial traits.

Core and/or mini core collections have been used to identify new sources of variation for resistance to biotic and abiotic stresses and for agronomic and/or seed quality traits in chickpea, common bean, lentil, pea, and pigeonpea (Dwivedi et al. 2005; Coyne et al. 2005; Brick et al. 2006; Pande et al. 2006; Smýkal et al. 2008; Vargas-Vázquez et al. 2008; Upadhyaya et al. 2009; Porter et al. 2009) for use in breeding programs.

Other strategies based on geographic descriptors (passport data)

In response to environmental selection pressure plant populations vary phenotypically across their distribution range, forming ecotypes with distinct locally-adapted trait combinations (Allard 1988). Because of their adaptive nature, these trait combinations may be very valuable in plant breeding, especially when elite genepools become too narrow to allow for further productivity gains or to deal with challenges such as new diseases, or the expansion of the crop into new agro-ecosystems (Tanksley and McCouch 1997). The world’s germplasm holdings have great potential as a resource of locally adapted ecotypes because collections are often extensive and well described, and in its evolution the crop is likely to have explored new habitats. The key to unlocking the adaptive potential of our germplasm collections is to accurately describe the principal selection pressures operating at the local level, allowing the user to select germplasm subsets that highlight his/her stress of interest. This is facilitated by characterizing germplasm collection site habitats, based on the assumption that the habitat at the point of collection is responsible for the evolution of the population. It is to be noted that habitat must be defined very widely here, including both the biophysical environment as well as human selection pressure imposed by the demands of the farming system, market or end users.

Description of key selection pressures rests entirely on habitat characterization: the better we know our collection sites, the more accurate our definition of selection pressures is likely to be. Unfortunately, the quality of the passport data in many collections is very variable, and therefore this approach has not been widely applied until recently. However, with the advent of user-friendly, freely-available GIS software and high resolution descriptive data surfaces (Hijmans et al. 2001, 2005; New et al. 2002), it is feasible to extract a diversity of passport data-particularly pertaining to site climate, as long as the collection site coordinates are available, or can be estimated from the site description notes. These approaches have been applied to habitat characterization in bread wheat, under the acronym FIGS (Focused Identification of Germplasm Strategy) (Street et al. 2008), chickpea (Berger 2007; Berger and Turner 2007) and lupin collections (Berger et al. 2008a, b), using variations on the following procedure:

  1. 1.

    Geo-reference/ground-truth collection site coordinates using Google Earth or MS Encarta by comparing site description notes with the screen output. Virtual collection site altitude and distance from the nearest town or geographical feature can be checked against the site description. With well described collections it is possible to follow the route precisely, using road numbers and geographical features, and correct site coordinate estimate errors which are relatively common in collection missions which pre-date the use of GPS systems. For example, Berger et al. (2008b) checked 1,763 collection sites of the Australian Lupin collection using the procedure outlined above and noted that 938 sites were correct, 605 were incorrect, while 220 localities could not be found using the site description notes. While geo-referencing large collections is a slow process, it has recently been simplified with the introduction of a Web-based automated toolkit (http://bg.berkeley.edu/latest/) that converts textual locality descriptions into site coordinates that can be validated by mapping (Guralnick et al. 2006).

  2. 2.

    Extract site-specific climate data using site coordinates. Data is freely available at different levels of resolution:

    1. (a)

      30 s (~1 km grid): Altitudes, monthly mean minimum and maximum temperature, and precipitation (http://www.worldclim.org/) (Hijmans et al. 2005).

    2. (b)

      10 min (~12 km grid): Monthly mean number of frost days, rain days, precipitation coefficients of variance, relative humidity, sun hours, wind speed (http://www.cru.uea.ac.uk/cru/data/hrg/) (New et al. 2002).

    3. (c)

      additional spatial data surfaces are available from DIVA-GIS (http://www.diva-gis.org/) and the CGIAR Consortium for Spatial Information (http://www.csi.cgiar.org/index.asp).

  3. 3.

    Define when the crop typically is sown, flowers and matures at each collection site using climate data, seasonal rules and breeder feedback (Berger 2007; Berger et al. 2008a).

  4. 4.

    Calculate crop-specific bioclimatic variables (i.e., seasonal, vegetative or reproductive phase rainfall etc.) using definitions in 3.

  5. 5.

    Characterize habitats holistically using multivariate techniques visualized graphically and by mapping (Berger 2007; Berger and Turner 2007).

  6. 6.

    Choose contrasting habitats that highlight the stress of interest to form germplasm subsets for evaluation.

The habitat characterization outlined above is largely based on seasonal climate, which is an excellent filter for germplasm collections because of its dominant role in selecting for specific adaptation. In annual plants, time-course studies of artificial populations grown over rainfall and temperature clines demonstrate very strong selection pressure within 3–10 generations (Goldringer et al. 2006; Nichols et al. 2009). In legumes there is considerable evidence for environmental selection pressure on phenology. Habitats which impose high terminal drought stress select for early flowering and short life cycles as a drought escape mechanism, whereas cool, high rainfall habitats select for delayed phenology, allowing for more biomass production, supporting a higher reproductive effort. This has been demonstrated in a variety of wild and domesticated Mediterranean annuals (Ehrman and Cocks 1996), including yellow lupin (Lupinus luteus L.) (Berger et al. 2008a), Trifolium glomeratum L. (Bennett 1997), T. subterraneum L. (Piano et al. 1996), Cicer judaicum Boiss (Ben-David et al. 2010) and chickpea (C. arietinum L.) (Berger et al. 2004, 2006, 2011). These studies confirm that habitat characterization is essential very useful ecophysiological tool to explore the mechanisms underlying specific adaptation. In chickpea, habitat characterization is being used in the search for reproductive chilling tolerance, contrasting germplasm collected from warm and cool flowering temperature habitats, respectively (Berger 2007).

The studies cited above are examples of climatic selection pressure acting directly on plant populations. However, climatic selection pressure can also act indirectly by influencing the likelihood of pests and diseases, which in turn impose selection pressure on plant populations. This approach has been used by the FIGS group (http://www.figstraitmine.com/) to identify resistance in bread wheat to Sunn pest (Eurygaster integriceps Puton) (El Bouhssini et al. 2009), Russian wheat aphid (Diuraphis noxia Kurd) (Street et al. 2008) and powdery mildew (Blumeria graminis f. sp. Tritici D. C. Speer) (Kaur et al. 2008).

Clearly, by characterizing the habitats sampled in our germplasm collections we increase both their value and utility for screening for traits of interest under direct or indirect selection, and undertaking ecophysiological research into plant adaptation. As data surfaces become increasingly precise and diverse in terms of data capture, it behoves germplasm curators and end users to rise to the challenge of fully exploiting the potential of their collections by ensuring the accuracy of their site coordinate data, and using habitat characterization to generate appropriate germplasm subsets for evaluation.

Wild relatives as source of novel variation to broaden legumes cultigens genepool

Domestication of crops was accompanied by a bottleneck reducing genetic diversity (Tanksley and McCouch 1997). Wild relatives are important source to widen the genetic base of cultivated crops (Dwivedi et al. 2008). The development of pre-breeding lines has long been advocated as a means to facilitate the transfer of genes from wild species. Resistance to legume pod borer in pigeonpea (Cajanus cajan) has been introgressed from C acutifolius and C. scarabaeoides (Mallikarjuna et al. 2007). The interspecific progenies involving C. platycarpus, a species from the tertiary gene pool possessing several desirable agronomic traits, show a range of novel traits such as resistance to phytophthora blight, fusarium wilt, pigeonpea sterility mosaic virus and legume pod borer, in addition to extreme variation for plant type (Mallikarjuna et al. 2006). C. scarabaeoides, C. cajanifoloius and C. acutifolius have also been exploited to develop cytoplasmic male sterility (CMS) (Tikka et al. 1997; Saxena et al. 2005; Mallikarjuna and Saxena 2005), which have been used to develop commercial hybrids (Saxena et al. 2010). Wild Cicer species have been introgressed to incorporate resistance to phytophthora root rot, cyst nematode, root-lesion nematode, pod borer, ascochyta blight, botrytis gray mold and tolerance to cold, drought and salinity in chickpea (Cicer aeritinum) (Singh et al. 1990; Collard et al. 2001; Malhotra et al. 2002; Singh et al. 2005; Mallikarjuna et al. 2007), while progenies involving C. reticulaum have shown a range of beneficial traits, such as early flowering and maturity, increased seed weight, seed yield and harvest index (Upadhyaya 2008).

Resistance to bruchids has been transferred into common bean, and such progenies were earlier to mature, produced more grains, the grains were larger in size, and some progenies had high seed mineral content (Acosta-Gallego et al. 2007). The arcelin gene from wild relative has been successfully used to transfer resistance to weevil in common bean (Kornegay et al. 1993). Pisum gene pool consists of Pisum fulvum and P. sativum, both inaccessible to each other (Smartt 1990). There is no decisive evidence of hybridization between these two species; however, it is believed that P. sativum subsp. abyssinicum probably originated from ancient cross involving both species (Vershinin et al. 2003). Both species have same chromosome number; however, differ in karyotypes (Hoey et al. 1996). Smýkal et al. (2009a) recognized five subspecies within P. sativum: abyssinicum, asiaticum, elatius, sativum and transcaucasicum. Resistance to powdery mildew, Fusarium, viruses and bruchid from P. fulvum has been introgressed in pea (P. sativum) (Provvidenti 1990; McPhee et al. 1999; Fondevilla et al. 2008; Byrne et al. 2008). Wider crosses with closely related Lathyrus genus did not result in fertile viable plants (Ochatt et al. 2004) thus introgression within Pisum genus is only possible.

The synthesis of exotic genetic libraries, such as introgression lines (ILs) (also known as chromosome substitution lines) and near isogenic lines (NILs), containing chromosome segments defined by molecular markers from wild species in a constant genetic background of the related cultivated species has made the use of alien genomes more precise and efficient (Zamir 2001; Gur and Zamir 2004; McCouch 2004). These lines provide systematic coverage of the entire genome, thus, a permanent genetic resource, which can be used to screen for multiple traits to identify alleles of economic importance that can be further introgressed to enhance trait value (Zamir 2001; Gur and Zamir 2004). Establishment of ILs with characterized genomic fragments in defined genetic background will allow phenotypic characterization of unlimited number of target traits, which coupled together with molecular tools will provide mean for final gene identification and their subsequent incorporation, pyramiding in desired genotypes ultimately leading to better performing commercial cultivars.

The value of this approach is well documented in barley, canola, rice and tomato (Dwivedi et al. 2008). The progress in developing such genetic resources in legumes has lagged behind, perhaps due to the greater difficulty in generating interspecific crosses and lack of DNA marker technology (both markers and high throughput assay) to monitor genomic coverage in progenies. More recently, there has been surge of developing large numbers of crop-specific markers in legumes and high throughput assays that can be used to monitor genomic regions in distant crosses to develop such genetic resources in legumes. ILs in chickpea and pigeonpea (Upadhyaya et al. unpublished) and pea (Smykal et al. unpublished) in the cultivated background is being established as a tool for novel traits identification. Recombinant inbred lines (RILs) involving P. abyssinicum and P. sativum has been made to determine loci which underwent strong domestication selection (Weeden 2007).

Recent studies in several plant systems have demonstrated that plant allopolyploidization, or interspecific/intergeneric hybridization followed by genome doubling, is often accompanied by unorthodox genetic and epigenetic changes that transgress Mendelian principles (Matzke et al. 1999). Today it is known that introgression might lead to activation of otherwise dormant transposable elements, which consequently reshuffle genome. This effect might ultimately bring more variation and provide further diversity.

Greater insights into the structure of the germplasm diversity and association mapping using genome-wide markers

The size of the ex-situ holding of legume germplasm (“Ex situ collection of cultivated and wild genetic resources” section) represents an extraordinary reserve of genetic diversity. Understanding that diversity and how it is structured and unlocking its potential for crop improvement is an area of high international activity made possible by the rapid advances in scale, robustness and reliability and the sharp fall in unit costs of deploying marker technology to many thousands of accessions. It has become feasible to genotype large proportions, and in some cases whole collections to provide new baseline descriptions of diversity which have started to contribute significantly to our ability to probe the structure of these large collections and target germplasm of particular interest in azuki bean (Xu et al. 2008, 2009; Wang et al. 2009), chickpea (Upadhyaya et al. 2008a), common bean (Pérez-Vega et al. 2009; Kwak and Gepts 2009; Blair et al. 2009, 2010), cowpea (Xu et al. 2007), faba bean (Zong et al. 2009a; Duc et al. 2009), lentil (Liu et al. 2008), and pea (Smýkal et al. 2008, 2009a; Zong et al. 2009b; Jing et al. 2010). These populations and the associated phenotypic data are seen as crucial to the establishment of association mapping resources for future progress in trait analysis in legume crops (Furman 2006; Duc et al. 2009). Valuable new insights into issues such as the frequency and levels of introgression between taxa and genetic distance are already beginning to impact on how collections are being managed and strategies for their structuring and sampling. These developments and the fact that we are now in the genomics era are having a number of direct consequences for genetic resource collections. Firstly there is a growing move to diversify into developing and holding genetic resources other than seeds, i.e., DNA or leaf samples (de Vicente 2004). Secondly new genotyped seed reference collections are being established on single plant progeny lines that were sampled for DNA rather bulks. Thirdly there is an increasing development associated database management and computational and analytical processing to manage these very large datasets. All these changes underline the dynamic nature of the field of plant genetic resources and how it has to adapt to the rapid changes in biological sciences and modern breeding methods.

A wide range of marker types are deployed in studies relating to diversity assessments, the details and merits of which are dependent on the material and the questions being addressed (Ayad et al. 1997; Spooner et al. 2005). Comparisons between different marker types have shown a high degree of comparability as to the inferences that can be drawn such as the study in Pisum comparing SSAP markers with gene-based sequence data (Jing et al. 2007). Further comparisons of genotypic diversity sampled using SSR and qualitative traits in chickpea demonstrated that both were equally effective (Upadhyaya 2008).

Studies into the structure of germplasm diversity are enabling exploration and greater resolution of population structure and quantify genetic distances between and among groups of germplasm than previously possible. The relative ease of development and high levels of polymorphisms that are generated compared to earlier systems such as isozymes has made them the method of choice in the great majority of diversity work being undertaken today. Many of these studies have served to provide markers for cultivar identification which may be of use within cultivar registration systems. Frequently they confirm the significance of certain ecogeographic isolation events with respect to the genetic structure of certain groups (Table 2). They have helped resolve and confirm taxonomic relationships between groups or species while at the same time quantifying the genetic distances and have helped to highlight domestication events and their putative progenitors. Furthermore, Blair et al. (2009) reported significant associations between SSR loci and seed size characteristics in common bean germplasm using association mapping approach, with some located on the same linkage groups as the phaseoline locus, which previously had been associated with seed size, or in other regions of the genome.

Table 2 Recent examples of population structure and diversity in chickpea, lentil, common bean, adzuki bean, pea, faba bean and pigeonpea germplasm collections from 2003 to 2010

These studies come at a time of greater international co-operation and collaborative access to germplasm which is enabling comparisons with other sets of material for the first time. A prime example of this are the growing collaborations with China along with the growth in their own capacity in this field which is clearly highlighting specific regions or genepools for closer attention.