Definition of the Subject

In the last decade, private seed companies have benefitted immensely from molecular breeding (MB) [1]. A private sector-led “gene revolution ” has boosted crop adaptation and productivity in developed countries, by applying and combining the latest advances in molecular biology with cutting-edge information and communication technologies combined with accurate plant phenotyping.

MB allows the stacking of favorable alleles, or genomic regions, for target traits in a desired genetic background thanks to the use of polymorphic molecular markers (MMs) that monitor differences in genomic composition among cultivars, or genotypes, at specific genomic regions, or genes, involved in the expression of those target traits. The use of MMs generally increases the genetic gain per crop cycle compared to selection based on plant phenotyping only, and therefore reduces the number of needed selection cycles, hastening the delivery of improved crop varieties to the farmers.

In contrast to the private sector, MB adoption is still limited in the public sector, and is hardly used at all in developing countries. This is the result of several factors, among which are the following: (1) scientists from the academic world are more interested in discovering new genes or QTLs to be published than in applied biology; (2) until recently access to genomic resources was limited in the public sector, especially for less-studied crops; (3) public access to large-scale genotyping facilities was not easily available; and (4) although a broad set of stand-alone tools are available to conduct the multiple types of analyses necessitated by MB, no single analytical pipeline is available today in the public sector allowing integrated analysis in a user-friendly mode.

The situation is even more critical in developing countries as additional limitations include shortage of well-trained personnel, inadequate laboratory and field infrastructure, lack of ISs with applicable and flexible analysis tools, as well as inappropriate funding – simply put, resource-limited breeding programs. As a result, the developing world has yet to benefit from the MB revolution, and most of the countries indeed lack the fundamental prerequisites for a move to informatics powered breeding.

Under those circumstances, developing and deploying a sustainable web-based Molecular Breeding Platform (MBP) as a one-stop shop for information, analytical tools, and related services to help design and conduct marker-assisted breeding experiments in the most efficient way will alleviate many of the bottlenecks mentioned earlier. Such a platform will enable breeding programs in the public and private sectors in developing countries to accelerate variety development using marker technologies for different breeding purposes: major genes or transgene introgression via marker-assisted backcrossing (MABC), gene pyramiding via marker-assisted selection (MAS), marker-assisted recurrent selection (MARS) and, in a not too distant future, genome-wide selection (GWS).

Introduction

Since the dawn of agriculture, mankind has sought to improve crops by selecting individual plants with the most desirable characteristics or traits. Agricultural productivity has been progressively enhanced by constant innovation, including improved crop varieties to increase production in specific environments [2]. The major objective of crop improvement is to identify within heterogeneous materials those individuals for which favorable alleles are present at the highest proportion of loci involved in the expression of key traits [3]. The classical plant breeding method is based on increasing the probability of selecting such individuals from populations generated from sexual matings. Selection has traditionally been carried out at the whole-plant level (i.e., phenotype), which represents the net result of genotype and environment (and their interactions). Phenotypic selection has delivered tremendous genetic gains in most cultivated crop species, but is severely limited when faced with traits that are heavily modulated by the environment [4]. In addition, the nature of some traits can make the phenotypic testing procedure itself complex, unreliable, or expensive (or a combination of these).

The recent remarkable development of molecular genetics and associated technologies represents a quantum leap in our understanding of the underlying genetics of important traits for crop improvement. The ongoing revolutions in molecular biology and information technology offer tremendous and unprecedented opportunities for enhancing the effectiveness and efficiency of MB programs. Indirect selection, based on genetic markers, presents an efficient complementary breeding tool to phenotypic selection. Individual genes or QTLs having an impact upon target traits can be identified and linked with one or more markers, and then the marker loci can be used as a surrogate for the trait, resulting in greatly enhanced breeding efficiency [58].

Molecular techniques can have an impact upon every stage of the breeding process from parental selection and cross prediction [9], to introgression of known genes [10] and population enhancement. Selection of beneficial alleles of known genes can be done through marker-assisted selection (MAS) – the selection of specific alleles for traits conditioned by a few loci [10] – or through marker-assisted backcrossing (MABC) – transferring specific alleles of a limited number of loci from one genetic background to another, including transgenes [11, 12]. For marker-assisted population improvement, individuals selected from a segregating population based on their marker genotype are inter-mated at random to produce the following generation, at which point the same process can be repeated a number of times [13]. A second approach aims at direct recombination between selected individuals as part of a breeding scheme, seeking to generate an ideal genotype or ideotype [14]. The ideotype is predefined on the basis of QTL mapping within the segregating population, combined with the use of multi-trait selection indices that can also consider historical QTL data. This variety development approach is commonly referred to as marker-assisted recurrent selection (MARS) [1517], or genotype construction. An alternative is to infer a predictive function using all available markers jointly, without significant testing and without identifying a priori a subset of markers associated with the traits of interest. This more recent approach coming from genomic medicine [18, 19], and then applied successfully in animal breeding [20] named genome-wide selection (GWS) , also appears to be quite promising in crop improvement [7].

Concomitantly with the evolution of marker technologies becoming increasingly “data rich,” the amount of data produced by plant breeding programs has increased dramatically in recent years. Increasingly, the critical factor determining the rate of progress in plant breeding programs is their capacity to manage large amounts of data efficiently and subsequently maximize the timely extraction of meaningful information from that data for use in selection decisions. If genotyping has become less of an issue, the efficient management of genotyping data in a broad sense, including sequence information, is increasingly becoming a major challenge in modern plant breeding. This was recognized early on in the private sector where the establishment of platforms or pipelines integrating field and laboratory processes with powerful data management systems (DMS) that merged and analyzed the data collected at every step and guided the process of crop improvement toward the release of improved cultivars has been the key to successful adoption of MB.

A few initiatives have taken place in the public sector to establish efficient data management or ISs [21, 22]. One of these has been led by several centers of the Consultative Group on International Agricultural Research (CGIAR) which have worked over the past decade, along with advanced research institutes (ARIs) and national agricultural research systems (NARS) in developing countries, to develop an open-source generic IS, the International Crop Information System (ICIS), to handle pedigree information, genetic resource, and crop improvement information [23]. Based on some elements of ICIS, the CGIAR Generation Challenge Programme (GCP, http://www.generationcp.org) has invested in integrating crop information with genomic and genetic information and in using existing or developing new public decision-support tools to access and analyze information resources in an integrated and user-friendly way [24]. Another initiative has been led by Primary Industries and Fisheries (PI&F) of the Queensland Government Department of Employment, Economic Development and Innovation in Australia, which recognized that effective data management is an essential element in obtaining maximum benefit from their investment in plant breeding. In conjunction with the New South Wales Department of Primary Industries (NSW DPI) and more recently Dart Pty Ltd (http://www.diversityarrays.com/) they are in the process of developing a linked IS for plant breeding (Katmandoo) that includes applications for capturing field data using hand-held computers, barcode-based seed management systems, and databases to store and link field trial data, laboratory data, genealogical data, and marker data [25].

Although an IS involves far more than a database, the development and implementation of a suitable database system alone remains a real challenge because of the fast turnover in technologies, the need to manage and integrate increasingly diverse and complex data types, and the exponential increase in data volume. Previous solutions, such as central databases, journal-based publication, and manually intensive data curation, are now being enhanced with new systems for federated databases, database publication, and more automated management of data flows and quality control. Along with emerging technologies that enhance connectivity and data retrieval, these advances should help create a powerful knowledge environment for genotype–phenotype information [26].

In addition to efficient data management, advances in statistical methodology [2729], graphical visualization tools, and simulation modeling [9, 3032] have greatly enhanced these ISs. The availability of molecular data linked to computable pedigrees [33] and phenotypic evaluation now makes genotype–phenotype analysis a practical reality [34].

In order to realize the full potential of marker technologies and bioinformatics in plant breeding, tools for molecular characterization, accurate phenotyping , efficient ISs, and effective data analysis must be integrated with breeding workflows managing pedigree, phenotypic, genotypic, and adaptation data. The goals of this integration of technologies are to (1) create genotype–phenotype trait knowledge for breeding objectives, and (2) use that knowledge in product development and deployment [4].

This entry generally explores the pace of innovation in world agriculture and the rise of MB. It particularly illustrates the accelerating application of information and communication technologies to the information management challenges of MB and, as a result, the emergence of virtual molecular breeding platforms (MBPs) as a vital tool for accelerating genetic gains and rapidly developing more resilient and more productive cultivars.

This entry reviews the rationale for access to MB technology and services and the status of existing public analytical pipelines and ISs for MB, and offers a detailed case study for the CGIAR GCP Integrated Breeding Platform (IBP) – the pioneer public sector MBP specifically targeting developing country breeding programs. It explores the gaps between countries and between crops in the application of informatics-powered MB approaches, and the potential for adopting MBPs to close these gaps; and it reviews institutional, governmental, and public support for these approaches. The entry discusses the challenges and opportunities inherent in MBPs, and the potential economic impact of MB. Finally, the entry explores the future directions and perspectives of MBPs.

Marker Technologies and Service Laboratories

Markers are “characters” whose pattern of inheritance can be followed at the morphological (e.g., flower color), biochemical (e.g., proteins and/or isozymes), or molecular (DNA) levels. They are so called because they can be used to elicit, albeit indirectly, information concerning the inheritance of “real” traits. The major advantages of molecular over other classes of markers are that their number is potentially unlimited, their dispersion across the genome is complete, their expression is unaffected by the environment and their assessment is independent of the stage of plant development [35]. During the past two decades, DNA technology has been exploited to advance the identification, mapping, and isolation of genes in a wide range of crop species. The first generation of DNA markers, restriction fragment length polymorphisms (RFLPs) , was used to construct the earliest genome-wide linkage maps [36] and identify the first QTLs [37, 38]. During the 1990s, emphasis switched to assays based on the polymerase chain reaction (PCR), which are much easier to use and potentially automatable [39]. The development of simple sequence repeats (SSRs) [40], amplified fragment length polymorphisms (AFLPs) [41], and single nucleotide polymorphism (SNP) [42] opened the door for large-scale deployment of marker technology in genomics and progeny screening.

SNPs are amenable to very high throughput and a wide range of detection techniques has been developed for them, from singleplex systems to high-density arrays. They can be used in fully integrated robotic systems going from automated DNA extraction to automated scoring in high-throughput detection platforms. The combination of increase in throughput and lowering in costs makes SNPs highly suitable to intensive marker applications in plant breeding such as MARS and the emerging approach of GWS. Based on SNP technology, production of molecular marker (MM) data expanded more than 40-fold between 2000 and 2006 at Monsanto, while cost per data point decreased to one sixth of the original cost [43].

With the transition from SSRs to SNPs and the concomitant large increase in the demand for genotyping as markers get more and more widely used in a broad range of applications from medicine to plant breeding, marker genotyping laboratories have evolved from relatively low-tech operations to highly automated, high-throughput laboratories using an array of sophisticated equipment (pipetting robots, high-density PCR, high-throughput SNP detection machines, high-level informatics). Although large private seed companies have had the need and the resources to put in place large-scale genotyping laboratories for their own uses, smaller programs, especially in the public sector, have typically not had the resources or the justification to establish such large operations to respond to their increasing need for SNP genotyping data. In response to this need, a few private marker service laboratories have sprung up over the past few years, which can provide complete genotyping services for their customers, from DNA extraction to generation of large numbers of SNP or other datapoints. Due to their broad customer base (from medical research laboratories to animal and plant breeding operations, both public and private), these laboratories can have a large volume of datapoint production which may lead to low costs for the customer and high throughput. They are able to invest in the most advanced equipment to keep up with the constant evolution of genotyping technologies and are able to pass on the resulting benefits to their customers. Processes have now been put in place for rapid shipment of leaf samples from any location (field or laboratory) around the world without any restrictions. Examples of such companies that can service breeding programs from around the world are DNA LandMarks, Inc. of Saint-Jean-sur-Richelieu, Quebec, Canada (http://www.dnalandmarks.ca/english/) and KBioscience Ltd. of Hoddesdon Herts, UK (http://www.kbioscience.co.uk/). For many public breeding programs and small companies, especially in developing countries, it is now more efficient to use those types of contract genotyping services than to try to support their growing MB needs through the establishment of an in-house laboratory. Functional and reliable SNP laboratories are especially difficult to establish in many developing countries due to the unreliability of the power supply, difficulties in shipping and storing and a low level of resources for the purchase and maintenance of sophisticated equipment. The GCP is facilitating the linkage between users and service laboratories through its marker services, a component of the breeding services offered through the GCP’s IBP.

Analytical Tools, Software, and Pipelines

One of the achievements of the plant biotechnology revolution of the last two decades has been the development of molecular genetics and associated technologies, which have led to the development of an improved understanding of the basis of inheritance of agronomic traits. The genomic segments or QTLs involved in the determination of phenotype can be identified from the analysis of phenotypic data in conjunction with allelic segregation at loci distributed throughout the genome. Because of this, the mode of inheritance, as well as the gene action underlying the QTL, can be deduced [44]. As with the improvement in marker technologies, the statistical tools needed for QTL mapping have evolved from a rudimentary to a very sophisticated level [45]. Previous approaches based on multiple regression methods, using least squares or generalized least squares estimation methods [46, 47], have evolved to composite interval mapping [9], mixed model approaches using maximum likelihood or restricted maximum likelihood (REML) [48], and Markov Chain Monte Carlo (MCMC) algorithms [49, 50], which use Bayesian statistics to estimate posterior probabilities by sampling from the data. In parallel, with progress in the characterization of genetic effects at QTLs and refinement of QTL peak position through meta-analysis [51], advances have also been made in understanding the impact of the environment on plant phenotype. The mapping of QTLs for multiple traits has allowed the quantification of QTL by environment interaction (QEI) [52] and, more recently, approaches using factorial regression mixed models have been applied to model both genotype by environment interaction [53]and QEI [48, 54, 55]. Recent approaches are now implemented to evaluate gene networking [56] and epistasis, based on Bayesian approaches [57, 58] or through stepwise regression by considering all marker information simultaneously [59, 60]. Epistasis and balanced polymorphism influence complex trait variation [61, 62], and classical generation means analyses, estimates of variance components, and QTL mapping indicated an important role of digenic and/or higher-order epistatic effects for all biomass-related traits in model plants [63] and in crops [6466]. It will be critical to implement the most efficient MB strategies in order to evaluate and include these genetic effects in breeding schemes [60].

All tools necessary to run MB projects, from the simplest to the most complicated approaches, are available today in the public domain. They are based on different algorithms and statistical approaches, from the very simple to the more complex. One challenge is the diversity of tools available for a given analytical function or along the different steps of an analytical pathway, making the choice of the “right” tool difficult and the move from one analytical step to the next very tedious due to the complete lack of common standards and formatting across tools. The number of applications available for QTL analysis illustrates well the multiplicity and diversity of tools that are available for a given analysis. The following software packages have been developed over the past 20 years:

  • Mapmaker/QTL [67]

  • MapQTL [68, 69]

  • QTL Cartographer [9, 70]

  • PLABQTL [71]

  • QGene [72, 73]

  • Map Manager QT [74]

  • iCIM [59, 60]

For most of these applications, the first versions were already available 15 years ago and the multiplicity and possible duplication generated by the independent development of these tools were already identified at the Gordon Research Conference on Quantitative Genetics and Biotechnology held in February 1997 in Ventura, California. A main objective of that workshop was to survey participants on the attributes of several software packages for QTL mapping and to define their analytical needs which were not presently met by the existing software packages. The workshop covered software for QTL mapping in inbred and outcrossed populations and the conclusions are available at: http://www.stat.wisc.edu/~yandell/statgen/software/biosci/qtl.html. In those conclusions one can read that “[a] consensus was reached that there is considerable overlap in the kinds of matings handled and statistics produced by the various QTL mapping software packages,” clearly identifying the need for better coordinated efforts. Such coordination never took place, as is often the case in public research. As a result, most of those QTL packages are still available today, although in more sophisticated versions. They are all suitable for QTL mapping but use different statistical algorithms, present a different user interface, and necessitate different input and output file formats.

Some specialists in the field realized that the public software packages are usually too specialized and too technical in statistics to permit a thorough understanding by the many experimental geneticists and molecular biologists who would want to use them. In addition, the fast methodological advances, coupled with a range of stand-alone software, make it difficult for expert as well as non-expert users to decide on the best tools when designing and analyzing their genetic studies. Based on this rationale, a few commercial analytical pipelines emerged about a decade ago that include some of the QTL packages mentioned above. Two of them are Kyazma and GenStat®. These applications assist plant scientists by providing easy access to statistical packages for phenotypic and genotypic data. Kyazma was founded in the spring of 2003 (http://www.kyazma.nl/), and offers powerful methods for genetic linkage mapping and QTL analysis. Since 2003 Kyazma has taken over the development of the software packages JoinMap® and MapQTL® from Biometris of Plant Research International. Kyazma handles the distribution and support of JoinMap and MapQTL and, in collaboration with the statistical geneticists of Biometris, Kyazma provides introductory courses on genetic linkage mapping and QTL analysis in order to make the use of the software even more accessible. GenStat encompasses statistical data analysis software for biological and life science markets worldwide. GenStat includes the ASReml algorithm (average information algorithm for REML) to undertake very efficient meta-analyses of data with linear mixed models. The development of GenStat at Rothamsted began in 1968, when John Nelder took over from Frank Yates as Head of Statistics. Roger Payne took over leadership of the GenStat activity when John Nelder retired in 1985 (http://www.vsni.co.uk/). An important feature of GenStat is that it has been developed in (and now in collaboration with) a Statistics Department whose members have been responsible for many of the most widely used methods in applied statistics. Examples include analysis of variance, design of experiments, maximum likelihood, generalized linear models, canonical variates analysis, and recent developments in the analysis of mixed models by REML.

These commercial analytical pipelines offer a set of quality tools to researchers in plant science. However, they cover only a part of the configurable workflow system that is required for integrated breeding activities. In addition, there is a need to have tools and analytical pipelines that are freely available and, if possible, based on open source code to avoid dependence on private companies that might discontinue support and ensure access to the tools even with limited financial resources, which is a critical constraint in the arena of research for development, of which breeding programs of developing countries are key partners. It is important to underline that a version of GenStat that does not include the most advanced version of the different tools but allows users to run most basic analyses is available for breeding programs in developing countries. The web site for the GenStat Discovery Edition is http://www.vsni.co.uk/software/genstat-discovery/, but this version of the pipeline does not include QTL selection based on the mixed model approach, which is available in the commercial version.

The issue of open source code is an important one as, even for freely-available tools, the lack of availability of the source code limits the further expansion and customization of the tools. It also reduces the opportunity of researchers in developing countries to participate in methodology development. Over the last decade, a programming language and software environment for statistical computing and graphics, R, is becoming the reference in open source code for a broad range of biological applications, including genetic analysis (http://www.r-project.org/). Its source code is freely available under the GNU General Public License (http://en.wikipedia.org/wiki/GNU_General_Public_License). The R language has become a de facto standard among statisticians for the development of statistical software. It compiles and runs on a wide variety of UNIX, Windows, and MacOS platforms. R is similar to other programming languages, such as C, Java, and Perl, in that it helps people perform a wide variety of computing tasks by giving them access to various commands. For statisticians, however, R is particularly useful because it contains a number of built-in modules for organizing data, running calculations on the information, and creating graphical representations of the data sets. R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, etc.) [29] and graphical techniques, and is highly extensible. Close to 1,600 different packages reside on just one of the many web sites devoted to R, and the number of packages has grown exponentially. However, R is difficult to use directly and procedures based on R must be wrapped in user-friendly menu systems if field biologists are to use them.

Information Systems

A functional IS involves far more than an analytical pipeline; it is a complete system that should include:

  • A project planning module

  • A germplasm management module

  • A robust relational database

  • Analytical standards

  • Data collection and cleaning tools

  • Analytical and decision support tools

  • Query tools

  • A cyber infrastructure (CI) that links the different tools in a cohesive and user-friendly way

Key elements of an IS are obviously the CI and the DMS as described in the following section. The value of an IS does not only reside in the quality of the individual tools or modules that are part of it, but rather in the CI or middleware that ensures cohesion across tools and efficient communication with databases.

There are not many examples of breeding ISs in the public domain. One example is the ICIS (http://www.icis.cgiar.org, [23]). ICIS is an open source IS for managing genetic resource and breeding information for any crop species. It has been developed over the last 10 years through collaboration between centers of the CGIAR, some NARS, and private companies. The ICIS system is Windows-based, and distributable on CD-ROM or via the Internet. It contains a genealogy management system (GMS , [33]) to capture and process historical genealogies as well as to maintain evolving pedigrees and to provide the basis for unique identification using internationally accepted nomenclature conventions for each crop; a seed inventory management system (IMS) ; a DMS [75] for genetic, phenotypic, and environmental data generated through evaluation and testing, as well as for providing links to genomic maps; links to geographic ISs that can manipulate all data associated with latitude and longitude (e.g., international, regional, and national testing programs); applications for maintaining, updating, and correcting genealogy records and tracking changes and updates; applications for producing field books and managing sets of breeding material, and for diagnostics such as coefficients of parentage and genetic profiles for planning crosses; tools to add new breeding methods, new data fields, and new traits; and tools for submitting data to crop curators and for distributing data updates via CD-ROM and electronic networks. The community of ICIS collaborators communicates via the ICIS Wiki (http://www.icis.cgiar.org), where all design and development decisions are documented. Feature requests and bug reports are made through the ICIS Communications project and the source code is published through various other ICIS projects on CropForge (http://cropforge.org). A commercial company, Phenome-Networks, has implemented a Web-based IS based on ICIS (http://phnserver.phenome-networks.com/).

Another system available is the Katmandoo Biosciences Data Management System (http://www.katmandoo.org/, [25]), which is a freely available, open source DMS for plant breeders developed by PI&F, NSW DPI, and DArT Pty. Ltd. It comprises linked ISs for plant breeding including applications for capturing field data using hand-held computers, barcode-based seed management systems, and databases to store and link field trial data, laboratory data, genealogical data, and marker data. A particular focus is on the use of whole-genome MM information to create graphical genotypes, track the ancestral origin of chromosomal regions, validate pedigrees, and infer missing data. It includes the applications of the Pedigree-Based Marker-Assisted Selection System (PBMASS) developed by PI&F as well as a seed management system, a digital field book for hand-held computers, and a system for directly recording weights of barcoded samples.

Both ISs struggle with the problem of integrating the different components into a single configurable system which matches the workflows of different breeding projects. Such a workflow should provide the user all tools and analytical means required to run a crop cycle: from germplasm preparation and planting, through the collection of phenotypic and the production of the genotypic data and their analysis, to the identification of genotypes to be crossed or the selection of suitable genotypes to be planted in the next cycle (Fig. 1).

Molecular Breeding Platforms in World Agriculture. Figure 1
figure 12215figure 12215

Different activities conducted during the crop cycle of an MB experiment presented in a generic way

In order to do this effectively, a CI is required which allows syntactic linkage between different data resources and applications.

Cyberinfrastructure and Data Management

We have referred to the revolution in Information and Communication Technology and the opportunities it presents for improving the efficiency of plant breeding . However, plant breeding is not the only area of biology being affected by this revolution and, in fact, the successful deployment of MB depends on other fields of information-intensive biology delivering knowledge (markers and methodology) to plant breeding. Even more is expected of the information and communications technology (ICT) revolution in the developing world, as it offers an opportunity for scientists there to overcome some of the constraints of isolation, the “brain drain,” and the lack of infrastructure which have prevented them from fully participating in science for development in the past [76].

It is generally recognized that upstream biology is increasingly reliant on networks of integrated information and on applications for analyzing and visualizing that information. Discipline-specific (sequence and protein databases) and model organism ISs such as Graingenes (http://wheat.pw.usda.gov/GG2/index.shtml), Gramene (http://www.gramene.org/), MaizeGDB (http://www.maizegdb.org/), and Soybase (http://www.soybase.org/) have been developed to facilitate exchanges in molecular biology and functional genomics. As noted above, plant breeding depends on these upstream sciences of molecular biology, functional genomics, and comparative biology to deliver the knowledge needed to deploy MB. The bottleneck in the overall network has been the technology needed to integrate diverse and distributed information resources, and many information scientists have been working on this problem [24, 26, 77].

One constraint to integration of scientific information is the necessity to have a standard terminology for biological concepts across species and disciplines. A successful example of such standardization is the Gene Ontology (GO) initiative (http://www.geneontology.org, [78]). Another more specialized ontology initiative, especially pertinent to agriculture, is the Plant Ontology Consortium (POC: http://www.plantontology.org, [7981]). However, these formal descriptions remain somewhat limited to biology of model plants and controlled environments. A key challenge will be to extend such standards to describe characteristics of plants growing in the unique, stress-prone environments found within the developing world to ensure a wider impact of such standards on international agriculture. The GCP has been working with POC to expand these ontologies to economic traits and farming environments so that they can be used in the field of plant breeding [82].

Another constraint to the efficient utilization of genomic information is the sheer volume of sequence data that can now be generated very cheaply across numerous genotypes. ISs to handle this volume of information are struggling to keep up. In plant biology, some examples of systems aiming to handle these torrents of data are the Germinate database ([83], http://bioinf.scri.ac.uk/public/?page_id=159) and the Genomic Diversity and Phenotype Connection (GDPC, http://www.maizegenetics.net/gdpc/). The primary goal of Germinate is to develop a robust database which may be used for the storage and retrieval of a wide variety of data types for a broad range of plant species. Germinate focuses on genotypic, phenotypic, and passport data, but has been designed to potentially handle a much wider range of data including, but not limited to, ecogeographic, genetic diversity, pedigree, and trait data, and will permit users to query across these different types of data. The developers have aimed to provide a versatile database structure, which can be simple, requires little maintenance, may be run on a desktop computer, and yet has the potential to be scaled to a large, well-curated database running on a server. The design of Germinate provides a generic database framework from which interfaces ranging from simple to complex may be used as a gateway to the data. The data tables are structured in a way that they are able to hold information ranging from simple data associated with a single accession or plant, to complex data sets, images, and detailed text information. Features of the Germinate database structure include its ability to access any information associated with a group of accessions and to relate different types of information through their association with an accession. The GDPC database was designed as a research database to support association genetics applications such as Tassel (http://www.maizegenetics.net/index.php?option=com_content&task=view&id=89&Itemid=119) and is being extended to handle higher and higher densities of genotyping and sequence data. The second version of Germinate seems quite similar to GDPC and if new databases are developed to handle the large data files to be generated soon through high-throughput sequencing, some conversion tools should be easily developed to migrate data from one system to another.

Finally, the problem of integrating all these diverse and widely-distributed information resources is a major informatics challenge, which is being tackled on several fronts at several levels of complexity. The BioMOBY project ([84], http://www.biomoby.org, [85]) and the Semantic Web seek to define standards that will allow computer programs to interpret requests for information or services, find informatics resources capable of fulfilling those requests, and return the results without the authors of the interacting software having specifically collaborated. In the private sector, solutions have been more pragmatic and Enterprise Software solutions have been developed to link data resources and applications with specific services. The iPlant Collaborative (http://www.iplantcollaborative.org/) is a National Science Foundation (NSF)-funded initiative designed to bring these Enterprise Software solutions to the biological sciences in the form of CI which can support any biological data resource and analytical application. iPlant and the GCP are collaborating on integrating plant breeding information resources and applications into the infrastructure. This will automatically link these resources to upstream biological applications using the same infrastructure such as that used by the Systems Biology Knowledgebase initiative (http://genomicscience.energy.gov/compbio/#page=news) of the US Department of Energy which will be producing knowledge needed for crop improvement.

With all the progress achieved in marker technology, software development, analytical pipelines, and DMS, it is time to provide an IS, available through a public platform, that will offer breeding programs in developed and developing countries access to modern breeding technologies, in an integrated and configurable way, to boost crop quality and productivity.

Case Study: GCP’s Integrated Breeding Platform

To fill this gap in the public sector and in particular in the arena of research for development, the GCP has been coordinating the development of the IBP (www.generationcp.org/ibp) in collaboration with scientists from ARIs, CGIAR centers, and national research programs since mid-2009. In a first phase the IBP aims at serving the needs of a set of 14 pioneer “user cases” – MB projects for eight crops in 16 developing countries in Africa and Asia. Leading scientists of those user cases help in testing the prototypes developed for the different tools of the analytical pipeline and contribute to the monitoring and evaluation of the platform development. This ensures that IBP development is driven by real breeding needs and its interface is user-friendly.

Objective of the IBP

The overall objective of the IBP project is to provide access to modern breeding technologies, breeding material, and related information and services in a centralized and functional manner to improve plant breeding efficiency in developing countries and hence facilitate the adoption of MB approaches. The short-term objective of the project (the initial phase) is to establish – through a client-centered approach – a minimum set of tools, data management infrastructure, and services to meet the needs and enhance the efficiency of the 14 user cases.

To achieve the overall objective, GCP is developing and deploying a sustainable IBP as a one-stop shop for information, analytical tools, and related services to design, implement, and analyze MB experiments. This platform should enable breeding programs in the public and private sectors to accelerate variety development for developing countries using marker technologies – from simple gene or transgene introgression to gene pyramiding and complex MARS and GWS projects. Hence IBP aims at bringing cutting-edge breeding technologies to breeding programs that are too resource-restricted to invest in the requisite genotyping and data management infrastructure and capacity on their own.

The IBP Partnerships

The primary stakeholders of the platform are plant scientists – at this time specifically breeders leading the selected MB projects of the 14 pioneer user cases. These pioneer user cases are all recently initiated marker-assisted breeding projects with specific budgets, objectives, and work plans. The needs of the projects are defining the user requirements, and hence the design and development prioritization of the different elements of the platform. In selecting the user cases, crop diversity was a primary consideration, since the platform is supposed to address the needs of a broad variety of crops. The platform’s reciprocal contribution to these breeding projects is in helping them overcome bottlenecks that would compromise final product delivery and in enhancing their overall efficiency and chances of success by providing appropriate tools and support.

The developmental phase of the IBP brings together highly regarded public research teams – institutes and individuals who have been working on the challenges of crop information management and analysis, biometrics, and quantitative genetics. This team of bioinformaticians, statisticians, and developers aims to design and develop the different elements of the platform, based on needs and priorities defined by the user cases.

A continuous dialogue between users, developers, and service providers ensures a healthy balance between having a user-driven platform on the one hand, with a reasonable degree of “technology push” on the other hand, to ensure that users are kept abreast of technological solutions they may not be aware of but that would facilitate and accelerate breeding work.

The private sector has led the application of MB approaches and utilization of MBPs. The IBP is the first public sector effort of this magnitude aimed at developing and deploying an MBP. Given that MB for complex polygenic traits, and more so MARS, is still in its infancy in the public sector, it is recognized that efficient partnerships with the major private sector transnational seed companies is a strong prerequisite for the success of the IBP project. Consultations are ongoing with leaders in MB at Limagrain, Monsanto, Pioneer-DuPont, and Syngenta. Partnership with the private sector includes mainly some technology transfer, especially for stand-alone tools, and access to human resources to advise on the development of the platform and contribute to developing new tools or implement data management. The users, tools and services, and partnership of the platform are presented in Fig. 2.

Molecular Breeding Platforms in World Agriculture. Figure 2
figure 12216figure 12216

The IBP partnership

The Platform

The IBP has three broad components (see Fig. 3): a Web-based portal and helpdesk, an open-source IS incorporating an adaptable breeding workflow system, and breeding and support services.

Molecular Breeding Platforms in World Agriculture. Figure 3
figure 12217figure 12217

The IBP and its three main components

The stepwise development of the breeding workflow includes: (1) access to existing tools, (2) development of stand-alone new tools or adapted versions of existing tools to address the needs of the user cases, and (3) the integration of those tools into a CI (collaboration with the iPlant initiative) or through a thin middleware linking with local database to form a user-friendly configurable workflow system (CWS). A first version of the CWS, including an adequate set of tools, should be available by mid-2012, with full unfettered access scheduled for 2014.

Component 1: The Integrated Breeding Portal and Helpdesk

Inaugurated by mid-2011, the portal is the online gateway through which users access all the tools and services of the IBP. Through the portal, users will select and download tools and instructions, order materials, and procure laboratory services.

The portal’s helpdesk facilitates its use and ensures access for users who cannot efficiently use the Web interface by providing the elements they need via email, compact disc, and other offline media.

Through their user-friendly networking components, the Portal and Helpdesk will stimulate the development of collaborative crop-based and discipline-based communities of practice (CoPs) . The CoPs are expected to promote the application of MB techniques and the utilization of facilitative information management technologies, enhance data and germplasm sharing, and generally advance modern breeding capacity by linking CGIAR Centers and ARIs with developing-country breeding programs and research organizations. There is a strong hope that CoPs will facilitate and accelerate a paradigm shift to a more collaborative, outward-looking, technology-enhanced approach to breeding.

Component 2: The Information System

The IBP IS is structured as a CWS, with access to both local databases and distributed resources, such as central crop databases, molecular databases from GCP partner sites and from public initiatives such as Gramene and GrainGenes.

The Configurable Workflow System

This CWS is the operational representation of the IS and will be implemented by assembling informatics tools into applications configured to match specific breeding workflows (e.g., for MAS, MABC, or MARS; Fig. 4). The tools are organized in a series of functional modules comprising the Integrated Breeding Workbench, which is really the background structure that implements the CWS.

Molecular Breeding Platforms in World Agriculture. Figure 4
figure 12218figure 12218

The IBP configurable workflow system

The IBP CWS drives the users through the different practical steps or activities of an MB project. The setup of the experiment and the germplasm management are the first steps of any project, to be followed by a set of activities that can be repeated during subsequent crop cycles, depending on the breeding objective of the experiment:

  • Germplasm evaluation

  • Genetic analysis

  • Data management

  • Data analysis, and

  • Breeding decisions

The Integrated Breeding Workbench

The workbench starts as a blank slate and the first task for the user is to open or create a project. A project manages a breeding workflow for a particular crop and a specified user. The initial sets of tools which should be available are grouped in seven modules: Administration Tools, Configuration Tools, Query Tools, and Workflow Initialization Tools (genealogy, data management, analysis, and decision support; Fig. 5).

Molecular Breeding Platforms in World Agriculture. Figure 5
figure 12219figure 12219

The integrated breeding workbench

The administration module of the workbench specifies the crop, which identifies the central (public) data resources that will be accessible to the project. This includes a central genealogy database , a central phenotype database, a public gene management database, and a central genotype database . Each installation provides access to local (private) data resources. These data resources include a private or local database for the above data types as well as a seed inventory management system. Each installation has at least one user with administrative privileges. Users are identified by authentication codes (username and password) for access to specific private data resources. (“Private” simply means “requiring authentication for access” and several users may have access to the same private data.)

The first functionality of the workbench asks the user to open a project by selecting from a list of available project configuration “files.” Once the configuration is selected, the availability of the public data resources should be checked, the user authentication codes verified, and the local data resources checked. Next, the list of modules should be reviewed and checked for availability and, depending on the state of the workflow, icons or menus should be made available for modules and tools.

The configuration tools allow users to:

  • Select or specify naming conventions for germplasm, germplasm lists, studies, etc.

  • Use and update ontologies such as germplasm methods and the trait dictionary

  • Update breeding, testing, or collection locations

  • Create and modify study templates

The query tools will depend on the data resources specified in the project configuration, and examples are:

  • A germplasm and pedigree viewer

  • A study browser to view phenotype or genotype data

  • A data miner for identifying data patterns

  • A cross-study query builder for linking different data sets

  • A gene catalog viewer for viewing genetic diversity

  • A genotype and trait viewer for visualizing graphical genotypes and trait markers

The workflow initialization tools comprise a set of modules (genealogy, data management, analysis, and decision support tools) that provide the user with a choice of different tools to achieve precise breeding objectives. Users might construct different breeding workflows to match their project activities. The user will only see the workbench tools and settings for those tools required to execute the steps in a particular breeding workflow, and at the appropriate step in that workflow.

The development of each tool is overseen by a team of IBP researchers, developers, and users who design, mock up, and prototype the tools of the breeding application and pass the specifications to a software engineering team. They will then monitor the development and test and support the application. For each application, the team develops a description of the application, functional specifications of all the tools, workflow specifications for the application, and an interface mockup. A workflow for a MARS project is shown in Fig. 6.

Molecular Breeding Platforms in World Agriculture. Figure 6
figure 12220figure 12220

Breeding workflow for an MARS experiment

Component 3: IBP Services

The Services component comprises two modules. The first module, Breeding Services, provides services to conduct MB projects. The second module, Support Services, deals with training and capacity-building, aiming to provide support and improve capacity of NARS breeders to deliver improved germplasm through marker approaches – essential for the adoption of MB approaches and the MBP.

Breeding Services

These services provide access to specific germplasm, and assist with contracting a service laboratory to conduct the marker work or to quantify specific traits, such as metabolite profiles or grain quality parameters. The module has three elements (Fig. 7):

Molecular Breeding Platforms in World Agriculture. Figure 7
figure 12221figure 12221

Organogram of the services provided by the IBP

Genetic Resource Support Service : Access to suitable germplasm and related information from the different partners is a critical element of the portal. To address this, a Genetic Resource Support Service (GRSS) plans to tap into the CGIAR System-wide Genetic Resources Program (SGRP), a collaborative effort between GCP and existing gene banks in the CGIAR and NARS. The GRSS should ensure quality control, maintenance, and distribution of genetic resources, including reference sets and segregating populations acquired or generated through projects supported by GCP, and material generated from other sources and deposited with the GRSS (e.g., maize introgression lines from Syngenta).

Marker Service: The portal provides a set of online options for users to access different high-throughput marker service laboratories in the public and private sectors with clear contractual conditions. Service Laboratories have been selected on the basis of competitive cost, compliance with quality control requirements, and expeditious delivery, but are currently accessible by offline processes pending deployment of the IBP portal.

Trait and Metabolite Service: The portal provides a set of options for users to access laboratories specialized in the evaluation and analysis of specific traits, such as quality traits, pathology screening, or metabolite quantification. Analyses of certain secondary traits and metabolites that are indicative of plant stress tolerance can potentially provide valuable information to be used in breeding. Such analyses are generally prohibitively expensive if done locally, as it is difficult to maintain assay quality and devote the necessary resources for expertise, quality control, and specialized facilities.

Capacity Development and Support Services

Capacity development is an integral part of the project, encompassing training and support in using MB techniques and markers, designing breeding strategies, quality data management, information analysis and decision modeling, phenotyping protocols, and protection of intellectual property (IP).

The main objective of this set of services is therefore to provide backstopping and training in a broad set of disciplines, to complement the elements of the breeding services and address specific technical and logistical bottlenecks. Such expert assistance is essential for the adoption and proper use of new technologies. Services that will be available include:

Breeding plan development : It is essential to develop a breeding plan with a cost–benefit analysis before conducting a multi-cycle MB project. Depending on the nature of the experiment, such a plan may be quite simple or very elaborate, from the transfer of a single region (e.g., transgene) to complex selection that can consider the simultaneous transfer of dozens of regions. The critical factor is that the plan must detail all the activities over time, and the costs and benefits of the project to determine if it is worthwhile conducting the experiment. The platform provides templates and associated cost calculation sheets for different breeding schemes.

Information management: Under this service, assistance is provided in installing and parameterizing the platform IS for use by specific breeding projects.

Data curation: This service assists with capturing and curating current data for particular breeding projects, and in entering them into the integrated IS. This step is absolutely critical for quality control and further sharing of the information, and a contact person for each of the pioneer user cases has been identified to ensure good communication between the platform and the users.

Design and analysis: This service provides support on statistics, bioinformatics, quantitative genetics, and molecular biology . It includes training in data generation, handling, processing, and interpretation, as well as experimental design from field planting to MAS and MABC schemes. It provides assistance with the “translation” of the molecular context to the breeding context, and it will ensure that the methodology developed for the design and analysis of breeding trials is rapidly available to the users.

Phenotyping sites and screening protocols: Through this service, users can access information on phenotyping sites, protocols, and potential collaborators to ensure that selection is carried out under appropriate biotic and abiotic stresses and that the adaptation of germplasm is well characterized. Characterization of phenotypic sites includes geographical information, meteorological historical data, soil composition, and field infrastructure.

Genotyping Support Service (GSS) : The GSS aims to facilitate access by developing country national agricultural research institutes to genotyping technologies, and bridge the gap between lab and field research. This service provides financial and technical support for NARS breeders to access cost-efficient genotyping services worldwide and supports training activities in experimental design and data analysis for MB projects.

Intellectual property (IP) and policy: This service provides support on IP rights and freedom to operate in the arena of biotechnology and germplasm use. The service is currently being provided on an experimental basis through a virtual IP Helpdesk hosted by the GCP web site at http://www.generationcp.org/iphelpdesk.php.

Integrated Breeding Hubs

If today few question the usefulness of local basic laboratories, it is also generally accepted that large-scale genotyping activities are best outsourced to cost-effective, high-throughput service laboratories, irrespective of location. Following that rationale, the IBP provides access to marker service laboratories as the main avenue to generate the large amount of genotyping data that will be necessary to support the extensive MABC programs of the future, starting with the user cases, but the GCP also recognizes the need to provide breeders in developing countries with access to some regional hubs. At the beginning of the project four regional hubs are envisioned, covering the needs of the Americas – Centro Internacional de Agricultura Tropical (CIAT, www.ciat.cigiar.org); Africa – BioSciences eastern and central Africa (BecA, http://hub.africabiosciences.org); South Asia – International Crops Research Institute for the Semi-Arid Tropics (ICRISAT, www.icrisat.org); and South East Asia – International Rice Research Institute (IRRI, www.irri.org).

These regional hubs are expected to provide the following services:

  • In-house hands-on training (different formats are possible from short- to medium-length periods), with the objective of exposing scientists to new technologies and their applications to breeding.

  • Training courses for selected groups of researchers, targeting basic knowledge of marker technologies and their applications, as well as data analysis. These courses can be used for the testing and validation of learning materials, which will then be continuously upgraded.

  • Facilitation of small genomic and genotyping projects led by national programs, academia, and small and medium enterprises (SMEs).

  • Marker services for “small” and “orphan” crops that do not have mass demand from breeding programs and would therefore not benefit from large service providers, due to the lack of availability of SNP markers and the need to use lower-throughput SSR or other markers that can more easily be handled in lower-tech laboratories.

The Genomics and Molecular Breeding Hubs should help raise the visibility of the IBP and thus help promote the adoption of MB. Collaboration between the IBP and the regional hubs is anticipated to occur through sharing information, guiding users to apply for the appropriate service, organizing training events, and planning other developments of common interest.

Scope and Potential for Molecular Breeding Platforms

Gaps Across Countries and Crops

The application of MB approaches is now routine in developed countries, as is the integration of facilitative information and communication technologies, which are critical given the immense volumes of data necessary for, and generated by, these breeding processes. However, the situation is very different in developing countries, where MB is still far from routine in its application in breeding programs, particularly in Africa. This is especially critical due to the monumental and urgent imperative to rapidly achieve food security and improve livelihoods for a rapidly growing population through breeding for biotic stresses (including weeds, pests, and diseases) and abiotic stresses (including physical soil degradation, nitrogen deficiency, drought, heat, cold, and salinity) – conditions that make accurate phenotyping challenging. Fortunately, the history of modern breeding in developing countries is comparatively short, allowing a larger potential for crop improvement relative to the genetic gains that can be obtained at this time in developed countries, in which extensive breeding has been applied to crops for a longer time.

To address these issues, the capacity of national research institutions in terms of funds, infrastructure and expertise is directly related to the strength of their national economies [86]. This is reflected in the sharp differences in the capacity to conduct and apply biotechnology research as observed across developing countries (FAOBioDeC, http://www.fao.org/biotech/inventory_admin/dep/default.asp), and by the same token in their capacity to establish and/or utilize MBPs. The result is a three-tier typology of developing countries, directly attributable to the level of each country’s investment in agricultural R&D [87].

Tier-1 countries, comprising newly industrialized countries (NICs) such as Brazil, China, India, Mexico, South Africa, and Thailand, substantially invest in technology and R&D and are self-reliant in most aspects of marker technologies [88, 89]. These countries have the simultaneous potential to effectively adopt, adapt, and apply information and communication technologies to enhance research efficiency and outputs. They are therefore naturally at the vanguard in adopting MBPs.

Mid-level developing world economies (tier-2) such as Colombia, Indonesia, Kenya, Morocco, Uruguay, and Vietnam are well aware of MB’s importance, and some effectively apply marker technologies for germplasm characterization [9093] and selection of major genes [9499]. These countries have a matching potential for a limited utilization of MBPs, a potential that can be enhanced fairly rapidly in the medium to long term.

Low-level developing world economies (tier-3 countries) are struggling to sustain even basic conventional breeding. They have very limited or no application of MB approaches and are unlikely to adopt MBPs except in the long term.

Especially for tier-3 countries, resource-limited breeding programs in many developing countries are severely hampered by a shortage of well-trained personnel, low level of research funding, inadequate access to high-throughput genotyping capacity, poor and inadequate phenotyping infrastructure, lack of ISs and appropriate analysis tools, and by the logistical difficulty of integrating new approaches with traditional breeding methodologies – including problems of scale when scaling up from small to large breeding programs.

Until recently, the scarcity of available genomic resources for clonally propagated crops, for some neglected cereals such as millet, and for less-studied crops such as most tropical legumes, which are all very important crops in developing countries, represented a further constraint to agricultural research for development [100], thereby limiting the application of molecular approaches and hence the potential for MBPs. However, the recent emergence of affordable large-scale marker technologies (e.g., DArT [101]), the sharp decline of sequencing costs boosting marker development based on sequence information [102], and the explicit efforts of national agricultural research programs (e.g., India [103]) and international initiatives such as GCP [104]) have all resulted in a significant increase in the number of genomic resources available for less-studied crops. As a result, most key crops in developing countries now have adequate genomic resources for meaningful genetic studies and most MB applications.

Similarly, international efforts such as GCP’s IBP are designed to help overcome the challenges of developing-country breeders – exploiting economies of scale by making available convenient and cost-effective collective access to cutting-edge breeding technologies and informatics hitherto unavailable to them, including genomic resources, advanced laboratory services, and robust analytical and data management tools. Together, this increasing availability of genomic resources and tools for previously neglected but important crops and the access to initiatives targeting the resource-challenged NARS of the developing world will hasten the adoption of MBPs for these countries.

Institutional, Governmental, and Public Support

While corporate and other proprietary MBPs need only meet the specific requirements of a particular corporation or of specific paying clients, the development of platforms targeted at breeding programs in the developing world require a broad consensus among the parties that would use them and support them from multiple overseeing organizations. This is because these platforms are built on the premise of minimizing costs and maximizing benefits through economies of scale generated through collective access by multiple partners.

The public-access MBPs would therefore be critically dependent on well-structured MB programs, which may not be a reality in many developing countries. A good structure would entail compliance with common or compatible:

  • Good field infrastructure, including meteo station

  • Good agronomical practices at experimental stations

  • Crop ontology information system

  • Data collection, management, and analysis protocols

  • Breeding plan design

  • Information and communication technology infrastructure

  • Informatics tools for analysis, decision support purposes, and eventually modeling and simulation

Traditionally, developing world breeding programs have largely been poorly funded and poorly supported, and have been primarily driven by donor organizations [105, 106]. The lack of in-country support has often limited the dependent breeding activities to no more than a basic level. Under such circumstances, it was unrealistic to anticipate the adoption of new biotechnologies – including the utilization of MBPs. Fortunately, this scenario is changing. In 2003, through the Comprehensive Africa Agriculture Development Programme (CAADP, http://www.caadp.net/implementingcaadp-agenda.php), African governments committed to invest more in food security and in agriculture-led growth. Since then, many countries in Africa and elsewhere have developed comprehensive agricultural development strategies.

There is also a growing participation by foundations and nongovernmental organizations, and more recently the emergence of public–private sector partnerships (e.g., US Global Food Security Plan, http://www.state.gov/s/globalfoodsecurity/129952.htm). This governmental and institutional commitment is critical for the adoption of biotechnologies in general [8, 107] and for MB adoption in tier-2 countries in particular, with the attendant establishment and utilization of MBPs.

Challenges, Risks, and Opportunities

Challenges hampering the potential of MBPs in developing countries include both factors applicable generally to MB and those specific to MBPs. These factors encompass infrastructure capacity, human resource, and operational and policy issues. But amidst the challenges there are also actual and potential opportunities.

Human Capacity

Human capacity for MB technologies in developing countries is a challenge, and limitations include substandard agriculture programs at universities; difficulties in keeping up to date with relevant developments, including failures by others; poor technical skills in core disciplines; isolation as a result of insufficient peer critical mass in the workplace; and poor incentives to attract and retain scientists, resulting in brain drain and staff turnover [108].

To partially offset the undesirable trend of losing the “champions” and to “generate” more “champions,” novel international initiatives like Alliance for a Green Revolution in Africa (AGRA) support high-quality education in the South. Examples include the African Centre for Crop Improvement (ACCI, http://www.acci.org.za/) based at the University of KwaZulu–Natal in South Africa and the University of Ghana-based West African Centre for Crop Improvement (WACCI, http://www.wacci.edu.gh/). Both institutes offer doctorate degrees in modern breeding to African students, with the fieldwork component being carried out in the students’ home countries.

While obtaining their Ph.D. in plant breeding, these scientists study the principles of marker technologies, equipping them to undertake MB activities. To retain this much-needed expertise in Africa, the WACCI and ACCI programs also provide post-Ph.D. funds for these scientists to conduct research in their home countries and, in some cases, provide matching funds for their career advancement.

Precise Phenotyping

There can be no successful MB program without precise phenotyping of the target traits. Reliable phenotypic data is a must for good genetic studies [109] and most developing countries lack suitable field infrastructure for good trials and collection of accurate phenotypic data. As part of the services of a good MBP, guidelines on best practice must be provided on how to design and run a trial and conduct precise phenotyping for genetic studies under different target environments. Improving access to homogeneous field areas, and paying attention to good soil preparation and homogeneous sowing are critical. The development of new geographic IS tools [102, 110], experimental designs, phenotyping methodologies [111, 112], and advanced statistical methods [113] will facilitate the understanding of the genetic basis of complex traits [114] and of genotype-by-environment (G×E) interactions [48, 115]. Improving phenotyping infrastructure in developing countries must thus be a top priority to promote modern breeding and utilization of MBPs [106].

Laboratories for Markers Services

Genotyping can be expensive when it is performed in small laboratories using labor-intensive and low-throughput markers such as SSRs. This has traditionally limited the use of MMs in developing countries beyond the fingerprinting of germplasm with a small number of markers or the use of MAS for a few key traits. Operational efficiency is also vital, because fundamental timelines must be respected to ensure that no crop cycle is lost. Indeed, at every selection cycle, a service laboratory may have only a few weeks (time between DNA being extracted from leaves harvested on plantlets and the flowering time) to conduct the analysis and return the data to the breeders to enable them to conduct appropriate crosses among selected genotypes.

There is general agreement today that basic local laboratories at national and regional levels can be useful at least to service small local needs such as fingerprinting of limited number of accessions, GMO detection or MAS for specific traits, or for teaching and training purposes. It is also generally accepted that large-scale genotyping activities are best outsourced to advanced, modern, cost-effective high-throughput service laboratories, irrespective of the original location of the needs. This outsourcing is driven by the evolution in marker technologies. The advent of SNP genotyping led the shift from the low-throughput, primarily manual world of SSRs to high-throughput platforms powered by robotics and automated scoring, better handled by dedicated service laboratories [102, 116, 117]. As a result, genotyping costs have decreased by up to tenfold while data throughput has increased by the same magnitude. An example for MARS is provided in Fig. 6. SNP markers are increasingly available for most mainstream crops and for several less-studied crops [118, 119], which are important in developing countries.

A particular effort will be needed to ensure an easy and reliable way to track samples from the field to the laboratory, and back to the field – it will hence be vital to carefully identify DNA samples from material collected in the field. Such documentation should optimally be through bar-coding, and all information pertaining to management of field trials or experiments should be recorded in electronic field books. Marker work would of necessity be subcontracted to a service lab with a good and preferably platform-compatible laboratory information management system (LIMS).

Data Management

For breeders to efficiently access relevant information generated by themselves and by other researchers, reliable data management (including sample tracking, data collection and storage, and modern analytical methodologies and tools for accurate decision making, among others) is critical both within a given MB program and across programs. In view of this, it is essential that breeders manage pedigree, phenotypic, and genotypic information through common or mutually compatible crop databases, in keeping with the collective access principle of a public MBP. The format of databases would need to be user-friendly and compatible with field data collection devices and applications to encourage both adoption and compliance. Ultimately, data collection and management processes would need to seamlessly link with a platform-resident analysis, modeling, simulation, and a decision support workbench for full utility of the breeding platform.

Paradigm Shift: Collaborative Work and Data Sharing

Access to information and products generated by fellow users is a potentially critical incentive for breeders to use the platform and share their own data with other users. However, this would require a fundamental paradigm shift from the present data-hoarding, inward-looking approach to research common to breeders. This may, however, only be achievable if it is a clear requirement in the terms of engagement for membership of a “platform community,” or if distinct financial and other incentives are offered for such sharing.

Technology-Push Versus Demand-Driven

An MBP is by nature a high-level technological solution. It carries with it the inherent risk of failing to address fundamental practical problems of developing-world breeding programs, which will often by nature be technology-deficient. Such platforms therefore face the challenge of ensuring that they meet targeted user objectives and address practical constraints.

However, with this challenge comes an opportunity to introduce advanced MB methodologies to developing world breeders, by encouraging change that will enable them to take advantage of the efficiencies and economies of scale offered by the MBP. This opportunity would be particularly reachable with bottom-up platform design and development that actively engages and involves the breeders – including elements of human resource capacity development and support in usage.

Adoption and Use by Breeders

An MBP would only make a difference if it is adopted and widely used by the breeders. The most important element influencing this would be credibility – a function of the quality of the technology, the awareness of potential users, the ease of access, and initial incentives. There is a need for successful public sector developing-country examples to demonstrate that the platform can effectively enhance the efficiency of breeders through the use of modern approaches – a clear demonstration of the added value of using the platform.

Sustainability of the Platform

Sustainability would be a challenge for MBPs targeting developing world breeding programs, given their resource limitations. These programs may not be able to meet the full cost of platform usage, and the cost of maintaining and updating the different elements of the platform on a regular basis – particularly tools and facilities that must keep abreast with evolving information and communication technologies.

Of course, platform sustainability is directly linked to its adoption by breeders, and sustainability strategies must be adapted to the diversity and financial resources of the potential clients, from developing-world national agricultural research institutes with limited resources to SMEs. Service costs might also be adjusted if clients are willing to share data and release germplasm through the platform.

Platform managers may also have to consider other innovative options like on-platform advertising by agriculture-related commercial enterprises. However, ongoing donor support would most likely still be required in the medium to long term.

Communities of Practice

The development of platform-based MB communities of practice, to connect groups of crop researchers, mainly breeders, willing to share experiences and information on modern breeding methods, best field practices, and development of improved varieties, and to practice peer-to-peer mentoring, are an additional potential avenue for platform adoption and sustainability, besides providing means to quickly and efficiently resolve recurring breeding problems. Partnerships between developed and developing-country institutions, and between the private and public sectors, are also an opportunity for realizing the full potential of MB [87, 108].

Many other hurdles limit successful public sector utilization of MB opportunities [120, 121]. However, the potential of virtual MBPs made possible by the revolution in information and communication technologies provides opportunities to counter and overcome many of those shortcomings.

Potential Economic Impact of Molecular Breeding Platforms

By its nature, MB improves the efficiency of crop breeding – progressively increasing genetic gains by selecting and stacking favorable alleles at target loci. The utilization of MBPs accelerates and amplifies the advantages of MB by introducing significant efficiencies in resource and time usage. Predictive or designer breeding, which would be the ultimate result of information-rich MB, attainable through the use of MBPs by numerous different breeding programs that freely share data and germplasm, would particularly bring about these savings in resources and time.

However, a direct comparison of the cost-effectiveness of MB with phenotypic selection is not straightforward. Firstly, factors other than cost – such as trade-offs between time and money – play an important role in determining the selection method. Secondly, this choice is further complicated by the fact that the two methods are rarely mutually exclusive or direct substitutes for each other [122]. On the contrary, under most breeding schemes, they are in fact complementary. Where operating capital is not a limitation, MB maximizes the net present value, especially when strengthened through MBPs [123]. With the increasing ease of accessing marker service laboratories and the declining cost per marker data point, MB costs are shrinking, making it extremely attractive from a purely economic perspective.

However, once the technological hurdles are overcome, the ultimate impact of new technologies (such as MBPs) is often limited by the lack of, or ineffective, seed distribution systems or by distant markets. SMEs are critical in promoting access to, and distribution of, improved seeds, thus helping alleviate a major bottleneck to the impact of improved breeding on smallholder farmers [124, 125].

Few economic analyses have been conducted to objectively assess the potential impacts of MB in the public sector, and none for MBPs that are just now emerging as a tool for breeding in the public sector.

Of the few analyses done to date, one evaluates the economic benefits of MABC using preexisting MMs in developing rice varieties tolerant to salinity and P-deficiency [126] in Bangladesh, India, Indonesia, and the Philippines. Encompassing a broad set of economic parameters, the study concluded that MABC saves an estimated minimum of 2–3 years, resulting in significant incremental benefits in the range of USD 300–800 million depending on the country, the extent of abiotic stress encountered, and the lag for conventional breeding [127].

Future studies are likely to confirm the positive economic benefits of MB and, given that MBPs amplify the benefits of MB, it can be reasonably inferred that the emerging platforms would indeed further enhance those economic benefits.

Future Directions

MBPs will inevitably have a significant impact on crop breeding in developing countries in the medium to long term because of:

  • The needs-driven demand for improved crop varieties to counter the global food crisis

  • The exponential development of genomic resources

  • The ever-declining cost of marker technologies

  • The increasing occurrence of public–private partnerships, where the public sector can learn from private companies about best practices for integrating MB into their breeding programs

  • The need for innovative solutions to the challenges of resource and operational limitations

The first challenge of MBPs will be to meet the immediate needs of the breeders in developing-country public and private programs. The first step will be to provide them with the tools for enhancement of their current breeding programs, through the implementation of field books, pedigree management, and basic statistical analytical tools necessary to optimally conduct their current breeding efforts. In close succession with these first applications, tools will need to be made available to facilitate the integration of MB into their breeding programs. Databases will need to be developed for storing genotypic and phenotypic data, integrated analytical tools will need to be made available to breeders for analysis of this accumulated data and for the identification of important simple trait loci or QTLs to monitor and recombine in their breeding programs, and decision support tools will need to be developed to help breeders decide on the next steps to engage in based on the data they generated from their MB activities.

In the near future, more complex tools will need to be developed for the storage and analysis of the large amounts of genotypic data that will be generated by new next-generation sequencing technologies and for their application in GWS. A tight linkage will also have to be established with the wealth of information that is being generated and will continue to be generated even faster in the genomics area, leading to the dissection of the genome and to the discovery of the location and function of major genes having an impact upon the performance of crops in environments relevant to developing-country programs.

Eventually, the accumulation of large amounts of genetic information linked to specific haplotypes will lead to the increasing use of predictive breeding in combination with traditional MB usage and appropriate tools will also need to be developed to support those efforts.

Although it is critical for a platform to anticipate all the new possible features of MB, ensuring that new technologies and ISs will find their way in a flexible infrastructure, it is also quite probable that most of the breeding programs in developing countries will work at the short- and mid-term mainly with simple MB approaches as they will never reach the critical size of crosses and germplasm evaluation requested to maximize complex approaches.

Conclusion and Prospective Scenarios

Through international initiatives like the ones coordinated by the CGIAR centers and programs, several notable developing-world MB successes have already been reported.

A well-known example is the development of submergence-tolerant rice cultivars through MABC led by IRRI [128]. The introgression of the Sub1 gene from FR13A (the world’s most flood-tolerant variety) into widely grown varieties like Swarna improved yields in more than 15 million hectares of rain-fed lowland rice in South and Southeast Asia.

MB in general and the use of MBPs in particular have definitely been shown to be an efficient approach for reducing the number of required selection cycles and for increasing the genetic gain per crop cycle to a point where the required human and operational resources can be kept to a minimum.

However, for sustainable adoption, the use of modern breeding strategies requires a breeder-led bottom-up approach. As a start, simple MB approaches adapted to local environments should be tested first by individual breeders to evaluate their success and impact under those breeders’ conditions. Once proven, these approaches can then be implemented more widely or integrated to an MBP for enhanced efficiency. In case of individual success the adoption of MB by those breeders should be quite straightforward.

It is clear that the extent, speed, and scope of adoption of MB approaches and of utilization of MBPs will vary somewhat across tier-1, tier-2, and tier-3 countries, depending on the local priorities and on the resources available in given breeding programs. It is unrealistic to expect that large-scale MB breeding activities, including utilization of MBPs, will be widely implemented across the board in developing countries in the near term. However, the prospects are bright for individual breeders in these countries (particularly in tiers 1 and 2) to access germplasm, data, tools, and methodology that will allow them to conduct efficient MB projects by taking advantage of large international initiatives specifically targeting developing-country breeding programs. This will, however, happen in different ways and on different timelines for each tier.

For tier-1 countries, the impact would be evident in the shorter term – say in 3–6 years. These countries will benefit from new tools and platforms by increasing the rate of MB adoption. The biggest change is likely to occur in tier-2 countries, as these countries would be starting MB from scratch, but the impact would realistically be measurable only in the medium term, meaning in about a decade from now. For countries currently in tier-3 to advance to tier-2, basic breeding programs must first be established, which is highly dependent on governmental priorities and on subsequent resource allocation.

All in all, implementing MB (and catalyzing and accelerating its impact through MBPs) will boost crop production, which will translate into higher farm productivity per unit of land, better nutrition, higher incomes, poverty alleviation, and ultimately improved livelihoods in developing countries (Fig. 8). These gains will be amplified by sustained use, by continuously improving expertise, and by growth and development of homegrown capacity for the application of advanced breeding approaches.

Molecular Breeding Platforms in World Agriculture. Figure 8
figure 12222figure 12222

IBP as a key component to boost NARS breeding capacities and therefore crop productivity in developing countries