Introduction

Cassava (Manihot esculenta Crantz.) is a staple for more than 800 million of the world’s poorest people and is produced primarily by smallholder farmers. Cassava has potential to reduce hunger and malnutrition in millions of people in sub-Saharan Africa (Burns et al. 2010). Despite achievements in cassava improvement, many challenges remain including the low protein content of the storage roots, presence of toxic cyanogenic glucosides, susceptibility to biotic stresses, and the need to tailor the cassava crop to the myriads of agro-ecosystems under which it is produced (Fregene et al. 2007).

Low protein content in the roots of cassava has been a major factor for the unfavorable image of cassava when compared with other staples like potato, rice, soybean and cowpea. Cassava root protein content ranges between 2% and 3% (dry weight basis) (Ceballos et al. 2006). In spite of low protein content, the quality of this protein is fairly good in having a balanced proportion of amino acids except for the two amino acids methionine and lysine being present in small amounts (Ceballos et al. 2006). Concentration of total free amino acids (free protein amino acids plus free non-protein amino acids) was higher in cassava leaves than in the roots (Diasolua Ngudi et al. 2002). If varieties with higher quantities of protein and free amino acids can be developed, the nutritional level of cassava as a food would be enhanced (Diasolua Ngudi et al. 2003).

Genetic mapping in allogamous crops is complicated by the absence of complete homozygosity in the parents and subsequent mixture of genotypic classes in the progeny (Williams 1998). The heterogeneity of mating types makes QTL detection more complex and less efficient in outcrossing heterozygous plants compared to inbreds (Śliwka et al. 2008). Several efforts have been made in other crops to identify the QTL responsible for protein content, for example in crops like maize (Holding et al. 2008), wheat (Chee et al. 2001), and soybean (Panthee et al. 2005).

The use of molecular markers to introgress a target region of the genome saves time compared to traditional backcrossing strategies (Frisch et al. 1999). It has been shown that tremendous genetic potential locked up in wild relatives can be made available in several crops through advanced backcross quantitative traits loci mapping schemes (Blair et al. 2006). Wassom et al. (2008) in identified genomic regions controlling kernel oil, protein, and starch concentration in BC1-derived S1 lines of maize. These authors detected QTLs that explained 46.9, 45.2, 44.3, and 17.7% of phenotypic variance for oil, protein, starch, and mass, respectively by composite interval mapping and 17.5%, 22.9%, 40.1%, and 28.7% for oil, protein, starch, and yield, respectively, in top cross hybrids.

Wild Manihot species were examined for storage root formation and root protein content. A wild accession of M. esculenta ssp flabellifolia collected in Brazil was found to have a protein content of between 10% and 18% (dry weight basis) in storage roots (Fregene et al. 2007). Manihot esculenta ssp flabellifolia was found to have high protein content for an interspecific hybrid suitable for use in backcrossing population. The objectives of this study were to identify QTL controlling protein content in cassava roots and to determine the effects of each QTL in a backcross population.

Materials and Methods

Plant Material

Choice of Parental Lines

A number of expeditions have been carried out by Centro Internacional de Agricultura Tropical (CIAT), to collect seeds of M. esculenta ssp flabellifolia, M. esculenta ssp peruviana, and M. tristis (Bonierbale et al. 1995; Roa et al. 1997). These collections were from the same populations used by Olsen and Schaal (1999) in their phylogeographic study of the origin of cassava. In May 2000, more than 2000 of these seeds were germinated in seedling nurseries and evaluated for crude protein, dry matter content, crude fiber, ash, amylose and storage root production (data not shown). Controlled pollinations were performed following the standard procedure described by Kawano (1980) between selected accessions of M. esculenta ssp flabellifolia and four of the current elite parents of the cassava gene pools for the development of F1 populations (data not shown) adapted to the three major agro-ecologies where cassava is grown.

Development of a Mapping Population

The mapping population for this study was a BC1 (B1P2) family consisting of 225 genotypes as described by Akinbo et al. (2011). The male parent (MTAI-8) is a successful elite Thailand cultivar with high dry matter content, good tuber formation, and cream colored roots. The female parent used to generate the B1P2 mapping population was an inter-specific F1 hybrid CW 198-11. Its selection was based on its high root protein content, high dry matter content, good tuber formation, and resistance to whitefly at the trial location of CIAT, headquarters, Palmira (Fig. 1). Embryo culture of mature seed was used to enhance seed germination and to produce in vitro plantlets. The population and the parents were maintained by in vitro propagation (Akinbo et al. 2010).

Fig. 1
figure 1

Pedigree of the planting materials used for the B1P2 family. a = introgression of high protein content and disease resistance to F1 progenies; b = backcrossing of the F1 into cultivated cassava; SG 427 – 87 X MESCFLAX – 80 = parental pedigree of CW 30 – 65 (male parent); OW 230 – 3 = M. esculenta ssp flabellifolia high protein gene source; MCOL 1684 X Rayong 1 = parental pedigree of Rayong 60; CW 198 – 11 = interspecific F1 high protein source; MTAI – 8 = cultivated cassava

Field Experiments

The in vitro plants were transferred to the screen house in 2005, and after 60 days planted in the field at Corporacion Colombiana de Investigacion Agropecuaria (CORPOICA), Palmira, Colombia. At 10 months after planting (MAP), matured stem cuttings from the harvested plants were used to establish a preliminary yield trial experiment made up of 225 genotypes, in a randomized complete block design with three replicates of 12 blocks, eight plants per row. Field trials were conducted in Palmira in Valle del Cauca (elevation 965 m, 3°49′N, 76°36′W) in 2006, and repeated in 2007 at Palmira and at Santander de Quilichao in Cauca (elevation 990 m, 3°30′N, 76°36′W), both located in the mid altitude tropics of Colombia. Both sites have bimodal rainfall, although there are yearly variations, with peaks usually between March–June and between October–December. The soil in Palmira is a fertile alluvial clay loam, distinct from the soil type in Quilichao, which is sandy loam, less fertile and acidic. The planting was done in July 2006 for the first year in Palmira, and May and June 2007 in the second year in Palmira and Quilichao respectively and harvested May 2007 for the first year and March 2008 in the second year. The plants were not fertilized, but weeded when necessary. Meteorological data at the locations during experimentation are presented in Table 1.

Table 1 Meteorological data at Palmira and Quilichao in 2006 and 2007

Root Protein Evaluation

Only plants of the inside rows, to avoid border effects, within each plot were harvested for data collection and root analysis. The samples from each replication were analyzed separately. Root protein analyses were done using two roots per plant. The selected roots were peeled and washed. A slice was taken from the proximal, central, and distal sections of each root. Samples from roots of all plants of each genotype were mixed and chopped into small pieces. Resulting chips were mixed well to obtain a uniform sample. A 100 g sample was taken and dried in an oven with forced ventilation at 60 C for 24 h. Dried samples were ground in a mill with a stainless steel grinding tool. All samples were analyzed at the plant tissue analytical laboratory at CIAT. Nitrogen determination was based on a modification of the Kjeldahl method (Skalar 1995). Root samples were digested with a mixture of sulphuric acid, selenium and salicylic acid. The salicylic acid forms a compound with the nitrates present to prevent loss of nitrate nitrogen. Digestion of the samples was initiated with hydrogen peroxide which is oxidized. After decomposition of the excess H2O2, the digestion was completed with concentrated sulphuric acid at elevated temperature (330°C) using selenium as a catalyst (Novozamsky et al. 1983; Walinga et al. 1989). Nitrogen was quantified colorimetrically on a segmented flow analyzer. In the coloring process, salicylate, nitroprusside (catalyst) and active chlorine were added to form a green colored complex with the ammonium ion. Light absorption was measured at 660 nm (Krom 1980; Searle 1984). Hock-Hin and Van-Den (1996) reported the conversion factor to estimate protein contents based on N concentrations in cassava roots should range between 4.75 and 5.87. An average of 5.31 was the standard being established and used for cassava roots in CIAT.

Dry matter content (DMC) assessment was done by peeling the fresh tubers from seven plants, mixed thoroughly for homogeneity using a sub sample from the combined tubers. The samples were oven dried at 60°C for 48 h after which the weight difference between the fresh weight and dry weight was measured and the percentage dry matter was calculated using the formula:

$$ \% {\text{DMC }} = \frac{\text{Weight of the oven dried sample}}{{\text{Weight of the fresh sample}}} \times 100 $$

The dry root yield was calculated as:%DMC x fresh root yield.

Data Analysis

Frequency distribution of the mean protein content of backcross population data were examined with Sigmaplot (10.0). Analyses of variance (ANOVA) of yield and protein data were performed using general linear model procedure in the SAS software (SAS Institute Inc. 2003). A box plot of the residuals was generated using the SAS univariate procedure. Broad sense heritability was estimated using Agrobase (2000).

Simple Sequent Repeat (SSR) Analysis

Total genomic DNA was isolated from the young leaves of the parents and the four randomly picked genotypes (B1P2-1, B1P2-111, B1P2-220, B1P2-322) of the B1P2 family using a DNA miniprep extraction protocol based on a modified Dellaporta extraction procedure (Dellarporta et al. 1983).

A total of 817 SSR markers for cassava at CIAT were the source of markers for screening the parents and the selected four backcross individuals. These markers include those developed for cassava by Mba et al. (2001). Only polymorphic markers were screened on the entire backcross population of 225 genotypes with the parents CW 198-11 and MTAI-8. The female parent (CW 198-11) was loaded first onto the Polyacrylamide Gel Electrophoresis (PAGE) followed by male parent (MTAI-8) and subsequently the progenies (B1P2-1 to B1P2-327). Amplification reactions were carried out as described in Akinbo (2008). After PCR amplification, 15 μl of PCR product was mixed with 10 μl of 50% loading dye (formamide: 98% (v/v). The mixture was denatured at 94°C for 4 min, and 10 μl was loaded on a 4% PAGE gel. The samples were resolved by electrophoresis in 1X TBE at 70 W for 30–60 min (depending on the weight of the marker), using a BioRad sequencing gel rig (BIORAD, California, USA). DNA was visualized by silver staining according to the manufacturer’s guide (Promega, Madison, USA).

Map Construction and Data Analysis

SSR marker data was used to construct a BC1 framework map using the single point genetic linkage analysis computer package MapDisto 1.7.0 (http://mapdisto.free.fr). The group command, with log10 of odds ratio (LOD) threshold of 4.0 and a recombination fraction of 0.30 was used to assign SSR markers to linkage groups. Map units, in centiMorgans, (cM) were derived using the Kosambi function (Kosambi 1944). Maximum likelihood orders of markers were verified by the “ripple” function, and markers were said to belong to the same framework map if the LOD value as calculated by the ripple command was greater or equal to 2.5.

QTL Mapping

QTL mapping was based on 105 markers from the genetic map of the B1P2 population. Each of the markers was tested for normal Mendelian segregation using chi-square tests with a significance level of 0.05. Adjusted means of protein content were employed in single marker analysis for QTL mapping of the B1P2 segregating population using the software package MapDisto (Lorieux 2007). QTL analysis was done for each environment by using the average of the three replications. Simple interval mapping and composite interval mapping was also conducted using the computer package Window QTL Cartographer version 2.5 (Wang et al. 2010). The cassava genome was scanned for the presence of a QTL at 2.0 cM intervals. A LOD score of 2.5 was chosen as the minimum to declare the presence of a QTL (Rector et al. 1998). Threshold values for significance at p = 0.05 were estimated from 1,000 permutation of the data for the trait (Churchill and Doerge 1994). The resulting R2 represents the proportion of the phenotypic variance explained.

Results

Microsatellite Marker Analysis

Of the 817 microsatellite markers screened, 43.45% were polymorphic, 15.42% were monomorphic and the remaining 41.12% showed no amplification in the tested material (Table 2).

Table 2 Percentage polymorphism found with respect to CW 198-11, MTAI-8 and four selected B1P2 progenies with the 817 microsatellite markers at the CIAT cassava genetics laboratory

A total of 105 polymorphic microsatellite markers segregating as single-dose markers were chosen for the linkage analysis based on consistency of allelic pattern. The segregation of markers which are heterozygous in both parents with a shared allele, were used to identify analogous linkage groups drawn based on independent segregation of markers in female and male gametes.

Genetic Linkage Map Construction

Table 3 provides a summary of microsatellite marker distribution on different linkage groups, showing the size, number of markers and the average marker interval of each linkage group. The number of linkage groups in this map (29) exceeded the haploid number of chromosomes for cassava (n = 18), indicating that the map is far from being saturated.

Table 3 Linkage group size, number of markers, and the average marker interval per linkage group of a cassava backcross (B1P2) linkage map

One hundred and five SSR markers were employed in the linkage analysis. These markers could be assigned to 29 linkage groups, each containing two to seven markers. Linkage group length varied from 7.39 cM to 119.45 cM (Fig. 2). The linkage map of the B1P2 population spanned a total of genetic distance of 1249.55 cM, with 46 markers remaining unlinked. The average markers distance was 16.69 cM, with intervals between loci ranging from 4.43 cM to 33.46 cM (Table 3).

Fig. 2
figure 2

A genetic linkage map of cassava backcross based on a B1P2 family and SSR markers

Of the 105 SSR markers used for the detection of QTL in this mapping population, 21 markers showed varying degrees of segregation distortion on linkage group 14. Five of the markers (rNS82, EST105, rSSRY226, rSSRY31 and SSRY149) had chi square values of 23.68, 13.02, 33.64, 41.82 and 15.61 (Table 4).

Table 4 Chi square values and chromosome location of microsatellite markers showing segregation distortion among 225 a cassava backcross line derived from the cross CW 198-11 X MTAI-8

Phenotypic Segregation in the Mapping Population

The distribution frequencies of protein content in the three trials varied between 0.77% and 9.61%, 0.17%–9.11% and 0.69%–7.75% with skewness values of 0.87, 0.88, 1.28 and showed normal distribution, indicating that the trait is quantitatively inherited (Fig. 3). The majority of the protein values in the three environments was high. Only the CIAT 2008 trial had values below the box plot graph while others were normal outliers (Fig. 4). Low to high broad sense heritability was recorded for all traits for the three trials. Broad sense heritability was estimated for fresh root yield, dry root yield, harvest index, dry matter content, and protein content (Table 5). The analysis of variance combined over the three trials revealed highly significant differences (p < 0.0001) among clones for protein content in the roots and provided evidence for genetic variation for root protein (Table 6)

Fig. 3
figure 3

Frequency distribution of the mean protein content of a cassava backcross population at CIAT station Colombia during 2007 and 2008 cropping season

Fig. 4
figure 4

Boxplots of 225 progenies in three replicates of the residuals for roots protein content evaluated in CIAT and Quilichao, Colombia in 2007 and 2008. Prot (%) = Protein content (%); Env = Environments; C07 = CIAT 2007; C08 = CIAT 2008; Q08 = Quilichao 2008

Table 5 Mean squares from the ANOVA, combined across years for the backcross family evaluated in CIAT, Palmira, 2006 to 2008
Table 6 Analysis of variance for protein content in roots from a cassava backcross population in three environments in Colombia between 2007 and 2008

Quantitative Traits Loci Analysis

A total of six putative QTL were identified on linkage groups 6, 7, and 23 respectively using MapDisto (Fig. 5). Genotype by environment interaction was significant for yield related traits. Composite interval mapping using the Window Cartographer program, was consistent with results from MapDisto and revealed linkage groups 7 and 23 having QTL and also detected another QTL on linkage group 13 (Fig. 6). All the QTL for protein content in the root showed a LOD score above 2.5. QTL protg.7 found on linkage group 7 in the interval EST47 - rSSRY110, accounted for 15% of phenotypic variance for protein content in the trial at Quilichao 2008 with a LOD of 33.4 whereas QTL protg.13 found on linkage group 13 in interval NS59 - SSRY230 explained 15% of phenotypic variance in the Quilichao 2008 experiment with LOD 81.9. QTL protg.23 on linkage group 23 located between markers NS664 and rSSRY88, accounted for 24% and 25% of phenotypic variance in CIAT 2008 and Quilichao 2008 respectively with LOD score of 25.2. The three QTL (protg.7, protg.13 and protg.23) all showed additive gene action with values of 6.20, 3.21, and 4.52 respectively. QTL protg.23 identified in linkage group 23 cut across locations and years which suggested that there is no environmental effect observed for protein for this QTL. The additive effects in these three QTL all came from the female parent which is also the protein donor parent, suggesting that these QTL are strongly associated with protein content. The individual effects of alleles at these QTL explained from 15% to 25% of the phenotypic variance.

Fig. 5
figure 5

The likelihood plots of QTL associated with root protein content of cassava in B1P2 population

Fig. 6
figure 6

a Quantitative trait locus scans for linkage group 7 associated with protein content in a backcross population of cassava. b Quantitative trait locus scans for linkage group 13 associated with protein content in a backcross population of cassava. c Quantitative trait locus scans for linkage group 23 associated with protein content in a backcross population of cassava

Discussion

Low protein content in the roots of cassava cultivars is mainly due to lack of sufficient breeder attention to screening and selection of germplasm for high protein content (Ceballos et al. 2006). This situation at least partially arose from breeders selecting primarily for higher fresh weight yield, plant architecture, and disease resistance. Results reported by Ceballos et al. (2006), on screening cassava germplasm at CIAT for protein content was done under the same laboratory conditions using the same protocol as was used in the current study. Although different protocols have been reported for the determination of protein in the roots of cassava, the current protocol used is considered effective (Fregene and Okogbenin personal communication).

Wild Manihot germplasm offers a wealth of useful genes for cultivated M. esculenta, but its use in regular breeding programs is restricted by linkage drag and a long reproductive breeding cycle (Nassar 2000; Hajjar and Hodgkin 2007). New technologies in the area of molecular biology have made positive contributions to crop improvement and more efficient breeding (Fregene et al. 1997; Devries and Toenniessen 2001; Okogbenin et al. 2008).

A joint genetic map for cassava would have been helpful for mapping of the protein markers (Okogbenin personal communication). The B1P2 map produced in the present work covers 1249.55 cM of the cassava genome with 105 SSR markers. The QTL approach used in this study identified three loci that determine protein content in cassava roots. Different QTL were identified on linkage groups 7 (one environment), 13 (one environment), and 23 (two environments). The QTL identified in more than one environment supports stability for that linkage group across environments.

Okogbenin and Fregene (2002) in their earlier studies of early root bulking of cassava reported a total of 18 and 27 QTL controlling dry foliage weight, harvest index and root diameter in two experiments. A year later, the same worker (Okogbenin and Fregene 2003) reported a number of QTL detected in more than one trial: 29 QTL in two trials and 11 QTL in three trials for plant height, branching height, branching level, branching index, stem portion with leaves, leaf area index, fresh root yield, fresh shoot yield, harvest index and number of storage roots. Balyejusa Kizito et al. (2007) also reported two QTL for cyanogenic glucoside potential and six QTL for dry matter content from an S1 population of cassava in a single environment. Okogbenin et al. (2008) also identified three QTL each for dry root yield, fresh foliage and harvest index, which explained phenotypic variances of 33.0%, 43.5%, and 36.0% respectively in a single environment.

Jorge et al. (2001) identified eight QTL that were involved in resistance of cassava to Xanthomonas axonopodis pv manihotis from an intra-specific population cross between two non-inbred cassava lines where one QTL, located on linkage group D of their map was observed over two crop cycles. In another backcross population, Wydra et al. (2004) reported eleven markers, which explained between 16.0% and 33.3% of phenotypic variance of area under a disease progress curve for cassava bacterial blight from different locations in Africa.

None of the earlier studies considered the protein content of cassava roots because it was not part of the breeding objectives. QTL for protein content in cassava found in this study are comparable with that reported for durum wheat (Chee et al. 2001), soybean (Panthee et al. 2005), and maize (Holding et al. 2008; Wassom et al. 2008) showing stability of QTL to different environmental conditions. The magnitude of the QTL is also similar to that identified for other traits, such as early yield and root quality traits in cassava (Okogbenin and Fregene 2003; Balyejusa Kizito et al. 2007).

The broad sense heritability estimate for protein of 0.61 indicates that protein content has higher heritability than yield (Dixon et al. 1994; Benesi et al. 2004; Balyejusa Kizito et al. 2007). The three QTL detected in this study showed additive effects suggesting this gene action plays a more important role for protein content in this cassava backcross population. Cach et al. (2005) reported that additive effects play an important role in determination of dry matter content. Likewise Balyejusa Kizito et al. (2007) identified QTL for cyanogenic glucosides in their population with additive effects.

The three QTL found for high protein content can be the basis for marker-assisted recurrent introgression of high root protein content into adapted gene pools that have low protein content (Chávez et al. 2005; Ceballos et al. 2006). Use of high protein alleles for cultivar improvement would allow for selection of progenies with high protein concentration above the current cassava value in available commercial cassava cultivars.