Introduction

Maize is one of the staple crops in most of sub-Saharan Africa (Byerlee et al. 1994; Smale 1995). In parts of West, Central, East and southern Africa, where the fallow periods have been shortened and cultivation has been extended for more than 2 years, maize yields generally decline rapidly (Drechsel et al. 1996; Hauser et al. 2006; Mafongoya et al. 2006). Although use of inorganic fertilizer can overcome the problem, most smallholder farms use little or no mineral fertilizer (Mwangi 1999). This is partly because fertilizer prices have been pushed upwards directly by the increase in crude oil costs (Hammeed 1976) and fertilizer has become less available following the structural adjustment, removal of government subsidies (Gladwin 1991), and the collapse of para-state agencies that were involved in fertilizer distribution and inputs markets. African farmers pay the highest fertilizer prices in the world (Mwangi 1999; Sanchez 2002). Moreover, inorganic fertilizers alone cannot sustain crop yield on the acidic and poorly buffered Alfisols, but accelerate decline in soil pH and exchangeable cations (Juo et al. 1995; Kang and Balasubramanian 1990). Therefore, use of organic matter technologies has became an important option for increasing soil fertility and maize yields in sub-Saharan Africa (Juo et al. 1995; Sanchez 2002; Snapp et al. 1998).

Promising alternatives include the use of nitrogen-fixing and weed-suppressing legumes planted as improved fallows, cover crops or green manure (Cherr et al. 2006; Hauser et al. 2006; Mafongoya et al. 2006; Styger and Fernandes 2006). Since colonial times green manure legumes have been widely tested in many parts of Africa. In the last two decades, research has focused on the introduction of fast-growing woody legumes in to farming systems. Both woody and herbaceous legume fallows are based on the principle of harnessing biological nitrogen fixation (Cherr et al. 2006; Giller et al. 1997; Sanchez 1999) by the legumes during the fallow period.

Several attempts have been made to review and synthesize the knowledge on the functions, processes and capabilities of planted fallows and green manure legumes in Africa (Drechsel et al. 1996; Hauser et al. 2006; Rao et al. 1998; Sanchez 1999; Szott et al. 1999). Though positive effects on soil fertility have been widely reported (Rao et al. 1998; Sanchez 1999; Styger and Fernandes 2006), the effects on crop productivity are much debated. Results of individual studies are highly varied, with legumes in some cases increasing crop yield but in others having no effect or decreasing yields (Hauser et al. 2006; Rao et al. 1998). Therefore, it has been difficult to make patterns out of much of the narrative reviews that used mental integration. The limited syntheses that attempted to compare the options have been overly ‘data hungry’ and often faulty in methodology. For example, Hauser et al. (2006) summarized data from published studies in West and Central Africa by classifying crop responses into “significant increase”, “neutral” and “significant decrease.” Based on this summary, they concluded that 60% of experiments with planted tree fallows in West and Central Africa had neutral response.

Such analyses are problematic as they ignore the preoccupation of researchers with null hypotheses tests leading to confusion between biological and statistical significance (Lortie and Dyer 1999; Osenberg et al. 1999). Some researchers often erroneously equate a small P value (P < 0.05) with “large effect”, and large P values with the “absence of an effect” (Gurevitch and Hedges 1999; Lortie and Dyer 1999; Osenberg et al. 1999). A single study often cannot detect or exclude with certainty a modest, albeit biologically relevant, difference in the effects of two treatments. A trial may thus show no significant treatment effect when in reality such an effect exists—that is, it may produce a false negative result. In single studies there is a prevalence of small true differences, small type I errors (=false positive) and few replications, which generate experiments with low statistical power or large Type II errors (=false negatives; Arnqvist and Wooter 1995).

The diversity of results and lack of clarity on maize yield responses has led to debates over the effect of legumes on maize yield among researchers and confusions among extension and development workers. The lack of quantitative synthesis in terms of the nature and magnitude of response and the contrasting results reported regarding the potential utility of legume fallows and green manure highlights the need for a comprehensive and quantitative analysis. Therefore, the primary goal of this paper is to provide a synthesis, which will provide a more complete representation of maize yield response across different locations, types of soils and weather conditions. This will aid formulation of evidence-based practical guidelines and policies on the role of organic sources for soil fertility management in sub-Saharan Africa.

We conducted meta-analysis with the overall aim of assessing whether or not there is a consistent evidence for yield benefits using green manure from herbaceous and woody legumes in sub-Saharan Africa. Meta-analysis refers to the statistical analysis of a large collection of analysis results from individual studies for the purpose of integrating the findings with an attempt to address a common question or to test a common hypothesis (Arnqvist and Wooter 1995). The basic assumption underlying a meta-analysis is that each study result is an observation that can be thought of as one data point in a larger dataset containing all possible observations, given the true relationship under study. If many trials exist in different geographic areas, with similar results in the various studies, then it can be concluded that the effect of the intervention under study has some generality. Compared to the traditional narrative reviews, meta-analysis has the advantage of objectivity and better control of type II error (Arnqvist and Wooter 1995) and thus has the potential to resolve longstanding scientific debates (Gurevitch and Hedges 1999).

The specific objectives of this analysis were to (1) provide a comprehensive, quantitative synthesis of published reports on the effect of woody and herbaceous green manure legumes on maize yield, (2) conduct parametric estimation of the magnitude of yield response and (3) determine the factors that moderate the response.

Methods

Choice of crop and the response variate

Maize (Zea mays L.) was chosen for this analysis because it is the staple food crop in most of sub-Saharan Africa (accounting for about 50% of the calories consumed in some countries) and it is grown from sea level (the coastal zones) to elevations above 2,400 m. It is also grown under widely varying rainfall and edaphic conditions. Maize accounts for 60% or more of the cropped area in some countries such as Malawi, Zimbabwe and Zambia and is almost as dominant in other countries including Kenya and Tanzania. The last 25 years have seen farmers in many parts of Africa switch from traditional crops to improved maize germplasm. Improved varieties and hybrids were estimated to have covered 33–50% of the maize area in Africa (Byerlee et al. 1994). Grain yield was used as the response variate because it is often the only true measure of productivity as the plant itself integrates across all factors, including soil, climate, pests and diseases, which affect productivity.

The treatments and management practices

Table 1 gives the treatments included in this analysis and the number of peer-reviewed publications for each treatment. The treatments were maize grown after (1) herbaceous green manure legumes, (2) non-coppicing woody legumes, (3) coppicing woody legumes, (4) natural fallows, (5) continuously cropped fully fertilized monoculture maize, and (6) continuously cropped unfertilized monoculture maize. Maize rotation with food legumes and alley cropping were not considered here.

Table 1 Summary statistics of maize yield differences (D, t ha−1) in the different treatments

Green manure legumes are those that are grown to be turned under as soil amendment and nutrient sources for subsequent crops (Cherr et al. 2006). Data for this came from 54 publications. The legume genera reported in the studies reviewed here included Aeschynomene, Canavalia, Calpogonium, Centrosema, Chaemacrista, Clitoria, Crotalaria, Desmodium, Glycine, Lablab, Macroptilium, Mucuna, and Stylosanthes. In this analysis distinction was made between management of green manure legumes as rotational fallows and relay intercrops. In the rotational fallows, the legumes are left to grow for 1 year and then biomass is incorporated during land preparation in the following season. Then a monoculture maize crop is planted. In the case of relay intercropping, the legumes are planted within a weeks to a month after planting maize. After the maize harvest, the legumes are left to grow as short fallows until land preparation for the following maize crop.

Non-coppicing species are woody shrubs or trees that do not re-grow when cut at the end of a two to 3-year fallow period (Sileshi et al. 2005). Data for this came from 48 publications. Non-coppicing species belonged to the genera Cajanus, Sesbania and Tephrosia. As in green manure legumes, distinction was made between the management of non-coppicing species as fallows and relay intercrops. In the literature, fallows of non-coppicing species have been variously referred to as ‘improved fallows’, ‘sequential fallows’ or ‘rotational fallows’. The trees may be left to grow as 1-, 2- or 3-year fallows. In the analysis this was defined as “fallow length”. After clearing non-coppicing fallows, maize is cropped for one, two or three consecutive seasons. This was defined as length of “post-fallow cropping” in the analysis. In some studies, 25, 50 or 100% of the recommended dose of fertilizer was applied to the maize cropped after the fallow or in relay intercrops. This variable was defined as “fertilizer amendment” in the analysis.

Coppicing species are leguminous woody trees that are able to re-sprout when cut back (Chintu et al. 2004). Data on these species came from 10 peer-reviewed publications. Coppicing legumes are left to grow for 2 years as fallows. Then they are cut back and maize is planted every year between the stumps. In the long run, this essentially becomes an intercropping system (Akinnifesi et al. 2007). As the stumps re-sprout, the biomass is cut back two to three times during the maize cropping season and incorporated into the soil. Members of the genera Acacia, Caliandra, Flemingia, Gliricidia, and Leucaena were the commonly used coppicing legumes (Sileshi et al. 2005).

The natural fallow involved leaving plots to vegetate naturally with native legume and grass species for one to several years (Hauser et al. 2006). At the end of the fallow period, the biomass is incorporated into the soil. Maize is cropped for one to several seasons before the land is left fallow. Data on maize grown after natural fallows came from 29 publications.

Continuously cropped fully fertilized monoculture maize data came from 52 publications. In all cases maize has received the fertilizer recommended for the specific site. All 94 publications had continuously cropped unfertilized monoculture maize, which was used as the control.

Data retrieval criteria

Meta-analysis requires an explicit definition of the population of studies of interest. It also requires an explicit definition of criteria to determine the eligibility of the studies to be included, how their quality will be assessed, what data will be extracted, what comparisons will be made (Gates 2002). This is because if not carefully considered, the selection criteria can exclude compelling studies or alternatively include comprehensive sets of studies that only tangentially address a hypothesis (Lortie and Callaway 2006). For data to be included in this analysis, the study must fulfil all of the following criteria: It must (1) have been published in a refereed journal, book chapter or peer-reviewed proceeding, (2) have originated from sub-Saharan Africa, (3) have reported maize yield from at least one legume species used for green manure or improved fallow (treatment) and a corresponding maize yield from an unfertilized plot (control), (4) be a well designed, randomized and replicated experiment either on a research station or on farmers’ fields, and (5) have reported the mean (and if possible the standard deviation or variance) as numerical or graphical data or this was available by personal communication.

The studies included were located by searching through computer library databases. However, this alone does not provide a comprehensive search (Gates 2002). Therefore, it was supplemented with checking the references of published studies and manual searching through conference abstracts, published proceedings, book chapters, monographs and direct contacts based on our extensive knowledge on studies conducted in the Sub-Saharan Africa. The study information was then coded and a database was created in EXCEL. A total of 160 publications that reported maize yield from improved fallow and green manure legumes were found. However, 93 peer-reviewed publications fulfilled all the criteria listed above (Appendix).

These publications covered a wide variety of agro-ecological conditions including humid tropical, savanna, semi-humid and semi-arid zones of West, Central, East and southern Africa (Appendix). Altitudes of study sites ranged from low-lying (15 m a.s.l.) coastal areas of West and East Africa (Bennin, Togo, Kenya) to high altitudes of up to 2,100 m a.s.l. in East Africa (Ethiopia). Average annual rain fall of the study sites ranged from 642 to 2400 mm. Over 40% of the study sites came from areas that receive bimodal rainfall, while 60% came from areas of unimodal rainfall. Herbaceous green manure legumes and non-coppicing species were recorded from almost all the countries, while data on coppicing legumes were available only from Tanzania, Malawi, Zambia and Zimbabwe (Appendix Table 1).

Further screening was also done on the data in the publications selected for analysis. In cases where the same data has been presented by the same author in two or more different publications, only one was included in this analysis. Meta-analysis assumes independence of data being analyzed. For example, including multiple results from a single study may alter the structure of the data, inflate sample size and significance levels and increase the probability of type I error. However, the loss of information caused by the omission of multiple results in each study may become a more serious problem than that caused by violating the assumption of independence (Gurevich and Hedges 1999). In this analysis, when more than one treatment was available in the same publication or when data from different seasons and sites were reported, all were included. This yielded a total of 1681 separate pairs of means (k = treatment and control).

A large proportion (63%) of the studies was from trials on research stations, while the rest (37%) were from on-farm trials. Most (>90%) of the on-station trials were laid out as randomized complete blocks and a few were split-plot and other designs with replications ranging from three to six. On-farm experiments mainly used farms as replicates. The management of maize was assumed to be similar in the treatment and control plots, and hence the control plots are subject to the same level of variation as the rest of the experiment. It is further assumed that maize variety and treatment effects are not confounded, that is, in each study the same variety was used in the treatment and control groups. It is also assumed that the designs and methods were homogenous across studies and that they produce similar sampling errors (Gurevitch and Hedges 1999).

Choice of the effect size

In meta-analysis, choice of an effect size metric involves conceptual issues that link the metric to the hypothesis, as well as statistical ones that require some knowledge of the properties of possible estimators of the desired quantity (Gates 2002; Gurevitch and Hedges 1999; Hedges et al. 1999). Meta-analysis can only provide meaningful summaries if the effect size index used is a meaningful summary of any one experiment (Gurevitch and Hedges 1999; Hedges et al. 1999). In this analysis we used the response ratio (RR) considering its application in ecology (Gurevitch and Hedges; Hedges et al. 1999; Osenberg et al. 1999) and agriculture where yields from treatment and control were compared (Miguez and Bollero 2005; Tonitto et al. 2006). The RR, the ratio of some measured quantity in experimental (Me) and control (Mc) groups, quantifies the proportionate change that results from an experimental manipulation (Hedges et al. 1999). In addition to RR, we used the mean difference in yield between the treatment and control (D = Me − Mc) because of its ease of interpretation in terms of absolute yield increase (t ha−1). The yield difference is also more relevant when comparing potential gains to required investment and input costs. Therefore, the bulk of the discussion is based on yield differences. RR was log-transformed to ensure normality (Hedges et al. 1999).

Assessing publication bias

Publication bias and normality in the data were assessed using descriptive statistics and normal quantile–quantile (Q–Q) plots. The normal Q–Q plot is an effective diagnostic tool for checking normality of the data and publication bias (Wang and Bushman 1998). It was constructed by plotting the empirical quantiles of the data against corresponding quantiles of the normal distribution of the yield differences (D) and log-transformed response ratios (RR). If the empirical distribution of the data is approximately normal the points on the plot will fall on a straight line defined by Y = X with the slope equal to unity, where Y is the ordinate and X is the abscissa. If there is just natural variability, the points will remain reasonably close to the straight line. If the data are skewed to the right a U-shape will emerge. In the presence of publication bias when the true effect size is positive, the plot of the effect sizes will also have a longer right tail (Wang and Bushman 1998).

The statistical model

Special analytic methods are needed because the log response ratios (L i) and yield differences (D) are not expected to be identically distributed, i.e. the variances of the observations (v i) are assumed to be unequal (Hedges et al. (1999). There are two components of variation in the L i: within studies (v i) and between studies (\(\mathop {\overline \sigma }\nolimits_\lambda ^2 \)). The within studies variance is due to sampling variation in the estimates for each experiment, i.e. variation of L i about the parameter value. Computation of the variance (v i) for each ith study was done following Miguez and Bollero (2005). The between-studies variance represents the variation between experimental results that would remain even if the estimates from all of the experiments had negligible internal standard errors. This between studies variance is often of scientific interest because it quantifies the degree of true (non-sampling) variation in results across experiments (Hedges et al. 1999). In summarizing results from k independent studies (pairs of means), effect sizes were weighted by the reciprocal of their variances as this gives greater weight to experiments whose estimates have greater precision and hence increasing the precision of the combined estimate (Miguez and Bollero 2005). Therefore, the weighted mean log response ratio (L w; Hedges et al. 1999) was used for analysis (Miguez and Bollero 2005).

A mixed modelling approach was adopted in this analysis because this enables one to make inferences about treatments that apply to a population of studies (Miguez and Bollero 2005). The mixed modelling procedure was also appropriate as the data gathered across studies were unbalanced with respect to predictor variables. The general form of mixed-effects linear models is:

$$L_i = X\beta + Zb + \varepsilon $$

where L i is the (n × 1) vector of summary statistics (log RR or D) from a number of k-related but independent studies, X(n × p) is the design matrix describing study characteristics that influence fixed effects, β(p × 1) is the vector of fixed-effects parameters, Z(n × q) is another design matrix describing the covariates for the random effects, b(q × 1) is the vector of random effects or the residuals on the between-study level, and ε(n × n) is the matrix of residuals on the within-study level.

To make the model more realistic the following assumptions were made: (1) Observations from the same study will be correlated. This was allowed for by including a random term with variance \(\sigma _s^2 \). (2) Many of the studies in the database contained observations from different seasons and/or locations. This imposes further structure on the within study correlation, which can be represented by further nested or crossed random effects. (3) The treatment effect is assumed to vary between studies. This is not just due to sampling errors but because the environment of the study modifies the true effect in that study. This can be modelled with a random study x treatment interaction term. (4) The variation in treatment effects across studies may not be the same for each treatment. Hence, the random effect in three should be heterogeneous between treatments. (5) The within-study residual could also be heterogeneous between studies. This is allowed for by letting the residual variance be \(\sigma _j^2 \) for study j. (6) The treatment effects may be modified by measured environmental covariates. Most of these modifications were needed to estimate the 95% CI correctly. The Akaike information criterion was used as measure of parsimony in deciding on the linear mixed model that gives the correct estimate of the 95% CI.

A meta-analysis would have to be based on studies that had specifically and correctly investigated a research question. Therefore, comparison of treatments was restricted to those studies that satisfied specific criteria, and parameter estimation proceeded in two steps. In the first step, RR and D were estimated after excluding data where legume fallows were amended with inorganic fertilizer to allow a reasonable comparison between legume fallows, natural fallows and fully fertilized maize. In the second step, analysis of coppicing and non-coppicing legume data was conducted separately to allow a comparison between legume fallows amended with fertilizer and those not amended.

We were specifically interested in how covariates describing biological characteristics of the study species or aspects of the experimental design and management influenced the magnitude of yield response. The covariates were soil type, altitude, rainfall, legume management (fallow, relay), length of fallow and length of post-fallow cropping. Since individual studies reported the soil types of the respective sites differently, the USDA and other soil groups were assigned the equivalent FAO soil group name through pro-parte matching. About 13% of the data points were excluded from the analysis either because the soil type was not reported or where reported it was generalized to cover a large area, for example for several farms. Some soil types (e.g. Andosols) were excluded as the data points for some treatments were very few. Altitudes were classified as high (>1,400 m a.s.l.), mid (700–1,400 m a.s.l.) and low (<700 m a.s.l.). Rainfall (long-term average annual) of sites was also classified as low (<700 mm), medium (700–1,400 mm) and high (>1,400 mm). A site productivity score was derived from the control maize yield as 1: <0.5 t ha−1, 2 = 0.5–1.0 t ha−1, 3 = 1–1.5 t ha−1, 4 = 1.5–2 t ha−1, 5 = 2–3 t ha−1, 6 > 3 t ha−1. This is based on the logic that the control maize yield can serve as a proxy for site productivity as it represents the potential yield (integrating the effect of soil, climate, pests, etc.) at a particular site and management conditions. For convenience, scores 1 and 2 defined low potential, 3 and 4 medium potential and scores above 5 defined high potential sites.

In all cases, mean values and 95% confidence intervals of the back-transformed response ratios and yield differences are presented. Statistical inference was based on the 95% CI because it functions as a very conservative test of hypothesis and it also attaches a measure of accuracy to sample statistic (Sim and Reid 1999). Therefore, it allowed us to estimate the degree to which the observed value is likely to be the “true” (population) value. Means were considered to be significantly different from one another if their 95% CI were non-overlapping. Mean yield differences (D) and response ratios (RR) were considered significantly different from 1 and 0 if the 95% CI did not overlap 1 and 0, respectively.

Results

Variability in yield response

Summary statistics of mean yield differences (D) are presented in Table 1. The variability in D was highest in natural fallows (CV = 229%) and lowest in continuously cropped and fertilized monoculture maize (CV = 70%). There were substantial differences between the mode and mean indicating distinct asymmetry in the effect size distribution (Table 1). The normal QQ plots also indicate the presence of asymmetry and publication bias. In the QQ plot of the yield difference, the curve is slightly U-shaped, indicating that the data are skewed to the right (Fig. 1). The plot of the response ratios (RR) is S-shaped and has one “bump” below and another “bump” above the straight line (Fig. 1), suggesting the fact that the studies come from different populations. In both cases, most of the points do not fall on the straight line defined by Y = X.

Fig. 1
figure 1

Normal quantile–quantile plots of the yield differences (D) and log-transformed response ratios (RR) for exploring the normality assumption and publication bias. The circles represent individual observations, while the solid line (Y = X) represents a standard normal distribution

Figure 2 presents the scatter plots of the relationship between the observed yield in the treatment (Y-axis) and the yield of the respective control plot (X-axis) for each study. The majority of the data points from the fertilized monoculture maize, coppicing, non-coppicing woody legumes are above the Y = 2X line, which represents doubling of yield relative to the control. In the case of natural vegetation fallows, most of the data points fell below the Y = 2X (Fig. 2). In all treatments, doubling of yields over the control was achieved where the control plots yield less than 4 t ha−1 (Fig. 2). Tripling of yield relative to the control (i.e. Y = 3X) occurred only where the control plots yield less than 2 t ha−1 (Fig. 2).

Fig. 2
figure 2

Scatter plots of treatment mean yields against control mean yields (t ha−1). The solid line represents situations where the treatment and control yield are the same (i.e. Y = X, RR = 1 and D = 0). The broken (Y = 2X) and faint lines (Y = 3X) represent situations where the treatment plots yield twice (RR = 2) and three times (RR = 3) as much as the control plots

Figure 3a presents the cumulative proportion of cases in each yield difference (D) category. D was highest (2.3 t ha−1) in fully fertilized maize, while it was lowest (0.3 t ha−1) in natural fallows. The probability of achieving D > 1.0 t ha−1 in fertilized monoculture maize was 0.77, while it was only 0.14 in the natural fallow (Fig. 3a). The natural fallow had yields equal to the control in 27% of the cases (Table 1). Herbaceous green manure legumes, non-coppicing and coppicing woody legumes had predominantly (>75% of the cases) positive effects on maize yield (Fig. 3a; Table 1). D was 1.6 t ha−1 in coppicing legumes, 1.3 t ha−1 in non-coppicing legumes and 0.8 t ha−1 in green manure legumes (Table 1).

Fig. 3
figure 3

Plots of cumulative proportion of pairs against change (D) in yield (a), D against site productivity classes (b), means and 95% confidence intervals of response ratios (c) and yield differences (d) in the different treatments excluding legumes fallows amended with fertilizer. Means (open circles) are not significantly different from one another if their 95% confidence intervals (error bars) were overlapping. The mean and 95% confidence intervals are in the original (back-transformed) scale. The dashed horizontal lines in (c) and (d) represent situations where yields are the same in the treatment and control plots (RR=1 and D=0)

Maize yield was more than double or triple that of the control (RR > 2) in 67% of the observations in coppicing fallows. In non-coppicing legumes, herbaceous green manure legumes and natural fallows, doubling or tripling was recorded in 45, 16, and 19% of the observations, respectively. Yield increase was higher on sites where the control plot achieved less than 2 t ha−1 (low to medium potential sites) than on high potential sites (Fig. 3b). In the case of natural fallows, the yield difference from the control became narrower as site productivity increased (Fig. 3b).

The 95% confidence intervals of RR (Fig. 3c) and D (Fig. 3d) show similar patterns. Except in the natural fallow, RR and D were significantly higher than 1 and 0, respectively. The 95% confidence intervals of RR from the natural fallow included 1 (Fig. 3c), indicating lack of difference between the unfertilized maize and maize grown in natural fallows. However, the yield increase (measured by D) was significantly higher in natural fallows as the 95% confidence intervals did not include 0 (Fig. 3d). Response was higher in maize grown in planted fallow or green manure legumes than in natural fallows (Fig. 3c and d).

Moderators of yield response

Legume management

Response in green manure legumes managed as pure fallows was higher than in those managed as relay intercrops. The 95% confidence interval of mean RR in rotational fallows (1.49–1.90) did not overlap with those in relay intercrops (1.12–1.43). The 95% confidence interval of mean D in rotational fallows (0.8–1.2 t ha−1) also did not overlap with those in relay intercropping (0.4–0.6 t ha−1).

When comparing relay intercropping with rotational fallows of non-coppicing species, 48 publications with a total of 391 pairs of observations were used. Improved fallows constituted 70.6% of the cases and relay intercrops the remaining 29.4%. Although the 95% confidence intervals overlapped response was higher in rotational fallows than in relay intercrops. The 95% confidence interval for RR was 1.27–1.43 in rotational fallows, where as in relay intercrops it was 1.11–1.30. The 95% confidence interval of D in rotational fallows (0.88–1.41) overlapped with that of the relay intercrops (0.34–1.01).

Rotational fallows of non-coppicing species were managed as 1 year in 21.6%, 2 years in 44.4%, and 3-year fallows in 34.0% of the cases. The 3-year rotation gave higher RR compared to the 1- and 2-year fallows (Fig. 4a). However, RR≥2 in 56 and 51% of the cases in 1-year and 2-year fallow-crop rotations, respectively, indicating compensation for the yield forgone during the fallow period. In 3-year rotations, compensation for the forgone yield was noted in only 45% of the cases which had RR≥3. The 95% confidence intervals of D in 3-year fallows (1.1–1.8 t ha−1) overlapped with 1-year fallows (0.9–1.7 t ha−1) and 2-year fallows (0.8–1.3 t ha−1).

Fig. 4
figure 4

Changes in response ratios (RR) with fallow length (a), post-fallow cropping (b), and fertilizer amendment in rotational fallows (c), relay intercrops (d) and changes in yield differences with post-fallow cropping and fertilizer amendment (e) in non-coppicing legumes. The 0%, 50% and 100% amendments represent application of 0%, 50% or 100% of the recommended fertilizer in the maize grown during the post-fallow cropping phase. Means (open circles) are not significantly different from one another if their 95% confidence intervals (error bars) were overlapping. The means and 95% confidence intervals are in the original (back-transformed) scales

After clearing non-coppicing legume fallows, maize was cropped (post-fallow) for one season in 65.0%, for two seasons in 24.5% and three seasons in 10.5% of the cases. There was no difference in RR between the one and two season and one and three season post-fallow crops (Fig. 4b). Variability in response increased with post-fallow cropping (Fig. 4b). However, the 95% confidence intervals of D indicate that response is higher in the first post-fallow crop (1.3–1.9 t ha−1) than in the third (1.0–1.2 t ha−1).

Fertilizer amendment

Where maize cropped after non-coppicing species was amended with fertilizer, the data for rotational fallows and relay intercrops was analyzed separately. A total of 48 peer-reviewed publications with 456 pairs of means were included in this analysis. In analysing the effect of fertilizer amendment in rotational fallows, fallow length and post-fallow cropping were used as covariates. However, neither their main nor interaction effects were significant. Response in rotational fallows of non-coppicing legumes was 28% higher when post-fallow plots were amended with 50% of the recommended dose of fertilizer than the similar plots that were not amended (Fig. 4c). Although amendment with 100% of the recommended dose of fertilizer increased yields by 56%, response was highly variable (Fig. 4c). In post-fallow plots not amended with fertilizer, yield declined with the length of post-fallow cropping until it approached yields from the control in the third season (Fig. 4d).

In relay intercropping with non-coppicing legumes, amending the soil with 50 and 100% of the recommended fertilizer increased yield by 27 and 42%, respectively, over similar plots not amended with fertilizer (Fig. 4d). Response was also higher by 38 and 32% where 50 and 100% of the recommended dose of fertilizer was applied to the maize intercropped with coppicing species than in plots without fertilizer amendment. In all cases, amendment with 100% of the recommended dose of fertilizer did not significantly differ from the 50% amendment.

Altitude, rainfall and soil type

Overall yield response was higher in mid altitudes (700–1,400 m a.s.l.) than high (>1,400) and low (<700 m a.s.l.) altitudes (Table 2), and in areas with high rainfall (>1,400 mm) than those that receive medium to low rainfall (<1,400). Response was also higher on Lixisols than on Ferralsols and Nitisols. In fully fertilized maize, response was higher on Acrisols than on Nitisols (Table 3). The effect of soil type on response ratios was not clear in the coppicing, non-coppicing and green manure legumes, although response was generally higher on Lixisols (Table 3).

Table 2 Summary table of effect of altitude, rainfall and soil type on maize yield response across all treatments
Table 3 The mean response ratios (RR) and their lower (LCL) and upper (UCL) 95% confidence limits for the treatments on different soil types (FAO classification)

Discussion

We believe the studies included in this analysis adequately capture the diversity of environments, legume species and maize genotypes under smallholder agriculture. However, publication bias cannot be ruled out. The mode gives some indication of the publication bias. If the mode is indeed a better estimate of average effects than the mean, then the benefits from legumes are more modest than those indicated by the mean in most cases. The asymmetry in Fig. 1 is probably due to the fact that studies with non-significant results were not published at all. This distinct asymmetry in the effect size distribution suggests the type of publication bias that exists when the population effect size differs from zero (Wang and Bushman 1998). The bias may not be simply due to unpublished negative results. Some studies could have been deemed to be failures because the legumes did not establish properly. For example, out of 93 sites where improved legume fallow trials were established in southern Africa, maize was harvested from only 72 sites as a result of poor establishment of the legumes (R. Coe, personal communication). The difficulty to capture such studies is one of the weaknesses of this analysis.

Publication selection bias due to exclusion of studies for reasons outlined under data retrieval criteria is expected to have minor effect. Most of the studies excluded from the analysis compared maize yields from treatments with yields from natural fallows, and no continuous unfertilized maize (our control) was available. Some studies compared legume fallows with natural fallows that were previously cropped during the growth of the managed fallow. Our decision to exclude those studies was based on the following logic. Firstly, use of natural fallows as the control in cross-regional syntheses will not be valid because the species composition of natural fallows varies not only from region to region but also from site to site. Secondly, the use of a natural fallow as the control is valid only in areas where continuous cropping without fertilizer is not the norm as in the humid tropics of West Africa (Hauser et al. 2006). In parts of East and southern Africa where continuous cropping is the norm, the use of a natural fallow as the control might erroneously bias the results.

The first part of this analysis focused on legumes without fertilizer amendment. The analysis of data on organic inputs and fertilizer amendment was restricted only to those studies that specifically assessed the interaction. Our analyses, based on the dataset that satisfied these minimum requirements, clearly show that fertilizer gives the best response followed by coppicing species. Response did not differ among the coppicing, non-coppicing woody legumes and herbaceous green manure legumes. However, yield response in the legumes was significantly higher than in natural fallows and unfertilized maize (Figs. 3 and 4).

Maize yield response varied with legume establishment and management practices that affect primary productivity of the legumes as well as site productivity moderated by edaphic (e.g. soil type), climatic variables (e.g. altitude, rainfall). Clearly, yield response was higher when herbaceous and woody legumes were managed as rotational fallows than as relay intercrops. Although response was highest in maize grown after 3-year fallows of non-coppicing legumes, the 3-year fallow has no clear advantage over a 2- or 1-year fallow in terms of yield increase.

Amending the post-fallow plots with 50% of the recommended fertilizer dose further increased yields by more than 25% over similar plots that were not amended. However, amendment with 100% of the recommended fertilizer did not significantly increase yields. This indicates that legumes can play an important role in increasing fertilizer use efficiency (Vanlauwe et al. 2001) and reducing fertilizer requirements. Positive interactions between nutrients from legumes and inorganic fertilizer have been demonstrated. However, the interaction is complex (Vanlauwe et al. 2001) and little is known about the mechanisms. Future research needs to focus on analyzing the impact of legumes on fertilizer use efficiency and reducing fertilizer requirements for a given yield target.

Inherent site productivity appeared to influence the performance of maize, in addition to the legumes. Tripling of yields over the control is not achievable in high potential site (where the control plots yield more than 2 t ha−1). Response was low on sites that receive low and moderate rainfall and on fertile soils. Overall response was highest on Lixisols, which have low levels of plant nutrients (making agriculture possible only with frequent fertilizer applications). In fully fertilized maize, response was generally higher on Acrisols. These soils are inherently infertile and become degraded very quickly when utilized (Stocking and Murnaghan 2001). Response to fertilizer was poorest on Nitisols, which are one of the most fertile soils of tropics. Maize cropped after non-coppicing and green manure legume species also responded poorly on Ferallsols, which have low supply of plant nutrients (especially low levels of available phosphorus) and strong acidity (Stocking and Murnaghan 2001). Legumes and biological nitrogen fixation are particularly sensitive to these constraints and poor legume growth and nitrogen fixation would be expected (Giller et al. 1997).

The analysis above has investigated the aggregate effect of factors that contribute to variability in response at the macro level. Despite the huge variation, the mean effects of legumes on maize yield are significantly more positive than natural fallows. The studies reviewed here have attributed this to various factors. The most common explanation was improvement in nutrient availability as a result of (1) N input by biological N2 fixation (BNF) (Adu-Gyamfi et al. 2007; Chikowo et al. 2004; Kaizzi et al. 2004; Ojiem et al. 2007; Wortmann and Kaizzi 2000), (2) retrieval of nutrients from below the rooting zone of maize crops (Chintu et al. 2004; Mekonnen et al. 1997), (3) reduction of nutrient losses from leaching, runoff and erosion (Hartemink et al. 1996; Phiri et al. 2003), and improved soil water conditions (Vanlauwe et al. 2001).

The legumes accumulate large amounts of N, up to 99% of which is N derived from the atmosphere (Adu-Gyamfi et al. 2007; Kaizzi et al. 2004). For example, the amount of N fixed by pigeon pea in maize intercrops was estimated at 37.5–117.2 kg N ha−1 year−1 in Malawi and 6.3–71.5 kg N ha−1 year−1 in Tanzania (Adu-Gyamfi et al. 2007). In Uganda, Mucuna accumulated 170–350 kg N ha−1 year−1, up to 97% of which is released over a period of 25 weeks (Kaizzi et al. 2004). Some 7.5–19% of the N released is taken up by the subsequent maize crop, resulting in 25 to 68% increase in yield (Kaizzi et al. 2004). For example, the fertilizer value of total N was estimated to exceed 50 and 69 kg N ha−1 that can replace the current need for mineral N at Tanga in Tanzania and Jimma in Ethiopia, respectively (Bogale et al. 2001). Some legumes were more effective in improving soil productivity and maize yield than others probably due to differences in biomass production, N2 fixation and recovery of leached nutrients. In Uganda, S. sesban and T. vogelii contributed to the soil N balance more than Mucuna and C. cajan fallows, deriving about 50% of plant N from the atmosphere (Wortmann and Kaizzi 2000).

Rotation of maize with legume fallows can result in more effective subsoil nitrate and water utilization than maize monoculture (Hartemink et al. 1996; Chirwa et al. 2007; Nyamadzawo et al. 2008; Phiri et al. 2003). Legumes can also have other beneficial effects on crop yield as they can improve availability and uptake of nutrients such as phosphorus (Akinnifesi et al. 2007; LeMare et al. 1987; Randhawa et al. 2005).

Increased maize yield response has also been attributed to pest suppression by the legumes (Sileshi et al. 2008). The studies included in this analysis reveal that legumes reduce (1) infestation by arable and parasitic weeds (Akobundu et al. 2000; Gacheru and Rao 2005; Khan et al. 2006; Mureithi et al. 2003; Sileshi and Mafongoya 2003; Sileshi et al. 2006), (2) damage to maize by soil insects (Sileshi and Mafongoya 2003; Sileshi et al. 2005) and (3) plant parasitic nematodes (Arim et al. 2006). Rotational fallows of S. sesban have consistently reduced Striga infestation of maize in Kenya (Gacheru and Rao 2005) and Zambia (Sileshi et al. 2006). Intercropping of maize with Desmodium spp. has also reduced Striga and stem borer problems (Khan et al. 2006). When intercropped with maize, Canavalia, Crotalaria and Mucuna reduced the lesion nematode Pratylenchus zea damage to maize compared with a sole maize crop in Kenya (Arim et al. 2006). Intercrops may favour build-up of nematode antagonists and enhance plant resistance to nematodes through improved nutrient status and plant vigour (Wang et al. 2003), thus increasing nutrients available for plant uptake.

The discussion above indicates that the positive effect of legumes on maize yield is due to a number of interrelated factors. Legume fallow and green manure technologies also have higher benefit cost ratio than mineral fertilizer implying that there is a higher return per unit investment (Ajayi et al. 2007).

Conclusion and recommendations

The key conclusion from these analyses is that the overall effect of herbaceous and woody legumes on maize is positive and significant although considerable residual variation existed. The study has established that maize yield could be doubled or tripled relative to an unfertilized maize (control) with coppicing woody legumes (67% of the cases) and non-coppicing woody legumes (45%), whereas doubling could be achieved only in 16% of the case in herbaceous green manure legumes. Response was also higher in rotational fallows than in relay intercropping. Three year fallows of non-coppicing woody legumes had no advantage over 2- or 1-year fallows of non-coppicing species. While the choice of legume species and management may have major effects on maize yield, this analysis could not confirm the superiority of a particular species across all geographical locations. The strong point of this analysis is its ability to generalize across many published studies. The analysis clearly shows that legumes had high impact on yield in medium potential areas, while they did not have the desired effect in high potential areas. Therefore, projects promoting legumes for soil fertility improvement need to encourage farmer experimentation with several options rather than rely on wholesale promotion of a limited number of species.

The analysis also suggests that amending legume fallows with inorganic fertilizer may be important to sustain productivity over several years, as yields normally decrease with the length of post-fallow cropping period. Amending post-fallow plots with 50% of the recommended fertilizer dose could increase yields by over 25%, indicating that legume rotations may reduce fertilizer requirements by half. It also indicated that synergistic effects could be expected between organic and inorganic fertilizer sources. Where both soil organic matter and P contents are very poor, legumes may not accumulate significant amounts of biomass and will fix N poorly. To maintain positive nutrient balances for N and P in these environments, organic resources need to be combined with low rates of mineral fertilizer amendment. The legumes and inorganic fertilizer do different things, and often have complementary effects on maize yield. Therefore, one should not be promoted as a replacement for the other, but should be seen as complementing each other. To achieve impact, practices that will improve legume establishment and growth on degraded soils at the same time ensuring a more efficient recovery of applied mineral fertilizers need to be developed. In conclusion, this study has provided evidence on the potential contribution of woody and non-woody legumes to maize productivity that could now be harnessed for sustainable smallholder agriculture in sub- Saharan Africa. Therefore, we recommend that the promising woody and non-woody green manure options be evaluated under local conditions and promoted appropriately.