Introduction

Plasmodium falciparum, the most virulent malaria parasite infecting humans, is considered an important force in human evolution. As a common and not infrequently fatal pathogen that takes its greatest toll on children of pre-reproductive age, malaria’s ability to apply selective pressure on human evolution is clear. In fact, many hereditary variations and disorders of the human red blood cell, including hemoglobin S and other hemoglobinopathies, alpha and beta thalassemia, glucose-6-phosphate dehydrogenase and pyruvate kinase deficiency, disorders of the red cell cytoskeleton, and blood group polymorphisms are well recognized as escape mechanisms from malaria (Carter and Mendis 2002; Cserti and Dzik 2007; Kwiatkowski 2005; Verra et al. 2009; Weatherall 2008; Weatherall et al. 2002). P. falciparum is also associated with a wide variety of non-erythrocyte polymorphisms which may have also arisen as escape mechanisms from malaria. These include polymorphisms in cytokines, receptor molecules, complement pathway proteins, vascular adhesion molecules, nitric oxide promoters, and some HLA allotypes (Cserti-Gazdewich et al. 2009, 2010; Hobbs et al. 2002; Kwiatkowski 2005; Verra et al. 2009). Evidence also suggests that the genome of P. falciparum has experienced selective pressure to adapt to the human host (Bongfen et al. 2009; Mackinnon and Marsh 2010; Templeton 2009). Understanding the dynamics of human and Plasmodia co-evolution depends, in part, on a knowledgeable estimate of the date when P. falciparum arose as a human pathogen.

A recent report by Liu et al. demonstrated that P. falciparum evolved from ancestors of certain recently identified gorilla parasites via host switching (Liu et al. 2010). This finding was particularly remarkable in that it provided the most definitive evidence to date for a gorilla origin of P. falciparum and effectively replaced the previously prevailing theory that P. falciparum evolved in humans via the direct host switching of the related chimpanzee parasite, P. reichenowi (Holmes 2010; Rich et al. 2009). The data of Liu et al. demonstrate that certain parasites endemic to gorillas represent the closest known living ancestors of human P. falciparum and that human P. falciparum likely descended from a host-switch event from these gorilla parasites to humans. Liu’s finding is partially supported by other recent work (Duval et al. 2010; Krief et al. 2010; Ollomo et al. 2009), including Prugnolle et al. who describe gorilla parasites closely related to P. falciparum (Prugnolle et al. 2010). Given the data available to them, Liu et al. were unable to assign a date to this host switch from gorillas to humans (Liu et al. 2010).

Prior attempts to date the dawn of P. falciparum in humans have been technically limited by lack of data describing the origins of P. falciparum or more often, the lack of reliable mutation rates (Carter and Mendis 2002; Escalante et al. 1998; Hayakawa et al. 2008; Krief et al. 2010; Ollomo et al. 2009; Rich et al. 2009; Su et al. 2003). Without direct fossil dates to use in calibrating the parasite DNA mutation rate, most prior research was forced to rely on the dates when primate hosts lineages split (for which fossil evidence is available) as a surrogate for the divergence time of the hosts’ respective parasites. However, this approach, while the best previously available, risks grossly overestimating the date when P. falciparum infected humans because it assumes that malarial parasites radiated directly within the lineages of their respective hosts, rather than the very probable scenario of parasites switching hosts sometime after their respective host lineages split. For example, in assigning a date to the origin of P. falciparum in humans, Ollomo et al. assumed that P. falciparum and P. reichenowi diverged when their respective host lineages (human and chimpanzee) split and base their calculations for the mutation rate of parasite DNA accordingly (Ollomo et al. 2009). Likewise, among the calibration points used by Krief et al. was the radiation of Plasmodia species endemic to macaques relative to the divergence of primate species within the host genus Macaca (Krief et al. 2010). Hayakawa et al. estimate the parasite DNA mutation rate using the divergence of Asian old world monkeys from African old world monkeys (Hayakawa et al. 2008).

A major advance over the above approaches to estimating malaria’s DNA mutation rate was recently made by Ricklefs and Outlaw (2010). They use sequence data from avian parasites similar to malaria to calibrate an overall cytochrome b gene mutation rate of haemosporidian parasites, including P. falciparum. The beauty of the Ricklefs approach is that it not only considers parasite differences due to host divergence, but also allows for parasite divergence via host switching. Ricklefs accounted for the range in possible switch times by assuming that the probability of a parasite switching hosts at given time was uniformly distributed over the period between host lineages split divergence and the present. By looking at avian parasite divergence, they were able to achieve a large enough sample of host–parasite pairs to reliably estimate malaria’s cytochrome b gene mutation rate. Specifically, they calculated a cytochrome b mutation rate of 0.012 (±0.002 SEM) mutations per base pair per million years (by F84 distance). While this calibration remains extremely valuable, Ricklefs and Outlaw concluded that the dawn of P. falciparum as a human pathogen was 2.5 million years ago on the assumption that human P. falciparum evolved from P. reichenowi via direct host switch, an assumption that Liu et al. subsequently demonstrated to be incorrect (Liu et al. 2010). In this report, we begin with the recent advances of Ricklefs and Liu and add new data on cytochrome b sequence variation to arrive at a new estimate and confidence interval for the dawn of P. falciparum in humans.

Methods

Sequence

Cytochrome b gene sequences were obtained from Genbank (Benson et al. 2004) for 14 gorilla parasite published by Liu et al. (2010) (Genbank accessions numbers HM235041, HM235287, HM235116, HM235277, HM235061, HM235037, HM235078, HM234982, HM235035, HM235068, HM235040, HM235034, HM234990, and HM235036) are referred to here as gorilla parasite sequences 1–14, respectively. These gorilla parasite sequences were selected as a representative sample from among the gorilla parasite sequences highly similar to human P. falciparum. Thirteen human P. falciparum sequences were also used (accession numbers AF069609, AY282924, AF069608, AF069607, AY282929, AJ276847, AY282947, AY588279, AY588280, AY910012, AY910013, AY282975, and AY283003) are referred to here as P. falciparum sequences 1–13, respectively.

Alignments

Sequences were aligned using ClustalW, version 2.1 (http://www.clustal.org/download/current) with default settings for “slow and accurate” alignment (Larkin et al. 2007). Since some downloaded sequences contained regions of the mitochondrial genome outside the cytochrome b gene, aligned sequences were truncated to the 916 bp region containing only cytochrome b over which all 27 sequences align. An alignment table showing the final alignments of the truncated versions of all 27 sequences is shown in Supplementary Fig. 1. All subsequent references to the sequences refer to these 916 bp.

Calculation of Genetic Distances

Genetic distances (under the F84 model of evolution) were calculated between each of the 27 sequences using Phylip, version 3.69 (Felsenstein 1989) (http://evolution.genetics.washington.edu/phylip/getme.html) with default settings. P-distances between pairs of sequences were calculated as the number of nucleotides by which the sequences differ divided by the total length (916 bp).

Calculation of Point Estimates for Divergence Time

Point estimates for divergence times between various Plasmodia species were calculated by dividing F84 distances by the cytochrome b gene mutation rate offered by Ricklefs (0.012 per nucleotide per million years) (Ricklefs and Outlaw 2010).

Calculation of Confidence Intervals for Divergence Time

For each paired comparison of a gorilla parasite sequence to a human P. falciparum sequence, a Monte–Carlo simulation was performed to calculate a confidence interval for the parasite divergence time (t). Given an observed cytochrome b distance, a cytochrome b mutation rate (r) was chosen by sampling from a normal distribution with a mean of 1.2% per million years (as reported in the Ricklefs paper) and a standard deviation of 0.2% per million years (standard error about the mean as reported by Ricklefs). The observed number of mutations (k) between the nucleotide sequences of length L (916 bp) with a p-distance (p), \( k = p \cdot L, \) was assumed to follow a Poisson distribution with mean \( \lambda = r \cdot t \cdot L. \) That is, if two parasite sequences of length L diverged at time t in the past with a mutation rate r (in percent sites mutated per unit time), we would expect them to differ by λ nucleotides. However, the actual observed number of mutations, k, would vary from this expectation according to a Poisson distribution. In this case, we hold k fixed and are interested in estimating the probability that different divergence times (t) gave rise to the observed k. The probability distribution for t is therefore a likelihood function based on the Poisson and normal distributions, and we use this likelihood function to calculate a 95% confidence interval for t.

In particular, for each of the 182 paired comparisons of human to gorilla parasite sequences (13 human P. falciparum to 14 gorilla parasite sequences), 50,000 simulated divergence times were generated according to the above likelihood function, producing a total 9,100,000 values on which to base our final results. As aggregate results largely converged within 50,000 iterations of the simulation, for all sequence comparisons differing by the same number of mutations, the same 50,000 simulated divergence times were used. Confidence intervals for each sequence comparison were taken as the 2.5th to 97.5th percentile values out of the 50,000 simulations for that comparison. The overall confidence interval was taken as the 2.5th to 97.5th percentile values across all 9.1 million simulated divergence times and the histogram was based on a compilation of all 9.1 million simulated values.

Calculations and Analysis

Except as otherwise stated, all calculations and algorithms were performed or implemented using MATLAB, Microsoft Excel and Microsoft Visual Basic for Applications.

Results

Table 1 shows the results of 182 pairwise comparisons of the 13 P. falciparum cytochrome b sequences to the 14 gorilla parasite cytochrome b sequences, all 916 bp long. The table includes distances (p-distance and F84 distance) and divergence times (point estimates and confidence intervals). Across all comparisons, the P. falciparum and gorilla parasite sequence differed by a median of 4 bp (range 2–9 bp) out of the total 916, corresponding to a median p-distance and F84 distance of 0.44%. Based on the 95% confidence interval of all simulated divergence times, P. falciparum first infected ancestors of modern humans between 112,000 and 1,136,000 years ago (median point estimate across all 182 comparisons = 365,000 years ago). Even the upper range of the 95% confidence interval is much more recent than the prior best point estimate of approximately 2.5 million years (Ricklefs and Outlaw 2010). The range of switch times is depicted graphically in Fig. 1.

Table 1 Distances and divergence times between P. falciparum and closely related gorilla parasites
Fig. 1
figure 1

Frequency of simulated host switch dates. Illustrates the probability distribution describing the times when P. falciparum may have first infected ancestors of modern humans in a host switch from gorillas. Each bar represents the number of simulated outcomes (out of a total of 9.1 million) falling within time periods divided into 50,000 year intervals with upper bounds of intervals shown on the X-axis

Discussion

Here, we present a median estimate of 365,000 years ago and a 95% confidence interval of 112,000–1,136,000 years ago for the introduction of P. falciparum as a human pathogen. Our revised estimate is much more recent than prior estimates. However, our revised estimate is subject to several limitations and considerations. As it is currently unclear how the various gorilla parasites will be eventually grouped into species, our analysis makes no correction for intraspecies variation. In addition, because it remains unknown which specific gorilla parasite represents the closest identified living relative of human P. falciparum, we base our overall timeline on an aggregate of closely related parasites. Future research determining the specific gorilla parasites most closely related to human P. falciparum may serve to refine the estimate we provide. Future research may also identify an even more closely related parasite than the cohort describe by Liu.

Another consideration is that we calculate divergence time point estimates under the F84 evolution model, to maintain internal consistency because Ricklefs and Outlaw (2010) use this model in calibrating their clock. However, given the high degree of similarity between sequences, the specific evolutionary distance model chosen is unlikely to substantively affect overall results. As noted in Table 1, F84 distances and p-distances are nearly identical for comparisons of P. falciparum and gorilla parasite sequences. Given the minimal impact of precise evolutionary model choice and in the interest of simplicity, we apply p-distances in our Monte–Carlo simulations.

Our analysis relies on the assumption that malaria cytochrome b gene mutation follows a molecular clock. Aspects of this assumption have been applied previously in the peer review literature, in particular, in Ricklefs and Outlaw (2010). In addition, we more formally tested this assumption using the Tajima relative rates test (Tajima 1993) using an automated script, which was validated against the Tajima test implementation within the Mega 5 software package (Tamura et al. 2011). We applied this Tajima test 182 times using the 182 combinations of P. falciparum and gorilla parasite sequences (truncated to the 916 bp) as the two test sequences in each comparison and using the corresponding 916 bp region of a P. vivax mitochondrial genome (accession number NC_007243.1) as an outgroup in all comparisons. This test failed to exclude the molecular clock hypothesis (P > 0.05) in all 182 tests. Thus, the assumption of a molecular clock is not statistically excluded and even slight deviations not detected by the Tajima test would not likely alter the qualitative conclusions of our study.

Nevertheless, the dates derived here depend on the parasite DNA mutation rate provided by Ricklefs and Outlaw. This rate is subject to its own assumptions. For example, Ricklefs and Outlaw assume that the time of avian hemosporidian parasite host switching follows a uniform probability distribution over the time span of avian host divergence. The Ricklefs and Outlaw calibration is also based on an established rate of avian host cytochrome b mutation (2.1% per nucleotide per million years). The standard error Ricklefs and Outlaw offer surrounding their mutation rate does not incorporate uncertainty in this rate of avian host cytochrome b mutations and accordingly, neither does our confidence interval. However, at least one analysis predicted very narrow confidence limits surrounding this avian mutation rate (±0.1% per bp per million years) (Weir and Schluter 2008). Thus, imprecision in the avian mutation value should not significantly expand uncertainty in our results.

Our findings are in agreement with those of others who have suggested a more recent origin of P. falciparum as a human pathogen. In particular, Krief et al. suggested that P. falciparum first infected human ancestors between 78 and 330 thousand years ago. However, their result was based on the assumption that P. falciparum arose in humans due to host switch from Bonobos. In contrast, our study is based on the more recent findings of Liu et al. which demonstrate that P. falciparum most likely arose from gorilla parasites.

Many of the malaria escape mechanisms arising within the human red blood cell, such as sickle hemoglobin, are thought to date from approximately 10,000 years ago (Carter and Mendis 2002). Although our revised timeline significantly shortens the time lag between the entry of P. falciparum as a human pathogen and these adaptive mutations, a very substantial time lag still remains. We see three conceivable explanations for this discrepancy. One is that adaptive mutations actually occurred earlier than 10,000 years ago. Likewise, it is possible that at least one of the aforementioned assumptions on which our results are based is incorrect and that our results overstate the time when P. falciparum first infected human ancestors. However, we favor a third explanation: the time lag may simply highlight the difference between the entry of a pathogen and later development of evolutionary adaptations to the disease caused by that pathogen. The rise of human mutations would be expected to occur in response to conditions that increase the prevalence and severity of an infectious disease. For example, Yersinia pestis is now recognized to have originated in China (Haensch et al. 2010) as long as 40,000 years ago (Keim and Wagner 2009). Yet, it was not until the silk trade introduced Y. pestis to a Europe characterized by crowding, rodents, fleas, and human travel that waves of plague resulted in genetic selection. In the case of malaria, Solomon and Bodmer have estimated that hemoglobin S mutations may date from as early as 150,000 years ago (Solomon and Bodmer 1979). Despite this, it is generally agreed that red cell mutations which mitigate malaria’s lethality may not have become common until as recently as 10,000 years ago when the prevalence and severity of P. falciparum increased as a result of the development of farming, domestication of animals, and human populations living in more densely populated communities. Nevertheless, our results suggest that P. falciparum still appears to be a much older than many other human pathogens which date to the rise of agriculture (Wolfe et al. 2007).

Finally, our confidence interval dating P. falciparum lies within the timeframe dating the dawn of Homo sapiens (Endicott et al. 2010). The previous estimates that malaria arose as a human pathogen 2.5 million years ago required assumptions about infectivity in homonids that pre-date Homo sapiens. In contrast, our revised estimate suggests that P. falciparum may have arisen as a pathogen by host switch from Plasmodia with an adaptation specific to our species. We hope this revised estimate of the date of origin of P. falciparum in humans may also serve as a foundation for identifying new genetic polymorphisms which might offer insights into malarial pathogenesis, host-resistance, and therapy.