Introduction

African population history

Africa is a region of considerable genetic, linguistic, and cultural diversity. There are over 2,000 distinct ethno-linguistic groups in Africa, speaking languages that constitute nearly a third of the world’s languages (http://www.ethnologue.com/). These languages have been classified into four major language families: Niger-Kordofanian (spoken predominantly by agriculturalist populations across a broad geographic distribution in Africa), Afro-Asiatic (spoken predominantly by northern and eastern Africa pastoralists and agro-pastoralists), Nilo-Saharan (spoken predominantly by eastern and central African pastoralists), and Khoisan (a language containing click-consonants, spoken by southern and eastern African hunter–gatherer populations) (Fig. 1). These populations live in a diverse set of environments and climates, including tropical forests, savannah, desert, and coastal regions and have diverse subsistence patterns and exposure to infectious disease. These populations also have high levels of genetic and phenotypic diversity (e.g. high HLA diversity, Supplemental Table).

Fig. 1
figure 1

A map of selected migrations and language family distributions in Africa (adapted from Reed and Tishkoff 2006). More recent migrations in historical times are represented by thin arrows and inferred prehistoric migrations are represented by medium arrows

The pattern of genetic variation in modern African populations is influenced by their demographic history, which affects all regions of the genome, as well as by natural selection, which affects specific loci that play a role in adaptation. African populations have a complex demographic history, consisting of ancient and recent population expansion and contraction events, short and long range migrations (e.g. the migration of agricultural Bantu-speakers from West Africa throughout sub-Saharan Africa within the past ~4,000 year and the migration of Khoisan-speakers from eastern to southern Africa within the past ~20,000–40,000 year), and population admixture (Lahr and Foley 1998; Reed and Tishkoff 2006; Tishkoff et al. 2007a). To date the earliest claimed fossil evidence for anatomically modern humans has been identified in Ethiopia (Omo 1) and dated to about 195,000 years ago (McDougall et al. 2005); however, the “modern” characteristics of this specimen have been debated (as it displays important differences in the brow and chin regions) and other records indicate a slightly more recent age of 190,000 years or less (White et al. 2003; Haile-Selassie et al. 2004). Following a period of population differentiation in Africa, one (or more) populations migrated out of East Africa, probably towards Western India, within the past 50,000–100,000 years, resulting in a world wide range expansion of modern humans (Quintana-Murci et al. 1999, reviewed in Tishkoff and Verrelli 2003a). Because African populations are older than non-Africans, there has been more time for genetic diversity to accumulate. Phylogenetic analyses of mtDNA, Y chromosome, and autosomal haplotypes indicate that the deepest lineages are in Africa (reviewed in Tishkoff and Verrelli 2003a). Within Africa, the oldest mtDNA and Y chromosome lineages are found amongst the click speaking hunter–gatherers in southern Africa (i.e. the Khoisan or !Kung San), although there is some evidence for shared ancestry between the southern African and East African click-speaking groups, suggesting a possible East African origin of these populations (Chen et al. 2000; Gonder et al. 2007; Hammer et al. 2001; Passarino et al. 1998; Scozzari et al. 1999; Tishkoff et al. 2003, 2007a).

The migration of modern humans out of Africa resulted in a population bottleneck and concomitant loss of genetic diversity (Liu et al. 2006). Numerous studies have observed higher levels of nucleotide and haplotype diversity in Africans compared to non-Africans, reflecting a larger effective population size in African populations (Tishkoff et al. 2003). Non-African populations appear to have a subset of the genetic diversity present in sub-Saharan Africa, higher levels of linkage disequilibrium, and larger and more uniform haplotype blocks relative to Africans (Tishkoff et al. 2003; Tishkoff and Kidd 2004; Tishkoff and Verrelli 2003b; Tishkoff and Williams 2002). African populations also have a more subdivided population structure relative to non-Africans, with high levels of genetic diversity amongst populations (Tishkoff et al. 1996, 1998; 2000). A recent study of 800 STRPs and 400 In/Dels genotyped in over 3,000 geographically and ethnically diverse African populations indicates at least 12–14 genetically distinct ancestral populations in Africa and high levels of population admixture in many regions (Reed and Tishkoff, unpublished).

Patterns of genetic diversity in humans in general and in African populations in particular are also influenced by natural selection (Barreiro et al. 2008). Because of differences in diet, climate, and exposure to pathogens, ethnically and geographically distinct populations are likely to have experienced distinct selection pressures, resulting in local genetic adaptations (Sabeti et al. 2007). For example, mutations causing G6PD enzyme deficiency that are associated with malarial resistance have risen to high frequency in populations exposed to malarial infection, despite the negative consequences associated with this deficiency (Sabeti et al. 2002b; Saunders et al. 2002; Tishkoff et al. 2001; Verrelli et al. 2002). Another example of local adaptation is the origin and rapid spread of mutations associated with lactase persistence in East African pastoralist populations (Tishkoff et al. 2007b). Lactase persistence (LP) is a classic example of genetic adaptation in humans. The ability to digest milk as adults is a cis-regulated genetic trait. Although a mutation associated with LP was previously identified in Europeans (Enattah et al. 2002), the genetic basis of LP in African populations remained unknown. A study of genotype and phenotype association in a sample of 43 populations from Tanzania, Kenya, and the Sudan identified three novel SNPs located ~14 kbp upstream of the lactase gene (LCT; this and all other gene names are from the HUGO Gene Nomenclature Committee (HGNC) database: http://www.genenames.org/index.html) that are significantly associated with the LP trait in African populations, and enhance transcription from the LCT promoter in vitro (Tishkoff et al. 2007b). These SNPs are located within 100 bp of the European LP-associated variant (C/T −13910). One LP-associated SNP (G/C −14010) is common in Tanzanian and Kenyan pastoralist populations, whereas the other two (T/G −13915, and C/G −13907) are common in northern Sudanese and Kenyans. Genotyping of 123 SNPs across a 3 Mbp region in these populations demonstrated that these African LP-associated mutations exist on haplotype backgrounds that are distinct from the European LP-associated mutation and from each other. In addition, haplotype homozygosity extends >2 Mbp on chromosomes with the LP associated C −14010 SNP, consistent with an ongoing selective sweep over the past 3,000–7,000 years. These data indicate a striking example of convergent evolution and local adaptation due to strong selective pressure resulting from shared cultural traits (e.g. cattle domestication and adult milk consumption) in Europeans and Africans. This study also demonstrates the effect of local adaptation on patterns of genetic variation and the importance of resequencing across geographically and ethnically diverse African populations for studies of disease susceptibility. Another recent study has provided evidence that selective pressure on genes in related pathways can occur; this would be the case of two separate gene regions, containing the LARGE and DMD genes, both having a demonstrated role in Lassa fever infection, that have been subjected to positive selection in West African Yoruba (Sabeti et al. 2007). In conclusion, the amount and patterns of genetic diversity in Africa can make studying African population particularly informative about how genes impact human disease and health.

Susceptibility to infectious disease

In his paper entitled “Observations made during the epidemic of measles on the Faroe Islands in the year 1846” the Danish physician P.L. Panum described an outbreak in which the great majority of the Faroe population developed measles (6,000 of 7,800 people) and more than 200 people died. In this epidemic none of the individuals who had previous exposure during the epidemic of 1781 (many of whom were still alive in 1846) developed disease (Poland 1998). As pointed out by Poland, key questions on disease susceptibility and immunization arise from this epidemic: why did some people survive while others died; why was mortality so high; and why was the protective effect from a past epidemic so high? These are important issues that involve both innate susceptibility and resistance as well as acquired immunity conferred by previous exposure. In essence, these observations integrate clinical, epidemiological and evolutionary approaches that allow an understanding of the diverse functions of immunity genes, affecting survival, and therefore subjected to natural selection (reviewed by Quintana-Murci et al. 2007). The concept of differential susceptibility to infectious disease motivates much of the genetic work in Africa as infectious disease stills plays an important role in the health of most African populations. In these studies different approaches have been and are being employed to assess how genetic variation affects infection status in sub-Saharan Africa. This research involves multiple protocols, including twin studies, where concordance rates are compared between monozygotic and dizygotic twins, case–control association studies, and family-based association and linkage studies where the co-segregation of a marker with the disease phenotype is tested in families (Hill 2006). Because the prevalence of DZ (dizygous) twinning in some parts of West Africa is unusually high, it is feasible to collect large twin cohorts to investigate the effect of host genetics on response to both infection and vaccines (Creinin and Keith 1989). For example, a large cohort of 267 twin pairs (where 60 twin pairs were determined to be MZ (monozygous) by microsatellite typing), has provided unequivocal data that cell proliferation responses to a range of malarial antigens were more highly correlated in MZ than in DZ pairs (Jepson et al. 1997). Another study has shown higher concordance for cellular immune responses to mycobacterial and other antigens in MZ compared to DZ twins, suggesting that genetic factors are important regulators of this immune response (Newport et al. 2004a). Of potential interest is the fact that DZ twinning itself may be under genetic control and possibly under selection (Sirugo et al., in preparation).

As with all such studies a major challenge is the precise and consistent definition of the clinical phenotypes. Therefore, over the past decade a very large effort has been made to ascertain samples, using carefully defined, standardized criteria; as a result of this effort thousands of African samples have been archived, providing a unique resource for studying the complex genetics of susceptibility and resistance to infection. These bio-banking initiatives, collected within appropriate ethical frameworks and supported by dedicated databases, provide unique resources for understanding the genetic risk factors for infectious disease (Sgaier et al. 2007; Sirugo et al. 2004).

Malaria

Malaria is a serious health issue in Africa, accounting for one in every five childhood deaths. In 2006 the WHO estimated that almost 74% of the African population lives in areas endemic for malaria, about 19% in epidemic-prone and only 7% in malaria-free areas (http://www.afro.who.int/malaria/publications/annual_reports/africa_malaria_report_2006.pdf). Studies of large populations have begun to elucidate the complex genetics of malaria susceptibility and several genes have already been associated with “malaria” susceptibility (Table 1).

Table 1 Genetic associations with malaria in Africans

As with all genetic traits there is a real need for accurate and precise definition of the phenotype. “Malaria illness” is a complex phenotype that can be clinically either uncomplicated or severe; the latter, in particular, affects various organs and tissues and takes diverse forms with variable mortality. In addition malaria can be easily over-diagnosed or under-diagnosed and, other than parasitemia, no single specific factor exists that is predictive of the malaria etiology of an illness. Attempts to associate specific acute phase proteins with malaria have not provided conclusive evidence; for example, one acute phase protein proposed as an index of malaria endemicity, haptoglobin (ahaptoglobinaemia), is equivocal and the effect of haptogloblin genotypes on severe malaria has been contradictory (reviewed by Koram and Molyneux 2007).

Despite this complexity, certain phenomena are critical to disease development, and genetic variants that disrupt these processes can protect against disease. The invasion of erythrocytes by malaria parasites is central to the disease process, and the Duffy blood group antigen, a chemokine receptor expressed in many cell types and encoded by the FY gene, is important because Plasmodium vivax cannot infect individuals who do not express the Duffy antigen, resulting in full protection of Duffy (−) individuals. The lack of Duffy expression is due to a promoter SNP that alters a binding site for the GATA-1 transcription factor (Tournamille et al. 1995), resulting in the parasite being unable to invade red blood cells. Over 97% of individuals in West and Central Africa are Duffy (−). The date of emergence of Duffy negativity has been broadly dated, from more than 90,000 to about 6,500 years ago, (Webb 2005). There has been considerable debate whether the spread of Duffy (−) (FY*0) was due to selection in response to P.vivax or if it evolved independently and probably earlier. The latter hypothesis is consistent with a Southeast Asian origin of P. vivax, and the independent evolution of the Duffy null phenotype in Africa. Under this scenario the spread of P.vivax across large areas in Africa would have been prevented. In support of the independent evolution hypothesis is the observation that current P. vivax induced malaria is mild, although the historical strength of selection cannot be known and may have been different than indicated by the present severity of disease (Carter and Mendis 2002; Livingstone 1984). A complicating factor is the suggestion that P. vivax infection could be “beneficial” to humans by conferring some cross-immunity to the more severe P. falciparum-related malaria (Williams et al. 1996). In contrast, evidence of strong recent positive selection on FY*0 is provided by the observation that this variant has a high Fst, the highest ever detected in humans (Hamblin and Di Rienzo 2000; Hamblin et al. 2002). However, no compelling model has yet to be developed to explain how such a high Fst could be produced under a model of such seemingly mild selection.

Other notable examples of genetic protection from malaria in sub-Saharan Africa include G6PD deficiency (Ruwende et al. 1995), HbB sickle-cell trait or HbA/HbS heterozygosity that is associated with a tenfold reduction in malaria risk, HbC (beta6Glu → Lys) (Kwiatkowski 2005; Ntoumi et al. 2007) and alpha thalassemia (HbA gene) that confers protection against severe malaria and malarial anemia (Haldane 1949; Williams et al. 2005a; Wambua et al. 2006; Pasvol 2006). In addition, beta thalassemia, which confers some protection against malaria, occurs only in limited parts of West Africa (Willcox et al. 1983). Sickle cell disease is the classic example for a human balanced polymorphism [a concept introduced by Neel (1953)] and has been studied extensively. In contrast to the heterozygote advantage of HbS, HbC associates with a very strong reduction in risk of clinical malaria of 93% in homozygotes versus 29% in heterozygotes in a large case–control study on more than 4,000 Mossi subjects from Burkina Faso (Modiano et al. 2001b). The conundrum is that despite a very modest pathological load and a very strong protective effect of HbC, the distribution of this allele is limited to central West Africa, while HbS, a quasi-lethal mutation that confers a severe clinical phenotype in homozygotes, has a significantly more cosmopolitan distribution across sub-Saharan Africa. This distribution is quite peculiar given that evidence demonstrates that the “harmless” C is older than S (Modiano et al. 2008). However, while the C allele confers mild protection in heterozygotes (a recessive selection model), the S mutation spread much faster by providing strong protection to heterozygotes (i.e. through over-dominance or heterosis for fitness), despite the cost of causing sickle cell anemia in homozygotes. Experimental data suggest that both HbC and HbS might protect against severe malaria by abnormal cell-surface display of P. falciparum erythrocyte membrane protein-1 (PfEMP-1), which would reduce the effects of parasitized erythrocyte sequestration in post-capillary microvessels, resulting in cerebral malaria (Cholera et al. 2008; Fairhurst et al. 2005).

Clearly, single gene variants affect malaria risk, but the currently known genes do not explain all of the risk. Additionally, it has been shown that interactions among genes can impact malaria. In an elegant study by Williams et al. (2005b), it was shown that the combination α (+) thalassemia homozygosity with HbS trait in the same Kenyan subjects causes loss of protection from severe malaria, a negative epistatic effect that could explain why α (+)-thalassemia did not fix anywhere in sub-Saharan Africa (Williams et al. 2005b). HLA has been suggested to have an important role as well; a study in The Gambia has shown that a class I antigen (HLA-Bw53) and a class II haplotype (DRB1*1302-DQB1*0501) independently associate with protection from severe malaria (Hill et al. 1991). In a subsequent study, malaria morbidity was associated with an overall distribution of Class II haplotypes, but no signals were seen from specific DR-DQ alleles (Bennett et al. 1993). In the same Gambian population, individuals homozygous for a specific TNF promoter SNP (−308) were found to have an increased risk for cerebral malaria, independent of their HLA alleles (McGuire et al. 1994). In Gabonese children this TNF SNP was associated with the rate of symptomatic P. falciparum re-infections (Meyer et al. 2002). Other SNPs in TNF (including −376 and −238) have been associated with susceptibility to severe malaria, severe malaria anemia and control of parasite density (Flori et al. 2005; Kwiatkowski 2005) and TNF variation has also been suggested to explain linkage of malaria fever with MHC Class III (Flori et al. 2003b; Jepson et al. 1997). A linkage study from the holoendemic village of Dielmo (Senegal) has provided evidence that the “asymptomatic parasite density” trait maps to chromosome 5q31, along with suggestive evidence for loci at 5p15 and 13q13 for the “number of clinical malaria attacks” phenotype. Additionally a signal for “maximum parasite density, during asymptomatic infection”, was detected at 12q21 in families from the mesoendemic area of Ndiop (about 5 km SouthEast of Dielmo) (Sakuntabhai et al. 2008). Interestingly, the four chromosomal regions detected in this study overlap with asthma or atopy related loci.

The Th1 pathway seems to have an effect on protection from severe malaria, as suggested by reported associations of IFNG and IL12B regulatory SNPs with protection (IFNG up-regulation) and increased susceptibility (IL12B down-regulation) (Cabantous et al. 2005; Marquet et al. 2008). However, these findings, from hemoglobin defects to polymorphisms of HLA and/or SNPs in the TNF promoter (the latter possibly tagging a neighboring Class III causal gene), or IFNG and IL12B do not explain most inter-individual variation in response to P. falciparum infection, and the distribution of various forms and manifestations of malaria. The current estimate that host genetics accounts for approximately 25% of the risk of infection and contracting malaria implies that there is ample room for gene variant discoveries, explaining differences in disease susceptibility and resistance (Mackinnon et al. 2005).

One way to approach the issue of genetic resistance/susceptibility to malaria is to study individuals stratified by previously known genes. A linkage study from Ghana, using only families with HB and G6PD normal genotypes, has recently identified a locus on chromosome 10p15 that affects malaria fever episodes (Timmann et al. 2007). In a population sample from Burkina Faso, a locus controlling the levels of parasitemia/immune responses to P. falciparum was mapped on 5q31-33 (Rihet et al. 1998), and recent data have shown that SNPs in interferon regulatory factor 1 (IRF1) on 5q31 associate with malaria infection control and with severe disease in Burkinabès (Mangano et al. 2008).

A further approach to study possible host genetic effects in malaria is to compare the susceptibility to the infection and disease among sympatric populations in endemic areas with different genetic backgrounds. This approach revealed the existence of important inter-ethnic differences in the susceptibility to P. falciparum malaria between West African ethnic groups (Modiano et al. 1996). It was clearly shown that such differential resistance was not associated with the classic malaria resistance genes (Modiano et al. 2001a), but could rather be explained by variant genes controlling the immune responses to the parasite (Torcia et al. 2008).

Tuberculosis

Every year more than 8 million people develop tuberculosis (TB) disease and 3 million patients die. The total number of people infected with Mycobacterium tuberculosis is much larger (approximately 2 billion), but the vast majority of those infected never develop clinical disease. In 2005 in Africa there were approximately 3.8 million TB cases, more than 2.5 million of these were new cases, accounting for 29% of the worldwide incidence, and almost 550,000 TB patients died (http://www.who.int/mediacentre/factsheets/fs104/en/). The analyses of TB in Africa is complicated by the parallel epidemic of HIV because co-morbidity is common, making it necessary in studies of the genetics of TB to consider HIV infection, especially in high HIV prevalence areas. Twin studies in Africa, comparing MZ to DZ twins, have provided evidence of a significant role for heritable factors in TB susceptibility (Jepson et al. 2001). In the first genome-wide linkage scan for a major infectious disease in Africans, evidence of linkage was found on chromosomes Xq27 and 15q11 (Bellamy et al. 2000). This report also identified association in these regions of linkage, supporting the conclusion that TB susceptibility loci reside at these chromosomal locations (Bellamy et al. 2000). At 15q a promoter variant of ubiquitin-protein ligase E3A (UBE3A) associates (although not very strongly) with susceptibility (Cervino et al. 2002), but at Xq27 no positional candidate has yet been identified. Using the complementary approach of candidate gene analysis, case–control studies of West African samples have identified associations with variants in several genes; for example SLC11A1 (NRAMP1) (Awomoyi et al. 2002; Bellamy et al. 1998c), IL1B (Awomoyi et al. 2005), vitamin D receptor (Bornman et al. 2004; Lombard et al. 2006; Olesen et al. 2007), CD209 (DC-SIGN), PTX3 (Olesen et al. 2007) and P2X7 genes (Li et al. 2002), to mention a few, have all been associated with TB. In East Africa, a combined linkage and association study of Ugandans has shown that IL10, interferon gamma receptor 1 (IFNGR1), and TNF alpha receptor 1 (TNFR1) variants are linked and associated to TB, but not with susceptibility to latent infection (Stein et al. 2007). Another recent analysis of affected sibling pairs from South Africa (of mixed ancestry) and from Malawi, along with a case–control study in West Africans have identified two putative loci for susceptibility, one at 6p21-q23 and one at 20q13.31-33. At the latter locus, variation in the melanocortin 3 receptor (MC3R) and cathepsin Z (CTSZ) genes were implicated in the pathogenesis of tuberculosis (Cooke et al. 2008).

Importantly, these studies provide additional clues to relevant pathways involved in disease susceptibility. Studies of HLA Class II variation detected an association with increased susceptibility, but these findings await replication (Lombard et al. 2006). As with all candidate gene studies, other reports of association have failed to replicate the original findings (Table 2). In summary, candidate gene studies have indicated the existence of susceptibility loci in specific populations, but without providing evidence of strong effects in Africans. Although some evidence for major TB susceptibility loci has been provided (Baghdadi et al. 2006, Cooke et al. 2008), it is apparent that several loci determine or modulate susceptibility to tuberculosis (Table 2).

Table 2 Genetic associations with tuberculosis in Africans

Malaria and tuberculosis genome wide association studies

As malaria and tuberculosis are both leading causes of morbidity and mortality in Africa a considerable effort is underway to understand their complex genetic etiology. Genome wide association studies (GWAS) have been launched recently through a consortia of investigators specifically created for this purpose, including the MalariaGen initiative (funded by the Wellcome Trust and the Gates Foundation; http://www.malariagen.net) and the Tuberculosis Gambia–Oxford/African Tuberculosis Genetics Groups, which are part of the Wellcome Trust Case Control Consortium (WTCCC; http://www.wtccc.org.uk). During 2006–2008 the MalariaGen and WTCCC consortia have generated data for up to 500,000 SNPs typed with the Affymetrix GeneChip 500 K mapping array set in thousands of African samples ascertained for both malaria and tuberculosis. The TB study group has been focusing on samples collected in four African countries: The Gambia, Guinea Conakry, Guinea Bissau, and Malawi. From just one West African country (The Gambia) the consortium was granted access to ~1,500 cases and ~2,500 controls. Analysis of the genome-wide association data is in progress and should lead to the identification of SNPs and genes associating with disease susceptibility, thereby providing a wealth of information. These initiatives will also provide important insights into the technical, analytical, methodological and biological aspects of genome-wide association analysis, although it should be noted that the current GWAS platforms may not provide a level of coverage that is complete enough for African samples to detect all important signals. This is of particular interest given the diversity of patterns of LD that probably exist across African populations and that we do not yet fully understand.

HIV/AIDS

Although only 10% of the world’s population lives in Sub-Saharan Africa, 68% of adults and nearly 90% of children infected with HIV-1 live in this region. Overall 22.5 million Africans are estimated to be infected and there are 12 million AIDS orphans, making Africa the worst affected region in the AIDS pandemic. Prevalences in the adult populations (age 15 and more) are more than 16% in South Africa, 23% in Botswana up to a dramatic 34.5% in Swaziland (http://www.who.int/whosis/database/core/core_select.cfm).

In the absence of antiretroviral treatment, the great majority of subjects progress to AIDS and death, following HIV infection. However, although the asymptomatic period averages 10 years, and ranges from a few to 20 years, there are rare instances of infected individuals who do not progress to AIDS at all. Also, some subjects at high risk are apparently resistant and never become infected, e.g. the well-described female sex worker cohort from Nairobi, Kenya (the so called “Majengo” slum women). Some of these women (~5% of the 3,000 sex workers in Nairobi) are persistently sero-negative despite exposure to the virus, although discontinuous exposure seems to lower protection (Kimani et al. 2008). These cases indicate that genetics and the environment interact in determining the resistant phenotype, and several factors, genetic and immune mediated, have been associated with altered susceptibility to HIV (Martin and Carrington 2005; Lama and Planelles 2007) (Table 3). It has recently been reported that in Kenyan sex workers interferon regulatory factor 1 (IRF1) variation and its low gene expression are associated with some resistance to HIV-1 infection. However, the same IRF1 variation does not seem to be linked with differential disease progression (Ball et al. 2007). Given the potential role of IRF1 in supporting HIV-1 transcription and amplifying replication, these results would suggest that the key to protection from infection lies in gene variants that do not support viral transcription and replication.

Table 3 Genetic associations with susceptibility, viral load and/or progression to HIV/AIDS in Africans

Variation in human genes that modulate HIV pathogenesis by influencing post-entry steps of the viral life cycle is likely to offer new insights into both protection from infection and modulation of disease course. SNPs in genes affecting viral replication such as the cytidine deaminase enzymes APOBEC3F, APOBEC3G, CUL5 and TRIM5 α have been shown to confer protection in African Americans against disease phenotypes, including infection, accelerated CD4 loss, and faster progression to AIDS (Lama and Planelles 2007). Other studies have indicated significant epistasis between HLA-B and Killer Immunoglobulin-like Receptors (KIRs) in eliciting protection from infection or progression to AIDS (Jennes et al. 2006; Lopez-Vazquez et al. 2005) (Table 4). These results await replication in additional cohorts.

Table 4 KIR allele frequencies and disease associations (allelic frequencies from Single et al. 2007)

Substantially different from HIV-1 and more benign (less transmissible, slower progression to AIDS), HIV-2 is limited to West Africa and exceeds 5% only in the adult population of Guinea-Bissau. Even in the case of HIV-2, some people die rapidly (within 3 years of infection), but others seem to be able to live with it for decades without immunological or clinical deterioration (Schim van der Loeff 2007). Although a few small studies exist in the literature that suggest possible associations between gene variants and HIV-2 infection, e.g fucosyltransferase 2 (FUT2, “secretor” blood group, Ali et al. 2000) or accelerated disease progression (with HLA B35, Diouf et al. 2002), research on HIV-2 host-genetics might contribute to the understanding of the role genes play in influencing HIV post-entry restriction and disease progression.

Other infections: leishmaniasis, leprosy, schistosomiasis, trachoma

Several other infectious diseases are relatively common in Africa and present serious public health issues. Among those that have been analyzed from a genetic perspective are leishmaniasis, schistosomiasis, leprosy, and trachoma. Although not as much work has been done on these diseases as on malaria, TB and HIV, recent studies have begun to shed light on the genetic susceptibility to them in African populations.

Leishmaniasis

Visceral leishmaniasis or kala-azar is a common disease caused by protozoa of the genus Leishmania carried by sand flies. It is characterized by high fever, dramatic weight loss, swelling of the spleen and liver, and anemia. If untreated, the disease has a fatality rate of nearly 100% within 2 years. In Sudan an epidemic of visceral leishmaniasis caused by Leishmania donovani occurred from 1984 to 1994. The Sudanese populations were highly susceptible, as this was the first such epidemic in the area. It was estimated that the disease caused 100,000 deaths in a population of around 300,000 in the Western Upper Nile area of Sudan; in some villages, more than 50% of the population succumbed to the disease (http://www.who.int/leishmaniasis/en/). The incidence varied among game wardens, working in a reserve in eastern Sudan, who were of different ethnic backgrounds (Ibrahim et al. 1999). Another outbreak occurred in 1996–1997 in a village at the border with Ethiopia, and while most villagers were infected only 30% developed kala-azar (El-Safi et al. 2002).

Candidate gene studies of this population detected linkage of kala-azar with polymorphisms at the SLC11A1 (formerly NRAMP1) locus, on 2q35 (Bucheton et al. 2003a, 2003b). A separate investigation of Sudanese multiplex families, living in the same geographical region, confirmed the linkage of SLC11A1 with kala-azar, mainly detected with a SNP in the fourth intron of the gene. However, a mutation screening of the coding and the SLC11A1 3′ UTR regions in selected patients failed to identify any functionally relevant sequence change. It was therefore hypothesized that the intron 4 SNP could be in disequilibrium with causative variation in the upstream promoter region (Mohamed et al. 2004). An investigation of the same families showed linkage and association of kala-azar with IL4 (Mohamed et al. 2003), while genomic variation of IFNGR1 was associated with post-kala-azar dermal leishmanaisis (Salih et al. 2007). Remarkably, sequence variation in this gene (particularly the promoter region) has been found to modulate susceptibility to other parasitic diseases, including cerebral malaria (Koch et al. 2002) and schistosomal hepatic fibrosis (Blanton et al. 2005).

Linkage studies have reported peaks for kala-azar on 2q22-q23 and 22q12 (Bucheton et al. 2003b; El-Safi et al. 2006) and more recently on 1p22 and 6q27 (Miller et al. 2007), although this latter study did not replicate the earlier linkage peaks on 2q and 22q. However, inconsistency of linkage reports can be explained in part by the fact that both design (and power) and the ethnic groups differed across the two studies. To date, SLC11A1 remains the most impressive of the susceptibility loci. This is reinforced by the known role of SLC11A1 in mouse models infected with L. donovani (Foote and Handman 2005).

Leprosy

In 2005 Mycobacterium leprae caused about 295,000 new cases of leprosy worldwide (http://www.who.int/lep/situation/NCDetection2006.pdf). Of these over 40,000 were in Africa, and more than 10,000 were in the Democratic Republic of Congo alone. However, only ~5% of people exposed to M. leprae develop disease. Leprosy affects the skin, the peripheral nerves, the mucosa of the upper respiratory tract, the eyes, and several other organs. Clinically, leprosy can be differentiated into two forms: (1) a tuberculoid, paucibacillary form, characterized by a low bacterial count, strong cell-mediated immunity, and localized disease and (2) a lepromatous, multibacillary form, characterized by high bacterial count, poor cell-mediated immunity and strong humoral immunity with progressive, disseminated disease. The different forms of the disease do not appear to be complicated by variation in the M. leprae genome since it is surprisingly invariant (Monot et al. 2005). However, family studies, twin studies, and segregation analyses have provided evidence that, in addition to environmental and exposure components, host genetics plays an important role in the disease. Loci/genes affecting differential susceptibility can be subdivided into those influencing infection after exposure, the disease per se, and those related to the paucibacillary or multibacillary type of disease. The risk of developing the lepromatous, multibacillary form can be measured by the extent of skin reactivity to lepromin (Mitsuda reaction; see Ranque et al. 2007 for implicated loci). To date, several genetic studies have identified genes putatively important in leprosy susceptibility, including the PARK2/ PACRG and LTA genes (6q25) in influencing susceptibility to leprosy per se in Indian, Vietnamese and Brazilian subjects, as well as a locus on 10p13 linked to the tuberculoid, paucibacillary form in population samples from India and Vietnam (Alter et al. 2008; Ranque et al. 2008 for a review). Few studies have been carried out on African populations, with exceptions in Nigerians, describing the role of HLA associations (Class II DRB1 leprogenic motifs modulating the clinical outcome of infection, Uko et al. 1999), and in Malians for non-HLA genes (SLC11A1 3' allele associated with lepromatous type, Meisner et al. 2001). More recently, linkage and large-scale candidate gene studies with samples from the Karonga district of northern Malawi have been performed (Wallace et al. 2004; Fitness et al. 2004). These studies have found suggestive evidence for a susceptibility locus on 21q22 influencing leprosy type, as well as associations with the VDR (increased risk of leprosy per se) and with Complement Receptor 1 (CR1) (protection from disease) gene variants, that however require replication in additional African populations.

Schistosomiasis

Schistosomiasis (bilharziosis), is a chronic disease caused by parasites of the genus Schistosoma (trematode flatworms) and, if we exclude the broad category of soil transmitted helminths, the second most frequent parasitic disease in Africa after malaria. Larval forms of the parasites are released by freshwater snails, the parasite’s natural reservoir. As the parasite can penetrate the human skin in the water, the main route of infection is contact with infested water. The larvae migrate into the peripheral vasculature, traverse the lung and settle in the portal or pelvic venous system where they develop into adult parasites. In sub-Saharan Africa S. mansoni causes intestinal schistosomiasis, which cause hepatic granulomas and fibrosis, portal hypertension, splenomegaly, bleeding from esophageal varices, and eventually terminal hepatic failure. Another species, S. haematobium, causes the urinary form of the disease, associated with progressive granulomatosis of the bladder, resulting in obstructive uropathy (http://www.who.int/schistosomiasis/en/). There is a significant association between the urinary infection and squamous cell carcinoma of the bladder, and possibly of the liver infection with hepatocarcinoma, an enlightening example of pathogenetic link between infections and cancer in developing countries (Mostafa et al. 1999). In 2000 it was estimated that approximately 200 million people were affected in the developing world and nearly 85% (170 million) of these were in sub-Saharan Africa (Chitsulo et al. 2000). Schistosomiasis is therefore a very important public health problem in Africa, causing approximately 280,000 deaths per year (150,000 from kidney failure and 130,000 from hematemesis) (van der Werf et al. 2003). During the prepatent period of infection, the first 4–5 weeks following exposure to cercariae (the parasitic larvae), the immune response is primarily of the Th1 type but it becomes progressively polarised towards Th2 about 8 weeks after infection (Pearce et al. 2004); parasite egg antigens seem to inhibit IL12 production and induce IL4 production, promoting a general amplification of the Th2 response.

Host genetic studies have shown that a few gene variants/loci are important in both controlling the infection and in modulating the susceptibility to hepatic and urinary diseases (reviewed in Campino et al. 2006). In 1997 a study of the intestinal form in Senegalese, detected a locus conferring susceptibility to S. mansoni at 5q31-33 (Müller-Myhsok et al. 1997) that had been previously mapped in Brazilian families (Marquet et al. 1996). Immune response genes of the cytokine cluster in the 5q31 region (including Th2 cytokines IL4, IL5, IL13) were assayed in two Dogon population samples (Mali) from a region endemic for S. haematobium, using family based associaton analyses. No association was found with IL4 and IL5 SNPs, but two IL13 5′ variants, IL13-1055C and IL13-591A, were preferentially transmitted to children with the highest infection levels. In contrast, subjects with the IL13-1055T/T genotype appeared to be relatively protected from infection (Kouriba et al. 2005). This “protective” genotype had previously been associated with increased expression of IL13, as well as with elevated IgE levels. Another locus was identified by linkage analysis at 6q23 in Sudanese families from an endemic, irrigated area in the Gezira region (Dessein et al. 1999). This locus is near the gene encoding the α-chain of the interferon gamma receptor 1 (IFNGR1) that seems to control severe hepatic peri-portal fibrosis in S. mansoni infection, a condition affecting 2–10% of subjects infected in the Sudan. A subsequent study in North African families from Egypt confirmed linkage of severe hepatic disease with IFNGR1 and possibly a region on 5q31, encompassing IL4 and IL13, as well as the TGFB1 locus on 19q (Blanton et al. 2005). In a Sudanese population sample two SNPs in the third intron of IFNG were found to produce opposite effects with respect to fibrotic phenotypes: +2109 A/G SNP was associated with a higher risk for fibrosis while +3810 G/A was associated with less severe disease (Chevillard et al. 2003). Associations of aggravation and protection from hepatic fibrosis have also been reported with TNF (Henri et al. 2002).

Other reported associations include HLA Class I alleles with hepatosplenomegaly in Egypt (Abdel-Salam et al. 1986). The urinary form of the disease (by S. haematobium) has been associated with SNPs in the STAT6 gene (on chromosome 12q13) in a Dogon (Mali) population sample. The STAT6 gene is key in Th2 cell differentiation (Shimoda et al. 1996), providing further evidence for an important role of the Th2 cytokine pathway in modulating resistance to schistosomiasis (He et al. 2008). However, recent studies addressing the complex interaction between the immune system and the parasite, indicate contrasting, age-dependent cytokine responses that would suggest that simple Th1/Th2 (or pro-inflammatory/anti-inflammatory) dichotomy is not sufficient to explain susceptibility or resistance to S. haemotobium (Mutapi et al. 2007).

Trachoma

Trachoma is caused by Chlamydia trachomatis, a bacterium that infects the epithelial cells of the conjunctiva. It is transmitted through contact with eye discharge from an infected person or by eye-seeking fly vectors. Repeated infection can result in scarring, distortion and in-turning of the eyelids, with the eyelashes rubbing on the globe (trichiasis), ultimately leading to corneal opacity and irreversible blindness. It has been estimated that more than 2 million people are blind because of trachoma in sub-Saharan Africa (Lewallen and Courtright 2001). The blinding complications of trachoma are thought to be immuno-pathological. Both innate and adaptive immune responses are involved, with cell-mediated immunity playing a dual role in both the resolution of the infection as well as in scarring. In the early stages of infection, pro-inflammatory cytokines (IL1, TNF) are released by the epithelium, and these cytokines attract an initial wave of macrophage and neutrophil infiltration to the site of infection. These cells are soon replaced by lymphocytes that become organized into lymphoid follicles. Repeated or severe inflammatory episodes along with persistent formation and resolution of lymphoid follicles results in tissue remodeling, scar formation and eventual blindness from the mechanical abrasion of the cornea by the eye lashes and rim of the upper eye-lid. The adaptive cellular responses that follow the initial innate response appear greater in individuals who rapidly resolve infection compared to those with persistent clinical disease, implicating an innate or genetic component. Furthermore patients with conjunctival scarring have lower peripheral blood lymphocyte proliferation responses with respect to controls (Burton et al. 2007). IFNG and FOXP3 (and possibly IL10) appear to play an important role in the resolution of the infection (Faal et al. 2006) and genetic polymorphisms in class I HLA, IFNG, TNF, IL10 and MMP9 have been associated with variation in scarring in Gambians (Natividad et al. 2007, 2005, 2006, 2008).

Non-communicable diseases

Although infectious diseases are the most important public health concern in Africa at present, the health landscape is rapidly changing with economic development and urbanization. For example, in South Africa recent studies have suggested that more than 75% of Black Africans have at least one major risk factor for heart disease (Tibazarwa et al. 2008). This scenario is beginning to be the rule for other common, complex diseases of the West, including obesity, diabetes and hypertension that are sensitive to the transition from rural to urban lifestyles (Abubakari et al. 2008; Cooper et al. 1997; Opie and Seedat 2005). In addition, cancers are not uncommon in sub-Saharan Africa (Parkin et al. 2003). Many of these phenotypes have been studied for genetic risk factors over the last few years and a small but rapidly increasing body of literature exists.

Genetics of diabetes and obesity in Africa

Type 2 diabetes (T2D) is currently the most common metabolic disorder in the world. However, there is extremely limited quality data, using standardized criteria for most countries in sub-saharan Africa. The available data indicate great variation in prevalence from 0% in Togo to 4.8–8.0% in South Africa and 10% in Northern Sudan (Motala 2002). To date, there has been only one major systematic effort to study the genetics of diabetes in Africa: this is the Africa America diabetes mellitus (AADM) study, a multi-institutional, multi-country collaboration designed primarily to map T2D susceptibility genes in the ancestral populations of African Americans (Rotimi et al. 2001). One obvious rationale for studying T2D in West Africa, where diabetes is less common than in the US, is that in an environment where caloric intake is lower, cases of T2D might carry a proportionately greater genetic component. As described below the AADM study has been extremely active in linkage analyses of not only diabetes, but multiple related traits.

Genome-wide linkage analysis was performed, using a sample of 343 affected sibships (691 individuals). Although multipoint non parametric linkage analysis showed suggestive linkage on chromosomes 12 and 19 (Rotimi et al. 2004), the strongest evidence of linkage was observed on chromosome 20. Putative linkage to chromosome 20 has been reported by at least ten other studies in multiple ethnic groups (Ghosh et al. 1999; Ji et al. 1997; Mori et al. 2002; Zouali et al. 1997). The linkage peak at 20q in AADM was within 1 cM of the peak reported in Caucasian families (Klupa et al. 2000). The AADM study is noteworthy because it was the first genome scan study to search for susceptibility genes for T2D in sub-Saharan Africa. Secondly, it showed that the same genomic regions are implicated in T2D in both Ghana and Nigeria where environmental risk is low.

Genome wide linkage analysis for T2D related traits

Obesity related traits

Given the central role of obesity as a risk factor for T2D, genome wide linkage analysis was done in AADM (Chen et al. 2005a) to identify linkage signals to three obesity-related traits: body mass index (BMI), fat mass (FM) and percent body fat (PBF). In West Africa, obesity is still relatively uncommon, with a prevalence of approximately 5%, reflecting the high physical activity levels and low caloric intake. A survey in The Gambia showed significant differences between rural and urban areas, with the prevalence of obesity (body mass index > or = 30 kg/m2) at 4.0% in the rural areas but about 33% in urban women 35 years or older (van der Sande et al. 2001). PBF showed the strongest evidence of linkage with a signal on chromosome 2 (location 72.6 cM). Additional signals were found on chromosomes 4 and 5. FM showed suggestive evidence for linkage to chromosome 2 within 10 cM of the signal for PBF. The strongest evidence for linkage to BMI was observed on chromosomes 1 and 4, although in both cases the highest LOD score was below 2. The areas of linkage for the three phenotypes showed significant clustering as all three phenotypes (BMI, FM and PBF) had linkage peaks in the same regions in 2p13, 4q23 and 5q14; however, not all of the peaks reached the thresholds for significant or even suggestive linkage. This study also provided substantial evidence for linkage to QTLs previously reported to be linked to serum leptin and plasma adiponectin levels on chromosome 2 (Comuzzie et al. 1997).

Serum lipids

The AADM study also conducted a genome wide linkage analysis to five serum lipid fractions: total cholesterol, triglycerides, HDL-cholesterol, LDL-cholesterol and VLDL-cholesterol (Adeyemo et al. 2005). Significant linkage of HDL-C to a QTL on chromosome 7 at 7q31 was observed. Other QTL met the criteria for suggestive linkage with three of them (chromosome 7 for TG, chromosome 5 for LDL-C and another locus on chromosome 7 for HDL-C) reaching LOD scores of at least 3.0. Significant or suggestive linkage was found for two of the five traits at the same locus for a QTL on chromosomes 5 and 7.

Several of the linkage signals for these lipid traits overlap linkage regions found in other studies. For example, 7q31 has also been found for lipid levels in Mexican Americans and Pima Indians (Arya et al. 2002). Thus, AADM has found linkage signals very close to those reported for multiple lipid phenotypes in several other major studies. However, two of the linkage regions in AADM are novel: 5q33 for LDL-C and 7p21 for HDL-C (Adeyemo et al. 2005).

Other diabetes related phenotypes

Other linkage analyses, using AADM, have been performed on phenotypes related to diabetes, including intraocular pressure, renal functions and C-peptide concentrations (Chen et al. 2007a, 2007b; Rotimi et al. 2006). For intraocular pressure in diabetics, multipoint linkage analyses showed significant linkage on 5q22 and suggestive evidence of linkage to chromosome 14q22 (Rotimi et al. 2006). The strong signal on chromosome 5 lies in the region implicated in glaucoma susceptibility in previous studies (Monemi et al. 2005). For renal function, linkage to creatinine clearance was observed on chromosomes 7, 16, and 17. Maximum LOD scores for serum creatinine were observed on chromosomes 3 and 10, and for glomerular filtration rate (GFR) on chromosomes 6 and 8 (Chen et al. 2007b). Several of these results are replications of significant findings from other genome scans. In AADM a linkage analysis for C-peptide identified potentially important QTLs on chromosomes 4, 15, and 18 (Chen et al. 2007b). Two positional candidate genes for diabetes (the pituitary adenylate cyclase activating polypeptide (PACAP) on 18p11 and the peroxisome proliferator-activated receptor gamma coactivator 1 (PPARGC1) on 4p15), are located in the genomic regions showing suggestive linkage evidence.

Candidate gene studies in the AADM study

Based on previous findings of association between T2D and three calpain 10 (CAPN10) gene polymorphisms (SNP-43, SNP-56 and SNP-63), these SNPs were investigated in the AADM study (Chen et al. 2005b; Horikawa et al. 2000). Calpain 10 is a nonlysosomal, neutral cysteine protease expressed in skeletal muscle, liver and pancreatic islets reported to be associated with T2D (Horikawa et al. 2000). No association was found between any individual alleles or genotypes of the three and T2D. However, in the Nigerian ethnic groups, one haplotype was significantly associated with type 2 diabetes (OR 3.765 and 95% CI 1.577–8.989). Also, no association was found between the CAPN10 gene polymorphisms and several diabetic-related quantitative traits, including glucose, insulin or other diabetes related quantitative traits such as waist–hip ratio, body mass index (BMI), fast insulin level, fasting C-peptide level, leptin level, glucose level, systolic blood pressure, and diastolic blood pressure. This preliminary observation suggests that the three CAPN10 SNPs tested may play a limited role, if any, in the risk of T2D in the AADM study.

The AADM study also evaluated the association between the functional agouti-related protein (AGRP) promoter SNP −38C/T and weight-related traits, namely BMI, FM and fat-free mass (FFM), as well as diabetes status (Bonilla et al. 2006). Women homozygous for the variant T allele had significantly lower BMI. Also, men with at least one copy of the variant T allele were over two times less likely to be diabetic than subjects without the protective allele. These results replicate previous findings and implicate the AGRP SNP −38C/T in the regulation of body weight in West Africans.

Finally, potential association between polymorphisms of the eNOS gene and diabetes-related phenotypes was investigated in the AADM study (Chen et al. 2007c). The insertion/deletion (4a/b) and the G894T polymorphisms of the eNOS gene were genotyped in cases and controls and the b/b genotype was associated with a 2.4-fold increased risk of diabetic retinopathy. In contrast, no association was observed between the genotypes or alleles of the G894T polymorphism and diabetic retinopathy, hypertension, or nephropathy.

A major contribution of the AADM study to knowledge about T2D is in the area of replication of associations found in other populations and refinement of such associations. A clear instance of this was in the association between risk of type 2 diabetes and variants in the transcription factor 7-like 2 gene (TCF7L2) first reported in populations of European ancestry. The AADM study sample aided in refining the definition of the TCF7L2 type 2 diabetes risk variant, HapB (T2D), to the ancestral T allele of a SNP, rs7903146 (Helgason et al. 2007). This study is a powerful demonstration that populations with shorter LD blocks, such as those of West Africa, provide the means to refine association signals detected in more recent populations (Tishkoff and Williams 2002). It is noteworthy that, to date, the TCF7L2 association has provided the strongest evidence of association of any gene with T2D risk from multiple GWAS and replication studies in multiple population groups. The AADM study also provided replication evidence of a genetic variant in the TCF2 gene that confers protection against type 2 diabetes (Gudmundsson et al. 2007).

In summary, the genetic epidemiology of T2D in Africa is still in its infancy. There have been few genome wide linkage studies and only a handful of association studies. To date, there have been no GWAS conducted in an African population, despite the great utility of using such an approach as has been demonstrated with multiple complex diseases over the last 2 years.

Hypertension

Over the past decade numerous studies have been undertaken in an attempt to identify genetic risk factors for hypertension and blood pressure regulation in Africans. These studies are justified by the observation that blood pressure regulation and the control of several plasma proteins thought to affect blood pressure, such as angiotensinogen (AGT) and angiotensin converting enzyme (ACE), are highly heritable (Adeyemo et al. 2002; Cooper et al. 2000; Rotimi et al. 1999). Of note, the heritability of ACE and AGT was considerably higher in Nigerians (~70–80%) than in African Americans (~20%) most likely reflecting the differential role of environment in these two geographic populations. It is also important to note that the vast majority of work on the genetic basis of hypertension in sub-Saharan Africa is in West Africa.

The genetic studies have included a few linkage studies (Cooper et al. 2002) and many candidate gene studies. As with all linkage studies of hypertension and related phenotypes the results have been uncertain. However, a few regions of the genome do provide evidence for linkage to blood pressure; notably 2p, 3p, 5q, 7p, 7q and 10q provided evidence of linkage to diastolic blood pressure, and 19p and 19q to systolic blood pressure in Nigerians (Cooper et al. 2002). Studies of candidate genes include the renin–angiotensinogen genes (Bouzekri et al. 2004; Fejerman et al. 2006; Nkeh et al. 2003; Robinson and Williams 2004; Tiago et al. 2003; Williams et al. 2004, 2000), barttin (BSND) (Sile et al. 2007), the beta subunit of the epithelial sodium channel (Dong et al. 2001; Nkeh et al. 2003; Rayner et al. 2003), alpha adducin (Barlassina et al. 2000) and G-protein coupled receptor kinase (GRK4) (Williams et al. 2004, 2000). Although several of these studies report positive associations with either hypertension or blood pressure, the data are still not conclusive. One approach that has tried to address the failure to identify replicable results has been to test multilocus genotypes that predispose to hypertension (Williams et al. 2004, 2000). This approach has identified a two locus model with ACE and GRK4 in a Ghanaian population; however, substantial retesting will be required to assess validity of both the results and the approach (Williams et al. 2004).

Cardiovascular disease

As with hypertension there are compelling epidemiological data indicating an increase in prevalence of CVD in African populations as individuals acquire CVD risk factors (Unwin et al. 2001). Despite the prevalence of CVD risk factors in some African populations (Alberts et al. 2005; Steyn et al. 2005), little research has directly addressed the role of genetic variation on the susceptibility to disease. However, recent work has shown that in South Africa, for example, family history confers an odds ratio ~17 (Loock et al. 2006). In addition, studies of African Americans have shown that variation in the leukotriene A4 hydrolase gene increases risk of myocardial infarction more than threefold while the relative risk is only 1.16 in Europeans (Helgadottir et al. 2006). Such studies reinforce the need for significantly more research on this topic in Africa.

A recent set of studies in a Ghanaian population has begun to assess the genetic control of CVD risk factors, plasma levels of serpin peptidase inhibitor (plasminogen activator inhibitor type 1, PAI1) and tissue plasminogen activator (PLAT), that affect the risk of thrombosis because thrombosis is a precursor to CVD (Williams et al. 2007). This study recruited more than 2,000 participants to assess the role that genetic variation plays in regulating plasma levels of these proteins. Preliminary findings indicate that not only does genetic variation in the PAI1 and PLAT genes affect plasma levels of both proteins, but that variants in at least one other gene, renin, does as well. Of note, the effects of the genetic variants differ significantly between males and females, suggesting a complex pattern of genetic regulation via gene–environment interaction (Schoenhard et al., submitted).

Cancer

Cancer is not rare in Africa and based on lifestyle changes its prevalence is expected to increase. However, due to severe deficiencies in health care systems and disease registration, epidemiological data in sub-Saharan Africa are limited. Underdiagnosis and underreporting differentially affect cancer types, gender and age classes, so that it is difficult to assess disease patterns. In general, however, cancer in Africa is characterized by younger age and advanced stage at diagnosis and correspondingly poor prognosis. Demographic and socio-economic factors, including access to medical care, contribute to these features, obscuring intrinsic biological factors (Parkin et al. 2003). Overall, there are few studies on the genetic basis of cancer in Africa (Table 5). Below, we briefly highlight features relevant to the genetics of the most common or characteristic cancers in African populations.

Table 5 Genetic associations with cancer in Africans

Prostate cancer

It has been well documented that men of African ancestry have higher rates of prostate cancer incidence and mortality compared to men of other ancestries, particularly in the younger age groups (Brawley 1998; Bunker et al. 2002; Delongchamps et al. 2007). This is supported by data from sub-Saharan Africa, where prostate cancer is estimated to be the third most common cancer of males, with rapidly increasing incidence (Magoha 2007; Parkin et al. 2003). Multiple candidates have been suggested, including genes that affect susceptibility to oxidative DNA damage, growth-related pathways, androgen receptor signaling, chronic inflammatory responses, and RNA processing (Rennert et al. 2005; Shand and Gelmann 2006; Sarma et al. 2008; Zabaleta et al., 2008). Several of these candidate genes have been investigated for differential distribution of allelic variants that may affect risk, and cohorts from Africa and those of African descent have the highest frequencies of the putative risk alleles (Esteban et al. 2006; Kittles et al. 2001; Zeigler-Johnson et al. 2002). Of interest is that one study found an association with a CYP3A4 promoter variant in both African Americans and European Americans, but not in Nigerians, indicating the complexity of analyzing stratified data (Kittles et al. 2002; Hainaut and Boyle 2008; Lessells and Cooke 2008) (Table 5). Gene expression profiles of tumors obtained by microarray technology from African–American and European–American patients point to prominent differences in primary prostate cancer immunobiology between African–American and European–American men (Wallace et al. 2008).

Recent genome-wide and linkage scans that have investigated associations between polymorphisms and prostate cancer in multiple ethnic groups provide support for at least five risk-associated chromosomal regions, three of which are at 8q24 (Freedman et al. 2006; Duggan et al. 2007; Haiman et al. 2007; Robbins et al. 2007; Schumacher et al. 2007; Yeager et al. 2007; Zheng et al. 2008). The available evidence compellingly indicates that the reasons for the disparity in prostate cancer risk between African Americans and other ethnic groups involve the differential distribution of 8q24 markers on African chromosomes, although the responsible genes in the chromosomal region remain to be identified. These results have not yet been extended to studies of African populations.

Colorectal cancer

Preliminary data indicate that colorectal cancer exhibits a multimodal distribution, reflecting heterogeneity, with different contributions from genetic and environmental factors. However, the effects of urbanization seem to be increasing the incidence of disease in previously largely rural African populations. Evidence that heritable factors are stronger in any one population is missing; however, it is known that when cancer occurs at a younger age in African populations, it is likely to be more aggressive.

Up to 20% of colorectal cancers in individuals under the age of 50 years appear to be hereditary (a combination of Hereditary Nonpolypotic Colorectal Cancers, i.e. HNPCC, and Familial Adenomatous Polyposis, or FAP). A similar proportion of HNPCC or Lynch syndrome was reported in a small sampling of colorectal cancers in Nigeria (Adebamowo et al. 2000).

A wide range of predisposing mutations has been identified in individuals of various ethnic groups. In one study, a single founder mutation (g.1528C > T in the hMLH1 gene) has been shown to underlie a major burden of disease in the Nama group of the far Northern Cape Province of South Africa (Anderson et al. 2007). In another study null mutations in GSTM1 and GSTT1 were studied in a cohort where all subjects carried an hMLH1 mutation. It was shown that individuals with both null mutations had a threefold increased risk of cancer at an earlier age (Felix et al. 2006) (Table 5). These studies indicate the potential of not only identifying important variants, but in studying their effects on each other.

Breast cancer

Breast cancer (BC) is the most common cancer of women worldwide, and is a major malignancy in African women. The estimated age-standardized rates (ASR) for breast cancer incidence in sub-Saharan Africa ranges from 15 to 53 per 100,000 women (Ferlay et al. 2004). Even though diagnosed breast cancer is less prevalent in Africa than Europe, due to late diagnosis and poor survival, mortality rates estimated for Africa are not lower than those registered in Europe.

Studies that compared extensive series of African–American and European–American breast cancer patients found associations between aggressive estrogen receptor (ER)-negative BC and both young age at diagnosis and black ethnicity (Carey et al. 2006; Porter 2008). These data raise the possibility that genetic factors could contribute to a higher burden of aggressive ER-negative breast cancer in indigenous African populations. However, recent data from Sudan and Nigeria do not support this hypothesis, but suggest that the differences between African and European women reflect stage at diagnosis rather than intrinsic biological characteristics (Adebamowo et al. 2008; Awadelkarim et al. 2008).

As in industrialized countries, strong genetic factors contribute to a subset of breast cancer cases in Africa. Pilot studies in Nigeria, Sudan and Tunisia show that mutations in the two major susceptibility genes, BRCA1 and BRCA2, account for variable but significant fractions of the premenopausal cases (Awadelkarim et al. 2007; Fackenthal et al. 2005; Gao et al. 2000; Troudi et al. 2007) (Table 5). From Nigeria the data suggest that both truncating and non-truncating mutations of BRCA1 or BRCA2 occur more frequently in young women with breast cancer (Fackenthal et al. 2005; Gao et al. 2000). The observations of higher levels of genetic variation in the BRCA genes are supported by other studies (Wagner et al. 1999). Data from Sudan also suggest that BRCA1/2 mutations could represent an important etiological factor in male patients and in young female patients less exposed to pregnancy and lactation (Awadelkarim et al. 2007).

Other cancers

Hepatocellular carcinoma is the second most common cancer of men in sub-Saharan Africa (Parkin et al. 2003). As with other major cancers of Africa, it is associated with environmental factors such as early infection with hepatitis viruses types B and C that interact with dietary exposure to aflatoxins from Aspergillus molds and specific genetic variants in Africa (Kirk et al. 2000, 2005a, 2006; Montesano et al. 1997; Hainaut and Boyle 2008; Lessells and Cooke 2008) (Table 5).

Cancer of the bladder occurs with particularly high frequency in North Africa, where the main histotype is transitional cell carcinoma (as in industrialized countries). With regard to transitional cell carcinoma, studies conducted in Tunisia and Egypt support the view that individual susceptibility is modulated by genetic variation in pathways that control metabolic detoxification, redox cycling, free radical injury, and metabolism of folate and methionine, critical for DNA synthesis/repair and methylation (Ouerhani et al. 2007, 2006; Saad et al. 2005). In Egypt GSTT1, GSTM1 and GSTP1 genotypes all associate with bladder cancer, but the sample sizes in this study were small (Saad et al. 2005). Similar results were found in Tunisia for GSTM1, but not GSTT1 (Ouerhani et al. 2006). Additional work in Tunisia indicates that methylenetetrahydrofolate reductase and methionine synthase genes associate with bladder cancer (Ouerhani et al. 2007). Unfortunately, almost nothing is known about the influence of genetic factors on bladder cancer in sub-Saharan Africa.

Nasopharyngeal carcinoma is an undifferentiated neoplasm with marked lymphocytic infiltration that arises in the squamous epithelium overlying the nasopharyngeal lymphoid tissue. There is evidence suggesting that certain genotypes in HLA class I, TP53, and antigen processing genes influence susceptibility in North Africa (Hadhri-Guiga et al. 2007; Hassen et al. 2007; Li et al. 2007b).

Podoconiosis: a paradigm of genetics and environment interactions

Podoconiosis (non-infectious geochemical elephantiasis) is a chronic tropical disease that phenotypically resembles filariasis (Davey et al. 2007b). Although not widely recognised, prevalence rates of over 5% have been reported in endemic areas, where it is more common than HIV/AIDS or tuberculosis (Destas et al. 2003). Exposure to red alkalic clay soil, in individuals who cannot afford footwear, leads to the absorption of silicate particles. These induce an inflammatory response in some, but not all, individuals, even though silicate particles have been identified in inguinal lymph nodes in unaffected individuals. Without intervention, chronic inflammation leads to lymphatic obstruction and the clinical phenotype of progressive asymmetrical bilateral swelling of the lower leg (Price 1990). Podoconiosis can be differentiated from its phenocopy, infectious filariasis (caused by various nematode species such as Wucheria bancrofti) on clinical grounds since disease is often symmetrical and extends above the knee in filariasis. Furthermore, podoconiosis occurs in high altitude settings that preclude transmission of filariasis by its mosquito vector.

Since only a proportion of exposed individuals develop disease and the disease clusters in families, the hypothesis that genetic factors determine whether an individual is susceptible to disease was tested in the Wolaitta region of Ethiopia. Multiplex family analysis estimated the heritability of podoconiosis to be 0.62 with a single major dominant gene as the most parsimonious model (Davey et al. 2007a). Genetic studies towards gene identification are planned, and compared to other chronic diseases, the genetic basis of podoconiosis appears to be relatively simple.

Genetics and disease prevention/treatment

Vaccine-induced immunity

Given the fact that infectious diseases play such an important role in the health of African populations, it is important to understand how genetic variation affects the efficacy of vaccines. The Extended Program in Immunisation (EPI) introduced by WHO and organizations such as the Global Alliance for Vaccines and Immunisation (GAVI) work towards the prevention of predominantly childhood diseases through vaccination. Vaccines currently delivered on a routine basis across the African continent are: BCG for TB and leprosy, individual or combination vaccination against diphtheria, tetanus and pertussis (DTP), oral poliomyelitis (OVP), and vaccinations against measles, and yellow fever. Additionally, Haemophilus influenzae (Hib) and hepatitis B virus infection (HBV) vaccines are recommended by the WHO but do not form part of the routine program in most countries with exceptions such as The Gambia. More recent immunizations are pneumococcal and meningococcal vaccines (in some instances targeted at high-risk groups) and others are still in a more or less promising experimental phase, such as malaria and HIV vaccines. Many factors that are known to influence immune responses to vaccines will not be discussed here including, the vaccine, adjuvants, age, gender, UV light exposure, smoking, infectious diseases, nutritional factors, etc. (reviewed by van Loveren et al. 2001).

Immune responses induced by vaccination are in part under genetic control, and the degree of heritability varies by vaccine between 35 and 90%, as shown by family and twin studies both within African settings (Lee et al. 2006; Marchant et al. 2006; Newport et al. 2004b, 2005) and across the rest of the world (Alper 1995; De et al. 2001; Hohler et al. 2002; Konradsen et al. 1993, 1994; Kruger et al. 2005; Lin et al. 1989; Musher et al. 2000, Musher et al. 1997 and reviewed by Kimman et al. 2007). Additionally, we know that differences in vaccine efficacy exist between different ethnic groups, also indicating a putative role for genetic factors (Kimman et al. 2007). Vaccine efficacy in a given population can be affected by the frequency of protective alleles, emphasizing the importance of ethnic comparisons for a thorough understanding of the role of genetics in determining or modulating immune responses. However, the heterogeneity in vaccine-induced immunity (also termed vaccinomics) is not well understood and little data are available from genetic studies worldwide, let alone Africa (Kimman et al. 2007; Ovsyannikova et al. 2004a; Poland et al. 2007; Poland and Jacobson 1998). Host genetic variation may affect multiple processes such as antigen presentation and recognition, the magnitude or kinetics of vaccine-induced antibody response, lymphocyte proliferation, and long-term immune memory.

The most exhaustively studied region of the human genome with respect to the correlation of immuno-phenotypic and genotypic data is the HLA region. As noted above, HLA variation has also been studied extensively in terms of susceptibility to the disease themselves (Tables 6, 7, Supplemental Tables). Relatively consistent findings have been reported for HLA associations with immunity induced by HBV vaccination (Kimman et al. 2007; Milich and Leroux-Roels 2003; Thursz 2001); the HLA data on responses to other vaccinations, except measles (Ovsyannikova et al. 2004c, 2006a), are sparse. However, even less is published in relation to variation in non-HLA genes. Table 7 lists existing publications on both HLA and other candidate loci, concentrating on English language reports and vaccines administered routinely across Africa. Most of these studies are hampered by small sample sizes, a limited number of markers/genes screened, poor information on covariates and environmental factors, differences in study design and analysis; therefore we only have a snapshot of what genetic factors are implicated in the control of vaccine induced immunity. Furthermore, genetically distinct populations have been studied, making comparisons difficult. However, even if good data on host genetic variability as well as comprehensive information on clinical, serological, demographic and environmental factors were available several issues would remain unclear. For instance, how accurate is the measurement of currently used correlates of protection, such as vaccine-induced antibody level? What effect does natural boosting through infection have on the evaluation of vaccine efficacy? How relevant is the genetic variability of the infectious agent and will long-term vaccination programs lead to the rise of vaccine-escape mutants? Will functionally relevant variants be identified from family, twin, cohort and case–control studies?

Table 6 HLA allele frequencies and disease associations (allelic frequencies from dbMHC, http://www.ncbi.nlm.nih.gov/gv/mhc/main.cgi?cmd = init)
Table 7 Genetic studies on vaccine-induced immunity (as direct or indirect outcome measure)

Pharmacogenetics

It has been well documented that genetic differences exist among individuals that impact the efficacy of specific drug treatments. In some cases this is due to the ability to process/metabolize drugs that can occur at ADMET genes that determine drug Absorption and Distribution (transporters and plasma proteins), drug Metabolism and Excretion (metabolising enzymes and transporters) as well as Toxicity. In other situations, such as non-communicable diseases, the efficacy may be more tied to the actual etiological risk. For example, it has been shown in Black South Africans that AGT genotype can affect response to ACE inhibitor therapy for hypertension (Woodiwiss et al. 2006). Pharmacogenetics aims at understanding this genetic diversity underlying the pharmacokinetic and pharmacodynamic variability in drug response among patients, enabling personalized treatment, optimal dosing and minimal adverse effects.

With respect to drug metabolism, the genes most intensively studied so far encode drug transporters and drug metabolising enzymes such as cytochrome P450s (CYPs), glucuronyl transferases (UGTs), N-acetyl transferases (NATs), epoxide hydrolases (EHs), glutathione S-transferases (GSTs), flavin monooxygenases (FMOs) and multidrug transporters (MDR). These genes are highly polymorphic and their variation results in proteins or enzymes with enhanced, normal or reduced capacity, thereby dividing populations into groups of extensive, intermediate or poor metabolisers. Of these the CYP genes show the highest level of variation. For example, 70 alleles of CYP2D6 have been reported so far, yet only a few haplotypes were found to be of functional importance (http://www.cypalleles.ki.se). Whereas most of this diversity is caused by single nucleotide polymorphisms (SNPs), gene copy number variation (CNV) has also been found in CYP2D6, GSTM1 and GSTT1 (Gaedigk et al. 2007; Ingelman-Sundberg et al. 2007; Ouahchi et al. 2006; Rotger et al. 2007). It has been shown that in some cases the CYP genotyping efforts are less predictive of metabolizer status in African Americans than European Americans (Gaedigk et al. 2005). Such findings are suggestive of the role that total genetic variation may play in prediciting drug efficacy and underscore the need to perform analyses in a diversity of populations.

The distribution of drug response alleles shows distinct clusters among the world populations (Aklillu et al. 2007; Sistonen et al. 2007). In addition, there is inter-individual variation that in some cases supersedes population diversity (Sistonen et al. 2007). Due to limited data on African genetics/polymorphisms, no comprehensive patterns of variation are known for African populations so far. African-specific SNPs have been reported for the CYP, NAT, FMO genes (Allabi et al. 2005, 2004; Yasar et al. 2002), but their population or inter-individual variation are not well understood. In addition, phenotypic expression of polymorphisms may differ in individuals of different ethnicities and environments (Aklillu et al. 2002). This suggests an urgent need for population as well as location based genotype–phenotype correlation studies.

The high frequency of reduced-function alleles CYP2D6*17 and *29 in Africans predicts a considerably higher intermediate-to-poor metabolizer status than in people of European descent (Bertilsson et al. 2002; Masimirembwa et al. 1996; Wennerholm et al. 2001). The CYP2C19*2 allele currently accounts for most (60%) of the poor metaboliser phenotype for substrates of the CYP2C19 enzyme in Africans and Europeans (Bathum et al. 1999; Masimirembwa et al. 1995). The commonly known polymorphisms of NAT2 include alleles *5, *6, *7 and the African-specific *14 allele, and all affect acetylator status in carrier individuals (Dandara et al. 2003). They may translate into ultra-rapid or complete absence of metabolism of some substrates, with important implications for dosage adjustments in patients carrying these alleles.

Since few African populations have been studied so far and the documentation of pharmacogenetic information is scarce, attempts to establish biobanking initiatives and pharmacogenetic databases are underway (Matimba et al. 2008; http://www.aibst.com/biobank). Such databases could be a very helpful tool in promoting drug discovery and development in the public and private sectors.

The data collected for both vaccine and drug responses might eventually lead to the implementation of screening protocols, although this is feasible only in a few African environments presently. Pre-prescription genotyping has been recommended for CYP2D6 and CYP2C19 in antipsychotic therapy (Kirchheiner et al. 2001; Masimirembwa and Hasler 1997). In anticoagulant therapy, CYP2C9 and Vitamin K epoxide reductase subunit 1 (VKORC1) genotyping can help to predict the starting dose of the drug warfarin (Wadelius and Pirmohamed 2006). Host genetic factors may also influence HIV treatment efficacy and safety; for example, the human leukocyte antigen HLA-B*5701 allele has been associated with abacavir sensitivity (Lucas et al. 2007), so patient screening for this allele should minimise incidences of adverse reactions or hypersensitivity. “African” polymorphisms need to be incorporated into the development of these applications, as individuals may be at higher risk of dose-related adverse drug reactions or less efficacious treatment when taking doses recommended for Europeans. For example, the CYP2B6 516G > T polymorphism is highly prevalent in Africans and results in reduced enzyme function (Klein et al. 2005). This has implications for drug toxicity due to high plasma concentrations (Rotger et al. 2005). A recent study showed that individuals carrying this mutation can be treated with reduced dosage and still achieve therapeutic outcomes (Nyakutira et al. 2008). Pre-prescription genotyping of patients should, therefore, result in minimal side effects and lower cost of treatment. Such an outcome is particularly relevant in Africa where healthcare cost usually outstrips affordability by individuals or governments.

Conclusions

Nearly 2000 years ago the Roman scholar and natural philosopher Pliny the Elder wrote in his Natural History: “Ex Africa surgit semper aliquid novi” (from Africa there is always something new); this quote beautifully applies to genetic studies of African populations as they provide a critical resource in the study of genetic risk factors of human disease and to new discoveries. By doing studies throughout Africa it will be possible to capture most of the extant genetic risk factors in all human populations. It may also be possible to use simple and relatively inexpensive genetic tests to reduce overall healthcare costs. Finally, as pointed out there are many diseases that are endemic to Africa that carry significant genetic risk, and studying these could improve the health in Africa. However, despite the advantages and importance of these studies there are substantial impediments to performing genetic research in an African setting, most notably lack of resources and infrastructure. In recognition of these factors it has been argued that bio-banks need to be developed to expedite research (Sgaier et al. 2007; Sirugo et al. 2004). There is an increasing awareness that it is not only important to coordinate research efforts, overcome “territorial issues”, and share resources between research teams, but also that there is an essential need for training African scientists who can lead and promote genetic research in Africa. Such efforts are ongoing and form the basis for many of the objectives of the African Society of Human Genetics that was formed in 2003 (Rotimi 2004). Many such efforts are still in their infancy, as evidenced by the lack of research discussed in this review for some diseases or for vaccines and treatment, and although some progress has been made, as stated in the African proverb “thunder is not yet rain”, it is just the beginning and a lot more needs to be done.