Keywords

1 Introduction

More than half of all humans are infected with Helicobacter pylori, a gram-negative spiral bacterium whose ecological niche is the human stomach which is also linked to severe gastritis-associated diseases, including peptic ulcer and gastric cancer [1]. H. pylori strains from different geographical areas show clear phylogeographic features; these features enabled us to assume the migration of human populations by phylogeographic analyses of H. pylori. In addition, the genetic diversity within H. pylori is greater than within most other bacteria [2] and about 50-fold greater than that of the human population [3]. Moreover, frequent recombination between different H. pylori strains [4] leads to only partial linkage disequilibrium between polymorphic loci, which provide additional information for population genetic analysis [5].

Several virulence factors of H. pylori have been demonstrated to be predictors of gastric atrophy, intestinal metaplasia, and severe clinical outcomes [611]. Currently, two most extensively studied virulence factors of H. pylori, cagA and vacA, are used as markers for genomic diversity within distinct populations [12]. In addition, multilocus sequence typing (MLST) analysis, which uses seven housekeeping genes, is also useful to predict the history of human migrations [2, 5, 1315]. MLST was proposed in 1998 as a tool for the epidemiological study of bacteria [16]. Recently, the genomic diversity within H. pylori populations was examined by employing the MLST method using seven housekeeping genes (atpA, efp, mutY, ppa, trpC, ureI, and yphC) [5, 13, 14]. MLST analysis is reported to give more detailed information about human population structure than the method using human microsatellite or mitochondrial DNA [15]. Moreover recently the whole-genome sequencing technology is another powerful tool to study the evolution and pathogenicity of H. pylori. In this chapter we describe the current knowledge about the usefulness of virulence factors and housekeeping genes for elucidating the history of human migration and overview on the utilization of genome-wide information for advanced studies.

2 Migration out of Africa to the Pacific

It was believed that H. pylori was already established in human stomachs at least 100,000 years ago [17] from an unknown source. It is most likely transmitted from large felines which contained H. acinonychis to San peoples (hpAfrica2; very distinct and has only been isolated in South Africa) and then widespread throughout Africa (hpAfrica1 and hpNEAfrica) [18]. hpAfrica1 divided into two subpopulations, hspWAfrica (West Africans, South Africans, and Afro-Americans) and hspSAfrica (South Africans). On the other hand, hpNEAfrica is predominant in isolates from Northeast Africa [19]. H. pylori is predicted to have spread from East Africa over the same time period as anatomically modern humans (~58,000 years ago) and mirrors the human pattern of increased genetic distance and decreased diversity with distance from Africa (Fig. 2.1) [13, 14]. Using MLST the modern populations derived from six ancestral populations (Table 2.1) which were designated ancestral European 1 (AE1), ancestral European 2 (AE2), ancestral East Asia, ancestral Africa 1, ancestral Africa 2 [5], and ancestral Sahul [14]. These ancestrals recently derived to seven population types based on geographical associations: hpEurope, hpEastAsia, hpAfrica1, hpAfrica2, hpAsia2, hpNEAfrica, and hpSahul (Fig. 2.2) [5, 13, 14].

Fig. 2.1
figure 1

Modern human migration out of Africa. Black arrows and numbers represent predicted paths and times of migration

Table 2.1 Multilocus sequence types of Helicobacter pylori according to geographical area
Fig. 2.2
figure 2

Seven population types based on geographical associations. Colored circles illustrate the putative distribution of H. pylori before “the age of exploration.” Six ancestries derived to seven populations: hpEurope, hpEastAsia, hpAfrica1, hpAfrica2, hpAsia2, hpNEAfrica, and hpSahul

By a southern coastal route, the ancestors of modern humans passed from India to the Southeast and Australasia [20] during their first “out of Africa” migration, which subsequently resulted in the Asian lineages (hpAsia2). Recently hpAsia2 strains have been isolated in South, Southeast, and Central Asia [19]. Most strains in India initially belonged to hpAsia2 [13], whereas some strains belonged to hpEurope [21]. However H. pylori in the Indian population is more heterogeneous in origin, reflecting perhaps both earlier common ancestry and recent imports. It is notable that hpAsia2 strains from Ladakh Indians and Malaysian Indians can be divided into two subpopulations, hspLadakh and hspIndia [22]. From mainland Asia the route extended along the Pleistocene landmass, known as Sundaland (i.e., the Malay Peninsula, Sumatra, Java, Borneo, and Bali), that was joined to the Asian mainland as a result of low sea levels during the last ice age (12,000–43,000 years ago). Low sea levels also meant that Australia, New Guinea, and Tasmania were connected in a continent called Sahul, separated from Sundaland by a few narrow deep-sea channels [14]. Recently hpSahul strains are isolated from aborigines of Australia and highlanders of New Guinea [19].

Subsequent migrations of ancestors of the African hpNEAfrica and/or the Asian hpAsia2 populations resulted in the admixed hpEurope population which then became the predominant population of extant H. pylori in Europe, the Middle East, and Western Asia. The modern humans settled in Europe about 30,000–40,000 years ago, probably entering via two routes: from Turkey along the Danube corridor into Eastern Europe and along the Mediterranean coast [20]. hpEurope includes almost all H. pylori strains isolated from ethnic Europeans, including people from countries colonized by Europeans. The hpEurope can be divided into AE1 and AE2. AE1 originated in Central Asia, because it shares phylogenetic signals with isolates from Estonia, Finland, and Ladakh in India. It is not clear which population arrived first, but AE1 has a higher frequency in Northern Europe, while AE2 is more common in southern Europe. MLST analyses from Iran also provided evidence that H. pylori strains from Iran are similar to other isolates from Western Eurasia and can be placed in the previously described hpEurope population [23].

Human migrations in Southeast Asia have also been clarified on the basis of MLST analyses from Cambodia [24]. Cambodian strains have been classified in two groups, hpEurope and hspEAsia, which have resulted from three ancient human migrations: (1) from India, introducing hpEurope into Southeast Asia; (2) from China, carrying hspEAsia; and (3) from Southern China into Thailand carrying hpAsia2 [20, 24]. Their findings also support two recent migrations within the last 200 years: (1) from the Chinese to Thailand and Malaysia spreading hspEAsia strains and (2) from Indians to Malaysia carrying hpAsia2 and hpEurope [20, 24]. In concordance with this study, H. pylori isolates from Malaysia are classified as hpEastAsia, hpAsia2, or hpEurope. A new subpopulation within hpAsia2, hspIndia, may reflect as the Malaysian Indians mainly came from South India.

hpEastAsia is common in H. pylori isolates from East Asia. hpEastAsia also includes subpopulations, i.e., hspMaori (Polynesians, Melanesians, and native Taiwanese), hspAmerind (Amerindians), and hspEAsia (East Asians). Approximately 12,000 years ago, H. pylori (hspAmerind) accompanied humans when they crossed the Bering Strait from Asia to the Americas [12]. Our previous data showed that four strains isolated from the Ainu ethnic group, living in Hokkaido, a northern island of Japan, belong to the hspAmerind population [25]. Japanese aboriginal people, known as Jomon people, are thought to have migrated to the northern or southern area such as Hokkaido and Okinawa because of the immigration of the Yayoi people from the Korean Peninsula [26]. Finally around 5000 years ago, H. pyllori (hspMaori) accompanied several subgroup of the Austronesia language family spread from Taiwan through the Pacific [14] included several islands in east Indonesia [27] into Melanesia and Polynesia.

3 Virulence Factors for Tracking Human Migration

The relationships between MLST and virulence factors were reported [28, 29]. The phylogeny of most cag pathogenicity island (PAI) genes, an approximately 40-kilobase pair region that is thought to have been incorporated into the H. pylori genome by horizontal transfer from an unknown source [30], was similar to that of MLST, indicating that cag PAI was probably acquired only once by H. pylori, and its genetic diversity reflects the isolation by distance which has shaped this bacterial species since modern humans migrated out of Africa [29]. The cagA gene which encodes a highly immunogenic protein (CagA) is located at one end of the cag PAI. The cag PAI encodes a type IV secretion system, through which CagA is delivered into host cells [3133]. After delivery into gastric epithelial cells, CagA is tyrosine phosphorylated at Glu-Pro-Ile-Tyr-Ala (EPIYA) motifs located in the 3′ region of the cagA gene [34]. Supporting that H. pylori mirrors the human pattern of increased genetic distance and decreased diversity, our group has reported that the structure of the 3′ region of the cagA gene varies between strains from East Asian and Western countries [9, 12, 35, 36]. In East Asian strains, two types of repeats are found: 57 bp repeats followed by 162 bp repeats (East Asian-type cagA). Western strains have similar 57 bp repeats; however, they are followed by a repeat region consisting of 102 bp repeats, which are completely different from those of East Asian strains (Western-type cagA). Previous reports also show that the structure of the 5′ region of the cagA gene varies between strains from East Asian and Western countries [12, 37]. East Asian-type cagA is only observed in H. pylori isolates from the East Asian population, whereas Western-type cagA is widely distributed among isolates from European, South and Central Asian, North and South American, and African populations [12]. Almost all H. pylori isolates from East Asia possess the cagA gene, whereas approximately 20–40 % of isolates from Europe and Africa are cagA negative. Thailand is at the cultural crossroads between East and South Asia and, indeed, approximately half of the strains in Thailand possess East Asian-type cagA, whereas others possess Western-type cagA [38]. Interestingly Western-type cagA detected in strains from Okinawa (J-Western-type cagA) formed a different cluster compared to the original Western-type cagA [39]. The pre-EPIYA region of cagA also shows geographic divergence [40].

Most strains isolated from East Asia have a 39-bp deletion, but this deletion was absent in most strains from Western countries. On the other hand, an 18-bp deletion was common in Vietnamese strains. In addition, we found that the frequencies of the EPIYT and ESIYT motifs are relatively high among the sequences of the Okinawa strains [41]. Amerindian-type cagA from part of Machiguenga-speaking residents of the Shimaa village in the remote Peruvian Amazon (AM-I) also contained ESIYT motifs, which supports the possibility that these populations share the same origin [42]. A recent study revealed the recombination processes of cagA [43]. Interestingly, the left half of the EPIYA-D segment of East Asian-type cagA was derived from the Western-type EPIYA, with the Amerindian-type EPIYA as intermediate, through rearrangement of specific sequences within the gene. J-Western type EPIYA is phylogenetically located between the Western-type EPIYA and Amerindian-type EPIYA. This finding suggests that the original H. pylori strain had a Western-type cagA sequence. Subsequently, they evolved to the J-Western-type cagA, to the Amerindian-type cagA, and then to the East Asian-type cagA.

The right end of the cag PAI has been divided into five subtypes according to deletion, insertion, and substitution motifs [44]. Type I is most common in isolates from ethnic European groups and from Africa, type II is predominant in those from East Asia, and type III is predominant in isolates from South Asia [12, 44]. Type IV is very rare and, therefore, has not been assigned to a specific geographical area. Type V is found in a few strains from Calcutta, India [12, 44]. Interestingly, our report showed that type V was present in 10 % of isolates from patients of Thailand, and the ratio was especially high in strains obtained from ethnic Thai (21 %) [38]. The presence of this genotype in Thailand suggests that it migrated to the east of Calcutta. Overall, these data might show that transmission of specific genotypes remains conserved within ethnic groups for at least several generations.

On the other hand, the overall topology of the vacA tree did not always match with that of MLST [28]. Furthermore, rooting the vacA tree with out-group sequences from the closely related H. acinonychis revealed that the ancestry of vacA is different from the African origin. VacA is a vacuolating cytotoxin that induces cytoplasmic vacuoles in various eukaryotic cells. Unlike the case of the cagA gene, all H. pylori strains carry a functional vacA gene. However, there is considerable variation in vacuolating activities among strains [45, 46], primarily as a result of differences in the vacA gene structure at the signal region (s1 and s2) and the middle region (m1 and m2) [47]. Interestingly, the vacA s1 genotype is closely correlated with the presence of the cagA gene [8, 48, 49]. The vacA gene may comprise any combination of signal and middle-region types, although the s2/m1 combination is rare [47, 50]. All East Asian H. pylori strains are of the vacA s1 type [8, 12]. Within East Asian countries, the m1 type is predominant in Japan and Korea, whereas the prevalence of m2 types gradually increases in southern parts of East Asia (Vietnam, Hong Kong) [12]. The vacA s1 type is subdivided into s1a, s1b, and s1c [37, 47], and the m1 type is subdivided into m1a, m1b, and m1c [51]. The vacA s1c and m1b types are typical of H. pylori from East Asia (i.e., more than 95 % of s1 and m1) and the s1a and m1c types are common in South Asia (i.e., approximately 85 % of s1 and nearly 100 % of m1) [12, 52]. The vacA m1c genotype is also found in strains from Central Asia (ethnic Kazakhs) [12]. The m1a type is typical of Africans and ethnic Europeans (i.e., nearly 100 % of m1) [12, 37]. Both the s1a and s1b types are common in ethnic European strains, and s1b types are especially common in strains from the Iberian Peninsula and Latin America (i.e., approximately 85 % of s1) [37, 53]. The s1b type is also predominant in Africa (i.e., approximately 90 % of s1) [50, 53]. The H. pylori genotypes circulating among ethnic groups (Blacks, White Hispanics, Whites, and Vietnamese) living in the same region (Houston, Texas, USA) [54] have been examined by our group. According to ethnicity genotypes, the most common were the following: Blacks, s1b-m1; Hispanics, s1b-m1; Whites, s1a-m2; and Vietnamese, s1c-m2, which completely overlap with the predominant genotypes of Africa, the Iberian Peninsula, Northern and Eastern Europe, and Vietnam, respectively. In Thailand, the predominant vacA genotypes among s1-m1 strains are s1a-m1c in ethnic Thai people and s1b-m1b in ethnic Chinese people, which are the same as the predominant genotypes of South Asia and East Asia, respectively [38].

By combining the cagA, cag right-end junction, and vacA genotypes of more than 1000 H. pylori strains collected from East Asia, Southeast Asia, South Asia, Central Asia, Europe, Africa, North America, and South America [12, 38], four major groups (East Asia type, South/Central Asia type, Iberian/Africa type, and Europe type) can be defined according to geographical associations (Table 2.2). In these groups, cagA-negative and/or vacA m2 genotypes are not taken into account, but we can predict the geographical origins of each group using available genotypes (i.e., strains with cagA negative, but vacA s1a-m1a is predicted to be of the Europe type). Overall, the genotype of the virulence genes is important, not only as a tool to track human migration but also for epidemiological studies of H. pylori-related gastroduodenal diseases, especially in areas where multiple genotypes coexist (e.g., virulent East Asian type and less virulent South/Central Asian type in Thailand).

Table 2.2 Predominant virulence genotypes on cagA and vacA genes according to geographic area

The genotypes of the virulence genes have provided important information about human migration to the Americas. The Americas were populated by humans of East Asian ancestry approximately 15,000 years ago. Over the last 500 years, Europeans and Africans have come to the Americas, leading to an increasing Mestizo (mixture of Amerindian and European ancestry) population. Our group has discovered that approximately 25 % of the H. pylori isolates from Native Colombians and Native Alaskans possess novel vacA and/or cagA structures that are similar, but not identical, to structures from East Asia (i.e., vacA s1c-m1b-like, East Asian-like cagA) [12]. Native Venezuelans are also reported to have a high frequency of the vacA s1c genotype [55]. These data confirm that H. pylori accompanied humans when they crossed the Bering Strait from Asia to the New World. Importantly, none of the H. pylori strains from Mestizo populations possess East Asian-like genotypes. Sequence analysis of H. pylori genomes has shown that East Asian-like Amerindian strains are the least genetically diverse, probably because of a genetic bottleneck, whereas European strains are the most diverse among Amerindian, European, African, and East Asian strains [56]. If diversity is important for the success of H. pylori colonization, the East Asian-like Amerindian strains may lack the needed diversity to compete with the diverse H. pylori population brought to the New World by non-Amerindian hosts and has therefore disappeared.

4 Genome-Wide for Evolutionary Study

Analysis of MLST data and virulence factors revealed much information about the pathogenicity and genealogy of H. pylori; however, these approaches focus on a small number of genes and may miss information conveyed by the rest of the genome. Genome-wide analyses using DNA microarray or whole-genome sequencing technology give a broad view on the genome of H. pylori.

Microarray analysis provides comprehensive information about gene contents of different strains and helps identify strain-specific genes as well as core genes shared by multiple strains. Salama et al. examined the genomic content of 15 clinical isolates using a whole-genome DNA microarray and defined 1281 genes as functional core genes [57]. They identified candidates of virulence genes on the basis of coinheritance with the cag PAI. A similar approach was used to elucidate the genomic diversity of isolates obtained from clinical patients in China [58]. The whole-genome sequencing technology is another powerful tool to study the evolution and pathogenicity of H. pylori.

Since the first release of the whole genome of strain 26,695 [59], the sequences of more than 20 genomes were determined by Sanger sequencing or the massively parallel sequencing technology. Accumulation of whole-genome data enables extensive sequence analyses of H. pylori strains. About 1200 core genes were identified by comparison of peptic ulcer strain P12 and six other H. pylori genomes, which were in agreement with preceding studies [60]. The authors found that the P12 genome contains three plasticity zones and that one of them is capable of self-excision and horizontal transfer by conjugation. Their result suggests that conjugative transfer of genomic islands may contribute to the genetic diversity of H. pylori. McClain et al. compared genome sequences of an isolate obtained from a patient with gastric cancer (strain 98–10) and an isolate from a patient with gastric ulcer (strain B128) [61]. Strain 98–10 was found to be closely related to East Asian strains, while strain B128 was related to European strains. They determined strain-specific genes of strain 98–10 as candidate genes associated with gastric cancer. East Asian strains are known for their stronger carcinogenicity compared to strains of other areas. Kawai et al. investigated the evolution of East Asian strains using 20 whole genomes of Japanese, Korean, Amerindian, European, and West African strains [62]. Phylogenetic analysis revealed a greater divergence between the East Asian strains and the European strains in genes related to virulence factors, outer membrane proteins, and lipopolysaccharide synthesis enzymes. They examined positively selected amino acid changes and mapped the identified residues on CagA, VacA, HomC, SotB, and MiaA proteins.

Currently, we took advantage of next-generation sequencers to read genomic sequences of more than 40 H. pylori strains mainly from Asian populations and attempted de novo assembly (unpublished observation). Although we cannot determine the whole genomes yet, we could construct a substantial size of contigs and predicted 1200–1500 genes for each strain. Using these data, we determined orthologous genes among our samples and strains whose whole genomes were released into public databases. A phylogenetic tree constructed by concatenated sequences of the orthologous genes showed more reliable results than a phylogenetic tree constructed by using MLST data. Compared with the tree based on MLST data, the tree constructed by using concatenated genes showed better branching with higher bootstrap values between hpEurope and hpAsia2, as well as between hspEAsia and hspAmerind. Data obtained by using the massively parallel sequencing technology provide valuable information on the genealogy of H. pylori strains, as well as on candidates of drug resistance genes and new virulence factors.

5 Conclusion

H. pylori typing is very useful as a tool for tracking human migrations. Serial studies of large numbers of H. pylori strains from all over the world, including strains isolated from aboriginal populations, have shown that MLST analysis of H. pylori sequences provides more detailed information on human migrations than does human genetic analysis. However, there are still a number of untapped areas in the world, including a number of isolated aboriginal populations in Siberia, Mongolia, China, Indonesia, and Japan (Ainu tribe), and it will be interesting to study H. pylori strains isolated from these different groups. To date, subcategorization of East Asian strains (hspEAsia) has not been possible because of high homology among East Asian strains. However, the genotyping of virulence genes has shown that the vacA middle region can be useful for distinguishing strains of the northern parts of East Asian countries from those of the south. Genome-wide analyses using DNA microarray or whole-genome sequencing technology give a broad view on the genome of H. pylori. These methods may complete the weakness of MLST and virulence factors. In particular, next-generation sequencers enabled us to efficiently investigate not only the evolution of H. pylori, but also novel virulence factors and genomic changes related to drug resistance.