Introduction

Gastric cancer is the third leading cause of cancer-related mortality globally and the leading cause of infection-associated cancer mortality, responsible for approximately 723,000 deaths annually or 10% of cancer-related mortality worldwide [1, 2]. It is most prevalent in low-income countries where 5-year survival rates are often less than 15%, as highlighted in the recent CONCORD-3 global cancer survival analysis [3]. Helicobacter pylori is the principal cause and the strongest known risk factor for gastric cancer [4]. It was the first bacterium linked to cancer and is classified by the World Health Organization as a class I carcinogen [5]. However, the inflammation H. pylori causes is necessary, but not sufficient, to cause gastric cancer as only a small percentage of infections advance to severe disease [4, 6]. Gastric cancer incidence demonstrates marked geographic variability by region, as demonstrated by the “altitude enigma” in Latin America [7•]. In the USA, it represents a major health disparity, as non-whites have twice the incidence [1].

H. pylori is a gram-negative spiral/rod-shaped bacterium that has evolved to survive a challenging gastric microenvironment. It has adapted to tolerate the low oxygen levels of the stomach, to raise gastric pH with the expression of urease, and to use flagella that allow H. pylori to colonize the gastric mucosa [8]. Similar to some other pathogens, including tuberculosis and malaria, H. pylori has co-evolved with humans. H. pylori is also a commensal bacterium that can be used to trace human migration [9,10,11,12,13,14,••–15].

H. pylori infection is usually acquired in childhood, and about half of the global population is chronically infected [4]. Some regions in Central and South America have infection rates as high as 90%, [16], but most of these infections result in only mild inflammation (“gastritis”) [4]. Some may cause peptic ulcers (10–20%) or gastric adenocarcinoma (1–3%) [17, 18].

Symptoms of gastric disease do not manifest until the disease has advanced significantly, leading to late diagnoses and high mortality rates [6]. In high incidence/high resource settings such as eastern Asia, screening programs have been shown to decrease gastric cancer mortality, with endoscopy coupled with endoscopic resection for early gastric cancer [19]. H. pylori and host genetic factors, acting in concert with dietary, microbiome, and environmental factors, account for progression of a subset of patients along the so-called Correa cascade, from benign non-atrophic gastritis, to multifocal atrophic gastritis, intestinal metaplasia, dysplasia, and lastly adenocarcinoma [17, 20••]. Atrophy, intestinal metaplasia, and dysplasia are considered pre-malignant lesions. The severity of the atrophy and metaplasia is quantified by the OLGA/OLGIM system, and in the research setting by the Correa histopathology score [21].

As the majority of infected individuals do not develop advanced gastric disease, it has been hypothesized that genetic factors of either the host or pathogen mediate disease risk [22]. Studies have supported both genomes as playing a role in disease progression, but thus far they have failed to explain most of the risk. Knowledge of who is at risk of progressing to gastric cancer may help discover biomarkers and design more cost effective interventions, especially in resource-limited settings where the majority of mortality occurs. H. pylori eradication in the gastritis stage may prevent gastric cancer [23], but universal population eradication is impractical and potentially harmful for a number of reasons [24]. For example, most infected individuals do not develop disease and therefore, treatment may contribute to antibiotic resistance and disruption of the microbiome. Additionally, as a commensal with effects on the immune system, H. pylori may protect from gastro-esophageal reflux disease (GERD), esophageal carcinoma, atopic dermatitis, and childhood asthma [25,26,27,28,29]. Of note, for patients with pre-malignant lesions, H. pylori eradication may be helpful as adjunctive intervention, but in general, endoscopic surveillance is needed, particularly in the absence of biomarkers for the risk of progression.

Whether H. pylori transitions from commensal bacteria to pathogen strongly depends on host-bacterial genetic interactions [30••]. Accurate and low-cost tools to detect how the two genomes interact and synergize are needed to predict when infection will likely lead to development of gastric cancer. Detailed characterization of study subjects is required for multiple, parallel investigations of host/pathogen genetics, host microbiome, and other molecular determinants that together affect gastric cancer risk.

Altitude and Geography as Determinants of Gastric Disease

Gastric cancer has marked geographic variability at the regional, country, and in-country levels [1]. High incidence regions include eastern Asia, mountainous Latin America along the Pacific Ocean, and Eastern Europe [2]. Three geographic paradoxes have been observed: the "Altitude enigma", the "African enigma", and the "Asian enigma" [31•]. In Latin America, differences in gastric cancer risk have been observed between human populations living in low versus high altitude regions along the Pacific littoral, the “Altitude enigma” [7•, 30••, 32]. In South and Central America, the incidence of gastric cancer mortality rates in high altitude regions may be as much as 6 times higher than regions at sea level [7•]. It has been suggested that risk factors may cluster in the mountain villages. The “African enigma” refers to the phenomenon of relatively low gastric cancer incidence on most of the African continent, despite almost universal H. pylori infection [31•, 33,34,35]. And the “Asian enigma” notes the high incidence nations in eastern Asia, with a decreasing incidence in the westward transition to India [31•, 36]. These patterns have not been explained by studies of either human or H. pylori genetics in isolation.

These “enigmas” are challenged by studies that suggest that factors beyond mere geographical location, such as the genetics of the pathogen or host, are the principal explanatory determinants of gastric cancer risk. However, when analyzed in isolation, these factors have not adequately explained variation in gastric disease outcomes among those infected with H. pylori. Since the prevalence of H. pylori infection, or even the known H. pylori risk gene, cagA, in a population generally may not predict gastric disease or gastric cancer incidence [4, 37, 38], this indicates that other factors affect gastric cancer risk within a population. Analyses that incorporate the interaction of both host and pathogen genetics may be better at predicting gastric disease risk than any single factor. Observing gene networks and the impact of the discordance between the genetic ancestries of H. pylori and the infected human host has helped determine the likelihood of developing severe gastric disease [30••]. Investigation of these complex interactions may be important for the development of biomarkers and focused prevention programs, as well as gene discovery.

Genetic Determinants of Gastric Disease Risk

In H. pylori

H. pylori uses several genes and gene networks to thrive within the gastric microbiome. The cag pathogenicity island (PAI) is a series of H. pylori genes that encode the type IV secretion system (T4SS), used to transport virulence factors from gram-negative bacteria into host cells that then disrupt key cellular processes of the host [38, 39]. The T4SS is a key enabler of H. pylori pathology. The PAI also encodes a secreted cagA (cytotoxin-associated gene A), oncoprotein-like virulence factor that can cause a severe inflammatory response [40]. Presence of cagA associates with increased gastric cancer risk, and is prevalent in high incidence areas, but is not sufficient to predict who will develop disease [40,41,42].

The H. pylori gene vacA also contributes to gastric cancer risk. Located outside of the PAI, this gene encodes a pore-forming cytotoxin that causes vacuole formation in host cells, triggering epithelial cell apoptosis [38, 43]. It also suppresses the host T cell response to H. pylori [44]. The vacAs1 m1 genotype associates with gastric disease [45]. Both cagA and vacA status and genotype are important carcinogenic risk factors, perhaps independently of host genotype, but they are not definitive determinants of risk for premalignant and malignant gastric lesions [4, 46, 47]. These genotypes alone are insufficient to explain differences in gastric cancer incidence among regions, or to serve as clinical biomarkers.

Other H. pylori virulence factors also contribute to a basal inflammatory state necessary for gastric disease progression. These include urease—which creates a microenvironment by neutralizing the low pH of the stomach, periplasmic nitrate reductase—which facilitates iron uptake, and arginase—which aids in H. pylori subversion of host macrophages [8, 38, 44]. The maintenance of basal chronic inflammation and subversion of the immune response also partly depends on host genetic factors and responses, mentioned below. Disturbance of the host-pathogen balance results in overt clinical disease. Additionally, the H. pylori genome is highly plastic [48], with high recombination rates that enable rapid gain and loss of genetic elements that may be harmful to the host. Although each of the factors in the H. pylori genome described above associate with gastric disease, they explain only a very small proportion of attributable risk [49]. Even if they do increase risk, they are not always present, highlighting the plasticity of the H. pylori genome and the potential complexity of genomic causes of gastric disease.

Human Factors: Germline and Somatic

Several host genetic factors appear to play a role in influencing gastric disease risk. Both candidate gene and genome wide studies (GWAS) have been used to assess host genetic risk factors. Analyzed independently of the colonizing H. pylori, these explain a small proportion of estimated heritability of gastric disease [49]. Based on candidate gene studies, host risk factors for gastric cancer include genes encoding cytokines and cytokine receptors like the interleukin-1 (IL-1) family of cytokines (including IL-1β, IL1RN), IL8, IL-10, TNF-α, stromal remodeling proteins like matrix metalloproteinases, the MUC1, PRKAA1, and PTGER4 genes, and prostate stem cell antigen (PSCA) that acts as a tumor suppressor gene in gastric pathology [50,51,52,53,54,55]. In a seminal study by El Omar et al., specific pro-inflammatory cytokine genotypes were associated with gastric cancer risk, in the setting of H. pylori infection [56•]. Some IL-1 SNPs increase inflammation, decrease gastric acid production, and increase gastric cancer risk—in response to H. pylori infection [57]. In the body of gastric cancer risk SNP literature, divergent risk SNPs have been observed in Asian versus non-Asian populations [58•].

GWAS studies have provided further insight into potential germline genetic factors. However, the generalizability of these results is not clear for at least two reasons. First, the majority of gastric cancer GWAS-based analyses have studied East Asian populations. Second, phenotyping in most of these studies has not been as precise as possible. Of the genetic loci listed above, only a few have been corroborated in non-Asian populations. One study, which included non-Asians, only corroborated the role of an IL1RN2 variant in gastric disease [58•]. A meta-analysis including non-Asians associated zinc finger domain transcription factors and PSCA with gastric disease [59]. This evidence was linked to biologically relevant and hypothesis-driven studies. The diverse phenotypes of gastric disease require detailed clinical characterization and sample stratification to yield generalizable results that include disease-associated genes that are both universal and population specific. Only one study has been performed using highly stratified samples [58•], where gastric cancers were stratified by gastric cancer histologic subtype (intestinal or diffuse), anatomic sub-site, H. pylori infection status, geographic location (Asian vs. non-Asian), and by a quantitative index of study quality.

As with other infectious diseases, GWAS gastric cancer studies are less informative, compared to those done in non-communicable diseases [60,61,62]. Human polymorphisms and polymorphism networks can confer variable effects in the context of different pathogen strains within the same study cohort. Many infectious disease phenotypes depend on complex interactions between host and pathogen genomes. Not surprisingly, most informative infectious disease susceptibility GWAS studies have been done in more genetically homogenous organisms, like Mycobacterium leprae [63].

Two GWAS that assessed susceptibility to gastric cancer and H. pylori infection identified SNPs with odds ratios up to 1.4, but most were of uncertain biological function and indicate that human genetic variation, considered in isolation, accounts for a small proportion of estimated gastric cancer risk [64,65,66]. (Table 1) Larger meta-analyses are needed to confirm these associations and assess their importance. Lack of consistency in data collection impacts results and may explain some variability in currently published results.

Table 1 Genetic variants identified by GWAS for phenotypes related to infection by H. pylori

Human genetics, in concert with epigenetics, may also interact to influence gastric disease risk. A study using ~ 200,000 SNPs, focused on immune response genes in Colombian populations, reported an association between severity of gastric lesions and exonic SNPs and DNA promoter methylation of the gene GATA5. [67] Promoter methylation of GATA5 and the exonic SNPs in GATA5 were independently associated with advanced gastric disease and also showed interaction between GATA5 variants and promoter methylation of the gene, demonstrating that the association of either factor with gastric disease is modified by the presence of the other. These variants may have complex, pleiotropic effects on the development of gastric disease via modification of tumor suppressors, the inflammatory response, transcript stability, and editing and binding affinity of protein to DNA, as well as cell and tissue differentiation and development of the gastric mucosa. Changes in DNA methylation often occur non-specifically, as a function of aging or environmental factors and some changes may occur in premalignant lesions, making methylation changes potentially useful biomarkers for gastric cancer risk [68,69,70].

Familial clustering of gastric cancers occurs in 5–10% of cases, and established and emerging germline mutations for the intestinal and diffuse subtypes have been recently summarized [71]. Familial clustering may be partially explained by shared environmental risk factors. Hereditary diffuse gastric cancer (HDGC) was the initially described familial form [72] related to mutations in the E-cadherin gene (CDH1). More recent work has implicated germline mutations in PALB, BRCA1, and RAD51C, in both diffuse and non-diffuse gastric cancer [71]. Additional proof of segregation in families and molecular evidence from tumors is needed to demonstrate an underlying, common genetic basis. Family-specific mutations, if found, could be critical for stratification of gastric cancer risk and treatment as well as further defining important risk pathways.

Comprehensive genetic analyses of gastric cancer tumors, led by the Cancer Genome Atlas Research Network (TCGA), has helped to elucidate various tumor somatic mutations, which yielded a novel classification system based on molecular characterization [73]. TCGA used a series of platforms, including whole-exome sequencing, array-based DNA methylation profiling, mRNA sequencing, microRNA sequencing, reverse-phase protein array, and microsatellite instability testing, to demonstrate that gastric cancers tend to cluster into one of four groups: (1) EBV (9%), with extensive DNA promoter hypermethylation, (2) microsatellite instability (MSI, 21%), with high levels of mutation and hypermethylation of several genes, (3) chromosomal instability (CIN, 50%), with intestinal histology and extensive somatic copy number aberrations, or (4) genomically stable (GS, 20%), with diffuse histology. The TCGA initiative also helps identify specific molecular targets for precision gastric cancer treatment in the future.

Exogenous environmental factors also increase gastric cancer risk, but incompletely explain the distribution of disease risk. These include dietary salt and nitrosamine intake, H. pylori virulence factors, presence of non-Helicobacter gastric microbiota, medications, alcohol, tobacco, wood smoke exposure in cooking, and availability of micronutrients [74,75,76]. However, neither exogenous factors, nor host or pathogen genetic variation in isolation has sufficiently explained the distribution of gastric disease. This raises the possibility that context may affect disease risk. For example, the pathogenicity of an H. pylori strain varies, depending on genetic variation of the human host and some individuals are better adapted to specific strains than others.

Host genetics may mediate gastric cancer through the microbiome. In a multi-species model, other resident bacteria, as well as viruses and fungi, may act in concert with human and H. pylori genetic variants to influence gastric disease risk, but this has not yet been studied across all possible interactions. Some studies have investigated one premise of this hypothesis: whether human genetic variation helps shape overall microbiome composition and if so, how [77]. Human genes associated with metabolism, innate immunity, and vitamin D receptors may modestly influence microbiome composition [77,78,79,80,81]. An especially strong association appears to exist between microbiome composition and C-type lectins [77, 78], which are pattern recognition receptor molecules that recognize microbes and activate inflammation via the immune system. A certain microbiome milieu may create a more inflammatory environment that facilitates gastritis progression. Importantly, advanced atrophy and metaplasia result in hypochlorhydria with a decreased H. pylori burden and reconstitution of the non-H. pylori microbiota—which may influence further progression to dysplasia and adenocarcinoma [82]. Notably, these individuals, similar to H. pylori negative individuals, will have a more diverse gastric microbiome, compared to those who are H. pylori positive [82,83,84,85]. However, no host genetic determinants have linked microbiome characteristics with gastric disease risk. The role of human genetics in shaping the microbiome, which may in turn impact gastric disease risk, is difficult to explore [86]. Cohorts followed from birth to late adulthood will help clarify the interactions of individual genetics with exogenous factors in shaping the gut microbiome and its potential role in gastric disease.

Co-Evolution as a Determinant of Gastric Disease?

Recent work hypothesized that humans and H. pylori reciprocally impact each other’s evolution and that gastric cancer risk is higher in host-pathogen pairs that have not co-evolved together [30••]. There is strong evidence that H. pylori has co-evolved with geographically defined human populations, moving out of Africa as a host-pathogen complex [87••]. H. pylori is usually transmitted vertically from parent to child, enabling host and pathogen genetic factors to be “co-inherited,” thus enabling commensalism [88]. Through co-evolution, chronic pathogens with vertical or familial transmission become less virulent over time [89,90,91,92,93], and H. pylori infections are well tolerated by humans, causing low-grade inflammation and possibly protecting against allergic and related disorders [25,26,27,28,29, 94, 95]. The H. pylori gene babA2 demonstrates adaptive microevolution with humans [96]. BabA binds blood group antigens and triggers pro-inflammatory cytokine release. Amerindians, who almost all carry the O blood group, harbor strains with the babA variant that has up to a 1500-fold greater O blood group binding affinity. A human-H. pylori pair exhibiting disrupted co-evolution may be partly responsible for triggering more severe gastric disease [97•].

An ideal locale to explore putative H. pylori-human co-evolution is Latin America, particularly in the Caribbean basin and surrounding nations. As a result of colonization, Latin America exhibits particularly diverse human and H. pylori ancestries [98,99,100]. A key insight into exploring this concept is based on the observation that Amerindian people living at high altitude suffer disproportionately from gastric cancer, relative to other local populations [30••, 98]. In the mountains, gastric cancer incidence can be up to 25 times that of individuals in low altitude communities [100]. A study of the Colombian coast and mountains demonstrated almost universal H. pylori prevalence (~ 90%), with highly distinct gastric cancer rates [30••]. The low-risk human, coastal population was of mostly African (58%), European, and Amerindian ancestry, while the high-risk (10 × greater), Andean population was mainly of Amerindian ancestry (67%), with some European ancestry. H. pylori strains in Colombia also show evidence of substantial admixture, with the historical Amerindian strains being almost universally displaced by European strains [30••, 98]. The high degree of admixture of both humans and H. pylori in some areas of Latin America adds to the complexity of the co-evolution model and may further exacerbate gastric disease, and has provided an excellent natural experiment.

When comparing the two Colombian sites, gastric disease severity was less severe in H. pylori-human matched samples with similar ancestries [30••]. The proportion of African H. pylori ancestry in patients with primarily Amerindian host ancestry correlated with more severe disease. Patients with a primarily African ancestry, infected with African H. pylori, had less severe disease. In individuals with high levels of Amerindian ancestry, high percentage H. pylori African ancestry was associated with intestinal metaplasia, while a low percentage was associated with gastritis.

The difference in disease prevalence between the mountain and coast populations was accounted for by the interaction effect between African H. pylori and Amerindian host ancestry and when modeled with this interaction, the altitude effect disappeared; the interaction effect was approximately five times larger than the effect of cagA [30••]. Geographic location, cagA and H. pylori or human ancestry, considered separately, were poor predictors of risk. The co-evolution model proposes that when H. pylori strains co-evolved with humans in Africa for millennia, the result was less severe gastric disease and reduced cancer risk [31•, 33,34,35, 101]. Since Amerindian people evolved separately from African H. pylori, colonization with these “novel” strains in Latin America caused a clash of ancestries, favoring gastric disease and mortality in Amerindian hosts. This suggests that considering ancestry from human samples and their H. pylori isolates, together, will identify individuals at greatest risk. It also implies that colonization at two levels has an impact on human disease.

Genome-by-genome interactions that take into account both human host and pathogen genomes should be considered in genetic models of complex, infectious disease, where there is evidence of long-term co-evolution. Examples beyond H. pylori include tuberculosis and human papillomavirus [97•]. Hence a bacterium’s pathogenicity may depend on human and pathogen genetic factors that are not independent of each other. Such interactions may play an important role in determining the etiology of infectious disease. One genetic variant in either host or pathogen may not be harmful except in the context of the other organism. An individual host may inherit alleles that evolved in a different environment than that of their infecting H. pylori; thus, gastric disease may be influenced by a large number of significant human and H. pylori genetic interactions, and the effect size of any single two-locus interaction may be small, but this is yet to be determined. Similarity in ancestry between host and H. pylori may be an excellent proxy for the paired genetic variation that affects gastric disease risk, while specific loci that confer risk are still being mapped. This creates a new “genetic architecture” to explore, which consists of polygenic susceptibility to infectious disease, influenced by host-pathogen ancestry. Exploring gene-by-gene interactions will be a crucial next step, enabling the discovery of specific genetic loci that are risk determinants, but only in the context of host and pathogen interactions.

H. pylori studies provide some of the best evidence in favor of human-pathogen co-evolution, based on its vertical transmission, long-term colonization of individual hosts, and its approximately 50,000-year association with humans [9,10,••–11, 87••, 95, 101]. H. pylori-mediated gastric disease disproportionately occurs in men, while H. pylori is usually transmitted by the mother, which may also indicate the influence of co-evolution, where female fitness may have been more strongly constrained against H. pylori virulence [97•, 102, 103]. Although onset of H. pylori disease typically occurs during reproductive years, disease usually advances to clinical stages only in late adulthood [97•]. H. pylori also recombines often among multiple strains, potentially rapidly acquiring segments that have not co-evolved with local hosts through horizontal gene transfer [104,105,106,107]. This could disrupt co-evolution and select for increased virulence, especially in regions where humans and H. pylori are highly admixed, such as South America. Therefore, H. pylori and human ancestral groups must be considered in the context of each other, in order to predict gastric disease well. As shown with the Latin America altitude effect described above, geographic location is not a significant factor when host-pathogen genetic ancestry is included in the model [30••].

As a highly recombinogenic bacterium, H. pylori facilitates the study of co-evolution. Introduction of new microbial competitors and human milieus pressured H. pylori to evolve, which may harm the host and in turn influence disease and evolution within the host population. This perpetual host-pathogen evolutionary response, involving different genes and pathways, is likely regionally unique. Humans and pathogens migrating to new environments, or admixing, disrupt co-evolutionary equilibrium and any complementarity developed between host and pathogen. Co-evolutionary interactions tend to promote geographic and spatial variation in disease outcomes, influenced by the local genetic and environmental dynamics that progress towards a unique co-evolutionary equilibrium. Though the role of co-evolution is difficult to definitively prove, all major criteria are met in the case of H. pylori and gastric cancer, where patterns of parallel host-pathogen genetic variation have correlated with functional, molecular changes.

Conclusion

A survey of the current literature on the complex etiology and genetic epidemiology of H. pylori and gastric disease points strongly to a model where both human and H. pylori genetics, including ancestry, influence gastric disease susceptibility, pathogenicity, and progression, depending on each other. Additional studies, in a range of diverse geographic locations, are needed to establish this hypothesis.

If true, understanding local co-evolutionary history and basic tests of individual human and H. pylori ancestry may serve as a biomarker to facilitate targeted eradication of H. pylori only in individuals at greatest risk, i.e., those with non-co-evolved H. pylori. On a macro level, discordance of ancestry between host and pathogen may serve as a proxy for predicting gastric disease. Because local intrinsic and extrinsic factors influence host-pathogen co-evolution, genetic loci involved in predicting disease may be specific to a geographic region. Studies in different regions will be needed to characterize local co-evolutionary trends. For example, an ancestry-specific co-evolutionary model that applies in Latin America, may not apply in other regions with high gastric cancer incidence, such as eastern Asia. Further elucidation of the co-evolutionary model of disease may contribute a significant paradigm, useful in the study of gastric disease and beyond.