Keywords

1 Introduction

Helicobacter pylori is a Gram-negative, microaerophilic, spiral bacterium that is capable of colonizing the inhospitable environment of the human stomach. Though infection rates vary greatly by region, the bacterium infects half of the human population across the globe, imparting a huge medical burden (Dunn et al. 1997). Despite its profound importance, the recognition and acceptance of this bacterium as medically important required many years of research and observation. Indeed, as far back as 1892, Giulio Bizzozero observed and noted the presence of spirilli in the stomachs of healthy dogs (as reviewed in (Figura and Oderda 1996; Marshall 2001). Throughout the twentieth century, researchers reported observations of spiral organisms in the stomachs of dogs, cats, macaques (Doenges 1938, 1939; Kasai and Kobayashi 1919), and humans presenting with gastric disturbances (Freedberg and Barron 1940). These reports led to the production of electron micrographs of Helicobacter inside a human parietal cell (Ito 1967) and in the stomachs of dogs (Lockard and Boler 1970). Finally, the groundbreaking advances of Marshall and Warren enabled the isolation and culture of this organism from human biopsies (Marshall et al. 1984; Marshall and Warren 1984; Warren and Marshall 1983), and ultimately, their research defined a link between H. pylori infection and development of gastritis and peptic ulcer disease (Marshall et al. 1985a, b).

The unique niche of H. pylori within the acidic environment of the stomach is a testament to its co-evolution with and adaptation to its human host. The publication of the first H. pylori chromosome sequence in 1997 uncovered a small 1.6 Mb genome characterized by a paucity of regulatory factors, limited metabolic abilities, and minimal biosynthetic capabilities compared to similar organisms (Tomb et al. 1997). This compact, restricted genome resulted from countless years of adaptations to its host and the accompanying acidic mucosal environment. Comparative genomic analyses have revealed that H. pylori strains display a great deal of genome-wide diversity (de Reuse and Bereswill 2007; Salama et al. 2000). Indeed, as detailed by Yamaoka and co-workers in Chapter 1, phylogenetic analyses have revealed several genetically distinct groups of H. pylori that can be linked to human migration (de Reuse and Bereswill 2007; Yamaoka et al. 2008). Furthermore, these various groups of H. pylori exhibit different polymorphisms within multiple genes that have been further linked to severity of the gastric diseases within the host (Chen et al. 2016; Yamaoka et al. 2008). Additionally, the plasticity of the H. pylori genome is further highlighted through the micro-evolution and genetic variability of strains identified within a single human host (de Reuse and Bereswill 2007), as well as through the passage and isolation of strains using established animal models (Asim et al. 2015; Draper et al. 2017; Franco et al. 2005).

As genomic analyses of bacterial species have become more accessible, studies that investigate genomic variations of H. pylori strains have abounded, and these studies have identified associations between particular genetic polymorphisms, geographic location of the host, and ultimate disease outcomes. Indeed, H. pylori can cause a range of gastric maladies within the human host and occurrence rates of these various diseases vary widely throughout the world (Kusters et al. 2006). Though a large portion of infected individuals remain asymptomatic, the primary presentation of H. pylori infection is through chronic gastritis and peptic ulcer disease (Atherton 2006). More severe manifestations of disease include gastric adenocarcinoma and MALT lymphoma (Atherton 2006; Kusters et al. 2006). More detailed information on H. pylori-associated diseases is presented in Chapters 7 and 8 by Vieth and colleagues and Hatakeyama and co-authors, respectively. Interestingly, multiple studies have recently, yet controversially, indicated a protective effect conferred by H. pylori infection against several diseases (Arnold et al. 2012; Eross et al. 2018; Kyburz and Muller 2017; Robinson 2015; White et al. 2015). These studies have led to a larger debate about whether a bacterium that colonizes 50% of the world’s population should be considered a part of the normal flora and whether infection should be left untreated in the young and in asymptomatic patients (Cover and Blaser 2009; Kyburz and Muller 2017; Robinson 2015). However, on the other side of this debate lies the fact that gastric cancer is the third most common form of cancer-related death in the world (International Agency for Research on Cancer 2012). Thus, the risk of disease may outweigh the potential benefits of this organism. Overall, the seemingly contradictory findings underscore the need to investigate and understand the differences between H. pylori strains that may contribute to various disease outcomes; such an understanding may reveal those patients that are most at risk and help to define those that should receive treatment (Fig. 1).

Fig. 1
figure 1figure 1

Factors influencing H. pylori’s disease potential in the gastric mucosa

Geography plays an important role in the ultimate outcome of H. pylori infection; this is due to geographical influences on H. pylori polymorphisms, host polymorphisms, and environmental factors that contribute to pathogenicity. Possible gene polymorphisms of H. pylori are represented by two different colored bacteria in the diagram and host polymorphisms are represented by two different colored host cells. Environmental factors to which a host may be exposed are identified by two separate colored shapes. An individual has the potential to be infected with different H. pylori variants and to be exposed to various environmental factors. As such, the complex interplay between the polymorphisms within H. pylori, the polymorphisms of the host, and the environmental factors will determine the ultimate disease potential of H. pylori within the host. As illustrated, H. pylori can remain a member of the normal gastric flora and cause no symptoms in the infected individual, or H. pylori can become pathogenic and induce clinical symptoms within the host

The persistent colonization of the harsh environment of the stomach is accomplished through a delicate interplay between the host and the bacterium. Indeed, polymorphic genetic factors in both the bacterium and human host influence the course of infection and contribute to the ultimate disease pathology experienced by the host. An increasing number of epidemiological studies, in conjunction with molecular studies, have revealed roles for specific polymorphisms in both the host and bacterium that directly contribute to the various gastric pathologies caused by H. pylori. While Clyne and colleagues discuss polymorphic factors within the host that influence disease outcome in Chapter 9, the current chapter will focus on specific genetic polymorphisms within H. pylori that impact disease pathology in some populations.

2 Virulence Factors

To survive the gastric milieu and to persistently colonize the human stomach, H. pylori possesses multiple virulence factors that contribute to its infectious strategy. First, following ingestion, H. pylori must transit through the acidic gastric lumen, during which it employs multiple sheathed flagella for motility and the urease enzyme to combat acid stress (Kao et al. 2016) and follows a urea gradient to locate the epithelium (Huang et al. 2015). Next, it must colonize the gastric mucosa and adhere to the gastric epithelial cells, which it expertly accomplishes with an arsenal of adhesins and outer membrane proteins (OMPs) (Kao et al. 2016; Wroblewski et al. 2010). Lastly, it directly affects the gastric mucosal cells through the injection of CagA into the host cells and the secretion of VacA into the mucosal layer (Kao et al. 2016; Tegtmeyer et al. 2009; Wroblewski et al. 2010). These virulence factors can cause extensive damage to gastric cells, leading to the inflammation that characterizes the clinical diseases many patients experience.

Initial epidemiological studies of H. pylori have unveiled a correlation between disease severity and the presence of specific virulence factors. Excitingly, the expansion of genomic sequencing tools has led to the identification of polymorphisms within several virulence factors, and these polymorphisms also appear to be linked to the varying disease outcomes within infected individuals. However, large differences exist in the prevalence of various polymorphisms among strains from different geographic regions across the globe. Consequently, associations between the many polymorphisms exhibited by H. pylori and the various clinical disease manifestations vary tremendously depending on the geographic location of the research study and associated strains. For example, East Asian strains tend to exhibit less genotypic heterogeneity while harboring more virulent genotypes (Kim et al. 2015); this region of the world also has the highest global incidence of gastric cancer (Yamaoka et al. 2008). Conversely, Western strains exhibit a greater diversity in genotype distributions (Kim et al. 2015), contributing to a greater variation in the manifestation of clinical disease (Yamaoka et al. 2008). Thus, the geographic origin of strains used in studies must be taken into account when analyzing data from the numerous available epidemiological studies.

As described above, disease development within the human host results from a complex interaction between host polymorphisms, H. pylori polymorphisms, and environmental factors. The prevalence of these polymorphisms and environmental factors varies throughout the world, which adds another geographic layer of complexity to the study of H. pylori infection. Our approach in this chapter is to highlight findings for certain genes that have been linked to disease outcome. In particular, we focus on the polymorphisms identified in the OMPs, VacA and CagA, since these factors substantially contribute to the course of disease progression. Though certainly not exhaustive, we compare and contrast a selection of studies for each factor, since results from different studies are often not in agreement. The lack of unity among published studies underscores the complexity of disease development during the course of H. pylori infection.

2.1 Adhesins and OMPs

Within the compact genome of H. pylori, an astonishing 4% of the predicted genes are believed to encode OMPs (Alm et al. 2000; Chmiela et al. 2017; Oleastro and Menard 2013). This large number of OMPs suggests that these factors are a critical component of the H. pylori lifecycle. As such, efforts to define the expression and function of the many OMPs encoded by the H. pylori genome are a major focus of many ongoing research studies. Interestingly, unlike many other Gram-negative microorganisms, H. pylori does not possess a predominant OMP, instead it employs its repertoire of diverse OMPs as needed (Alm et al. 2000; Oleastro and Menard 2013). Broadly speaking, the H. pylori OMPs are classified into several families of proteins, which include the Helicobacter outer membrane protein (Hop) family, hop-related (Hor) protein, Helicobacter OMP family (Hof), and the Helicobacter outer membrane (Hom) family (Matsuo et al. 2017). Sequencing of the first H. pylori genome revealed approximately 21 Hop proteins, some of which are involved in adherence to the gastric epithelium. In particular, subsequent studies have identified Bab proteins (HopS/HopT), Sab proteins (HopO/HopP), the OipA protein (HopH), and the HopQ protein, as important components for interaction with host cells during the course of infection (Matsuo et al. 2017). While BabA is the blood-group-antigen-binding adhesin that binds to Lewis B (Leb) blood group antigens (Ilver et al. 1998), SabA is a sialic acid-binding adhesin that interacts with sialyl-Lewisa, sialyl-Lewisx, and LewisX receptors on host cells (Mahdavi et al. 2002). OipA is the outer inflammatory protein that is capable of inducing inflammatory cytokine production (Yamaoka et al. 2002). Finally, HopQ interacts with various proteins of the carcinoembryonic antigen-related cell adhesion molecule (CEACAM) family (Matsuo et al. 2017; Moonens et al. 2018). These various proteins and their polymorphisms are described in detail below.

2.1.1 Bab Proteins

The bab genes consist of three paralogues that contain extremely similar sequences: babA, babB, and babC (Alm et al. 2000; Ansari and Yamaoka 2017; Colbeck et al. 2006; Kim et al. 2015) (Fig. 2). Although the three bab genes contain highly similar sequences, BabA is the only protein encoded by these genes that has been shown to bind to Leb antigens (Oleastro and Menard 2013); BabA was the first identified H. pylori adhesin (Ilver et al. 1998). In addition, two primary allelic variants of babA have been identified in clinical isolates: babA1 and babA2 (Fig. 2). The babA2 allele contains the entire coding sequence of the babA gene, which results in the production of the BabA protein. In contrast, the babA1 variant contains a deletion of 10 base pairs, which includes the ATG translation initiation codon; this deletion results in a loss of translation of this protein in H. pylori strains containing this gene variant (Ilver et al. 1998). Additionally, the 5′ end of the babA gene contains a string of cytosine-thymine (CT) repeats, which leads to deletion or insertion of nucleotides during DNA replication (Salaun et al. 2004; Solnick et al. 2004). This variability in CT repeats causes phase variation, whereby subsequent cells carry babA genes that encode for truncated non-functional proteins. The bab status of H. pylori is further complicated by the existence of 3 distinct genomic loci in which the bab genes can be found (Colbeck et al. 2006; Oleastro and Menard 2013; Pride and Blaser 2002) (Fig. 2).

Fig. 2
figure 2figure 2

OMP polymorphisms in bab and hom genes

babpolymorphisms occur in both the gene sequences and the genomic loci in which the bab genes are located. The three bab genes, babA, babB, and babC, are depicted as three different colors. The babA gene has two alleles. The babA1 allele contains a 10 bp deletion and lacks the ATG translation initiation codon; thus, cells with this allele produce no BabA protein. The babA2 allele contains the ATG codon necessary for translation, which permits cells to produce the BabA protein. As noted, strains with the babA2 allele have been linked to disease in Western countries. Additionally, the bab genes can recombine to form chimeric bab sequences; the babA/B gene is depicted with the combined colors of the babA and babB genes. The presence of the babA/B gene has been linked to disease in Iranian strains. Polymorphisms in the bab genes also appear in the genomic loci occupied by the genes. As depicted, there are three loci that can be occupied by the bab genes. Specific examples of bab genes occurring at a one, two, or three loci are shown; numerous unshown genomic combinations of the bab genes are possible. In addition, the diagram displays the babA/B gene in locus A and locus B, which has been linked to disease in Taiwanese strains. Furthermore, the genotype at locus B has been linked to disease in Korean strains; identified combinations shown to occur with the empty locus B are indicated

hompolymorphisms also occur at the level of both the gene and the genomic loci. Two of the hom genes, homA and homB, are displayed as two separate colors. homA has been linked to non-ulcer disease, and the presence of homB has been correlated with disease in Western and Iranian strains. The two genomic loci that can be occupied by the homA and homB genes are depicted with the various combinations of the genes that can be found at these loci. A double copy of the homB gene has been linked to peptic ulcer disease

Varying results have been obtained from the multitude of epidemiological studies that have been conducted to investigate the relationship between babA gene status and disease outcome. As noted earlier, genetic variations within H. pylori are frequently observed in conjunction with the geographic origin of the clinical isolate under investigation. As such, studies that investigated the influence of the babA genotype on disease status report differing associations that depend on the origin of the strain. In general, studies that employ isolates from Western countries tend to identify a significant association between the babA2 genotype and disease outcome, whereas studies that utilize strains from East Asian countries frequently fail to detect a significant correlation between the babA2 genotype and disease. Specifically, studies conducted using strains that originate from Western countries identify a correlation between the babA2 genotype and development of gastritis (Homan et al. 2014), duodenal ulcers (Gerhard et al. 1999; Olfat et al. 2005; Torres et al. 2009), intestinal metaplasia (Zambon et al. 2003), and adenocarcinoma (Gerhard et al. 1999) (Fig. 2). Furthermore, a meta-analysis of 25 research articles also reveals a correlation between the babA2 genotype and peptic ulcer disease and duodenal ulcers within Western countries (Chen et al. 2013). Interestingly, a recent genome-wide association study, which compares the entire genomes of 173 European H. pylori clinical isolates, investigates the different frequencies in the occurrence of genes and single nucleotide polymorphisms (SNPs) that occur in strains isolated from gastritis patients versus those obtained from gastric cancer patients. Of note, one of the observed differences is a significant association between the presence of the babA gene and gastric cancer in these European clinical isolates (Berthenet et al. 2018). Although multiple studies describe a link between babA and disease, some studies do not confirm a broad link across all Western countries (Oleastro et al. 2009b) and actually report variations in babA association that depend on the Western country from which the isolates originate (Olfat et al. 2005).

Unlike the association between babA and disease outcome frequently observed in strains from Western countries, studies of isolates from East Asian countries have largely been unable to find a correlation between the babA2 genotype and disease status (Chomvarin et al. 2008; Mizushima et al. 2001; Oleastro et al. 2009b). However, one study did show an association of babA2 strains with pre-neoplastic lesions in Chinese patients (Yu et al. 2002). It is worth noting that traditional PCR genotyping of babA in strains may not provide the best information on babA status or the correlation of bab genes and disease (Fujimoto et al. 2007). Indeed, a study in Taiwan highlights the potential role played by the particular genomic loci occupied by the bab genes; the babA/B genotype, which occurs through the recombination of the babA and babB genes, associates with pre-cancerous lesions and gastric cancer when present in both locus A and B (Sheu et al. 2012) (Fig. 2). Similarly, the study of Korean isolates reveals that the bab genotype at locus B associates with disease type (Kim et al. 2015). In particular, strains lacking a bab gene in locus B are more likely to originate from gastric ulcer or gastric cancer patients (Kim et al. 2015).

Additional data that indicate a link between bab gene status and disease outcome arises from the study of Asian strains, specifically from Iraq and Iran. In fact, the study of clinical isolates from Iraq shows an association between the babA2 genotype and peptic ulcer disease, which is similar to the correlations observed in Western countries (Abdullah et al. 2012). Moreover, as chronic infection leads to the formation of chimeric bab genotypes (Matteo et al. 2011), a study using Iranian strains reveals an increased risk for duodenal ulcer development with a babA/B genotype as well as a low Leb binding (Leb) phenotype (Saberi et al. 2016) (Fig. 2). This more complicated association between bab genotype and disease status is similar to studies in East Asian countries.

To further confound study of the bab genes, phase variation of babA and homologous recombination among the bab genes and their genomic loci allow strains to lose BabA protein production throughout the course of infection in animal models (Hansen et al. 2017; Liu et al. 2015; Ohno et al. 2011; Solnick et al. 2004; Styer et al. 2010). Indeed, a study involving isolates from Western and East Asian countries correlates increased disease severity with low production of BabA (Fujimoto et al. 2007). The varying characteristics of the bab genes are also hypothesized to contribute to H. pylori’s ability to adapt to the gastric niche through the modulation of its adherence profile (Liu et al. 2015; Solnick et al. 2004; Styer et al. 2010). Thus, the complexity of bab polymorphisms perhaps hinders the identification of a clinical link between babA genotype and disease outcome in more populations.

As described above, the multitude of epidemiological studies that attempt to define a link between bab and disease outcome produce varied results predominantly influenced by geography. Collectively, studies of strains from Western countries tend to identify a link between the babA2 genotype and disease status, whereas research conducted on strains from East Asian countries reveal a more complicated relationship between bab and disease outcome.

2.1.2 Hom Proteins

H. pylori carries a small family of four OMPs called the Hom family (Alm et al. 2000). Most published studies focus specifically on homA and homB, both of which can be found at two distinct but overlapping genomic loci (Oleastro et al. 2008; Oleastro et al. 2009a) (Fig. 2). The two genes are 90% identical (Alm et al. 2000), and each gene has allelic variants (Oleastro et al. 2009a). Functional and epidemiological studies focus on the presence and copy number of homA and homB. In vitro, HomB has been shown to stimulate IL-8 secretion and to promote H. pylori adherence (Oleastro et al. 2008). Furthermore, the presence of homB is associated with peptic ulcer disease (Oleastro et al. 2008, 2010) in strains from Western countries but not strains from East Asian countries (Oleastro et al. 2009b) (Fig. 2). Comparatively, homA correlates with non-ulcer disease in children and adults (Oleastro et al. 2008, 2009b). In terms of more severe disease, homB is associated with gastric cancer (Talebi Bezmin Abadi et al. 2011), though the association is geographically-dependent (Hussein 2011; Kang et al. 2012). This geographic disparity is hypothesized to be a result of the variation in homA and homB gene status as well as the genomic loci at which the genes are carried in various strains (Kang et al. 2012). Indeed, homA and homB gene profiles appear to vary greatly based on the geographic origin of the strain (Servetas et al. 2018). Copy number also appears to be relevant as there is a correlation between strains carrying two copies of homB and the occurrence of peptic ulcer disease (Oleastro et al. 2009b) (Fig. 2). Thus, the complicated association between the hom genes and disease status depends not only on homA versus homB genotype but also on geographic location, genomic locus, and gene copy number.

2.1.3 HopQ

HopQ is yet another H. pylori adhesin. This protein binds to carcinoembryonic antigen-related cell adhesion molecule (CEACAM) receptors on gastric epithelial cells and aids in the translocation of CagA (Belogolova et al. 2013; Javaheri et al. 2016; Koniger et al. 2016; Moonens et al. 2018; Tegtmeyer et al. 2019). Evaluation of multiple H. pylori strains reveals that two predominant alleles exist for hopQ, named type I and type II (Cao and Cover 2002). Both of these allelic variants are capable of binding CEACAMs (Moonens et al. 2018), and strains can carry both variants within their genomes (Cao and Cover 2002; Chiarini et al. 2009). Strains from East Asian patients primarily carry hopQ type I, while strains from Western countries can carry either type (Cao et al. 2005). Several studies indicate an association between the presence of the hopQ type I allele and peptic ulcer disease (Cao and Cover 2002; Leylabadlo et al. 2016; Oleastro et al. 2010). Additionally, hopQ I, carried as either a single allele or in combination with a type II allele (hopQ I/II), correlates with gastritis and peptic ulcers (Chiarini et al. 2009). While the hopQ type I allele is also linked to gastric cancer (Talebi Bezmin Abadi and Mohabbati Mobarez 2014; Yakoob et al. 2016), an additional study correlates both hopQ I and II with gastric cancer in Iran (Leylabadlo et al. 2016). However, another study fails to find an association between hopQ allele and disease outcome (Ohno et al. 2009). Taken together, the data primarily indicate an association between the hopQ type I allele and gastric disease. Its role in translocation of CagA, however, is yet unclear and requires further study.

2.1.4 Additional OMPs

As previously discussed regarding the babA gene, other OMPs undergo phase variation through the presence of dinucleotide repeats within the 5′ region of their coding sequence (Salaun et al. 2004). This phase variation results in the classification of genes as being either “on” or “off” depending on the insertion of a premature stop codon. This polymorphic characteristic has been utilized in studies to identify associations between the “on” or “off” status of particular genes and the disease status of the patient; such studies have included the OMPs encoded by sabA, sabB, and oipA. As with other studies attempting to link specific polymorphisms to disease state, conflicting and controversial results have been reported, some of which are described below.

While its homolog, SabB, does not bind these glycoproteins, SabA is an adhesin that binds sialyl-Lewisa and sialyl-Lewisx antigens (de Jonge et al. 2004; Mahdavi et al. 2002). Like the bab genes, the sab genes can undergo recombination with one another. However, studies suggest that selective pressure to maintain sabA exists in the host (Talarico et al. 2012). Investigation of clinical isolates shows that the “off” status of sabB is associated with duodenal ulcers (de Jonge et al. 2004). That same study, along with other studies, fails to identify any link between sabA and disease status (Chiarini et al. 2009; de Jonge et al. 2004; Oleastro et al. 2010; Yadegar et al. 2014; Yanai et al. 2007). In contrast, sabA status is correlated with intestinal metaplasia and gastric cancer, but negatively associated with duodenal ulcers (Yamaoka et al. 2006). It is worth noting that the identification of the “on” status for sabA, which is often determined by sequencing of the gene, does not always reflect the actual production of SabA protein by individual isolates (Sheu et al. 2006). This finding likely affects the interpretation of all sequencing-based studies that have looked at the “on/off” status of sabA and sabB. Thus, the data linking sabA and sabB to disease outcome remain ambiguous.

Outer inflammatory protein A (OipA/HopH) is encoded by the phase-variable oipA gene (Yamaoka et al. 2000) and is involved in adhesion (Dossumbekova et al. 2006) and stimulation of IL-8 secretion (Yamaoka et al. 2002). While oipA “on” status is linked to peptic ulcers (Markovska et al. 2011; Oleastro et al. 2010), duodenal ulcers (Yamaoka et al. 2002, 2006) and gastric cancer (Yamaoka et al. 2006), other studies do not identify an association between oipA status and gastric pathologies (Torres et al. 2014) or disease outcome (Chiarini et al. 2009; de Jonge et al. 2004; Farzi et al. 2018; Zambon et al. 2003). Interestingly, the oipA “on” status associates with peptic ulcer disease in children but not in adults (Oleastro et al. 2008). Furthermore, a meta-analysis study identifies an association between oipA status and increased risk for peptic ulcer disease and gastric cancer (Liu et al. 2013). As described for other H. pylori factors, a selection of published studies identifies a link between oipA and disease status (Liu et al. 2013; Markovska et al. 2011; Oleastro et al. 2008, 2010; Yamaoka et al. 2002, 2006), while several others do not (Chiarini et al. 2009; de Jonge et al. 2004; Farzi et al. 2018; Zambon et al. 2003). However, the actual host cell receptor for OipA is still unclear and needs to be identified.

2.2 iceA

Although it is neither an OMP, an effector nor a toxin, the induced by contact with epithelium (iceA) locus is identified as a gene that correlates with gastric disease state (Peek Jr. et al. 1998). The iceA gene has two different alleles: iceA1 and iceA2 (Figueiredo et al. 2000; Peek Jr. et al. 1998). The iceA1 allele exhibits sequence homology to an endonuclease found in Neisseria (Figueiredo et al. 2000). The initial identification of iceA reveals an association between iceA1 and peptic ulcers as well as increased IL-8 in the mucosa (Peek Jr. et al. 1998; van Doorn et al. 1998). Subsequently, in a study of South Africa strains, iceA1 associates with gastric cancer, while iceA2 variants correlate with peptic ulcer disease (iceA2D) and gastritis (iceA2C) (Kidd et al. 2001). In Venezuelan strains, the iceA2 genotype is a marker for atrophic gastritis, especially in combination with particular host genetic polymorphisms (Chiurillo et al. 2010); iceA2 is also correlated with chronic gastritis (Caner et al. 2007). In contrast, a larger analysis of several published studies reveals an association between iceA1 and peptic ulcer disease in China (Huang et al. 2016), while a study of Iranian isolates indicates a correlation between gastric cancer and iceA1 (Dadashzadeh et al. 2017). The iceA1 allele is also associated with duodenal ulcers (Caner et al. 2007). However, several studies fail to identify a definitive link between iceA genotype and clinical disease outcome (Abdullah et al. 2012; Chomvarin et al. 2008; Ito et al. 2000; Miehlke et al. 2001; Ribeiro et al. 2003; Yamaoka et al. 2002). Similar to the epidemiological data published for OMPs, conflicting results have accumulated regarding the role of iceA genotype in gastric disease development.

2.3 Secreted and Injected Proteins

While adhesins and OMPs allow the bacterium to interact with gastric cells, H. pylori also produces two proteins, VacA and CagA, which enter host cells and greatly perturb cellular processes. The vacuolating cytotoxin (VacA) is secreted by H. pylori and first generated interest for its ability to induce vacuole formation in epithelial cells (Cover and Blaser 1992; Leunk et al. 1988). The product of the cytotoxin-associated gene A (CagA) is injected into gastric cells and has been shown to affect many cellular processes through both phosphorylation-dependent and independent mechanisms (Bridge and Merrell 2013; Jones et al. 2010; Nishikawa and Hatakeyama 2017). The intimate interactions of these proteins with host cells contribute to the gastric disturbances caused by H. pylori infection. As such, these two virulence factors have been the subject of a multitude of epidemiological studies to identify allelic variants that affect disease progression.

2.3.1 vacA

VacA is a pore-forming toxin that is unique to H. pylori; it does not show significant homology to the sequence or structure of other known bacterial toxins (Cover and Blaser 1992; Leunk et al. 1988) (see review (Foegeding et al. 2016)). Following secretion by H. pylori, VacA can be internalized by host cells and then cause an accumulation of vesicles and the formation of anion channels. These toxic effects induce swelling, vacuolation, and death of the host cells (Foegeding et al. 2016; Thi Huyen Trang et al. 2016). At a molecular level, the VacA toxin is first produced as a 140 kDa protein that undergoes proteolytic cleavage to form the mature 88 kDa protein, comprised of 33 kDa (p33) and 55 kDa (p55) subunits from the N-terminal and C-terminal domains, respectively (Cover et al. 1994). Experimentally, only the p33 subunit in conjunction with a small portion of the N-terminus of the p55 subunit is required for vacuolation (de Bernard et al. 1998; Ye et al. 1999). More in-depth information on the function of VacA is described by Sgouras and colleagues in Chapter 3 and by Hatakeyama and co-workers in Chap. 8.

Current analysis shows that the vacA allelic variations are observed in five distinct regions of the vacA gene, resulting in strains that produce VacA proteins with varying toxicities (Thi Huyen Trang et al. 2016) (Fig. 3). The first region of variability is located within the signal peptide and the N-terminal region of the p33 subunit; this has been designated the s-region and the two primary variants are referred to as s1 and s2 (Atherton et al. 1995). The s1 variant is able to create vacuoles in host cells, whereas the s2 allele encodes a VacA protein that is unable to induce vacuolation (Atherton et al. 1995; McClain et al. 2001). In addition to functional differences in the produced VacA protein, there appears to be a transcriptional difference in the various vacA alleles; transcripts of the s2 forms are found at lower levels than the s1 forms (Forsyth et al. 1998). The second region of variability is also observed in the p33 subunit, but lies closer to the C-terminus of the subunit; this region is called the i-region (Rhead et al. 2007). The i-region has 3 predominant alleles, i1, i2, and i3. Of these, the i1 allele results in the most active form of the toxin (Rhead et al. 2007; Thi Huyen Trang et al. 2016). The third region of vacA diversity is found within the p55 subunit and has two primary variants, m1 and m2. These variants differ in channel-forming and cell-binding abilities (Atherton et al. 1995; Pagliaccia et al. 1998; Tombola et al. 2001; Wang et al. 2001). The fourth region of variability is termed the d-region; this variation lies at the junction of the p33 and p55 subunits, and the resulting d1 and d2 variants differ based on the presence or absence of nucleotides (Ogiwara et al. 2009). Finally, a recently identified fifth region of polymorphism has been termed the c-region and is defined by the inclusion (c2) or exclusion (c1) of a 15 bp nucleotide sequence (Bakhti et al. 2016). Clinical isolates have been found with virtually all possible combinations of these variable regions, though some combinations are observed more frequently than others.

Fig. 3
figure 3figure 3

CagA and VacA polymorphisms

Polymorphisms within the five regions of the vacA gene are represented by different colors. These regions include the s- and i-regions found in the p33 subunit of the protein encoded by vacA and the d-, m-, and c-regions located within the p55 subunit. Additionally, the allelic variations for each region are depicted in varying shades of the color that represents each region. The s1, i1, d1, m1, and c1 alleles have all been linked to disease. Representative combinations of the alleles are depicted; numerous unshown genomic combinations are possible. The combined s1/i1/m1 vacA allele has been linked to gastric cancer

Polymorphisms within the EPIYA region of the CagA protein are represented by different colors. The Western EPIYA motifs include an A, B, and C motif with varying number of C motifs possible. Additionally, the Western EPIYA sequence includes CM motifs found upstream and downstream of the EPIYA-C motif sequence. Disease has been linked to an increasing number of C motifs. The East Asian EPIYA motif includes an A, B, and D motif, which has been linked to disease in Asian populations. The East Asian motif contains a single CM motif downstream of the EPIYA-D motif sequence

An enormous number of epidemiological studies have been conducted to define the importance of the various vacA alleles to disease development/progression. Of these, the vast majority focus primarily on the s- and m-regions of the vacA allele; more recent investigations include the i- and d-regions. While we will summarize some of the major findings in this area, the reader is encouraged to consider more thorough recent reviews of this topic-see references (McClain et al. 2017; Thi Huyen Trang et al. 2016). Briefly, initial studies of vacA reveal an association between the s1 allele and gastric inflammation (Atherton et al. 1997; Gunn et al. 1998; Zambon et al. 2003). Additional studies correlate the presence of the m1 allele with gastric disease severity. In fact, the s1/m1 genotype is the most virulent combination of these two alleles and toxicity is not induced with the m2 allele regardless if it is found in combination with the s1 or s2 polymorphisms (Atherton et al. 1995; Atherton et al. 1997; McClain et al. 2017; Yamaoka et al. 2008). Recent meta-analyses of published studies reveal links between the s1/m1 genotype of vacA and peptic ulcer disease (Matos et al. 2013) and gastric cancer (Matos et al. 2013; Pormohammad et al. 2018). Another meta-analysis of publications involving strains from Southeast Asia identifies an association between the vacA m1 allele and peptic ulcer disease (Sahara et al. 2012), while a study of Korean isolates confirms an association between the m-region, cagA allele, and disease state (Jang et al. 2010). While the role of the s and m alleles seems pretty consistent, the association of the i-region with particular gastric diseases is more varied. Despite this, some studies identify a link between the i-region genotype and disease status (Thi Huyen Trang et al. 2016). Indeed, the i1 genotype is identified as a risk factor for gastric cancer (Liu et al. 2016; Rhead et al. 2007) as well as is associated with peptic ulcers (Yordanov et al. 2012). A more in-depth study finds that specific amino acid positions within the i-region are linked to severe disease outcomes, particularly in isolates not carrying an EPIYA-ABD motif in the cagA allele (Jones et al. 2011); the cagA alleles are discussed in more detail in the subsequent section. In terms of the d-region, the d1 allele correlates with the s1, m1, and i1 alleles in Western strains, and strains containing all 4 of these variants display an increased risk of gastric cancer (Ogiwara et al. 2009). Additionally, the d1 polymorphism associates with peptic ulcers (Basiri et al. 2014) and gastric adenocarcinoma in Iranian strains (Abdi et al. 2017; Bakhti et al. 2016; Basiri et al. 2014; Hussein 2014). In areas of Iran with a higher incidence of gastric cancer, there is an increased prevalence of i1 and d1 polymorphisms (Latifi-Navid et al. 2013), and the i1 and d1 alleles increase the risk for the intestinal-type and diffuse-type of adenocarcinoma, respectively (Abdi et al. 2017). Finally, the presence of the c1 allele is linked to gastric cancer in Iranian strains (Bakhti et al. 2016, 2017). Taken en masse, gastric cancer risk is primarily associated with the s1, m1, and i1 genotypes (McClain et al. 2017; Foegeding et al. 2016).

2.3.2 cagA

The widely-studied CagA effector is a 120–145 kDa protein that is encoded on the cag pathogenicity island (cagPAI) (Censini et al. 1996); initial studies revealed a link between the presence of cagA and gastric diseases, including peptic ulcer disease and gastric cancer (Blaser et al. 1995; Covacci et al. 1993; Parsonnet et al. 1997; Weel et al. 1996). Indeed, CagA is classified as an oncoprotein due to its involvement in cancer development in a transgenic mouse model (Miura et al. 2009; Ohnishi et al. 2008). In contrast to the secreted toxin, VacA, CagA is directly translocated into host cells by a type IV secretion system (T4SS) encoded on the cagPAI (Odenbreit et al. 2000). For a more thorough review of CagA translocation and activity, we encourage the reader to see reference (Nishikawa and Hatakeyama 2017). Briefly, translocated CagA disrupts a variety of host cell signaling pathways through direct interaction with host proteins. More extensive discussion on the activity of CagA is provided by Sgouras and colleagues in Chapter 3 and by Hatakeyama and co-workers in Chapter 8.

The C-terminus of CagA contains two major polymorphic regions that are responsible for CagA’s downstream effects: the EPIYA (Glu-Pro-Ile-Tyr-Ala) motif and the CagA multimerization (CM) motif (Ren et al. 2006; Stein et al. 2002) (Fig. 3). Polymorphisms within both the EPIYA and CM motifs have been extensively reported and further investigated for their in vitro effects on cells as well their contribution to disease progression (see reviews (Backert et al. 2010; Bridge and Merrell 2013; Jones et al. 2010; Nishikawa and Hatakeyama 2017)). First, the EPIYA motifs vary among clinical isolates based on the number of EPIYA repeats within the C-terminus as well as the amino acid sequence flanking the EPIYA sequence (Covacci et al. 1993; Hatakeyama 2006). The sequence of the polymorphic flanking regions is used to denote the motifs as EPIYA-A, -B, -C, and -D; CagA is further classified into Western (containing EPIYA-A, -B, and -C, where -C can be repeated multiple times) and East Asian (containing EPIYA-A, -B, and -D) strains due to the prevalence of the particular EPIYA motifs in these geographic regions (Hatakeyama 2006; Higashi et al. 2002; Miehlke et al. 1996). Similar to the EPIYA types, the polymorphisms in the CM sequences differ between the Western and East Asian strains. Western CagA carries the CM motif upstream of the EPIYA-C motif and also at the distal end of the last EPIYA-C motif; this repeated CM motif allows for the recombination that results in the repeated EPIYA-C motifs that are sometimes observed in Western strains (Furuta et al. 2011; Ren et al. 2006). In contrast, the East Asian EPIYA-D motif contains only one CM motif that is found downstream of the EPIYA-D motif (Ren et al. 2006). Accordingly, only one copy of the EPIYA-D motif is typically observed in East Asian strains.

Epidemiological studies have repeatedly sought to identify a link between the polymorphisms within the cagA gene and gastric disease development; however, as with the other factors discussed in this chapter, conflicting results are reported across the vast amount of literature on this topic. For example, when investigating Western strains, some studies identify an association between increasing numbers of EPIYA-C motifs and the development of intestinal metaplasia or dysplasia (Sicinschi et al. 2010) or gastric cancer (Basso et al. 2008; Batista et al. 2011; Ferreira et al. 2012; Yamaoka et al. 1999). However, other studies that also utilize Western strains find no correlation between increasing numbers of EPIYA-C motifs and disease state (Acosta et al. 2010; Figura et al. 2012; Rizzato et al. 2012; Shokrzadeh et al. 2010). Similarly, studies in Korea fail to establish an association between EPIYA-C motifs and disease (Choi et al. 2007); however, most strains in that region carry the EPIYA-D motif, making it difficult to evaluate the effect of the EPIYA-C polymorphism on disease (Argent et al. 2008a). In fact, a study in Japan describes only strains containing EPIYA-D in their gastric cancer patient population (Azuma et al. 2004), and a larger molecular epidemiological study uncovers a link between gastric carcinoma and strains carrying the EPIYA-D polymorphism (Jones et al. 2009). Furthermore, a meta-analysis study that evaluates publications that assess H. pylori isolates from Southeast Asia finds that possessing the EPIYA-D motif increases the risk of developing peptic ulcer disease and gastric cancer (Sahara et al. 2012). Another recent meta-analysis evaluates 23 published studies and reports correlations between the EPIYA-D polymorphism and gastric cancer in Asia, multiple EPIYA-C motifs and peptic ulcer disease and duodenal ulcers in Asia, and multiple EPIYA-C motifs and gastric cancer in Europe and the United States (Li et al. 2017). Finally, when investigating the association of the CM motif to gastric disease, a study of New York City hospital strains uncovers a link between strains harboring one or two Western CM motifs and peptic ulcer disease and gastric cancer in comparison to strains with an Eastern CM motif (Ogorodnik and Raffaniello 2013). Thus, both EPIYA type and CM motif appear to contribute to disease progression.

3 Combinations of Genotypes

Even though many studies focus on individual bacterial genes, it is worth noting that there are certainly higher order associations and interactions among the various factors that ultimately influence disease. Indeed, when thinking about combinations of various alleles, many epidemiological studies reveal that certain alleles of different virulence factors are more frequently found in combination and that these combinations are associated with particular gastric diseases. As a few examples, strains possessing cagA, vacA s1, and babA2 associate with duodenal ulcer (Gerhard et al. 1999; Torres et al. 2009) and adenocarcinoma (Gerhard et al. 1999). Also, the cagA, vacA s1 m1, and babA2 genotype creates a higher risk for intestinal metaplasia (Zambon et al. 2003). Chronic active gastritis links to a genotype of cagA, vacA s1 m1, babA2, and hopQ I or hopQ I/II (Chiarini et al. 2009). In addition, a study of strains isolated from German patients experiencing chronic gastritis reveals an association between the oipA (hopH) “on” status and the vacA s1, vacA m1, babA2 and cagA genotypes (Dossumbekova et al. 2006). In a study of South African isolates, strains harboring the iceA1 and vacA s1 genotypes are associated with gastric cancer (Kidd et al. 2001). Furthermore, an association between the presence of the homB gene and cagA, babA2, vacA s1, hopQ type I, and oipA “on” status has been identified (Oleastro et al. 2008). An additional study involving Western strains indicates that homB is associated with the presence of cagA and vacA s1, which together correlate with peptic ulcer disease (Oleastro et al. 2009b). En masse, epidemiological research has primarily linked gastric cancer risk to strains possessing vacA s1, vacA i1, vacA m1, cagA, the T4SS of cag, and OMPs (McClain et al. 2017). In contrast, a recent review by Floch et al. summarizes the epidemiological studies conducted using strains specifically from MALT lymphoma patients; those authors indicate that less virulent polymorphisms are associated with this disease (Floch et al. 2017). We do note that with all of these epidemiological studies, caution must be exercised when making concrete conclusions concerning the associations between specific genotypes and disease. One reason for this is the finding that independent strains isolated from the stomach of the same patient may have different genotypes (Lopez-Vidal et al. 2008). Due to the cost and labor involved, few studies have independently assessed large numbers of strains from individual patients.

In general, genotypes that are frequently found in association with each other are believed to work synergistically in order to enhance H. pylori’s persistence within the gastric mucosal layer. However, VacA and CagA exhibit an antagonistic relationship, whereby each protein dampens the effect of the other (Argent et al. 2008b; McClain et al. 2017). Thus, strains with the most virulent genotypes of both proteins harbor the best ability to control the effect of the opposing protein.

4 Conclusions

Many studies have attempted to identify particular polymorphisms that are linked to the most virulent H. pylori clinical isolates. Unfortunately, a single predictive factor has remained elusive; however, combinations of particular polymorphic virulence factors have repeatedly been linked to clinical disease outcome. Moreover, these combinations of virulence genotypes show variation based on the geographic origin of the strains being studied. Additionally, the complex contribution of polymorphisms within both the host and bacterium along with environmental factors influences H. pylori’s ultimate fate within the host (Fig. 1). Indeed, reports of H. pylori as a beneficial gastric organism underscore the importance of identifying factors that determine whether the bacterium survives as a commensal organism or becomes a pathogen within its human host. Despite the plethora of data gathered to date, more research is needed to adequately define the risk of gastric disease development attributable to various H. pylori genotypes. This information will ultimately help to inform patients and physicians as they weigh the benefits and risks of applying antimicrobial therapy to H. pylori infections.