Introduction

Methods for the detection of bacterial and viral foodborne pathogens to assure the safety and cleanliness of human food have been successfully utilized for many decades. While traditional microbiological methods have been used and trusted for years, the emergence and acceptance of molecular methods to identify and characterize foodborne pathogens have been increasing exponentially in the past decade. Most notably, since the publication of the first edition of this chapter [1], whole genome sequencing (WGS) for identification and characterization of bacterial foodborne pathogens is beginning to be commonly used to supplant more traditional methods in public health surveillance. This update will not describe in detail the traditional methods of detection and characterization of foodborne pathogens described in the previous edition of this chapter; instead, this review will focus exclusively on the comparison of traditional methods to the current state-of-the-art molecular techniques, indicating where possible, which of these methods have become accepted as standard.

In 2011, the CDC listed the top four foodborne pathogens as norovirus, non-typhoidal Salmonella, Clostridium perfringens, and Campylobacter spp. [2]. The leading cause of foodborne illness resulting in hospitalization and/or death was attributed to Salmonella enterica serotypes. According to the CDC’s latest data from 2013 to 2016, Campylobacter spp. have been identified as the leading cause of foodborne infections followed by Salmonella enterica, Shigella, STEC, Cryptosporidium, Yersinia, Vibrio, Listeria, and Cyclospora spp. [3]. In 2013, the CDC published a report entitled the “Antibiotic Resistance Threats in the United States” that ranked the most urgent, serious, and concerning antimicrobial-resistant bacterial infections detrimental to human health [4]. The rankings descend in importance from urgent (the most threatening resistant infections involving high-consequence antimicrobials and urgent action), to serious (those infections that are significant antibiotic resistance threats but do not require urgent action at this time), and finally to concerning (those infections that cause serious illness but at this time multiple therapeutic options may be available). These rankings were based on several factors including the estimated burden of illness in the USA, as well as the number of available antibiotics which could treat these resistant infections. The leading foodborne infections from 2011 and 2016, Salmonella enterica serotypes and Campylobacter spp., are both ranked as serious resistance threats to public health by the CDC [2,3,4]. Any of these zoonotic strains which may also be resistant to carbapenems are classified as urgent threats, including Salmonella enterica serotypes, Campylobacter species, and pathogenic Escherichia coli [4]. Therefore, a major focus of this chapter will describe detection and characterization of these leading causes of foodborne infections that threaten human health.

Traditional Methods

Traditional methods, consisting of bacterial and/or viral culture of food samples using microbiological media with biochemical identification of bacterial genera or cell culture techniques for viruses, continue to be considered the most reliable and successful methods for foodborne pathogen detection and currently remain the gold standard. The Food and Drug Administration’s (FDA) Bacterial Analytical Manual (BAM) currently describes the officially accepted methodology for detection of bacteria, viruses, yeast, and molds [5]. These fundamental microbiological assays remain the cornerstones of most pathogen detection schemes, involving standard sample collection, selective agar plating, and characterization via biochemical tests for proper identification. However, these traditional culture methods are slow, labor intensive, and can require specialized skills. In a typical bacterial foodborne disease outbreak, a minimum of 5–7 days is required to culture and identify an isolated colony following BAM recommendations. The time necessary for microbiological and biochemical identification of the bacterial strain may delay the proper diagnosis and subsequent treatment regime, resulting in a longer hospital stay [2]. Therefore, a significant demand for a more rapid detection of pathogens (minutes, rather than days) has arisen. Alternate molecular methods, including culture-independent diagnostic techniques (CIDT), have been developed in an attempt to reduce or eliminate rate-limiting steps and thus reduce the time required to provide public health officials the identity of the cause of a foodborne disease outbreak. A partial list of some rapid methods and alternative molecular methods are listed in the FDA BAM in Appendix I, although these are not methods officially used or endorsed by the FDA [5].

Serotyping

Following the identification of a bacterial foodborne pathogen utilizing selective media and biochemical testing, common and successful methods for further characterizing these strains involves the use of antibodies. For bacteria such as Salmonella enterica strains, serotyping per the Kauffman-White scheme is one of the oldest and most successful subtyping methods available [6]. Serotyping is based on antibody recognition of the O and the H antigens present on S. enterica flagella, and typing is achieved via agglutination testing using monoclonal antibodies specific for each variant. There are over 2500 serotypes of Salmonella enterica now recognized. Although serotyping is a widely used and specific method to characterize S. enterica strains, it is laborious, time consuming, requires specialized skills, and the logistics for maintaining adequate stocks of antisera can be challenging.

Molecular serotyping methods, such as multiplex polymerase chain reaction (PCR), real-time PCR systems, probe detection, gene sequencing, single-nucleotide polymorphism, and whole genome sequencing methods, have been established for some of the most common foodborne serotypes of S. enterica and E. coli found in the USA and Europe [7,8,9,10,11,12,13,14,15,16,17,18,19]. Leader et al. [7] tested over 700 strains of Salmonella serotypes using a multiplex PCR system capable of detecting the 50 most common serotypes in the USA with an accuracy of 89%, when compared to traditional serotyping. Taking multiplex PCR of the O and H antigens of S. enterica serotypes one step further, a technology whereby multiplex PCR products are detected via a liquid array of fluorescently labeled antigen-specific probes coupled to beads was developed to increase the throughput and specificity of the multiplex PCR molecular serotyping of Salmonella serotypes [10, 11]. McQuiston et al. [11] utilized this technology amplifying the fliB and fliC genes of the H antigen to characterize 500 serotypes of S. enterica in parallel with traditional serotyping techniques. This method correctly identified 461 (92.2%) isolates, partially serotyped 47 (9.4%) isolates, and characterized 13 (2.6%) isolates as monophasic or nonmotile strains. Only 39 (7.8%) strains were not correctly identified. The authors suggest that this methodology is sufficiently high throughput to screen 100 isolates per day and is useful for outbreak detection when used in combination with the similar O-antigen scheme developed by Fitzgerald et al. [10]

Single-nucleotide polymorphism (SNP) typing has been investigated for molecular serotyping. Highly informative sequence variations in the gnd gene encoding for the serotype of E. coli have been utilized to screen retail beef for E. coli O157 and the “big six” E. coli non-O157 serotypes that are flagged by the US Department of Agriculture (USDA) as contaminants of public health concern [12]. In order to develop SNP types correlating to these E. coli serotypes of concern, the gnd region was sequenced in a “collection of 195 STEC isolates, including isolates belonging to O157:H7 (n = 18), O26(n = 21), O45 (n = 19), O103(n = 24), O111 (n = 24), O121 (n = 23), O145 (n = 21), and ten other STEC serogroups (n = 45).” Subsequent to this analysis, additional informative SNPs were identified for molecular serotyping. Twelve informative SNPs have been identified and multiplexed into a SNP typing assay by single base pair extension chemistry. Using this technology, SNP types were determined for the seven clinically important STEC serogroups and, although multiple SNP types per serogroup were identified, “there were no overlapping SNP types between serogroups.” [12]

Microarray methods to determine serotype of Salmonella enterica strains have been developed but to date do not have a 100% correlation with the traditional Kauffman-White method. The Salmonella genoserotyping array (SGSA) detects 57 of the most commonly reported serovars through detection of the genes encoding surface O and H antigens [13]. This microarray was evaluated and validated by testing 1874 isolates from human and nonhuman sources at 4 laboratories in 3 countries, correctly identifying 96.7% of isolates from the target 57 serovars. Test specificity and sensitivity was greater than 98% for S. Enteritidis and 99% for S. Typhimurium. However, the SGSA array has its greatest utility as a rapid screen for those most common serotypes included in the 57 targets and cannot correctly detect other serotypes including those which may be unusual or on the rise [13]. Patel et al. [14] developed a custom E. coli pan-genome E. coli microarray (the FDA E. coli identification or FDA-ECID array) as a “molecular toolbox” for use in bacterial characterization and outbreak tracking. The FDA-ECID array was designed to represent the core genome of all E. coli isolates based on WGS sequence analysis of all publicly available E. coli sequences available in the public domain. The FDA-ECID array is capable of molecular serotyping using 25-tiled 11-mer probes per target O or H antigen gene target capable of detecting SNPs including “211 unique probe sets for identifying 152 O types and 54 probe sets for all known H types.” Validation of this array was accomplished by testing 103 E. coli isolates from the E. coli reference collection and diarrheagenic E. coli collection for comparison of the molecular serotype determined by the array to WGS data and traditional serotyping. Ninety-nine of the 103 isolates were correctly identified by O-type, and all but 15 were correctly identified by H-type by the FDA-ECID array. The authors state that errors were due to the absence of particular O-type antigen probes, mistyping by serology, and nonmotile strains [14]. While this array is capable of multiple types of molecular characterization of E. coli isolates simultaneously, the limitation of the array versus whole genome sequencing or traditional serotyping lies in the fact that it can only detect the specific number of O and H antigen types that are designed into the array and cannot detect those which are not included or are unusual.

The community-wide adoption and decreasing per strain cost of whole genome sequencing of foodborne bacterial strains has resulted in a large amount of isolate-level sequencing data which can be analyzed using bioinformatics to determine the serotype of foodborne bacterial strains. One such system is called SeqSero and is a web-based tool developed to accurately identify Salmonella enterica serotypes based on the matching of sample sequence data to well-curated databases “of Salmonella serotype determinants (rfb gene cluster, fliC and fljB alleles).” The SeqSero tool can “determine serotype rapidly and accurately for nearly the full spectrum of Salmonella serotypes (more than 2,300 serotypes), from both raw sequencing reads and genome assemblies.” [15] These authors tested SeqSero’s capability to accurately determine the serotype of each isolate using three types of sequencing data. The first type of data included the “raw reads from genomes of 308 Salmonella isolates of known serotype” from the Centers for Disease Control. The second type of data consisted of raw WGS “reads from genomes of 3,306 Salmonella isolates sequenced and made publicly available by GenomeTrakr, a U.S. national monitoring network operated by the Food and Drug Administration.” These isolates included metadata submitted by the submitting agency, which included an indicated serotype. The third type of data consisted of 354 other publicly available draft or complete Salmonella genomes, with metadata describing the serotype. After comparison of the sequence data with the known serotypes or submitted metadata serotype, the SeqSero tool’s serotype prediction matched the known serotypes in 98.7% of the 308 CDC isolates, 92.6% of the serotypes submitted in the metadata of the GenomeTrakr isolates, and 91.5% of the metadata submitted serotypes of the publicly available isolates. Two hundred serotypes successfully correlated to known or metadata submitted serotypes, which included 85 of the top 100 Salmonella serotypes associated with human infections. Errors were attributed to variability in the H antigens, and unknown serotypes were not adequately represented in the database [15]. This platform may be considered for official adoption in public health laboratories and national surveillance systems and is undergoing validation (S. Ayers, personal communication).

In 2015, Public Health England implemented routine whole genome sequencing as a part of their foodborne pathogen surveillance and serotype identification for Salmonella serotypes [16]. Public Health England utilized a multilocus sequence typing (MLST) approach using whole genome sequence data of housekeeping gene alleles to predict the serotypes of 6887 human gastroenteritis cases of S. enterica subspecies I using the MLST scheme and database reported by Achtman et al. [17] This report showed that a majority of sequence types (ST) of S. enterica clustered by serotype due to the evolutionary relatedness strains with the same seven housekeeping gene alleles. Metadata including serotypes for a majority of the strains are housed in the database [17]. In this study, MLST sequence for the 6887 isolates were assigned a sequence type, and the associated serotype was predicted using the database reported in Achtman et al. [17] Of the strains tested by Public Health England, 6616 (96%) showed concordance between MLST-predicted serotype and phenotypic serotyping information in the metadata. The 4% that did not match were due to process errors, incorrect data entry regarding serotype, and some instances where two serovars belonged to the same sequence type (ST). Seventy isolates belonged to STs that did not belong to a defined serotype in the database, and those serotypes were determined phenotypically. Due to the success and robustness of this method, it was recommended that Public Health England adopt this scheme for serotyping S. enterica isolates [16].

Building on the concept of using genetic determinants for the O and H antigens as are used in the SeqSero method and allelic diversity of conserved housekeeping genes employed in MLST, Yoshida et al. [18] developed a bioinformatics platform to analyze WGS data of Salmonella isolates to determine serotype called the Salmonella In Silico Typing Resource (SISTR). This platform rapidly performs simultaneous in silico analyses on draft Salmonella genome assemblies. SISTR predicts serovars utilizing several methods for sequence-based serotyping (genoserotyping) based on the O and H antigens as in other platforms and integrating phylogenetic sequencing schemes including MLST, ribosomal MLST (rMLST), and core genomeMLST (cgMLST). Yoshida et al. [18] validated the SISTR platform by analyzing 4129 sets of Salmonella WGS data available in the public domain by comparing the predicted serotype from the SISTR analysis with the indicated serotype from the strains’ metadata. SISTR correctly identified the serotype of 94.6% of the finished genomes and WGS draft assemblies. Errors in correct serotype prediction were identified as incorrect serotypes in the metadata submitted with strains and some quality issues associated with the sequencing data. However, coupling the cgMLST and genoserotyping of O and H antigen genes in the SISTR platform provided the most accurate serotype prediction [18].

In 2017, a group from PulseNet Canada compared three of the molecular serotyping methods described herein (SeqSero, SISTR, and MLST) with traditional serotyping to ascertain which method was the most concordant with traditional serotyping [19]. Serotype was most accurately predicted for 813 clinical and laboratory S. enterica strains using the SISTR method (94.8%), with the SeqSero and MLST methods resulting in 88.2% and 88.3% concordance with traditional serotyping, respectively. The authors conclude that this validation indicates that each of these methods “would be suitable for maintaining historical records, surveillance systems, and communication structures currently in place;” however, the authors maintain the importance of traditional serotyping for the foreseeable future [19].

Although molecular serotyping methods are much faster and orders of magnitude less labor intensive, they are not 100% accurate and have not been established for all 2500 serotypes of S. enterica, particularly veterinary strains. Moreover, very limited information can be gleaned from establishing a serotype, although this method of detection is considered a first step in the broad characterization of a S. enterica or E. coli strain.

Enzyme-Linked Immunosorbent Assay (ELISA)

In addition to traditional serotyping, which uses specific antibodies to detect and characterize foodborne pathogens such as Salmonella enterica and E. coli, other types of antibody-mediated methods are available for the detection of foodborne pathogens with varying levels of specificity, detection versus characterization capabilities, and time for required for results. Enzyme-linked immunosorbent assays, or ELISAs, are an example of one such method. Nyman et al. [20] evaluated three ELISA platforms for the detection of Salmonella serotype Dublin in bovine bulk milk for potential use in surveillance in the Swedish Salmonella Control program. Samples were randomly “collected within the Swedish bulk milk sampling scheme and analyzed with three ELISAs; a Danish in-house Dublin ELISA, PrioCHECK(®) Salmonella Ab bovine Dublin ELISA and PrioCHECK(®) Salmonella Ab bovine ELISA.” Each ELISA resulted in high specificities for the detection of S. Dublin in bulk milk at 99.4%, 99.4%, and 97.9%, respectively. Therefore, the authors concluded that these ELISA tests were sufficiently specific to be included as a screening step for Swedish Salmonella surveillance; however, an obvious limitation to this test is the inability to detect other Salmonella serotypes [20].

Another example of a commercially available automated system is the VIDAS system (bioMerieux SA, Marcy l’Etoile, France), which detects Salmonella enterica, E. coli, Campylobacter, Vibrio, and Listeria strains from a mixed culture via an immunoassay strip-based method, whose inner surfaces are coated with specific antibodies. The VIDAS allows for automated rapid detection of Salmonella in 1–2 days, versus the longer process of identifying via traditional culture methods (5–7 days) and serotyping (5–7 days). This system was reviewed in the previous version of this chapter [1] and is still currently used in some US federal foodborne pathogen detection labs as a rapid method of detection and first step for screening for foodborne pathogens which are then confirmed with official BAM methods (S. Ayers and K. Blickenstaff, personal communication).

Although traditional ELISA methods can be sensitive and specific for the detection of foodborne pathogens, those conducted using a plate/well scheme are time consuming and can require large volumes of antibody or sample for accurate detection. Therefore, immunoassays that function similarly to ELISA have been developed. These immunoassays are faster, can detect more samples as well as further characterize the strains, and are more sensitive. A newly developed antibody-based microarray that detects the foodborne pathogens E. coli and Salmonella with comparative sensitivity to ELISA and returns results in 1 h was reported by Karoonuthaisiri et al. [21]. Other technologies based on immunoassays, such as microbead-based immunoassays (discussed in the serotyping section of this review), are replacing traditional ELISA methods. Microbead assays that are capable of detecting a multiplex of 40–100 or more different targets including foodborne pathogens and associated virulence genes are faster, more reproducible, and more sensitive [22]. An immunoassay utilizing gold nanoparticle aggregation linked to a polyclonal antibody specific for Salmonella enterica was described by Hahn et al. [23] for sensitive detection of Salmonella enterica on the surface of tomatoes. These researchers have detected Salmonella serovars Typhimurium, Javiana, and Newport to a level of detection of 10 CFU/g of tomatoes. Cho et al. [24] developed an in situ immuno-gold nanoparticle network-based ELISA biosensor platform to detect S. typhimurium and E. coli in food matrices with high sensitivity. This sensor system includes a sample concentration step based on immuno-magnetic separation of the pathogenic microorganisms to increase sensitivity to “3 cells/mL of E. coli O157:H7 and Salmonella typhimurium in buffer and 3 CFU/mL of E. coli O157: H7 and 15 CFU/mL of S. typhimurium” in food matrix conditions within 2 h of inoculation.

Bacteriophage

Bacteriophages are viruses which infect bacteria via recognition of strain-specific antigens. Bacteriophages are ubiquitous in nature, and their selective properties make them ideal for the detection of bacteria. Anany et al. [25] utilized the natural specificity and selectivity of bacteriophages (phage) to develop a “dipstick” paper device impregnated with phage to detect foodborne bacteria such as Escherichia coli O157:H7, E. coli O45:H2, and Salmonella Newport in spinach, ground beef, and chicken homogenates. When coupled with quantitative real-time PCR, “a detection limit of 10–50 colony-forming units per ml was demonstrated with a total assay time of 8 h, which was the duration of a typical work shift in an industrial setting.” Junillon et al. [26] developed a multiple foodborne pathogen detection system based on the use of bacteriophage tail fibers affixed to a solid phase surface and an intracellular metabolic marker to visualize the bacterial presence on the device surface. The solid phase support surface was affixed with bacteriophage tail fibers specific for Escherichia coli O157: H7, Listeria spp., and Salmonella spp. and added directly to a stomacher bag of food sample artificially inoculated with the pathogens of interest. Bacterial capture was visualized “in situ as a result of the bacterial reduction of the colorless soluble substrate triphenyltetrazolium chloride (TTC) (present in the primary culture medium) to an intracellular red insoluble formazan product.” The authors state that this system is faster than traditional microbiological methods by eliminating post-stomaching incubation and is practical for use in industrial food environments [26].

While bacteriophages are natural and exquisitely specific, this form of detection simply identifies bacteriophage specific strains in food matrices and provides no further information about the strain. Due to their specificity, variants of strain types may not be detected. Further limitations of bacteriophage include the requirements for microbiological culture for propagation and a cold chain for maintenance of testing stock.

Polymerase Chain Reaction (PCR), Real-Time PCR, and Reverse Transcriptase PCR

Polymerase chain reaction is one of the gold standard methods for detecting and characterizing foodborne pathogens. Because PCR can be conducted on impure samples as well as on mixed samples, and can be performed without the time-consuming microbiological culture and isolation methods, it is one of the fastest, most robust, and most reliable methods to date. Methods to detect and characterize the major foodborne pathogens (Salmonella, Campylobacter, E. coli, Listeria, and Vibrio, spp., to name a few) have been developed for contamination detection in a variety of food products and were comprehensively reviewed in Mangal et al. [27], including commercially available PCR detection systems.

While PCR is used for the first step of foodborne pathogen detection, the use of PCR for inter- and intra-strain characterization is discriminatory and popular. PCR speciation of Campylobacter jejuni and C. coli is a common method for species identification from food production environments and for surveillance of retail meats in the National Antimicrobial Resistance Monitoring System [28, 29]. MLST schemes have been developed for foodborne bacterial species such as Salmonella enterica, which has been shown to group strains by serotype and evolutionary relatedness by identifying single or multiple nucleotide changes in well-conserved housekeeping genes. Sangal et al. [30] used MLST and a database of thousands of sequence types contributed by researchers all over the world to study the relatedness and population structure of five major serotypes of Salmonella, with a focus on Salmonella Newport and its MDR-AmpC phenotype expressing resistance to nine antimicrobials. Achtman et al. [17] proposed MLST as a replacement for traditional serotyping. However, primary identification systems such as bacteriological culture and isolation must be used prior to MLST characterization for strain detection. As useful as MLST or any variant multilocus scheme is to define strains as a stand-alone method, a combination of PFGE and other methods such as MLST have been shown to be the most discriminatory [31, 32]. MLST schemes have been incorporated into whole genome sequencing analyses to group related strains and reign as the most discriminatory combination to date [18].

Real-time PCR (qPCR) and reverse transcriptase PCR (RT-PCR) are common and regularly utilized methods to detect and quantify bacterial foodborne contamination events. Due to the popularity of qPCR methods, commercial kits have been developed and validated by the AOAC for diagnostic tests for food products. A comprehensive review of commercial kits available was reported by Mangal et al. [27]. For example, two kits were developed by Roche and/or BIOTECON Diagnostics to individually detect Listeria monocytogenes and Salmonella enterica in a variety of food matrices using a qPCR scheme. The foodproof kit allows for rapid isolation of the DNA from food matrices such as peanut butter, milk, vegetables, retail meats, and many other food products [33, 34]. These foodproof qPCR detection kits have been evaluated to be equivalent in performance to the FDA-BAM reference method, however, much more rapidly. The ability to test for more than one pathogen concurrently is a characteristic essential to the rapid diagnosis of a foodborne illness. qPCR is easily manipulated to test for multiple targets and was used by Fukushima et al. [35] to detect the causative agents of 21 foodborne outbreaks in 2 days. Therefore, the benefits of using real-time PCR to detect the foodborne pathogen contamination in food products or in an outbreak include the rapidity of the method over traditional microbiological identification, increased sensitivity and specificity, quantification of the pathogen, and the ability to multiplex the reaction. However, neither of these methods can differentiate between the detection of live or dead bacterial cells. One method to detect viable bacterial cells in food was reviewed in depth by Zeng et al. [36], “whereby biological dyes such as ethidium monoazide and propidium monoazide (PMA) are used to pretreat samples before DNA extraction to intercalate the DNA of dead cells in food samples, and then proceed with regular DNA preparation and qPCR.” The intercalation of the dyes into DNA interferes with subsequent PCR amplification and thereby excludes dead cell DNA from being amplified with DNA from live bacteria in food. These authors reviewed in detail the detection of viable Salmonella serotypes, Campylobacter species, E. coli, and other foodborne pathogens using this method; however, limitations to this method include the incomplete exclusion of dead cell DNA in complicated food matrices.

Reverse transcriptase PCR (RT-PCR) is a method capable of detecting live bacterial cells via the isolation of mRNA with subsequent conversion by reverse transcription to cDNA for further amplification and quantitation. Miller et al. [37] tested the sensitivity and rapidity of the detection of Salmonella Typhimurium from spiked samples of lettuce and tomatoes via RT-PCR of the invA gene. These authors could show that RT-PCR identified S. typhimurium at 6 log CFU/25 g of lettuce spiked with high inocula Salmonella without pre-enrichment and at 4 log CFU/25 g at low inocula levels with a 6-h enrichment. For tomatoes, Salmonella strains were detected at 6–7 log CFU/100 g without enrichment and at 4 log CFU/100 g with 6-h enrichment at a low inocula. Therefore, this method can detect Salmonella enterica contamination in produce within 24 h.

Zhang et al. [38] compared qPCR, RT-PCR, and loop-mediated isothermal amplification (LAMP) to the FDA BAM method for the efficiency of the molecular methods to identify Salmonella serovars in six high-risk produce commodities: cilantro (coriander leaves), lettuce, parsley, spinach, tomato, and jalapeno pepper. Salmonella serovars were spiked into 25 g samples of each commodity at two different levels, 105 and < 101 CFU/25 g. All four methods detected as little as two CFU of Salmonella cells/25 g of produce. Compared to the BAM method, each of the molecular methods, qPCR, RT-PCR, and LAMP resulted in equally sensitive detection levels but more rapidly. RT-PCR additionally has the advantage of detecting live Salmonella serovars, an important feature in food safety screening in six high-risk produce commodities.

Microarray

Microarrays have been used with success to identify and characterize foodborne pathogens such as E. coli, Salmonella, and Campylobacter spp. in purified or mixed samples since their first description in 1995 and were reviewed in the previous edition of this chapter [1, 39,40,41,42]. Microarrays are a high throughput and information-dense tool that are particularly useful when screening multiple pathogen types with multidrug-resistant phenotypes and virulence types in foodborne pathogen surveillance [39, 43,44,45,46]. While an exhaustive review of microarrays will not be explored in this chapter, it is of note that custom, high-density microarrays have been developed which provide almost sequencing type data on a microarray slide. Photolithographic microarrays, such as Affymetrix arrays, were designed for foodborne pathogens, which can accommodate millions of probes due to the photolithographic technology (Affymetrix Inc., Santa Clara, CA). These information-dense, high-throughput microarrays contain probes for the entire genomes of foodborne pathogens and can define a single strain. Jackson et al. [47] used this technology to define and describe the genomic content of E. coli isolates from a reference collection and human illnesses. Patel et al. [14] utilized the FDA E. coli identification (FDA-ECID) custom E. coli microarray as discussed previously to identify the molecular serotype of 103 diverse E. coli strains. Additionally, the FDA-ECID array is designed to include probes representing the core E. coli genome, detect virulence genes, and identify SNPs which correlate to phylogeny, thereby providing strain-level characterization of tested isolates. Data generated from screening via the FDA-ECID array were validated against WGS of 103 diverse E. coli strains including those associated with past foodborne illnesses. “A 99.7% phylogenetic concordance was established between microarray analysis and WGS using SNP-level data for advanced genome typing” [14]. Therefore, the array provides a plethora of genomic information and would best be used for in-depth screening when WGS is not available.

Although microarrays remain useful as screening tools for foodborne pathogen detection, characterization of strains, and source tracking, whole genome sequencing has become affordable for almost all public health laboratories and may become secondary to the more powerful and informative WGS for food safety surveillance.

PFGE

For the last 20 years, pulsed-field gel electrophoresis (PFGE) has maintained its status as the gold standard for outbreak tracking and molecular subtyping of zoonotic foodborne bacteria such as Salmonella enterica, Campylobacter species, Escherichia coli, Shigella, Vibrio cholerae, and Listeria monocytogenes [48,49,50]. The PulseNet program, a molecular subtyping program consisting of state and public health laboratories and the CDC, operates via sharing macrorestriction digest gel fingerprints of each strain of foodborne bacteria within a common database and can identify indistinguishable patterns which may be linked in a foodborne outbreak. Surveillance networks that utilize PFGE include the National Antimicrobial Resistance Monitoring System (NARMS), CIPARS in Canada, and many other international surveillance systems in PulseNet International including the USA, Europe, Canada, Asia Pacific, Latin America and the Caribbean, Middle East, and Africa [51,52,53]. The benefits of the PFGE method include national and international validation and standardized methodology, a full genome “fingerprint” or banding pattern that is stored electronically, and a shared database between the state, local, and federal food safety agencies. Although single- or two-enzyme PFGE analysis provides a whole genomic snapshot of the bacterial strain, and does provide a high level of discrimination between very similar strains or serotypes, the actual sequence of these genomic differences is not identified. Additionally, plasmids, due to their small size, often are not visible on the PFGE fingerprint and often are not identified when only using the PFGE method. Finally, microbiological culture and isolation/identification of the bacterial pathogen must be conducted before PFGE can be conducted, resulting in a wait time of about 10 days before the results are realized.

Due to the limitations of the method and the community-wide adoption of whole genome sequencing, the PulseNet program has officially committed to transition to whole genome sequencing as the primary molecular subtyping method for foodborne outbreak characterization in the USA [48]. Whole genome sequencing delivers the entire genome of the foodborne pathogen, whereby characterization via single-nucleotide polymorphisms results in multiple methods for analyzing the data in one assay. Molecular serotyping, as discussed in the serotyping section above, as well as relatedness typing via whole genome MLST (wgMLST) or core genome MLST (cgMLST), single-nucleotide polymorphism strain typing, virulence typing, plasmid detection, and the identification of antimicrobial resistance genes can be accomplished with multiple analyses from the data from a single whole genome sequence of the foodborne pathogen [48, 51]. Therefore, while PFGE remains in wide use, whole genome sequencing will soon completely replace PFGE as the primary method for molecular subtyping of foodborne pathogens in the USA [48, 51].

Whole Genome Sequencing (WGS)

The advancements in sequencing technologies in the last two decades, in addition to the plummeting per reaction cost of performing these methods, have rendered the use of whole genome sequencing feasible for foodborne pathogen surveillance. The ability to identify and subtype strains involved in a disease outbreak is now a reality [54]. As mentioned in the previous section describing PFGE, the reigning gold standard primary method for molecular subtyping of foodborne pathogens in the USA, whole genome sequencing is replacing PFGE and is being adopted by the CDC-led PulseNet program for foodborne outbreak surveillance [48, 51]. The benefits of using whole genome sequencing versus PFGE are many. While PFGE provides strain discrimination that can reliably identify clusters of outbreak strains, whole genome sequencing provides data that can be analyzed to identify the serotype, phylogenetic relatedness of strains, antimicrobial resistance and virulence genes, and plasmids or other mobile elements [51]. However, the time it takes to achieve results, purely from isolation of the foodborne bacterial strain to the generation of a PFGE profile or WGS dataset, is not markedly different. Generating whole genome sequence from foodborne bacterial strains still relies on the time-consuming microbiological isolation of a pure culture from food or an ill consumer, which can take up to 5 days. However, once the data are obtained, the ability to analyze WGS data and obviate the need to perform further traditional characterization testing, including serotyping or further PCR/sequencing to characterize the strain, is a major benefit. A recognized limitation to the use of whole genome sequencing includes the need for complex bioinformatics tools and personnel expertise to analyze the data, set standards to define requirements for calling strains related, the need to set a national and international agreement on the appropriate method for analyzing the WGS data [using SNP differences, whole genome MLST (wgMLST), or core genome MLST (cgMLST)], the need for databases with comprehensive and defined nomenclature to identify genetic elements by the same names, and a common repository to store the immense amount of data generated per strain. Despite these challenges, US public health laboratories surveillance systems such as NARMS and CDC’s PulseNet Program are beginning to use WGS as a primary method of identification of foodborne pathogens [51].

A number of studies have provided proof of principle for this emerging technology in the study of food-related disease outbreaks, including the 2013 pilot outbreak detection program for Listeria monocytogenes by CDC, FDA, USDA-FSIS, NCBI and local, state, and international partners [51, 55]. This pilot project prospectively performed WGS on all available L. monocytogenes isolates collected from food, food processing environments, and patients in the USA to evaluate the usefulness of WGS in real-time foodborne disease surveillance. CDC’s PulseNet program, including state and local health departments, performed WGS on all human cases of L. monocytogenes in 2013, USDA-FSIS performed WGS on isolates from food processing environments, and FDA’s GenomeTrakr network contributed WGS data from food sources of L. monocytogenes in 2013. All L. monocytogenes WGS data from all partners were submitted to NCBI under a single BioProject, which functioned as a single repository for deposition of the WGS data. PFGE was performed in parallel by many of the partners on the L. monocytogenes strains. While two different methods of analysis were employed, core genome MLST (cgMLST) by CDC and high-quality SNP analysis (hqSNP) by the other partners, the authors report that the two “methods equally distinguish isolates belonging to an outbreak from sporadic cases with high epidemiological concordance.” When comparing WGS to PFGE, the authors found that more clusters were distinguished and in a more rapid time frame than using PFGE alone. In September 2012 to August 2013, the year before WGS was piloted, 14 outbreak clusters were identified. After WGS implementation, 19 outbreak clusters were detected in the first year, and 21 clusters were detected in the second year. While two outbreaks were solved using molecular subtyping pre-WGS, five were solved in the first year of utilizing WGS, and nine were solved in the second year, linking to more conclusive food sources. The authors conclude that WGS is a preferable method for use in L. monocytogenes outbreak detection versus PFGE because WGS analysis could delineate clusters with diverse PFGE patterns, determine the source of cold cases, refine outbreak case definitions, link sporadic illnesses to food sources, and confirm outbreaks following product testing [55]. Subsequently, CDC’s PulseNet and state and public health laboratories began to transition to using WGS for foodborne outbreak detection, recognizing that the standard for defining the number of SNPs which may diverge in strains that cluster together has not been set, and epidemiological information is necessary to meaningfully group outbreak strains. L. monocytogenes outbreaks involving ice cream from a single manufacturer in three facilities from 2014 to 2015 and Hispanic-style cheese in 2013 were successfully detected and characterized using PFGE and WGS, with WGS emerging as the more discriminatory and meaningful method for outbreak tracking [56, 57]. The L. monocytogenes outbreak traced back to cheese is recognized as the first use of WGS in US regulatory investigation of an outbreak [58].

As mentioned previously, in response to the community-wide interest in using WGS for outbreak tracking, the FDA has organized a network of participating state and federal public health and FDA field labs generating WGS data on outbreak and foodborne disease-related isolates in 2012 called GenomeTrakr. This network, currently comprised of 28 state health and independent labs and 15 FDA labs in the USA,Footnote 1 was initiated to centralize the deposition of WGS data generated in the public health and field labs into 1 publicly available repository at NCBI, which syncs data nightly with global DNA databases in Europe and Japan [European Molecular Biology Laboratory (EMBL) and DNA Database of Japan (DDBJ)]. As of 2017, GenomeTrakr has added 20 international locations to the network and continue to add participants [1]. Thereby, GenomeTrakr and NCBI provide a platform for global comparison of the rapidly uploaded draft genomes, including critical metadata such as food source and geographical location, for foodborne disease outbreak identification to support timely investigations [58].

Whole genome sequencing has been successfully used to improve discrimination of foodborne outbreak clusters of Salmonella enterica serotypes including S. Enteritidis, a serotype which is historically difficult to differentiate via PFGE due to the phenomenon that most strains fall into only 3–5 PFGE profile types. Whole genome sequencing of these isolates in retrospective and prospective studies using outbreak isolates was capable of subclustering strains into discrete outbreak clusters which was not previously possible using PFGE [59, 60]. Another example of Salmonella serotype foodborne outbreaks being solved by WGS includes S. Heidelberg, one of the top serotypes in human infection. An outbreak of 146 Salmonella Heidelberg infections in 2014 in 24 states was retrospectively analyzed by conducting WGS, successfully tracking the food source of the outbreak to chicken at a catered party. While whole genome sequencing is rapidly being validated as the most useful method for outbreak tracking and surveillance, multiple food sources can be confounding making epidemiologic information inclusion necessary for the most successful foodborne outbreak resolution [61]. Foodborne strains of Campylobacter species have also been successfully analyzed with greater discriminatory power using WGS than PFGE or MLST and were comprehensively reviewed by Llarena et al. [62].

Due to the successes in foodborne outbreak resolution for Salmonella enterica and Campylobacter species, the NARMS program has begun to use WGS as the primary method of foodborne bacterial characterization and discrimination for these two pathogens. As discussed previously, WGS has begun to officially replace gold standard methods such as traditional serotyping and PFGE for foodborne pathogen detection and characterization in US surveillance systems and outbreak tracking programs such as CDC’s PulseNet. Antimicrobial susceptibility testing is important to perform for foodborne pathogens to provide a baseline of resistance and characterize trends in antimicrobial resistance development to inform human medical treatment of gastroenteritis from food. Because WGS data analysis can reveal antimicrobial resistance gene presence in foodborne bacterial strains, several proof of concept studies have been conducted to assess the predictive value of the detection of antimicrobial resistance genes to phenotypic antimicrobial resistance for Salmonella enterica serotypes and Campylobacter species [63, 64]. McDermott et al. [63] performed WGS on 640 retail meat and human infection Salmonella serotypes from the NARMS program from 2011 to 2012 and assessed the correlation between the detection of antimicrobial resistance genes in those isolates to phenotypic Clinical and Laboratory Standards Institute (CLSI) resistance breakpoints and epidemiological cutoff values. Overall concordance between the methods was shown to be 99% for all the isolates, whereby a resistance gene was identified that could predict the resistant phenotype assessed by microbroth dilution per CLSI standards. A match was not identified in 20 instances, resulting in an overall sensitivity of 98.8%, and these cases involved aminoglycosides, beta-lactams, sulfasoxazole or trimethoprim-sulfasoxazole, and quinolones. A total of 65 unique resistance genes have been identified for which antimicrobial resistance was not tested phenotypically, highlighting the ability of WGS to identify antimicrobial resistance phenotypes which may be missed by the constraints of phenotypic testing. However, the authors also recognize that unknown resistance genes that confer resistance will not be detected if WGS is the sole manner of characterizing decreased antimicrobial resistance and maintain that phenotypic antimicrobial susceptibility testing will be conducted in the NARMS program in some fashion for the foreseeable future. Looking at the ability of WGS to predict reduced susceptibility in Campylobacter species, Zhao et al. [64] compared in vitro antimicrobial susceptibility testing results to WGS of 114 C. jejuni and C. coli from retail meats, cecal samples, and human infections from 2000 to 2013 from the NARMS program. The authors found that “phenotypic and genotypic correlation was 100% for tetracycline, ciprofloxacin/nalidixic acid, and erythromycin, and correlations ranged from 95.4% to 98.7% for gentamicin, azithromycin, clindamycin, and telithromycin” [64]. An overall correlation of 99.2% between the methods was identified, suggesting that WGS is a reliable indicator of resistance for foodborne Campylobacter species in the USA. Limitations identified by the authors of both studies include the fact that short reads from the benchtop sequencers used preclude closing the genomes, whereby some antimicrobial genes can be missed or locations not accurately identified. Further, plasmids are difficult to close using short-read sequencers, and comprehensive databases for plasmid gene identification are not yet publicly available, rendering the ability to use WGS for plasmid identification incomplete. Although the NARMS surveillance program supports the use of WGS to predict phenotypic resistance in foodborne pathogens and forecasts the replacement of antimicrobial susceptibility testing by WGS, the European Committee on Antimicrobial Susceptibility Testing (EUCAST) disagrees [65]. In 2017, the EUCAST published a paper exploring the ability for WGS to completely replace phenotypic antimicrobial susceptibility testing for clinical therapy guidance, and this group feels that there is currently insufficient evidence to support a complete transfer of methodology to WGS. Some of the limitations listed by this group, in addition to those cited by the NARMS group [63, 64], include the importance of setting international standards and quality control metrics to predict resistance from all WGS participants that epidemiological cutoff values should be used to predict non-susceptibility versus clinical resistance breakpoints and the importance of a single, comprehensive database for identifying mutations and resistance-conferring genes. Therefore, while the sole use of WGS to predict decreased antimicrobial susceptibility is gaining support in the US foodborne pathogen surveillance systems such as NARMS, the international community has not yet committed the same level of confidence in the replacement [65].

Overall, whole genome sequencing is becoming accepted as the primary method of epidemiological outbreak and source tracking foodborne pathogen studies and is being used in real time to identify contaminant point sources in the USA. Real-time outbreak detection with the capability to simultaneously identify important characteristics of the foodborne pathogen-like serotype, resistance phenotype, and virulence gene presence are important during high-priority public health events and will become more efficient as standards in quality metrics and bioinformatics pipelines are adopted.

Metagenomics

According to a study from 2011, the number of foodborne illnesses that cannot be attributed to a specific cause is estimated at 38.4 million cases [2]. In order to decrease the number of unattributed cases, researchers have been employing culture-dependent methods such as WGS and newer technologies made possible by the affordability of WGS to identify and characterize more foodborne pathogens than ever. Culture-independent diagnostic techniques (or CIDT), such as PCR conducted without microbiological identification and isolation of the pathogen, have been increasingly utilized by medical professionals to decrease the time to treatment and achieve better clinical outcomes. However, PCR and other CIDT are limited by the number of antigens that can be detected in a multiplex simultaneously, by the known variants of foodborne pathogen strains, and by known pathogens. Metagenomics, or the identification of genetic material using sequencing technology directly from samples, is a growing field for CIDT. Metagenomics can be conducted without microbiological isolation of strains, and because sequencing is performed on all DNA in the sample, none of the potential pathogens are missed. This method is faster than traditional culture-dependent techniques, and multiple pathogens in the millieu can be identified simultaneously including the presence of antimicrobial resistance and virulence genes. However, although virulence and antimicrobial resistance genes can be identified, it is difficult to assign these genes to a host pathogen or determine if the pathogen was viable in the sample [66].

To test the ability to use metagenomics in foodborne pathogen outbreaks, Huang et al. [66] performed a proof of concept study using two outbreaks in 2013 determined using culture-based methods by the CDC and state health labs to be S. Heidelberg. These two outbreaks, occurring in Alabama and Colorado, were indistinguishable via PFGE, occurred in the same month, and were originally suspected to be identical but were resolved using WGS to be two distinct outbreak strains. Using shogun metagenomics on the original patient stool samples, Huang et al. [66] compared the metagenomics results to the culture-dependent methods to solve the outbreak. In this comparison, metagenomic investigations were consistent with the culture-based findings. Additionally, the intrapopulation diversity of S. Heidelberg in the samples was identified, as well as the “possibility of coinfections with Staphylococcus aureus, overgrowth of commensal Escherichia coli, and significant shifts in the gut microbiome during infection relative to reference healthy samples.” A bioinformatics pipeline was designed to address challenges associated with the analysis of clinical samples, including the high frequency of contaminating human DNA sequences. This study described the successful use of metagenomics to detect and characterize foodborne outbreaks while addressing some of the gaps in the validation of these methods [66].

While there are many advantages to the use of metagenomics to reduce the time to treatment in human infections, the loss of the microbiological isolation of the causative bacterial strains for secondary testing has caused problems for foodborne pathogen characterization and surveillance. In November 2015, the CDC sent a letter to US state and territorial epidemiologists and public health labs stating that the use of CIDT as a sole method of detection of enteric pathogens “are a serious and current threat to public health surveillance, particularly for Shiga toxin-producing Escherichia coli (STEC) and Salmonella.” [67] Without a cultured isolate, secondary testing such as antimicrobial susceptibility testing cannot be performed, and the presence of a live causative agent may not be confirmed because metagenomics methods detect DNA of both living and dead cells. While a multitude of characteristics of patient or food samples can be determined, antimicrobial resistance genes cannot be attributed to a specific strain in the mixture when solely using CIDTs, making the attribution of antimicrobial resistance to an outbreak strain difficult. The authors feel that the sole use of CIDT may compromise the ability to link ill patients to each other, definitively link ill patients to a causative common food source, and link dispersed cases. A lack of isolates may cause outbreaks to go undetected, causing contaminated products remaining on the market, and reopen gaps in the food safety system. Adding reflexive culturing of CIDT positive strains may alleviate some of these pitfalls; however, the added cost to perform reflexive culture of patient samples has rendered some diagnostic labs resistant to conduct this isolation. Obtaining causative isolate cultures with future storage also provides the ability to retest these strains with the next “gold standard” methodology developed in the future and thus maintaining historical information on outbreaks. Considering the ever-advancing technology, this point may be the most critical advantage of reflex culturing positive CIDT samples. Future research and validation of methods to conclusively distinguish between viable and nonviable cells as well as link antimicrobial resistance genes to the host organisms will be important for metagenomics utility in foodborne pathogen detection and characterization schemes.

Conclusions

With food safety and antimicrobial-resistant foodborne infections drawing national attention due to recent outbreaks involving retail meats, peanut butter, cheese, and fresh vegetables, it is imperative that the programs which protect the US food supply from accidental or intentional contamination remain strong, reliable, and incorporate state-of-the-art molecular methods. Traditional methods, while validated and internationally accepted, are often laborious, time consuming, and lack detailed genetic information necessary to adequately detect and characterize a foodborne pathogen outbreak and indicate treatment strategies. New and advanced technologies, such as whole genome sequencing and CIDTs including metagenomics, are becoming regularly used for surveillance of the food supply, recognizing the limitations associated with these methods. Extensive multi-laboratory validations are being conducted for whole genome sequencing as this method is officially becoming the gold standard for foodborne pathogen outbreak detection. New bioinformatics tools are being designed to accurately delineate related strains and predict antimicrobial resistance, serotype, and evolutionary relatedness. However, epidemiological information remains essential for use with molecular technologies to meaningfully characterize outbreaks and must be preserved in parallel with these exciting and emerging technologies to preserve human health and the safety of our food.