Keywords

Introduction

Next-generation sequencing (NGS) methods, also referred to as deep, ultra-deep, high-throughput, or massively parallel sequencing, comprise a number of sequencing technologies that have succeeded the traditional dideoxynucleoside chain-termination (i.e., Sanger) method. Various platforms, which differ in their sequencing chemistries, read lengths, and throughput capabilities, are available (reviewed in [1]) (Table 37.1). As these platforms have become more accessible, they have become particularly attractive to clinical microbiology laboratories that already rely on molecular methods for pathogen identification and characterization.

Table 37.1 Characteristics of current NGS platforms . The specifications of the instruments were obtained from the manufacturers’ websites and/or company’s representatives and include up-to-date information as of April 2018

NGS studies of microorganisms typically follow one of two general strategies: targeted sequencing or nontargeted sequencing (Fig. 37.1) [2, 3]. The first approach typically uses target-specific primers for PCR-mediated amplification, so that the genomic regions of interest are enriched and selectively sequenced. This approach is often performed to interrogate well-characterized genomic regions (e.g., identify known drug-resistant mutants). Sequencing for de novo assembly of whole genomes, on the other hand, frequently relies on nontargeted library preparation. Whole-genome sequencing or WGS is often performed on cultured isolates, when microorganisms are unknown or the goal is to define the genomic content and functional potential of the organism under investigation. Nontargeted sequencing may also be applied to primary specimens for culture-independent pathogen identification or characterization of the microbial population. These nontargeted sequencing applications using primary specimens are termed metagenomic sequencing.

Fig. 37.1
figure 1

Illustration of sequencing approaches for diagnosis and monitoring of infectious diseases. Targeted amplicon sequencing (left panel) utilizes target-specific primers for template enrichment, followed by primers that are partially complementary to the target-specific primers (black bars), and contains sequencing adaptors and bar codes (blue bars). Nontargeted or metagenonic sequencing (right panel) utilizes enzymatic or mechanical fragmentation, followed by end repair to allow ligation of primers that contain sequencing adaptors and bar codes (blue bars). Size selection allows only fragments of a predefined length to be used for sequencing. Bioinformatics removal of human sequences is required since the nucleic acids of the organism of interest frequently constitute less than 1% of the nucleic acid pool. Note that fragmentation libraries may also be made from PCR-enriched amplicons

Examples of these approaches in infectious disease testing will be discussed with particular attention paid to the technical and bioinformatics challenges that arise with specific scenarios in virology and bacteriology. The use of NGS in clinical microbiology laboratories remains relatively limited, though its role in the diagnosis and management of infectious diseases continues to grow as standardized operational protocols, automation, and data analysis pipelines emerge.

Specific Applications in Diagnostic Virology

Viral Drug Resistance Mutation Testing

The emergence of drug resistance is an important factor in the management of several clinically significant viral infections. Genotypic drug resistance testing was originally performed using “population” or “bulk” sequencing, which involves amplification of specific viral genes followed by Sanger sequencing. However, Sanger methodology has limited sensitivity for minor variants when present at less than 15–20% of the viral population, while NGS methods can detect drug-resistant mutations (DRMs) present at ~1% [4, 5]. The prototypical virus for NGS-based genotypic resistance testing is HIV-1, and similar to Sanger-based methods, emerging NGS assays have used targeted sequencing of viral genomic regions known to develop resistance mutations [4]. Because it has been studied most extensively, HIV-1 will be used as a paradigm for a detailed discussion below of concepts related to NGS-based testing for viral drug resistance. Cytomegalovirus (CMV) will also be discussed. However, genotypic drug resistance testing is also utilized for the clinical management of other viral infections, including hepatitis B, hepatitis C, and influenza, and NGS methods are applicable for these viruses as well.

Human Immunodeficiency Virus Type 1 (HIV-1)

Epidemiologic studies in HIV-1-positive patients have shown that the presence of mutations conferring resistance to highly active antiretroviral therapy (HAART) can predict treatment outcomes [6]. Therefore, genotypic testing for DRMs is currently recommended for therapy-naïve patients when they enter into clinical care and for therapy-experienced patients when they show evidence of virologic failure [7]. A number of studies have compared NGS and Sanger sequencing methods for capturing minority resistant variants, demonstrating that at least half of the DRMs identified by NGS are missed by Sanger sequencing [8, 9]. The presence of such variants has been shown to predict an increased risk for therapy failure [10].

A major consideration when assessing minor variants is distinguishing true mutations from artifacts generated during PCR amplification, library preparation, or sequencing. These include mismatches, insertions/deletions, and PCR-mediated recombination products, known as chimeric sequences [11, 12]. This is particularly problematic for clinical specimens with low virus loads because the numbers of viral copies that are used for library preparation are small and a mixed viral population may not be accurately represented, even with the use of high-fidelity polymerases. Differential amplification of some variants can skew the final PCR product mixture because of stochastic events in early PCR cycles or differences in the efficiency of primer annealing [4, 13]. One possible solution is to estimate empirical error rates for a given NGS assay and for different viral concentrations and then to set thresholds for minor variant detection safely above the empirical error rates. For instance, a plasmid of known genotype can be subjected to NGS and Sanger sequencing, with the assumption that all NGS calls not validated by the Sanger “truth” are due to library preparation and/or NGS errors [14]. Alternatively, the library preparation step can employ primers tagged with a random sequence, such that each template receives a unique identifier. This allows a consensus sequence to be generated for each original template molecule, thus correcting for random errors during library preparation and sequencing [15, 16]. Another approach for addressing PCR bias has been to perform multiple independent amplifications from the same clinical specimen and pool the products to serve as a template for library preparation [17, 18]. Novel bioinformatics tools have also been used to process NGS data in ways that reduce error rates and call authentic low-abundance viral variants [4].

A large number of HIV-1 research studies demonstrating the superior performance of NGS methods compared to Sanger sequencing [4] have resulted in the introduction of several clinical NGS HIV-1 drug resistance assays. The most comprehensive is the DEEPGEN™HIV (developed by University Hospitals Case Medical Center, Cleveland, OH), which assesses for resistance mutations in the protease, reverse transcriptase, and integrase genes, in addition to predicting HIV-1 co-receptor tropism, with mean error rates of 0.37–0.39%, sensitivity for minor variants of 5%, and capacity to multiplex up to 96 samples in a single run [5]. Though not yet available in the USA, Vela Diagnostics have obtained CE marking for their ion PGM-based Sentosa SQ HIV-1 Genotyping Assay for the automated detection of drug resistance mutations in the protease, reverse transcriptase, and integrase genes at a level of 5% [19].

Two other assays that use deep sequencing of the HIV-1 env V3 loop for HIV co-receptor tropism are available clinically: the HIV-1 CCR5 tropism test (V3) offered by the British Columbia Centre for Excellence in HIV/AIDS (Vancouver, Canada) [20] and the HIV-1 co-receptor tropism with reflex to ultra-deep sequencing offered by Quest Diagnostics [21]. These assays and DEEPGEN™HIV have been shown to predict non-CCR5 tropism as accurately as the phenotypic gold standard (Trofile, Monogram Biosciences) and to exhibit a higher sensitivity than Sanger sequencing for detecting minor CXCR4-tropic variants.

Importantly, the clinical significance of low-abundance HIV-1 drug resistance variants detected by NGS remains to be fully characterized. Several studies have retrospectively evaluated the impact of low-abundance resistance variants detected by NGS in treatment-naïve patients [8, 9], as well as in treatment-experienced patients with virologic failure [22, 23]. Although patients with low-abundance DRMs detected by NGS alone appear to have a modestly increased risk of failing therapy, in general, the risk of failure is substantially higher with high-abundance mutants that can be demonstrated both by NGS and Sanger sequencing [8].

Cytomegalovirus (CMV)

CMV is another virus for which genotypic drug resistance testing is clinically useful, particularly in transplant recipients [24, 25]. Rates of CMV drug resistance vary based on patient populations: 5–12.5% in solid organ transplant (SOT) recipients and 2–5% in hematopoietic stem cell transplant (HSCT) recipients [26]. Timely detection of CMV drug resistance is critical because DRMs can accumulate with continued exposure to a drug [26, 27], potentially leading to shortened graft survival and increased morbidity [28, 29]. Furthermore, rational change of therapy following identification of drug resistance has been shown to lead to more rapid clearance of virus [30]. Mutations conferring resistance to the CMV therapeutics, ganciclovir, foscarnet, and cidofovir have been characterized in two CMV genes, the DNA polymerase UL54 and the phosphotransferase UL97, together representing <6 kb of coding sequence, which makes CMV well suited for an amplicon sequencing NGS-based approach analogous to assays targeting HIV protease and reverse transcriptase. In fact, an NGS assay for CMV UL54 and UL97 is demonstrating low overall empirical error rate (0.189%) and reliable detection of CMV DRMs in clinical plasma specimens with a wide range of viral loads [18]. Mutations conferring resistance to the terminase inhibitor, Letermovir, FDA-cleared in 2017, have been identified in several genes encoding members of the terminase complex, primarily UL56 and less commonly UL89 and UL51 [31,32,33]. Subsequent assays for CMV genotypic resistance will likely include UL56 and may include other genes important for the development of Letermovir resistance.

The impact of minor-population resistant variants on clinical outcomes in CMV-positive patients has not yet been assessed in large clinical trials. However, there is emerging evidence that NGS can facilitate the detection of impending drug resistance and assist in therapy optimization [27]. NGS studies of viral drug resistance are also expected to identify novel putative DRMs, which, after appropriate phenotypic validation [27], can be incorporated into CMV DRM databases and genotypic interpretation systems, similar to those that exist for HIV-1 [34]. Such automated tools have been shown to improve sequence analysis in addition to expediting and standardizing workflow when compared to manual sequence curation [35].

Virus Identification in Clinical Specimens

Proof-of-concept studies have demonstrated the ability of nontargeted, metagenomic sequencing to identify common, clinically relevant viruses from a variety of specimen types previously shown to be positive by routine molecular testing [36]. Another area of diagnostic virology where NGS is being successfully applied is for the identification of viral pathogens in clinical scenarios where a viral agent is suspected but not detected by conventional diagnostic methods [37]. Many viruses cannot be cultured or identified by traditional molecular techniques, while other methods such as cloning and Sanger sequencing are laborious, time-consuming, and mainly applicable to sterile samples like cerebrospinal fluid [2]. Microarrays targeting highly conserved regions within viral families are capable of detecting known viruses, but they cannot identify novel pathogens without sequence similarity to oligonucleotides on the array [37]. In contrast, NGS offers an efficient, highly sensitive, and unbiased alternative for the detection of viruses in clinical specimens [2, 37]. The general approach in such studies is fundamentally different from that used in targeted sequencing. First, the virus of interest is usually not known and therefore cannot be selectively amplified with target-specific primers. Thus, specialized laboratory and bioinformatics strategies are needed to enrich viral RNA or DNA from the predominantly human nucleic acids. Second, a reference sequence may not be available for mapping of sequencing reads if the virus is novel or largely divergent from known related viruses. This necessitates de novo assembly of the viral genome.

As the nucleic acids in clinical specimens are predominantly of host origin, the enrichment of viral and/or depletion of host sequences is an important step for sensitive NGS discovery of viruses in clinical specimens. Laboratory methods for viral particle purification and enrichment include viral culture, ultracentrifugation, density gradient centrifugation, and pretreatment of the sample with nucleases in order to remove host nucleic acids, while preserving capsid-protected viral particles [2, 38]. Nucleic acid amplification methods for enrichment of viral genomes include rolling circle amplification for viruses with a circular genome [39] and use of restriction enzyme sites that are more frequently encountered in viral nucleic acids than human, followed by ligation of adaptors and PCR amplification [40, 41]. Other methods have incorporated hybridization approaches to capture viral nucleic acids with antisense oligonucleotides as baits, although bait design requires at least some prior knowledge of the pathogen [42, 43]. For example, both ViroCap [44] and VirCapSeq-VERT [45] contain probes for capture of all viruses known to infect vertebrates. Similarly, hybridization methods have been designed to deplete human nucleic acids, including methods utilizing CRISPR-based depletion [46]. Furthermore, computational tools have been developed for “subtracting” host sequences from the initial read pool containing mixed human and microbial sequences [47,48,49]. This filtering step is crucial because viral sequences may comprise <1% of the initial aligned reads [37, 49] (Table 37.2).

Table 37.2 Select studies describing culture-independent NGS pathogen identification from primary human clinical specimens

Additionally, it is frequently unknown whether a putative viral pathogen contains a DNA or RNA genome, which necessitates processing for total nucleic acid extraction. Amplification with random primers may also be necessary to generate sufficient template for library preparation. An interesting approach to this problem for RNA viruses involves reverse transcription with random primers and cDNA amplification using Phi29 bacteriophage polymerase-based multiple displacement amplification [50]. The choice of sequencing platform (Table 37.1) also requires consideration, as read length and sequence depth may impact virus detection and genome assembly [51].

Perhaps the most critical aspect of successful viral discovery is the choice of bioinformatics tools. When the reference genome is known, as in amplicon sequencing experiments, read alignment software typically applies stringent mismatch rules in order to minimize errors. In contrast, with unknown pathogens it may be impossible to map reads to publicly available viral databases if the target virus is highly divergent. Instead bioinformatics tools must assemble reads into contiguous sequences (contigs) by identifying overlapping sequences between reads, followed by contig assembly into genomes [52]. Sequencing methods that produce long reads and therefore linkage information facilitate contig assembly (Table 37.1). Sequences assembled this way can be compared to public databases by using algorithms with relaxed stringency in order to identify related viruses. Repetitive sequences pose a significant challenge in de novo assembly, because they can interfere with PCR amplification as well as accurate genomic mapping. Computational and experimental strategies are being developed to address such issues [2, 52]. Table 37.2 summarizes several representative studies in which the NGS approaches described above have been used to identify viral pathogens in patients with infectious syndromes of unclear etiology.

An important caveat to viral discovery is that demonstrating the presence of a virus in a patient with disease does not automatically imply pathogenicity. Traditionally, proving that a microorganism is the causative agent of disease has depended on fulfilling Koch’s postulates: a putative etiologic agent is found in affected hosts but not healthy controls; it is propagated in culture and can reproduce the disease when a healthy host is inoculated. However, it is increasingly evident that many viruses cannot be cultured, which has prompted the revision of traditional approaches to prove causality for a microorganism in a disease [53]. Such guidelines eliminate the requirement for microorganism isolation but expand on the rigor with which the association between microorganism and disease is established. For example, it may be necessary to demonstrate the presence of virus in affected tissues using immunostaining or molecular methods, to establish a correlation between viral copy number and disease severity or to show seroconversion from acute to convalescent plasma specimens.

Specific Applications in Clinical Bacteriology, Mycobacteriology, and Mycology

Identification by Targeted or Nontargeted Sequencing

Genomic approaches are also likely to assist in the diagnosis and management of bacterial, mycobacterial, and fungal infections, including pathogen identification, as well as characterization of virulence factors, strain typing, and antibiotic resistance markers. Clinical microbiology has traditionally relied on isolation of pathogens by culture followed by biochemical tests and more recently matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) to identify the genus or species of the infecting organism [54]. These assays are inexpensive, at least on a per test basis, have a rapid turnaround time, and are therefore appropriate as first-line diagnostics. However, proof-of-concept studies have shown that NGS holds significant potential for microbial identification from primary human specimens, both by targeted amplicon sequencing of ribosomal RNA (rRNA) genes and nontargeted, metagenomics [3].

Ribosomal RNA Sequencing

rRNA gene sequencing by Sanger is routinely used for bacterial and fungal identification in clinical microbiology laboratories [55, 56]. Furthermore, rRNA sequencing by NGS is the basis for studies of the microbiome. For bacteria, these approaches employ primers targeting conserved 16S rRNA sequences, with variable intervening regions that provide sufficient sequence diversity for taxonomic assignment, often to the species level. The choice of primers is important because certain areas of 16S rRNA genes may allow amplification of a broader spectrum of bacteria than others [57]. Additionally, in some cases, classification may only be possible to the family or genus level because the amount of sequence variation may be insufficient for a species-level identification [58].

Informatics is critical for the interpretation of 16S rRNA sequencing data that is obtained by NGS. The length, number, and quality of sequencing reads, as well as possible bacterial contamination of reagents, are all factors that can impact pathogen identification and introduce bias in microbiome diversity assessments [59]. For example, commonly used DNA extraction kits are frequently contaminated with environmental bacteria, leading to overestimation of bacterial diversity in specimens with low starting bacterial loads such as cerebrospinal fluid, blood, or tissue biopsies [60]. Data analysis tools include error correction methods [58, 61] and removal of amplification-derived chimeric sequences [62, 63]. Processed data or raw sequences can be analyzed in dedicated pipelines such as QIIME [64], the Ribosomal Database Project [65] and mothur [62], which cluster similar sequences into operational taxonomic units (OTUs) based on at least 97% sequence identity [3], followed by phylogenetic analyses. In addition, efforts are under way to standardize the use of these pipelines by establishing quality-filtering parameters based on the sequencing platform and the quality of sequencing data [66]. Importantly, the accuracy of bacterial identification largely depends on the scope and completeness of reference databases used for analysis. A number of extensive databases have been created for 16S rRNA sequences: for example, SILVA (www.arb-silva.de) containing >three million small subunit and >250,000 large subunit bacterial rRNA gene sequences [67] or Greengenes (http://www.greengenes.secondgenome.com/downloads) which can calculate taxonomic relationships based on >400,000 16S rRNA sequences [68].

Targeted 16S rRNA sequencing by NGS may have immediate clinical application for the characterization of mixed infections, particularly those containing uncultivatable or nonviable organisms. This approach has been successfully applied directly to brain abscess material, lymph node biopsy tissue, cystic fibrosis (CF) sputa, and mastoid abscess material [69, 70].

Sequencing of 16S rRNA targets by NGS has also been used to study the genomic diversity of bacterial communities, or the microbiome, in health and various disease states. For example, sequencing of the 16S rRNA hypervariable region was used to study bacterial vaginosis, revealing increased bacterial heterogeneity compared to the healthy state [71]. On the other hand, studies of microbiome in the lower airway of cystic fibrosis patients [72] and stool of patients with Clostridium difficile infections (CDI) [73] or inflammatory bowel disease (IBD) [74] have shown that disease progression is marked by decreasing bacterial diversity, which may be related to escalating antibiotic exposures. Such results indicate that certain disease states may be driven by disturbances in the normal structure and diversity of a microbial community rather than the action of individual pathogens. Although genomic approaches may elucidate the mechanisms by which changes in the microbiome contribute to disease, the diagnostic utility of characterizing the microbiome in patient management remains to be established.

Metagenomic Sequencing

In contrast to rRNA-based NGS approaches, nontargeted approaches allow more detailed functional and taxonomic analyses, either when a cultured isolate is tested or an entire microbial community is being characterized, a field termed metagenomics [54, 58]. WGS methods can also be helpful for bacterial pathogen discovery in patients with suspected infections where culture and other standard diagnostic methods have failed. In such scenarios, a direct patient specimen can be sequenced in a relatively unbiased way, similar to what is described above for viral discovery. The potential diagnostic utility of this approach was demonstrated in a pediatric case of severe combined immunodeficiency and recurrent meningoencephalitis, in which WGS coupled with a rapid, dedicated bioinformatics pipeline [75] detected Leptospira santarosai sequences in cerebrospinal fluid (CSF) within 48 h of specimen receipt [76]. Table 37.2 shows representative studies in which WGS was used for the culture-independent identification of bacterial pathogens in patients with infectious syndromes of unclear etiology.

Analysis of metagenomic data poses even more challenges than those discussed for 16S rRNA sequencing [58]. When sequencing direct clinical specimens, data need to be filtered for human sequences and sequencing errors. In addition, the putative bacterial reads have to be aligned to reference genomes or subjected to de novo assembly of contigs in order for gene predictions to be made and biological functions to be assigned [58, 75]. Examples of pipelines that have been used for clinical pathogen identification include SURPI (sequence-based ultrarapid pathogen identification) [75, 76] and Taxonomer [77], among many others. However, both taxonomic and functional annotations may be limited by the availability of reference genomes. In that respect, large endeavors exploring bacterial metagenomics in the human host, such as the Human Microbiome Project [78] and Metagenomics of the Human Intestinal Tract (MetaHit) project [79], are actively expanding bacterial genomic databases.

Several groups have assessed the feasibility of nontargeted sequencing for bacterial identification and characterization in a clinical microbiology laboratory. One study tested the feasibility of this approach for routine use by sequencing 130 cultured isolates, including aerobic and anaerobic bacteria, mycobacteria, and fungi [80]. The steps from colony harvest to acquisition of analyzable data took ~ 55 h, with most of the time attributable to the sequencing run (39 h). Comparison of these sequencing results to identification by MALDI-TOF-MS, in addition to conventional culture and biochemical methods, demonstrated good correlation: 115/130 samples (88.5%) showed concordant results, while 15/130 could not be identified due to insufficient coverage or absence of applicable reference genomes in publicly available databases (mainly for sterile molds). Thus nontargeted sequencing was able to identify the majority of organisms identified by conventional methods; however, the turnaround time was substantially slower, and a cost analysis was not performed.

Genotypic Pathogen Characterization

In addition to organism identification, NGS methods can be used to identify genotypic markers of drug resistance and virulence, as well as strain typing [54, 81]. Although phenotypic antimicrobial resistance testing is relatively well standardized, it is available for a limited number of organisms and can take up to several weeks for slow-growing organisms like Mycobacterium tuberculosis [82]. Molecular assays with improved sensitivity and turnaround times already exist for some resistance markers; however, resistance to an antimicrobial class can be mediated by several molecular mechanisms, necessitating multiple individual tests or panel testing [82]. Whole-genome-based genotyping, therefore, could simplify workflow and eliminate the need for individual PCR-based assays by simultaneously interrogating all possible genotypic resistance mechanisms, especially if sequencing is being performed for other purposes, such as identification, strain typing, or to detect toxin genes [83]. Use of this approach is likely to expand as new drug resistance mechanisms are characterized and catalogued in publicly available databases such as ResFinder [84] and ARG-ANNOT [85], which use BLAST to query a user-supplied sequence against a curated list of bacterial antimicrobial resistance genes. A number of proof-of-concept studies have assessed the ability of NGS to predict bacterial drug resistance patterns and have been reviewed by Koser et al. [83]. Though genotypic resistance prediction appears feasible, susceptibility determination is nuanced and challenging. Notably, a drawback of genotypic assays is that they do not provide a quantitative measure of antimicrobial susceptibility. In particular, if the presence of a resistance gene or mutation confers variable or inducible resistance, phenotypic assays will still be required [82]. Similarly, when used alone, sequencing may fail to predict a resistance pattern if it has not been characterized genetically or is absent from a given database. For these reasons, whole-genome sequencing is unlikely to replace existing cost-effective resistance assays (phenotypic or molecular) for fast-growing organisms even as cost and turnaround time for sequencing assays continue to decrease. The greatest utility of whole-genome-based drug resistance testing may be for slow-growing organisms such as M. tuberculosis, where multidrug regimens are used, phenotypic testing is complex and available for a limited number of drugs, and the number of genes and intergenic regions that need to be targeted for a comprehensive molecular assay is prohibitively large for a targeted approach [86, 87].

Another area where bacterial whole-genome sequencing is being implemented for clinical purposes is for strain typing in hospital outbreak investigations. Traditionally strain typing has been performed either by fragment analysis methods, e.g., pulsed field gel electrophoresis, or by sequence-based techniques, such as multilocus sequence typing [88]. However, typing schemes exist only for a limited number of organisms, and currently typing is performed primarily in reference and public health laboratories, which means that results are frequently not available within a clinically actionable time frame. In contrast, analysis of single nucleotide polymorphisms (SNPs) based on bacterial whole-genome data can be performed during ongoing outbreaks, does not depend on the availability of established typing schemes, and has higher resolution than most existing sequence-based typing Schemes [54]. In this approach, whole-genome sequencing data are aligned to a reference genome, SNPs are identified and filtered based on preestablished quality metrics, and then phylogenetic analysis is performed to assess the relatedness of bacterial isolates. The feasibility of this approach for reconstructing transmission pathways in hospital outbreak investigations has been demonstrated in a number of studies [54]. However, it remains to be shown whether the use of whole-genome sequencing in the setting of hospital outbreaks will be cost-effective and will be associated with prevention of transmission events.

Validation, Quality Control, and Maintenance of Proficiency

The use of any diagnostic test in the clinical laboratory requires analytical and clinical validation, as well as the careful monitoring and documentation of quality control and proficiency testing (Table 37.3). In that regard, NGS performed in the clinical laboratory for patient care differs from NGS performed in the research setting, even though the sequencing methods may be the same. As such, the American College of Medical Genetics and Genomics (ACMG) has published detailed clinical laboratory standards for NGS [89]. Furthermore, the College of American Pathologists (CAP) has developed an NGS checklist for accreditation of molecular pathology laboratories performing clinical NGS testing [90]. The molecular pathology NGS checklist details requirements for documentation, validation, quality control, and quality monitoring for both the wet bench work and bioinformatics and includes guidelines for data storage, as well as the assessment and implementation of new technology and software releases. Though this checklist has been updated to include examples relevant to NGS for infectious diseases, it is anticipated that in the future, the microbiology checklist will contain a separate section for NGS tailored specifically for microbiology. To further assist in the validation of NGS-based assays for infectious diseases, the American Society for Microbiology and CAP published a manuscript describing the challenges and potential solutions for validating metagenomic pathogen detection tests in clinical laboratories [91].

Table 37.3 Assessment of the performance characteristics of NGS-based tests for clinical microbiology

However, the application of NGS in clinical infectious disease testing poses unique challenges that are distinct from the diagnostic settings of human inherited diseases or cancer. For example, as NGS is increasingly adopted for clinical microbiology, well-characterized and extensively sequenced reference microbial organisms will be required for use as controls and proficiency material. In order to supplement reference strains, mock sequence data may also be necessary to ensure adequate bioinformatics pipelines. These in silico controls and proficiency challenges will be particularly important for the clinical characterization of the microbial metagenome, low-level DRM detection, and the identification of organisms that are unculturable or difficult to culture.

NGS technologies that are being used for clinical infectious disease testing are currently being performed as laboratory-developed tests, as no clinical microbiology NGS tests have yet been approved by the United States Food and Drug Administration (FDA). Nevertheless, the FDA is keenly interested in the regulatory oversight of NGS in clinical microbiology, particularly for microbial identification and the detection of antimicrobial resistance markers. As such, the FDA has published a discussion paper detailing clinical applications and validation approaches for the regulatory approval/clearance of NGS diagnostic devices for clinical microbiology [92]. Of note, this document reports that the FDA is engaged in the development of a database (FDA MicroDB) comprised of >550 high-quality, “regulatory-grade” sequences from clinically relevant bacterial microorganisms to be used in the pathway for regulatory approval. The availability of FDA-approved infectious disease NGS in vitro diagnostics will likely aid in the standardization of specimen handling, library preparation and sequencing, as well as data interpretation, in order to ensure the accuracy and reproducibility of NGS-derived genotypic results. This standardization and quality assurance may be particularly important given that contaminating microbial DNA is ubiquitously found in commonly used extraction kits and reagents used for NGS, as well as “sterile” specimen transport containers [60].

Conclusions

In this chapter we have reviewed areas of clinical microbiology in which next-generation sequencing approaches have been used to identify and characterize medically important pathogens. While many of these studies have been conducted as proof-of-concept experiments or research investigations, NGS-based testing has already been adopted in select diagnostic microbiology laboratories, including academic clinical laboratories, large commercial reference laboratories, and startup companies. Routine applications are likely to increase as cost, turnaround time, and complexity decrease sufficiently to make NGS complementary to existing affordable, standardized, and considerably simpler methods. As technologies like this one are developed and evaluated, the use of NGS for infectious diseases testing may become more widespread.

Targeted NGS assays relying on amplicon sequencing, such as HIV drug resistance testing, were the first to be introduced clinically given the sensitivity advantages over Sanger sequencing and the accumulating data supporting the clinical relevance of low-abundance resistance mutations. NGS-based amplicon sequencing of ribosomal RNA genes may also become more commonly used for identification of pathogenic bacteria and fungi when there is high suspicion for infection and culture is negative or not available or when mixed infections are suspected. Metagenomic strategies may also be useful for pathogen identification in sterile specimens if testing can be optimized to provide clinically actionable data faster than culture or currently available molecular methods. Importantly, the ability of NGS methods and bioinformatics pipelines to accurately identify and characterize pathogens will need to be rigorously validated and compared with traditional diagnostic techniques [93, 94].

The greatest attraction of genomic approaches is that metagenomics sequencing could provide all relevant information about a pathogen in a single assay, including species identification, strain typing, virulence determination, and antimicrobial resistance. In practice, widespread implementation of NGS in clinical microbiology laboratories will require acquisition of costly new equipment and, in particular, the training of personnel in methods that are reliant on bioinformatics. Bioinformatics pipelines will need to provide user-friendly interfaces that allow the user to input data directly from the sequencing instrument and receive best-hit matches to comprehensive and well-curated reference genome databases [54].

Thus, at this point in time, NGS methods are expected to supplement, rather than replace, conventional diagnostic testing. An important hurdle, even in the most sophisticated of clinical laboratories, is that genotype-phenotype correlations for many clinically relevant microorganisms are unknown, although large-scale metagenomic efforts like the Human Microbiome Project will undoubtedly define numerous new associations between sequence and function. Ultimately, the tremendous promise of NGS methods for diagnostic infectious disease testing will require the successful development of clinical microbiologists capable of interpreting and evaluating NGS data and placing these data in the appropriate clinical context.