Introduction

The detection of the antimicrobial resistance and/or identification of microorganisms by molecular means is nothing new to the field of clinical microbiology and continues to evolve, albeit less rapidly than some might have predicted [1]. As an example of this evolution, a protocol for the detection of methicillin resistance among staphylococci using a polymerase chain reaction (PCR) assay combined with agarose gel electrophoresis of amplified nucleic acid product was reported in 1991 by Murakami et al. [2]. In this study, the absence of an amplifiable mecA gene product among Staphylococcus aureus isolates had 100 % correlation with methicillin susceptibility by phenotypic testing. The clinical utility of this approach soon became obvious and was transformed 10 years later into a rapid “real-time” PCR assay for the simultaneous determination of mecA status and S. aureus species identification [3]. A commercial version of this genotypic screening approach was cleared by the U.S. Food and Drug Administration (FDA) shortly thereafter [4] and within a second 10-year period, eight automated or semi-automated assays were released for the detection of methicillin-resistant S. aureus (MRSA) colonization of at-risk patients and/or direct identification from positive blood cultures ([5], plus NucliSENS EasyQ® MRSA in May of 2011). Despite improvements in processing and speed, the information generated by some of these targeted molecular assays was, at times, misleading. Desjardins et al. [5] noticed a high percentage of false-positive results after the implementation of one MRSA screening assay that were likely due to the “kick-out” of the mecA gene, but the retention of the target amplification site at the orfX gene–staphylococcal cassette chromosome mec (SCCmec) element intersection. An additional demonstration of the limitation of such targeted molecular testing was provided by Shore et al. [6], who identified two clonal complex (CC) 130 MRSA isolates that were falsely negative by an MRSA screening assay, owing to a novel SCCmec XI element with a highly divergent mecA gene. These authors utilized whole-genome sequencing (WGS) to elucidate the root cause of the false-negative results.

Next-generation sequencing

Similar to the progression of single-target or multiplex amplification assays described above, the technological wherewithal to sequence large stretches of DNA/RNA and to patch those sequences together into complete genomes or transcriptomes of prokaryotic and eukaryotic organisms has advanced rapidly following the publication of the drafts of the first microbial [7] and human genomes [8]. The term “next-generation sequencing” (NGS) refers to those strategies that have supplemented or supplanted the Sanger dideoxy chain termination sequencing method used nearly exclusively through 2004 [9, 10]. While a detailed review of the technology that comprises NGS methods is beyond the scope of this review, a comparison of the salient characteristics that differ between methods is shown in Table 1. In addition, several excellent reviews of this topic are provided within and are highly recommended [1013].

Table 1 Macroscopic comparison of next-generation technologies (data extrapolated from [912, 68])

The massive sequence output, cost per base, size of microbial genomes, and the ability to generate large quantities of microbial DNA/RNA starting material makes NGS an attractive option to current single-target or multiplex amplification methods for the detection of multiple resistance determinants, virulence factors, or epidemiological markers in a single sequencing run. But is this the proverbial use of an elephant gun to kill a mouse?

Potential for the routine use of NGS technologies in the clinical microbiology laboratory

If we assume that NGS will become increasingly affordable, rapid, and simple to use (i.e., DNA/RNA in; aligned, assembled, annotated, and interpreted genome/transcriptome out), and that technologies and databases will evolve to the extent that WGS will be highly reproducible and reliable, how could this technology be put to its best use in the clinical microbiology laboratory? Before we speculate, let’s try to estimate the cost of WGS in the current market. Bacterial genomes range in size from 0.5 to 10 Mb, but the genomes that most interest clinical microbiologists at the moment (e.g., common human pathogens such as Escherichia coli, Pseudomonas aeruginosa, and S. aureus) reside somewhere in the 2–5-Mb range [14]. It has been estimated that the cost of sequencing using second-generation technology ranges from $1 to $60/Mb [913]. Using these figures, one could estimate the cost of sequencing a single bacterial genome to range from $2 to $300, depending on the technology employed. This does not include capital outlay (which, ranging from $100 K to $1,350 K, is not trivial) and overhead costs including labor, whereas service agreements vary greatly. In reality, using second-generation sequencing instrumentation and 100× coverage of a 4-Mb-sized genome, the output of an eight-lane, two flow cell Illumina HiSeq2000 instrument is around 1,600 genomes over a 10-day period at a cost of around $25.00 per genome; however, that is only the sequencing costs. When the sample preparation steps are added, e.g., DNA/RNA extraction, fragment preparation, library construction, ligation of bar-coded adapters to allow for the pooling of DNA fragments, multiplex sequencing, repeat runs, sequence quality assessment, deconvoluting, eliminating bad sequences, and construction of high-quality genomes (which might require re-sequencing or complicated gap-closing procedures), the estimate is in the range of $200 to $400 per genome (George Weinstock, Washington University School of Medicine, St. Louis, MO, USA, personal communication). The cost will undoubtedly continue to drop due to improved techniques and competition among sequencer manufacturers, so it is reasonable to assume that, in the very near future, WGS of microorganisms will be affordable and reasonably rapid. Now, back to the original question: how can we put this technology to work for improved patient care and clinical outcomes?

Virtual resistance testing

One of the prime uses of WGS would undoubtedly include the virtual resistance testing of bacteria and viruses [15], both on primary patient encounters and throughout the duration of antimicrobial therapy. As output and cost continue to drop, fungi and parasites of clinical importance might be added to the list. The clinical interpretation of potential antimicrobial resistance secondary to the identification of fully characterized resistance-associated sequences (point mutations, indels, or putative open reading frames [ORFs]) would need to be extremely conservative at first. For example, an organism would be reported as “potentially resistant” to a specific antimicrobial agent based solely on a “hit” for a verified resistance sequence. The beauty of this lies in the ability to distinguish among hundreds of different antibiotic-modifying enzymes (e.g., β-lactamases or aminoglycoside-modifying enzymes), each of which might have subtle or distinct substrate profiles that could be used to fine-tune appropriate antimicrobial therapy [16]. Obviously, this approach in the absence of any phenotypic support for the functionality of a resistance gene or mutation could lead to major errors (reported as resistant, actually susceptible) and preclude the use of potentially useful therapies, as would be the case with mecA-positive strains of S. aureus that are susceptible to oxacillin secondary to mutations in the fem gene family [17]. Conversely, and perhaps more importantly, as it would lead to the generation of very major errors, the absence of an identifiable gene or mutation sequence does not guarantee susceptibility. This potential failure stems from NGS’ dependence on annotation and interpretation using prior knowledge about how genetic sequences translate to resistance phenotypes. Indeed, novel resistance mechanisms or combinatorial factors that require two or more separate genetic alterations for the expression of resistance could be very difficult to predict a priori by sequence alone without performing comparisons of an isolate’s genome “before” and “after” therapy. Such a comparative WGS approach was used to identify the resistance determinants associated with a strain of Acinetobacter baumannii that had acquired tigecycline resistance during therapy [18]. The authors determined that resistance was associated with 18 single nucleotide polymorphisms (SNPs) and three deletions between susceptible and resistant strains. Therefore, the reliability of WGS as a means of predicting antimicrobial susceptibility is critically dependent upon the availability of a current and curated database of reference sequences, e.g., http://ardb.cbcb.umd.edu/index.html and http://img.jgi.doe.gov (see also [19]). While it would be desirable to have databases such as these openly available, it is likely that such sequence databanks will require a licensing or user fee or perhaps be accompanied by promotional advertisements! Until the odds of accurately predicting antimicrobial resistance based upon the identification of a species-specific sequence are known, confirmation using conventional phenotypic antimicrobial susceptibility testing (AST) would be required. In fact, it is likely that phenotypic methods will continue to be used, at least for the foreseeable future, to screen microbial isolates for unrecognized resistance patterns, and, thus, their mechanisms of resistance, before gene-level inquiry is pursued.

Alternatively, the need for phenotypic verification of the genotype might be partly circumvented by transcriptome analysis [15], thus, providing direct evidence of functional resistance rather than using gene identification as a proxy. An example of this approach was provided by Feng et al. [20] for linezolid resistance in Streptococcus pneumoniae, a phenotype which required the overexpression of proteins and enzymes involved in sugar metabolism rather than a defined resistance mutation or genetic locus. For second-generation sequencing systems, this would require the conversion of mRNA to cDNA with amplification prior to sequencing, while third-generation systems hold the promise of direct sequencing of isolated mRNA [913]. A novel protocol for the isolation and enrichment of bacterial mRNA from total cellular RNA, including the eukaryotic ribosomal fraction, for this very purpose has been described [21]. Transcriptome analysis would also require exposure of an organism to subinhibitory concentrations of an antimicrobial agent and the subsequent sequencing of mRNA transcripts to identify upregulated resistance determinants and/or SOS genes to predict or determine drug susceptibility. One possible pitfall of using SOS transcripts as a congener of drug susceptibility would be the presence of small heteroresistant subpopulations that would be undetected by this strategy. However, heteroresistant populations could be selected upon prolonged antibiotic exposure with conventional AST.

Taxonomy and epidemiology

The routine use of WGS would likely open a whole new door in terms of taxonomic classification and identification of novel bacterial species and subspecies. In addition to resistance prediction, the expanding collection of completely sequenced genomes could provide for the epidemiological typing of microorganisms at the ultimate level of resolution—essentially generating whole-genome SNP analysis [15, 22]. The use of WGS for epidemiological purposes has been demonstrated with clinical isolates of multidrug-resistant strains of A. baumannii that could not be differentiated using standard epidemiologic tools, such as pulsed-field gel electrophoresis and variable number tandem repeat analyses [18]. Further, WGS would also permit previously characterized virulence genes to be detected and identified simultaneously. The expanded use of WGS for microorganisms has the potential to produce the equivalent of a HapMap for microbes (http://hapmap.ncbi.nlm.nih.gov/). When combined with data generated by the human HapMap, the risk for serious sequelae when individuals of a particular haplotype become infected by specific strains of an organism carrying characterized virulence factors could begin to be assessed. This is akin to the scenario described for Helicobacter pylori virulence genes cagA, vacA, and babA2 versus human cytokine polymorphisms when evaluating the risk for the development of precancerous gastric lesions [23]. Another promising application of NGS would be in urinary tract infection (UTI), where genetically well-determined bacterial virulence factors define host recognition and response during infection [24], the probability of recurrent disease [25], and progression of uncomplicated UTI to bacteremia [26]. Simultaneous genotyping of host and microbe in this case could potentially define treatment tailored to the patient’s potential for severe and recurrent disease versus the likelihood of experiencing a single episode of uncomplicated cystitis.

Obviously, much of this technology would have broad application in the public health domain, providing that funding is maintained at a level equivalent to that in the private sector.

The availability of WGS, obtained using NGS technologies, as a means of supplanting all existing forms of molecular epidemiology typing methods and, at the same time, providing real-time, longitudinal analysis of outbreaks in progress on any scale has been referred to as “Public Health 2.0” by Pallen and Loman [27]. Prominent examples of such “Public Health 2.0” endeavors include the elucidation of the origin of the Vibrio cholerae isolate that devastated Haiti after the 2010 earthquake [28], where it was revealed that the epidemic isolate was closely related to isolates from Asia rather than circulating South American isolates and likely introduced by human activity. The sequence of the E. coli O104:H4 isolate, the etiologic agent of the 2011 foodborne outbreak in Germany, which saw the release of the organism’s genome before a contaminated food source was appreciated, provides another example [29]. This analysis rapidly revealed, via a reductionist multilocus sequence typing approach that examined only seven common housekeeping genes, that the outbreak strain was a hybrid of enteroaggregative and enterohemorrhagic E. coli that had acquired AAF/I fimbriae and the CTX-M-15 β-lactamase relative to a non-outbreak strain from 2001. A more in-depth analysis resulted from “crowd sourcing” the analysis of the genome to the public, dubbed the “E. coli O104:H4 Genome Analysis Crowd-Sourcing Consortium”, which has now publicly annotated and commented on the genomes of ten outbreak strains [30, 31].

But, as Pallen and Loman wisely point out, universal use of WGS in the public health arena will require a level of standardization and consistency that has not yet been attained, as well as better control of a host of other variables, such as the recognition of sequence differences among heterogeneous sub-populations within outbreak-associated organisms that could generate confounding epidemiological findings (e.g., isoniazid or rifampin heteroresistance among isolates of Mycobacterium tuberculosis) [27]. And while it is clear that WGS will not be a panacea for all of the nuances facing public health microbiologists, it may well be that NGS may find an initial home in public health microbiology laboratories, which have more centralized resources and a keener interest in epidemiology, where it will serve as another weapon in the arsenal that may advance the productivity of public health diagnostic laboratories beyond their current means.

Of course, risk evaluation is not limited to bacterial infection. Barzon et al. [32] have developed a method for the direct identification of human papillomavirus (HPV) genotypes in cervical samples using NGS for both single- and multiple-genotype infections. NGS provided a sensitivity of up to 100 genome equivalents/μL of cytology sample for high-risk strains HPV16 and 18, and had a sensitivity of 100 % for the detection of HPV DNA in cervical samples compared to conventional PCR using consensus primers. Further, multiple infections could be identified when present at 1 % of the total genome equivalents. Compared to Sanger sequencing, 454 NGS was 100 % sensitive and specific for the detection of HPV in clinical samples and identified more samples with mixed HPV infection. By contrast, the agreement between NGS and the INNO-LiPA HPV Genotyping Extra kit (Innogenetics) was genotype-dependent [32].

Microbiome analysis

One of the most powerful applications of NGS has been its role in understanding the complex biodiversity of the human microbiome: the plethora of microorganisms colonizing every “nook and cranny” of the human body. While we are only beginning to appreciate what constitutes the normal or abnormal microbial constituents of the gut [3338], lower and upper respiratory tract [39, 40], vagina [4144], skin [4548], urinary tract [4951], gingiva [52, 53], and wounds [54, 55], it is not difficult to envision that human microbiome analysis as a means of evaluating and stratifying different disease processes could become part of mainstream clinical management. Indeed, gut microbiome analysis already shows promise in classifying disease processes such as inflammatory bowel disease into more specific entities, such as Crohn’s disease or idiopathic bowel syndrome [5659], and as more detailed information is filtered from so-called “altered microbiome” studies, it should be possible to winnow disease-associated changes down to one or two highly predictive species. Indeed, the “alpha-bug” theory postulates that there is an association between the development of colon cancer in mice, and potentially in humans, to gut over-colonization with and exposure to enterotoxigenic Bacteroides fragilis [60]. A further promise of human microbiome studies are their ability to register the microbial diversity of the human microbiome during and in response to clinical therapy, so that the resultant information can be used to both predict prognosis and alter unfavorable clinical outcomes. In a seminal study, Ubeda et al. assessed the potential of cataloging the human gut microbiome by NGS in the clinical management of allogeneic hematopoietic stem cell transplant (allo-HSCT) patients. The authors observed that post-antimicrobial overpopulation of the intestinal microbiome with vancomycin-resistant enterococci (VRE) rendered this patient population at higher risk of developing bacterial sepsis compared to allo-HSCT patients whose intestinal microbiome was not enriched with VRE. Thus, at least in this highly fragile patient population, the definition of the composition of the intestinal microbiome affords some clinical utility, especially as these patients might serve as candidates for the administration of probiotic microbes in an attempt to prevent bacterial sepsis [61].

Challenges

Every new technology and its implementation into a specific environment is accompanied by a plethora of challenges, and this will certainly be true for the inclusion of NGS technologies in the clinical microbiology laboratory. A notable challenge is the development of highly reproducible NGS sequencing technology/software platforms that can provide clinically actionable reports to physicians at a pace equal to, if not vastly greater than, the culture-based methods used at present. This, coupled with the construction and maintenance of readily accessible databases of reference microbial genomes up-to-date with contemporary and historical resistance determinants (e.g., discrete genes, indels, SNPs, and promoter/regulator mutations), is critical to the success of NGS technologies in the clinical microbiological laboratory. Also essential to the reproducible identification of the genetic elements and mutations listed above are the resultant sequence “read lengths”. While monolithic “read lengths” are reported by manufacturers, this is typically a rule-of-thumb truncation of longer individual reads that are low quality on the their distal ends but may, nevertheless, be “piled” in non-truncated form to increase the likelihood that a contiguous sequence is mapped correctly. In cases where only short reads are obtained, the sequencing of clinical isolates using reference genomes as templates would be complicated by difficulties calling SNPs, indels, and copy number variation [62].

An additional, yet not to be overlooked, challenge is the ability of laboratory directors [63] and personnel to collect, manage, analyze, and interpret the colossal amounts of bioinformatic data that will be generated by NGS runs. The treatment of such data may require skills that are well beyond those of the current workforce. Thus, as a profession, we as clinical microbiologists must ensure that the emerging workforce are endowed with the skills to face the challenges of NGS in the clinical microbiology laboratory, and that the proficiency of this workforce is maintained.

On the regulatory level, the clinical validation of molecular assays [64] is a relatively common activity, however, the mechanism(s) by which to validate clinical NGS-based assays is as yet undefined, particularly as concepts such as limit of detection and normal ranges are difficult to define for microbiota. Thus, as with training of the workforce, we as a profession must ensure that we play an active role in defining the mechanism(s) by which NGS-based assays may become validated and verified for clinical use. Any working assay must be validated on many levels: sample and library preparation, sequence generation and subsequent automated annotation and interpretation (which are highly dependent on the platform and analysis pipeline used [Table 1]), and distillation of the data into a reportable result that carries enough weight to influence clinical outcomes. Initially, the trading of split samples between institutions with different approaches to similar tests may serve as sufficient proficiency testing but, in the long term, national and international regulatory bodies will need to set clear standards.

Finally, classical indicators of a disordered microbiota such as “clue cells”, gingivitis, and diarrhea may be more familiar to physicians and deemed more worthy paying for over reports generated by WGS analyses. This problem may be mitigated by the existence of problems where a great deal of sequence information can be distilled into a well-established risk analysis from the microbial side (e.g., HPV [32, 65] or hepatitis C virus [66] typing) or the perspective of the host (e.g., carrier testing for an array of diseases such as cystic fibrosis at birth [67]).

Summary

As the cost and complexity of sequencing platforms declines and access to curated databases containing validated sets of resistance and virulence markers become readily available, the routine use of NGS/WGS technology in the diagnostic clinical microbiology laboratory will, in our opinion, become increasingly commonplace. We also strongly believe that sequence-based diagnostics will not likely replace conventional microbiological methods, including growth-based assays, any time soon. There will be a continued need to validate novel resistance mechanisms versus standardized in vitro AST, characterize previously unrecognized agents by microscopic and colonial morphology and metabolic profiles, establish animal models of infection, and establish parallel, low-cost methods of identification. All of these alternatives will work in a supplementary fashion to provide a blend of complexity suited to the needs of individual laboratories. Regardless of the mix of technologies used in the future, it is likely that sequence-based diagnostics will be a major part of the blend. Clearly, it is an exciting time for clinical microbiology.