Keywords

1 Introduction

Staphylococci are Gram-positive cocci that appear as grape-like clusters. The genus is comprised of more than 40 species, most of which are harmless and exist on the skin and mucous membranes of humans or other animals. Staphylococci are divided into coagulase-negative (CoNS) and coagulase-positive members, based on their ability to produce the free enzyme coagulase, which causes blood clot formation. While the majority of staphylococcal species are CoNS, few CoNS have been implicated in human disease. This, however, has been changing, with an increasing number of CoNS infections identified, boosting their clinical significance [1, 2]. Staphylococcus aureus (SA), the most notable member of the genus, is coagulase positive and has been the primary focus of clinical identification as it is commonly associated with human infection. Methicillin-resistant Staphylococcus aureus (MRSA), in particular, has garnered much of that attention as it is resistant to all penicillins and most β-lactam drugs and associated with higher morbidity and mortality rates among hospitalized patients and higher patient care costs [3,4,5].

MRSA has been shown to asymptomatically colonize 20–30% of the human population [6, 7] but is also responsible for a wide variety of infections, ranging from mild skin and soft tissue infections to life-threatening illnesses such as endocarditis, septicemia, and hemorrhagic pneumonia [8]. MRSA infections were initially associated with hospitals and healthcare settings; however, MRSA has since emerged as a major cause of community-associated infection as well. Adding complexity is the fact that, despite the overwhelming attention given to MRSA, methicillin-sensitive S. aureus (MSSA) infections are increasingly being recognized as presenting a significant threat to public health [9, 10]. With the ever-changing prevalence and epidemiology of S. aureus infections, reliable methods for characterizing strains are essential for outbreak investigations, for tracking clonal spreading, and for the implementation of effective treatment or control measures. At the local level, typing is useful for identifying clones, which aids in disease management and in predicting prognosis. It also helps identify outbreaks and strain spreading within the geographic locale, guiding infection control strategies. At the international level, strain typing aids in investigation related to the evolution and spread of clonal types, both over large areas and over time. Discussed in this chapter are the various phenotypic and molecular methods used to discriminate S. aureus lineages.

2 Identification of Staphylococcal Species

Differentiation of S. aureus from CoNS is accomplished using standard microbiological methods in clinical diagnostic laboratories. Staphylococci are catalase-positive, facultative anaerobes, capable of growing in the presence of bile salts or 6.5% NaCl solution. Columbia or tryptic soy blood agar, with 5% defibrinated sheep or horse blood, is the primary culture plate used for staphylococcal isolation. On blood, S. aureus presents as large, round, golden-yellow colonies that are most often β-hemolytic. CoNS colonies, on the other hand, are typically smaller in size, non-pigmented, smooth, glistening, and opaque, although some species can be gray-yellow to yellow-orange in pigmentation and can also be β-hemolytic. Coagulase tube test or rapid latex and hemagglutination assays allow presumptive identification of S. aureus, while commercial systems can differentiate the staphylococcal species using biochemical procedures. Systems such as Vitek 2 (bioMérieux), the BBL Crystal Identification System’s Rapid Gram-Positive ID Kit (BD Diagnostic Systems, Sparks, MD), the Pos ID Panel family (Siemens Healthcare Diagnostics, Deerfield, IL), the Phoenix Automated Microbiology System (BD Diagnostic Systems), the Biolog systems (Biolog, Hayward, CA), the RapiDEC Staph (bioMérieux), and the API Staph and ID32 Staph strips (bioMérieux, La Balme-les-Grottes, France) are routinely used in clinical laboratories. Antibiotic susceptibility patterns for the staphylococcal species can be obtained on systems such as Vitek 2 (bioMérieux).

While biochemical identification of S. aureus is relatively straightforward, CoNS have proven to be more problematic. Common species such as S. epidermidis, S. saprophyticus, and S. haemolyticus are generally successfully identified by biochemical means, while identification of less common species such as S. warneri and S. hominis shows more variable rates [11,12,13]. Nucleic acid amplification and sequencing of universally occurring genomic regions offer an effective alternative for speciating staphylococci and can be accomplished quickly with minimal cost. Sequencing of a portion of the rpoE gene has been shown to accurately differentiate staphylococcal species [14]; however, sequencing of the 16S rRNA gene is generally considered the gold standard for identification and taxonomic classification of bacterial species. 16S rRNA is the small component of the prokaryotic ribosome that binds to the Shine-Dalgarno sequence, with its gene undergoing slow rates of evolution, making it useful for phylogenetic analysis. The 16S rRNA gene contains highly conserved primer binding regions, as well as nine hypervariable regions (V1–V9), each ranging from 30 to 100 bp in length [15]. Sequencing of the full 16S rRNA gene can be performed; however, more commonly shorter sequences involving the variable regions are targeted. Regions V1–V3, in particular, have been shown to be the most useful in distinguishing among staphylococcal species [16]. Various 16S ribosomal databases exist for analyzing sequencing data, including public databases such as NCBI and secondary ones such as EzBioCloud, Ribosomal Database Project, SILVA, and Greengenes [17,18,19,20]. While the public databases are easily accessible and free, the quality of sequences and taxonomic assignments found on the database are often not validated, making secondary databases that collect and validate 16S rRNA sequences superior choices.

As CoNS are not routinely typed beyond species identification and antibiotic susceptibility, the remainder of this chapter will focus on molecular characterization of S. aureus. Discrimination of isolates based on phenotypic and genotypic characteristics is important for determining clonal relationships between strains and furthering our understanding of the epidemiology of infectious diseases. Presently, classification schemes for Staphylococcus aureus are based less on phenotypic methods and more so on molecular ones. While many of these methods were initially used for research purposes, they are now commonly used in clinical labs as well.

3 MRSA Identification and SCCmec Typing

Distinguishing MRSA from MSSA is an important first step in S. aureus classification. MRSA have acquired and integrated into their chromosome a mobile genetic element known as staphylococcal cassette chromosome mec (SCCmec) , which carries the methicillin resistance genes mecA or mecC. mecA was the first methicillin resistance gene identified and encodes an alternative penicillin-binding protein (PBP2a or PBP2’), which has low affinity for semisynthetic penicillins and confers resistance to all β-lactam drugs except ceftaroline and ceftobiprole [21]. mecA remained the only methicillin resistance gene identified in S. aureus until 2011, when the mecC gene was described, sharing 70% identity with mecA, and coding a PBP2a/2′ sharing 63% homology at the amino acid level [22]. A third homologue, mecB, was first identified in 2009 in closely related bacteria, Macrococcus caseolyticus [23]; however, in 2018, it was detected for the first time in S. aureus on a plasmid [24]. The mecB gene shares 60% homology with mecA and confers resistance to methicillin. A fourth homologue, mecD, has been reported on a genomic island (McRImecD-1 and McRImecD-2) in M. caseolyticus but to date has not been detected in S. aureus. The Clinical and Laboratory Standards Institute (CLSI) recommends testing for MRSA using broth microdilution or with cefoxitin disk diffusion or Mueller-Hinton agar plates supplemented with 4% NaCl and 6 μg/ml of oxacillin as alternatives [25]. Chromogenic agars, such as CHROMagar™ MRSA, Oxoid Brilliance™ MRSA, MRSASelect, BBL™ CHROMagar™ MRSA, and ChromID MRSA, are also available for MRSA detection, offering highly sensitive and specific detection [26]. The PBP2a latex agglutination test (Oxoid, Hampshire, UK) is also available as an alternate phenotypic test for detecting PBP2a in S. aureus colonies; however, it suffers from a large variability in performance [27, 28]. No optimal phenotypic method exists for MRSA detection, as they generally require specialized conditions and results are affected by factors such as inoculum size, incubation temperature and time, or pH and salt concentration.

Nucleic acid amplification tests represent a more precise and reliable form of MRSA identification and have become the gold standard for MRSA detection. These assays have traditionally relied on detection of the mecA gene; however, detection of the mecC gene also needs to be considered now. Additionally, while the mecB gene has only been described in one instance, its detection may become important if the gene spreads. Murakami et al. [29] were the first to develop a PCR assay for MRSA detection, targeting the mecA gene, while the first multiplex PCR assay targeting both the mecA and 16S rRNA genes was developed by Geha et al. [30]. Since then, a substantial number of assays have been developed targeting the mecA/mecC genes alone or in conjunction with other targets, such as PVL, fem, nuc, or 16S rRNA, and using both standard and real-time PCR platforms. In 2008, Zhang et al. developed a multiplex PCR assay that could discriminate staphylococci from non-staphylococcal species while simultaneously distinguishing S. aureus from CoNS, identifying MRSA, identifying the Panton-Valentine leukocidin virulence genes, and presumptively identifying USA300 and USA400 epidemic strains [31]. While this assay has been extensively used, it suffers in that it does not detect the mecB or mecC genes. In 2012, Stegger et al. developed a multiplex PCR assay capable of simultaneously detecting both the mecA and mecC genes, along with the PVL genes and the staphylococcal protein A gene (spa) [32]. The assay allows rapid and inexpensive detection of MRSA, with the ability to perform downstream spa typing of isolates, but does not take into account the mecB gene.

As mentioned, the mecA and mecC genes, which confer resistance to β-lactam antibiotics, are carried on a mobile genetic element termed staphylococcal cassette chromosome mec. To date, 13 different SCCmec elements have been described in S. aureus based on the nature of their mec and ccr gene complex and are further divided into subtypes based on differences in their joining regions. These differences provide an important means of classifying MRSA isolates, as even closely related strains can differ in the type of SCCmec element they carry. Initial SCCmec typing schemes involved molecular cloning and sequencing or long-range PCR amplification with multiple sets of primers [33,34,35]. Typing schemes have since improved to include conventional PCR detection of several type-specific loci [36], RFLP analysis [37, 38], multiplex PCR [39], multiplex real-time PCR [40, 41], and targeted DNA microarrays [42]. Multiplex PCR typing is currently the most widely used method of SCCmec typing, with several variations developed. A novel multiplex PCR assay for the characterization and concomitant subtyping of SCCmec I–V was developed by Zhang et al. in 2005 and later updated in 2012 to make it more accurate and reliable [43, 44]. Similarly, in 2007, Milheirico et al. updated a previous multiplex PCR assay to detect SCCmec I–V. These multiplex assays are by far the most commonly used ones for SCCmec typing; however, both are limited to detection of types I–V, requiring other methods for the detection of types VI–XIII. Both are also restricted by their inability to classify newly evolving SCCmec types and subtypes. Unfortunately, to date, no single PCR assay is available to identify all SCCmec types and subtypes. Targeted DNA microarray offers an alternate option for SCCmec typing, simultaneously detecting multiple genes associated with SCCmec, including mecA and its regulatory genes, and sequences in the J regions [42]. As with PCR, only known SCCmec types can be identified with this technique, and it suffers from the added disadvantage that specialized equipment and highly trained personnel are required. As such, multiplex PCR remains the best option for SCCmec typing at present.

4 Historical Typing Methods

In an attempt to understand and track S. aureus (particularly MRSA) infections, numerous typing methods were developed to classify lineages. While these historical methods are rarely used routinely anymore, they still can be of value when typing S. aureus.

Phage Typing

relies on bacterial susceptibility to a defined set of phages, with a set of 23 internationally accepted phages used for typing human strains of S. aureus [45, 46]. While the method was the primary one used for several years, it suffered in that it often lacked reproducibility and was time-consuming and technically challenging and a large percentage of strains remained untypable with the technique [47,48,49,50].

Multilocus Enzyme Electrophoresis (MLEE)

involves the extraction of constitutively expressed proteins from the bacteria and their separation on gels using electrophoresis, with the rate of migration being dependent on amino acid composition in the proteins. Generally, 12–20 proteins are assessed, each being assigned allelic types based on variation in their charge, with the similarity between isolates determined by the proportion of loci which show differences. While MLEE generally has good reproducibility and typability for S. aureus, it is a labor-intensive procedure, and the results are difficult to compare between laboratories [51, 52].

Random Amplification of Polymorphic DNA (RAPD) and Arbitrarily Primed PCR (AP-PCR)

rely on parallel non-stringent amplification of random DNA fragments, resulting in unique gel patterns specific to each bacterial strain [53, 54]. In RAPD, short arbitrary primer sequences and low-temperature, non-stringent annealing conditions allow amplification of multiple PCR products of varying sizes. Amplicons are analyzed either by gel electrophoresis or DNA sequencing, with the number and size of fragments used to define an isolate type [54]. AP-PCR is a variant of RAPD, whereby amplification is done in three parts, each of which has a set stringency and reagent concentrations [53]. While these techniques have been used successfully in outbreaks and are relatively inexpensive and easy, they have lower discriminatory power and lower inter- and intra-laboratory reproducibility [55,56,57].

Repetitive Element PCR (rep-PCR)

employs primers that bind to noncoding repetitive sequences in the bacterial genome, producing fingerprint patterns unique to each isolate [58]. The repetitive palindromic extragenic elements (Rep) are sequences 35–38 bp long that occur in variable positions and numbers. Amplification of the elements creates amplicons of varying lengths, which are separated by electrophoresis, creating fingerprints unique to the strains. For S. aureus, RepMP3 and inter-IS256 and Tn916 are commonly used targets, with RepMP3 showing greater reproducibility and stability [59]. Rep-PCR has high discriminatory power, with good correlation to PFGE; however, reproducibility can suffer from variations in reagents and electrophoresis systems [60].

Amplified Fragment Length POLYMORPHISM (AFLP)

relies on differences in the amplification of digested genomic DNA fragments [61]. Genomic DNA is digested with restriction enzymes, and double-stranded adaptors are ligated to the sticky ends, followed by amplification of the fragments using primers complementary to the adaptors. The primers are generally fluorescently labelled; therefore, after separation of the amplicons based on size, they can be detected with an automated DNA sequencer and compared by computer. Analysis of the high-resolution banding patterns is used to determine the relationship between strains [62]. While this technique is portable and highly reproducible and has high discriminatory power, it is time-consuming and expensive [63, 64].

Accessory Gene Regulator (agr) Typing

is a PCR-based typing method that relies on amplification of hypervariable regions present in the agr locus to classify strains. The accessory gene regulator (agr) is a bacterial regulatory component containing two divergently transcribed units, which has highly conserved and hypervariable regions [65]. Four genes, agrA, agrC, agrD, and agrB, are present in the locus. The C-terminal of agrB and agrD and the N-terminal of agrC are highly divergent and constitute the hypervariable region of the locus, which is used to divide S. aureus into four agr groups (I–IV) [65]. PCR primers for agr group determination were developed by Peacock et al. [66], and a multiplex real-time quantitative PCR assay was developed by Francois et al., targeting the variable region of agrC and offering good specificity [67]. While agr typing is extremely limited in its discriminatory power and would not be useful for defining S. aureus lineages, it does provide additional information about strains that can supplement other typing methods.

5 Current Molecular Typing Methods

Current typing schemes for S. aureus classification rely predominantly on molecular methods based on DNA sequence variations. A proposal was made that MRSA clones should be defined based on a combination of the genomic type of the strain and the SCCmec type, a nomenclature system that was accepted in 2002 by the subcommittee of the International Union of Microbiology Societies in Tokyo [68]. This system, which can be amended to describe both MRSA and MSSA (e.g., ST8-MRSA-IVa or ST8-MSSA), relies solely on multilocus sequence typing and SCCmec typing (discussed below) to define the strains. While these two methods are important parts of S. aureus classification, the addition of other typing schemes provides more complete information about S. aureus lineages, which are discussed below.

5.1 Pulsed-Field Gel Electrophoresis (PFGE) Typing

PFGE was first described in 1984 and is based on the digestion of bacterial genomes into large fragments with a restriction enzyme and their subsequent separation by gel electrophoresis [69, 70]. Because larger fragments of DNA will co-migrate and appear as a large diffuse band with conventional gel electrophoresis, in PFGE, the voltage direction is periodically switched (pulsed), allowing effective separation of larger DNA pieces. Migration of the DNA fragments produces a DNA fingerprint, which can be used to compare the relatedness of strains.

For PFGE, genomic DNA needs to be intact and free from mechanical shearing; therefore, bacterial cells are incorporated into agarose plugs prior to lysis to protect the DNA from damage [71]. DNA, which is immobilized in the agarose plug, is digested with a rare-cutting restriction endonuclease, at which time the plugs are loaded onto an agarose gel and subjected to PFGE. PFGE protocols for Staphylococcus aureus have been optimized and, with minor variations, include standard features common to typing this species [72,73,74] (https://www.cdc.gov/mrsa/pdf/ar_mras_PFGE_s_aureus.pdf). A number of restriction endonucleases have been used in PFGE typing of bacterial species; however, smaI was found to be the most useful for S. aureus, allowing nearly all isolates to be typed, with reproducible results following repeated subcultures [75,76,77]. S. aureus belonging to the ST398 lineage are the exception, not typable using smaI due to a DNA methyltransferase that modifies the consensus sequence [78]. The restriction enzyme Cfr9I, a neoschizomer of smaI, is able to cleave these strains within the same recognition sequence as smaI and is used for PFGE typing of the ST389 lineage. S. aureus gels are generally run with the contour-clamped homogeneous electric field (CHEF) electrophoresis system, where the current is applied in three directions, offset by 120°, using hexagonally arranged electrodes [52, 79].

PFGE is a popular technique used by laboratories around the world and is effective for providing local epidemiological information, as well as for identifying epidemics. In experienced hands, the method can provide information related to the presence or absence of some mobile genetic elements such as the SCCmec cassette or phages. The technique has high discriminatory power, and results can be reproducible at both the intra- and inter-laboratory levels when the method is highly standardized [48, 80]. To aid with standardization, the Centers for Disease Control and Prevention in the USA developed PulseNet (https://www.cdc.gov/pulsenet/index.html). It is a national laboratory network that uses bacterial DNA fingerprints (such as PFGE patterns) to detect foodborne illnesses and outbreaks. Standard protocols are available, and data can be shared nationally or internationally. Also helping with standardization is the fact that S. aureus gels are run with the S. braenderup H2812 control standard and the data normalized and analyzed using BioNumerics software (Applied Maths, Sint-Martens-Latem, Belgium). Data analysis criteria set out by Tenover et al. are useful for comparing strains and determining their relatedness [81], and S. aureus PFGE profiles have been assembled into a national database to assist interpretation [72, 82]. In Fig. 9.1, sample PFGE patterns for Canadian and US epidemic reference strains are shown, along with some other common typing information for each strain. PFGE does suffer from limitations, the main ones being the long turnaround time, the high cost for specialized equipment and software, and the skill level required. Without high standardization, data interpretation can be problematic, as differences in electrophoresis equipment and conditions can affect DNA migration, complicating isolate comparisons within and between laboratories [83, 84]. As well, the technique separates DNA based on size, not sequence, and small changes are enough to affect the fingerprint. For example, the acquisition or loss of mobile genetic elements will alter the banding pattern, as will a point mutation in the smaI recognition sequence.

Fig. 9.1
figure 1

Sample typing results for representative Canadian (CMRSA1-10) and US (USA100-800) epidemic reference strains. Different lineages may share the same type when classified using a single typing method but will become distinguishable from each other when multiple typing schemes are used together. Pulsed-field gel electrophoresis (PFGE) profiles, staphylococcal cassette chromosome mec (SCCmec) type, accessory gene regulator (agr) type, staphylococcal protein A (spa) type (including Ridom repeat pattern and Kreiswirth ID), and multilocus sequence type (MLST) (including MLST profile) are shown

Despite the limitations, PFGE remains a powerful technique for S. aureus typing and classification and is still considered the “gold standard.”

5.2 Staphylococcal Protein A (spa) Typing

The spa gene, coding for protein A, is conserved among S. aureus and has proven to be an effective target for single-locus sequence typing of this species. The gene is approximately 2 kb in length and contains conserved Fc binding regions, a variable X region, and a conserved C-terminal region. The X region (or repeat region) is comprised of polymorphic variable number of tandem repeats (VNTR), generally consisting of 2–18 repetitive sequences of 21–30 bp (most often 24 bp) in length [85]. Each repeat is given an identifier (numerical or letter code), with the number, order, and sequence of these repeats varying between strains, forming the basis for spa typing [86, 87].

Two nomenclature systems, Ridom and Kreiswirth, are used for describing spa types and repeats, with Ridom represented by numerical repeat codes and Kreiswirth represented with alpha numeric repeat codes [86, 88]. Conversion between the two is possible with online tools. The Ridom StaphType software (available for download from www.ridom.de/staphtype/) was developed to ensure uniform assignments of spa repeats and types and is useful for MRSA surveillance. The software synchronizes with the Ridom SpaServer (www.SpaServer.ridom.de), which is a freely accessible server developed to collate and harmonize data from around the world, permitting 100% reproducibility between laboratories and providing public access to typing data. Figure 9.1 shows the spa type, including the Ridom and Kreiswirth profiles, for Canadian and US epidemic reference strains.

spa typing is a reliable way of assigning lineage and has proven to be effective for both short-term and long-term epidemiological studies [80, 86,87,88,89]. The speed and simplicity of targeting a single locus make it favorable for short-term studies, while the stable association of types with lineages over time makes it suitable for long-term studies. Development of the BURP (Based Upon Repeat Pattern) algorithm has provided an automated method to infer clonal relatedness of isolates based on spa repeat patterns and was shown to have high concordance with other typing methods [89, 90]. With a high discriminatory power, spa typing is a cost-effective, easy-to-use method with excellent reproducibility and portability. The major drawback of spa typing is the fact that the method relies on typing a single locus, running the risk that strains can be misclassified due to recombination and/or homoplasy [91]. Strains from different lineages can carry the same spa type (Fig. 9.1), and epidemiologically related strains from a lineage may carry different spa types, varying in as little as a single repeat. spa typing is, consequently, most effective when used in combination with other typing methods.

5.3 Multilocus Sequence Typing (MLST)

MLST is similar in principle to MLEE, but variations are examined directly by DNA sequencing. The method relies on sequencing a 402–516 bp fragment from each of seven essential housekeeping genes, present in all S. aureus isolates. These genes are crucial to cellular function and, therefore, stable and evolve slowly. Based on point mutations, the genes for each locus are assigned numerical allele designations, with the series of seven numbers (one representing each locus) defining the sequence type (ST type) of a strain. For S. aureus, the carbamate kinase (arcC) , shikimate dehydrogenase (aroE) , glycerol kinase (glpF) , guanylate kinase (gmk) , phosphate acetyltransferase (pta) , triosephosphate isomerase (tpi) , and acetyl coenzyme A acetyltransferase (yqiL) genes were selected, as they provided the highest number of alleles with the best resolving power for identifying lineages [92]. The genes are arranged in the abovementioned order (i.e., arcC-aroE-glpF-gmk-pta-tpi-yqiL) to define the ST type (e.g., ST8 has an MLST profile of 3-3-1-1-4-4-3).

Sequence analysis was initially facilitated by the online server available at MLST.net, a free website which provided the main hub for assigning allele and sequence types, naming new ones, as well as storing other important information related to the clonal types [93]. Now available for analysis is the database at PubMLST (https://pubmlst.org/saureus/), which contains both sequence definition and epidemiological information [94, 95]. To aid with visualizing and analyzing the evolutionary relationship between isolates, the eBURST (Based Upon Related Sequence Type) algorithm was developed [96, 97]. Strains sharing identical allelic profiles are considered as belonging to the same ST type and lineage, while strains differing by one or two loci (single-locus variants or double-locus variants) are considered to be genetically related, belonging to the same clonal complex (CC). The founding genotype for a clonal complex is the one that differs from the highest number of other genotypes by only one locus, assuming strains emerge as dominant clones and then diversify with time. A representative eBURST image showing the relatedness of MLST types from Canadian and US epidemic strains in the global Staphylococcus aureus population is shown in Fig. 9.2.

Fig. 9.2
figure 2

Demonstration of eBURST analysis showing the relatedness of MLST types identified in the Canadian and US epidemic strains CMRSA1-10 and USA100-800 in the global Staphylococcus aureus population. Clonal complexes are marked in black font for the strains of interest, while ST types are marked in red (Generated on December 1, 2018)

MLST is a useful tool for assigning lineage and has proven to be effective for studying the origin and evolution of S. aureus. The method is unambiguous and portable, making data transfer to, and comparison between, labs around the world simple. The technique is, however, intolerant to sequencing errors, as a single nucleotide change can lead to an incorrect ST assignment. Cost is another drawback to the method, as it requires high-quality sequences for 7 loci, requiring 14 sequencing reads.

This makes it less appealing as a tool for studying outbreaks or for use in smaller facilities with limited sequencing capability. Caution also has to be taken when relating MLST types to epidemiology, as strains with significantly different epidemiological significance can share a common MLST type. For example, the major epidemic strain in North America, USA300, belongs to MLST type ST8, a type also found in the infrequently encountered Canadian lineages, CMRSA9 and CMRSA5 (USA500) (Fig. 9.1). Despite the drawbacks, MLST is highly reproducible with high discriminatory power and, in conjunction with SCCmec type, remains the gold standard for publishing S. aureus epidemiological data.

5.4 Microarray

DNA microarrays use DNA probes attached in a known order to a solid surface to type bacterial isolates [98]. The probes can be oligonucleotides or gene segments (PCR amplicons) and can occur in low (100 s) or high (100,000 s) density. Bacterial DNA is labelled and allowed to hybridize to the microarray, such that complementary sequences present in the strain will bind to the probe. The microarray is scanned, and labelled spots are detected and then compared to known strains.

Microarrays are an effective means of typing and, indirectly, assigning lineage for S. aureus, simultaneously targeting a large number of strain-specific markers such as genes for antimicrobial resistance, exotoxins, surface components, regulators, and hsdS variants [42, 99, 100]. They are also well suited to the detection of complex patterns of virulence genes, mobile genetic elements, and extrachromosomal elements [101, 102] and have been used to understand the molecular mechanisms of pathogenesis, studying regulons such as Agr, Sar, SigB, and Mgr [103,104,105]. As such, microarrays permit strains to simultaneously be assigned to a lineage while having their resistance and virulence capabilities investigated at the same time.

Numerous microarrays have been designed specifically for S. aureus typing, and several companies make it possible to design custom arrays to meet specific needs [106,107,108,109,110]. The Alere StaphType DNA microarray is a commercially available system that covers 334 targets, including 170 genes and their allelic variants [42, 111, 112]. Included are species markers, capsule and agr typing markers, toxin and microbial surface components recognizing adhesive matrix molecule (MSCRAMM) genes, resistance gene markers, and SCCmec markers. On a larger scale, the Sam-62 microarray was developed based on 62 S. aureus whole genome sequencing projects and 153 plasmid sequences. The array targets all open reading frames in the sequences and includes over 29,000 probes, representing 6520 genes and 579 gene variants [113]. Sam-62 has shown potential to identify MRSA, distinguish between extremely similar but non-identical sequences, and be able to identify MRSA transmission events unrecognized using other methods [101].

While DNA microarray is highly accurate, specialized equipment and software are required meaning there is a significant cost associated with their use. Microarrays also suffer in that they cannot directly assign MLST group; strains can only be assigned to a given clonal complex group once the hybridization pattern of a reference strain with known MLST/spa types has been defined.

6 Whole Genome Sequencing (WGS) and the Future of MRSA Typing

WGS is a powerful tool for S. aureus typing, as well as for epidemiological and evolutionary studies, and next-generation sequencing (NGS) has provided a cost-effective means of extracting large amounts of information and identifying genome-wide variations. Today, the most commonly used NGS platform is Illumina (Illumina, Inc., San Diego, CA, USA), which can generate reads up to 300 bp in length. Assembly of a genome can be accomplished via de novo assembly, whereby reads are matched based on overlapping regions, or with reference-guided assembly, where reads are assembled against an existing WGS. De novo assembly in S. aureus is challenging, however, because of the small read sizes and the presence of dispersed or tandemly arrayed repeats in the genome. As such, the resulting genome is not continuous, but rather contains numerous contigs with gaps between assembled regions, due in part to the inability to resolve contig order surrounding these repeat elements. Reference-guided assembly can also be challenging because genomic regions, such as mobile genetic elements (MGEs), that are not present in the reference will be assembled poorly, particularly if they contain repeat elements, such as in SCCmec. Illumina data is still useful for querying genomic traits and variations, as well as for phylogenetic analysis, but for a complete genome assembly, sequencing platforms that generate longer reads are necessary.

Read lengths of >10 kb (and up to 60 kb) are possible with the “third-generation” PacBio sequencing platform (Pacific Biosciences, Menlo Park, CA, USA), while read lengths in the Mbp range have been achieved using nanopore sequencing technology (Oxford Nanopore, Oxford, UK). These systems suffer in that they can be more expensive and have lower read accuracy than Illumina; however, with tailored assembly methods (such as HGAP for PacBio reads), assemblies with higher accuracy are achieved. Hybrid assemblies, combining Illumina short reads and PacBio or Nanopore long reads, currently offer the most accurate and complete genomes.

A major drawback of WGS is the requirement for significant computer resources and bioinformatics support in order to extract meaningful information from the data. Software such as Lasergene exists for assembly and analysis of the genomes; however, in most cases, more complex pipelines are employed and require trained bioinformaticians. For WGS technology to become useful for routine typing of S. aureus, tools for data analysis that are simple enough for use in clinical settings are required, and a number of web-based and downloadable programs are available to help in this regard. The Center for Genomic Epidemiology (Lyngby, Denmark, available at https://cge.cbs.dtu.dk/services/), for example, has web-based analysis tools that are useful for S. aureus WGS analysis and able to extract data from raw reads and assembled or draft genomes generated using Illumina, Ion Torrent, Roche 454, SOLiD, PacBio, or Nanopore platforms. Currently available on the site are MLST, for assigning ST type; spaTyper, for determining spa type; and SCCmecFinder, for classifying SCCmec type. Also available are ResFinder, for identifying acquired antimicrobial resistance genes and/or chromosomal mutations, VirulenceFinder, and Restriction-ModificationFinder. For phylogenetic analysis, CSI Phylogeny will call single-nucleotide polymorphisms (SNPs), filter and validate them, and then infer phylogeny based on the concatenated alignment of the SNPs, generating phylogenetic trees. Also available for phylogenetic analyses are the downloadable software, RAxML (Randomized Axelerated Maximum Likelihood), for sequential and parallel maximum likelihood-based inference of large phylogenetic trees [114], as well as BEAST (Bayesian Evolutionary Analysis Sampling Trees), for inferring rooted, time-measured phylogenies using molecular clock models [115, 116]. Available from the University of Alberta (at http://phaster.ca/) is a web-based tool for rapid identification and annotation of prophage sequences within a bacterial genome, known as PHASTER (PHAge Search Tool – Enhanced Release). The program is able to work on raw DNA sequences as well as annotated GenBank formatted data, providing detailed tables and graphical displays of the phages, with high sensitivity and positive predictive value [117, 118].

WGS is the ultimate tool for the identification of diversity in an organism. In addition to extracting S. aureus typing information, WGS data can be used to track transmission events and outbreaks [119,120,121] and analyze variations between strains within a lineage by SNP analysis [122]. It has shown that related strains have well-conserved core regions but differ in their accessory genetic elements [123] and, likewise, that geographically dispersed isolates of ST239, ST225, and CC30 are stable in their genetic backgrounds, differing by SNPs and MGEs [119, 124, 125]. In the future, we may see the application of extended MLST (eMLST) to S. aureus typing, extending typing beyond the seven housekeeping genes to include a subset or all of the genes in the genome. Ribosomal MLST (rMLST) (adding the ribosomal genes), core genome MLST (cgMLST) (including all core genes present in the majority of isolates, and not subject to selection pressure), whole genome MLST (wgMLST) (also including genes subject to selective pressure), and pan-genome approach (including the full complement of genes within the species) would provide the ultimate high-level genomic epidemiology. Available to facilitate eMLST analysis, the Bacterial Isolate Genome Sequence Database (BIGSdb) software stores and analyzes sequence data for bacterial isolates, allowing a large numbers of loci to be defined and allelic profiles for each strain to be determined. BIGSdb is available within the PubMLST database at https://pubmlst.org/software/database/bigsdb/.

As sequencing costs are reduced and genome analysis tools improve, WGS will almost certainly become the primary tool for S. aureus typing and evolutionary and epidemiological studies.

7 Conclusions

Each typing scheme for S. aureus is met with strengths and limitations, leaving no single method ideal for all situations. PFGE was once considered the gold standard for MRSA typing and remains an effective tool for characterizing outbreaks and understanding S. aureus epidemiology, particularly at the local level. With standardization, it can be expanded to the international level; however, lineage cannot be inferred directly from the PFGE pattern. spa typing is capable of assigning lineage, is useful for analyzing both outbreaks and long-term molecular evolution, and is rapidly becoming the method of choice for clinical laboratories for epidemiological studies of S. aureus. With highly portable and standardized data, it is useful for investigations at both the local and international levels but is not always accurate when assigning lineages. MLST is also an effective tool for assigning lineage and, in combination with SCCmec typing, is considered the gold standard for publishing S. aureus epidemiological data. Similar to spa typing, the data is highly standardized and portable, making it an effective tool for studies at both the local and international levels. However, the cost makes it less appealing for routine use. Microarrays can provide large amounts of strain information within a short timeframe and are well suited for both outbreak investigations and long-term epidemiological studies, particularly at the local level, but suffer in that they cannot directly assign strains to lineages. WGS is the ultimate tool for strain typing and epidemiological studies and will rapidly increase in use as sequencing costs decrease and as easy-to-use data analysis tools are developed.

Ultimately, the technique of choice will depend heavily on the goals and questions that need answering, with a combination of methods offering more detailed information and greater discrimination between isolates. For outbreak situations where speed is important, PCR-based methods may be the better choice, making spa typing an effective tool. However, for routine strain typing and epidemiological monitoring at the local level, PFGE and spa typing complement well, providing better strain and clone discrimination. For international comparisons, spa typing, MLST, and WGS are good for generating highly standardized and portable data, but when detailed strain characterization is desired, a combination of PFGE, agr typing, SCCmec typing, spa typing, and MLST provides a more complete picture. Finally, long-term epidemiological and evolutionary studies benefit from greater detail, making microarrays and WGS attractive options.

8 Summary

Staphylococci are Gram-positive bacteria and commonly divided into coagulase-negative staphylococci (CoNS) and coagulase-positive members, based on their ability to produce the free enzyme coagulase. The majority of staphylococcal species are CoNS, with an increasing number of CoNS infections identified, boosting their clinical significance. Staphylococcus aureus is coagulase positive and has been the primary focus of clinical identification as it is commonly associated with human infection. Methicillin-resistant S. aureus (MRSA ), in particular, has garnered much attention as it is resistant to all penicillins and most β-lactam drugs and is associated with higher morbidity and mortality rates and increasingly being recognized as presenting a significant threat to public health. With the ever-changing prevalence and epidemiology of staphylococcal infections, reliable methods for characterizing strains are essential for outbreak investigations, for tracking clonal spreading, and for the implementation of effective treatment or control measures. In this chapter, we discussed various phenotypic and molecular methods used to discriminate staphylococci and S. aureus lineages. We first described the methods to identify staphylococcal species and to discriminate MRSA from methicillin-susceptible S. aureus (MSSA), including how to characterize different types of staphylococcal cassette chromosome mec (SCCmec) in MRSA. We then discussed various typing methods applied to study the molecular epidemiology and evolutionary nature of S. aureus, starting with the historical methods [phage typing, multilocus enzyme electrophoresis (MLEE), random amplification of polymorphic DNA (RAPD) and arbitrarily primed PCR (AP-PCR), repetitive element PCR (rep-PCR), amplified fragment length polymorphism (AFLP), and accessory gene regulator (agr) typing] and continuing to the current commonly used molecular typing methods [pulsed-field gel electrophoresis (PFGE) typing, staphylococcal protein A (spa) typing, multilocus sequence typing (MLST), and microarray] and to the advanced genome approaches (whole genome sequencing). We also discuss the strengths and limitations for each typing scheme and their suitable applications.