Introduction

It has been consistently reported that the incidence of fungal infections is increasing [5]. This applies to a range of animals in the environment, e.g. chytridiomycosis in amphibians [18] and white-nose disease in bats [60], as well as humans. The reason for this increase is complex and is dependent on the ecology of each fungal species. The main causal agents of invasive fungal disease (IFD) in humans are primarily opportunistic pathogens, e.g. Aspergillus, Fusarium, Rhizopus, or Scedosporium, that rarely cause disease in a healthy host [21]. The driver for the observed increase in IFD has been immunosuppression of the human host either by infection with a non-fungal primary pathogen, e.g. HIV, or through medical interventions such as solid organ and stem-cell transplantations [3, 61]. Post-tuberculosis infection, chronic pulmonary obstructive disorder (COPD), and long-term corticosteroid treatment are additional risk factors [54]. In the 1980s, the incidence of invasive disease caused by Candida increased due to acquired immune deficiency syndrome (AIDS) but it has been replaced by Pneumocystis and Cryptococcus infections in recent times driven by changes in patient care [31]. The majority of cases of candidiasis were previously caused by Candida albicans but there are now a wide variety of Candida species involved [48], including the recent emergence of Candida auris [57]. Similarly, the spectrum of fungi causing invasive mould diseases has been increasing, which has placed pressure on established diagnostic methodologies.

Although there are several effective treatments for IFD, there is still a persistent problem in diagnosis; early diagnosis is particularly important for the treatment of opportunistic mould infections [34, 52•]. For clinicians, the early symptoms of IFD are non-specific and empirical treatment for bacterial infections will often be administered. In patients with clearly defined risk factors for IFD, antifungal prophylaxis is administered but the mortality associated with break-through IFD remains high, 50–70% depending on the site of infection [47, 58]. Prophylaxis is not totally effective and break-throughs can occur; the combination of delayed diagnosis and drug-resistant fungi contributes to the high mortality rates. Since the 1980s, a number of diagnostic tests targeting fungus-derived molecules have been developed, including serum galactomannan [38], β-glucan testing [33], PCR [11••], and matrix-assisted laser desorption/ionisation-time of flight (MALDI-TOF) mass spectrometry [51]. These methods work by detecting fungal targets in blood or serum, and bronchoalveolar lavage (BAL). However, none of these has completely replaced culture and histology [30]. All methods can contribute valuable information for the clinician and have been gathered into the European Organization for Research and Treatment of Cancer/Invasive Fungal Infections Cooperative Group (EORTC) and the National Institute of Allergy and Infectious Diseases Mycoses Study Group (MSG) (EORTC/MSG) guidelines for the definition for IFD; based on patient risk factors, traditional methods, and mycological factors, patients can be defined as having proven, probable, or possible IFD [13•].

The promise of molecular biology in IFD diagnosis has been boosted by advances in nucleic acid sequencing technology. The capacity to genotype or even sequence whole genomes and metagenomes in a short timeframe at an increasingly relatively low cost and to identify novel or rare pathogens has created new possibilities in the development of diagnostic strategies for IFD. This review will examine the applications and potential limitations of sequencing in the diagnosis of IFD.

Genome Sequencing in Medical Mycology

The topic of using whole genome sequencing (WGS) in medical mycology has previously been reviewed, so this review will focus on the applications of whole genome analyses to Candida, Cryptococcus, and Aspergillus [12••]. The first fungi to be sequenced were yeasts Saccharomyces cerevisiae, followed by Schizosaccharomyces pombe [66] with Neurospora crassa being the first mould to be sequenced [19]. These were all well-established model organisms used in genomic research for over 70 years, with a robust research community and molecular toolkit. The N. crassa genome had almost double the number of genes annotated compared to any yeast previously sequenced: 41% of its genome lacked homologs to known proteins, indicating yeasts are a poor proxy for all fungi [19]. The existence of reference genomes is an important consideration for the use of WGS in diagnostics. The process of fungal WGS follows a similar workflow irrespective of genus. First, genomicists construct one or more high-quality and high-level (chromosomal) reference genomes for widely available strains. Reference genomes can then be used to guide genome assembly and identify variants, allowing for even more cost-effective WGS studies focused on specific biological questions (resequencing of populations or strains). Initially, these reference sequencing projects were time consuming and expensive due to both high consumable and associated labour costs. However, they have become sufficiently accessible, for even smaller laboratories, since the introduction of the current generation of sequencing technology, frequently referred to as next-generation sequencing (NGS). Genomes of human pathogenic species from the genera Cryptococcus, Aspergillus, Candida, Pneumocystis, Histoplasma, Coccidioides, Mucor, Blastomyces, and Scedosporium have been sequenced, assembled, and published in the NCBI genome database [49], the primary repository for such information. The first C. albicans genome (strain SC5314), published in 2004, was a significant milestone in genomics as it required the development of new computational methods to overcome issues associated with heterozygous diploid species [32]. In 2005, the Aspergillus fumigatus AF293 genome sequence was first published [45]. This was compared with the genomes of Aspergillus nidulans and Aspergillus oryzae revealing low intra-genus amino acid identity, and a genomic capacity for heterothallic sexual reproduction in A. fumigatus [20]. The relatively low similarity within the genus Aspergillus has proven to be an issue in PCR-based diagnosis of IA and was examined in previous research [43]. The first genomes of Cryptococcus neoformans became publicly available in 2004–2005 [40]. These early WGS studies, together with complementary DNA sequencing (surveys the active genes, i.e. the transcriptome, and can help identify the genes in the genomes), identified 30 new genes putatively involved in synthesis of the polysaccharide capsule indispensable for C. neoformans virulence [40]. These early genomes provided a foundational resource for application of resequencing in medical mycology.

Monitoring Intraspecific Variation in Pathogenic Fungi

Using WGS for resequencing has proven to be valuable in the areas of fungal microevolution, resistance and virulence monitoring, and outbreak analysis (discussed in detail below). The primary studies in these areas required whole genome sequencing of target organisms and large-scale sequencing of specific gene targets to complement the initial WGS analyses. It is important to note that the quality of the determined variants depends on not only the quality of the reference genome assembly but also how representative it is for the resequenced strains. While short read NGS technologies can provide accurate identification of single base pair variants (SNVs), recent single molecule re-sequencing technologies, such as Oxford Nanopore, allow for improved detection of large structural variants, including copy number variants (CNVs) and pathogenicity islands [53•].

Microevolution in the Host

Whole genome sequencing of clinical C. albicans isolates, which had been sub-cultured both in vitro and in a murine model, has been used to characterise novel mutations that arise. One study [15] found microevolution to be driven primarily by amino-acid changing SNPs and short-tract loss-of-heterozygosity (LOH) events that lead to recombination-induced mutagenesis. WGS of C. albicans isolated from oral samples taken from healthy human hosts also found short-tract LOH events to be important in generating within-host variation. The high resolution of WGS revealed intra-sample heterogeneity, highlighting the importance of considering intra-host variability when comparing serial isolates [56].

Recent C. neoformans resequencing projects have characterised the adaptation of this fungus to the host environment. In these studies, isolates were serially sampled from patients over the course of infection, sequenced, and compared to identify microevolution events. Such studies have made significant findings. First, isolates recovered after relapse in cryptococcal meningitis patients are usually clonally related to the original infection [46•, 50•]. Second, aneuploidy of specific chromosomes (chromosome 12 in this case) [46•, 50•] and mutation of an AT-rich interaction domain protein may be important mechanisms of in-host adaptation [46•]. Third, non-sense mutations in DNA mismatch repair proteins can lead to a hypermutator state, accelerating the potential for microevolution [50•]. Sequencing data have also enabled identification of a genome amplification event that facilitates massive tandem gene amplification in response to environmental stimuli and drives microevolution [10]. Similar to C. neoformans, several recent A. fumigatus WGS projects have investigated in-host microevolution albeit with a greater focus on azole resistance [4•, 7, 26].

Antifungal Resistance

Resistance to antifungal drugs is the result of an arms-race between the populations of the hosts and the pathogen. Microevolution studies suggest that cyp51A SNPs conferring azole-resistance are selected during infection as are tandem repeats (TR120) in the cyp51A promoter selected during infection [26]. Resequencing of fluconazole-resistant isolates has implicated gain-of-function, Erg11 heterozygous and Erg3 homozygous mutations, and MDR1 promoter allele alterations in azole-resistance of C. albicans [8]. New experiments using resequencing have been used to identify mechanisms of resistance by genotyping strains in which genes conferring antifungal resistance have been deleted but resistance has been restored by artificial selection in the laboratory (experimental evolution). In the absence of Rgd1, an azole-resistance conferring gene, exposure to azoles induced amplification of several chromosomal regions as identified by WGS. Overexpression of a transporter gene, NPR2 was found to confer resistance [44]. A similar study exploring the effects of medium-chain fatty acids found susceptibility to be associated with trisomy of chromosome 7 [42].

WGS of C. albicans and S. cerevisiae strains experimentally evolved for resistance to co-treatment with azoles and inhibitors of either Hsp90 or calcineurin inhibitors revealed diverse resistance mechanisms including extensive aneuploidies and mutations in genes encoding drug targets; transcriptional regulators of multidrug transporters and ergosterol biosynthesis enzymes; and Lcb1, a regulator of sphingolipid biosynthesis and extensive aneuploidies [28]. Resequencing has been used to genotype experimentally evolved azole-resistant A. fumigatus isolates relative to their isogenic parental strains. Both medical and non-medical triazole fungicides have been used in experimental evolution studies. Variants contributing to medical triazole induced resistance includes mutations in erg11A (cyp51A), multidrug transporters, erg25, and HMG-CoA reductase [41]. Agricultural fungicides induced cross-resistance to medical triazoles with mutations also seen in cyp51A and HMG-CoA reductase [68•].

Outbreak and Virulence Analysis

Cryptococcus gattii is less common than C. neoformans but can infect immunocompetent individuals. While once considered endemic to tropical and subtropical environments, C. gattii outbreaks in the Pacific north west of the USA precipitated the need for phylogenetic studies to help identify outbreak origin. The clonal nature of C. gattii sublineages impeded the ability of multilocus sequence typing (MLST) to resolve variation [23] but resequencing of 118 genomes managed to identify South America as the probable origin of Pacific north west lineages [16].

Resequencing of 56 C. neoformans strains identified 40 genes as putatively associated with human survival, immunologic response or clinical parameters. Using the 17 available KN99α gene deletion strains for these candidate genes—six (35%) were found to directly influence survival of mouse models: three increased and three decreased survival, four of which had not previously been identified [22••]. In recent years, SNP-based phylogenies have shown mucosal and bloodstream C. albicans isolates are organised into separate clades [6]. Resequencing and phenotyping of two clinical isolates of variable pathogenic potential indicate major differentiating genetic variants are located in genes associated with biofilm production and first-line host barriers while they vary genetically in a manner that correlates to isolate-specific phenotypic differences [9].

Sequencing in the Diagnosis of Invasive Fungal Disease

Sequence analysis of fungal DNA from patient samples is well established and being established depending on the techniques being employed (Fig. 1). Sequencing of PCR products has been common practice in the identification of organisms causing IFD. This has usually involved isolation of DNA from formalin fixed paraffin embedded (FFPE) tissue samples or samples of mycelium. This practice usually involves PCR amplification of a variable ribosomal region, usually with ITS-targeting primers, and Sanger sequencing, the sequence is compared to existing sequences using a sequence similarity algorithm (https://blast.ncbi.nlm.nih.gov/Blast.cgi). In contrast, qPCR assays rely on existing sequence data for the design of oligonucleotide primers and probes for targeted amplification of IFD causing pathogenic fungi. The quality and diversity of existing sequence data usually dictates the level of specificity achievable with qPCR assays.

Fig. 1
figure 1

Schematic diagram indicating the strategies for isolation of DNA from different samples and the subsequent options for processing the DNA to identify the causes of invasive fungal disease in patient samples. 1The additional steps include separation of blood samples into cell-free (serum or plasma) or white cell fractions, xylene and ethanol treatment for FFPE, proteinase K treatment for tissue, and lyticase or bead beating to release DNA from fungal spores or hyphae. 2The ITS 4/5 primer set [62] has been used extensively in the identification of fungi due to the large number of sequences submitted to the DNA databases. 3High-resolution meltcurve (HRM) analysis. 4Multiplex qPCR can have multiple probes to allows identification of fungal genus or species and detect the presence of drug resistance alleles

Recent NGS protocols create the possibility to perform targeted sequencing of pan-fungal PCR products from a sample or non-targeted sequencing of all DNA present in the sample allowing for the reconstruction of the community of organisms present in the sample. In a study of three cases of individuals at-risk for IA, NGS was used to sequence all the DNA from BAL samples. NGS was able to identify A. fumigatus in all three cases and was the only positive diagnostic method in one case [27••]. A further study examined the use of NGS in serum samples from nine patients with proven IFD [29••]. In this study, NGS identified a fungal pathogen in seven of the nine cases. Pulmonary scedosporiosis was diagnosed with the aid of NGS, where DNA from BAL and NGS results were confirmed by histology [67••]. An important aspect of using any diagnostic method is sample type and cell-free DNA in serum/plasma is an attractive target that has performed well in PCR-based studies [39, 64]. A trial of NGS to diagnose IFD in paediatric patients using plasma identified 4 of 6 (66.67%) cases of proven fungal disease. The 2 of 6 (33.33%) cases that were not identified appeared to have non-invasive infections [1••]. These cases highlight some of the advantages of NGS in diagnosis; it can detect any potential pathogen in a sample and the surrounding microbial community. It can be performed on any sample type from which DNA or RNA can be isolated including the usual diagnostic samples types, e.g. blood, serum, BAL, and FFPE.

The use of NGS avoids the need to develop pathogen specific assays, a major technical issue for qPCR assays. The fungal databases do not contain sufficient information to screen primers and probes to ensure specificity. Even if the databases were complete the taxonomic issues within genera such as Aspergillus make it difficult to develop a pan-Aspergillus PCR [43]. NGS can provide species level identification from the raw data generated, which avoids the need for a secondary identification step and allows targeted treatment of the pathogen. The case study by Hong et al. identified Aspergillus lentulus from a clinical sample whereas other methods would identify A. fumigatus [29••]. This is critical since Aspergillus species vary considerably in their responses to antifungal drugs [59]. The turnaround time for NGS has been decreasing as the methodology has been refined and now results can be expected within 48-72 h, an acceptable timeframe for a species-level and antifungal resistance identification.

Potential Limitations of NGS in IFD Diagnosis

Although the workflow is very similar to PCR-based assays with relation to sample type and DNA isolation (Fig. 1), there are some differences and challenges that should be recognised.

The key problem for nucleic acid diagnostic assays is having sufficient nucleic acid in the sample. In the standardisation of qPCR assays, it was found that the key factor affecting performance was the DNA isolation methodology [63••]. This is the input for any PCR or sequencing-based assay and is a more important consideration than specific amplification target or platform. Fungal DNA occurs at very low levels, near the limit of detection for qPCR, in blood and serum creating a potential problem for NGS-based approaches. At these low levels of target, the DNA may be indistinguishable from contamination, which is an issue in microbiome studies with smaller samples [14] and has even led to conflicting accounts of a human placental microbiome [36]. Single molecule sequencing technologies are meant to, ultimately, overcome this barrier but the laboratory protocols are not yet optimised and inputs in the range of 10–100 s of ng of DNA are still required while significant sequencing consumables are also ‘wasted’ on non-target host DNA. Indeed, another question is whether NGS is sufficiently sensitive to detect pathogens that are in low abundance since the method can be influenced by the abundance of competing DNA in the sample (see these reviews for further discussion on these technical limitations [24•, 25•]). A third challenge is that the majority of diagnostic strategies for IFD suggest twice weekly sampling and analysis to ensure that the pathogen is detected [13•]; this may not be cost-efficient for NGS yet due to cost and time considerations. Another related issue is the use of antifungal prophylaxis, which reduces the amount of diagnostic targets in the host and has a pronounced effect on the performance of qPCR assays which would also affect NGS [11••].

A qPCR assay gives a defined result allowing immediate interpretation of a positive or negative result. It is generally accepted that the strength of qPCR is in its ability to exclude the presence of IFD rather than to detect the pathogen [11••]. In a diagnostic platform utilising several strands of information, this type of test can be beneficial but there is still insufficient information about the interactions between IFD-causing fungi and the host to enable certainty when interpreting what a positive sample means. By comparison to PCR-based assays, direct NGS from a sample can yield a wealth of microbial information in terms of the number and identity of organisms present in a sample. This wealth of information creates a challenge in the scientific and medical interpretation of the data, e.g. what should the threshold for an IFD be? Which organisms are significant? Indeed, the data from NGS can overwhelm clinicians with less than relevant data [37].

Sample origin and type is another important consideration. From a serum sample, it might be deduced that a positive result for the presence of a pathogen may be significant since the DNA from pathogens is in such low abundance in that sample type. Analysing BAL samples may be more challenging since the lung has a transient microbiome that consists of a variety of fungi, bacteria, parasites, and viruses [17]. The use of qPCR from BAL raised questions about the interpretation of a positive result, leading to implementation of thresholds and distinguishing between colonisation and infection [35, 55]. The data from NGS would be more complex and would require the definition of stringent thresholds for classifying cases for IFD. There is still insufficient information about the lung microbiome especially in individuals at-risk for IFD to define such thresholds. Increasing data may allow the definition of what a dysbiosis associated with IFD might look like; this would remove the pressure from detecting a specific pathogen and focus on microbial population dynamics.

Conclusions

It has taken almost 20 years for qPCR assays to gain broad acceptance in the diagnosis of IFD and this required significant evidence of comparable performance to other diagnostic assays [65]. This was greatly aided by the use of meta-analyses to identify the variation in assay performance [2, 11••] and led to community efforts to standardise PCR-based methodologies such as the Fungal PCR Initiative (https://www.isham.org/working-groups/european-aspergillus-pcr-initiative-eapcri). Multicentre analysis of standardised samples, optimised DNA isolation protocols for different sample types, standardised pipelines for data analysis, and guidelines for the interpretation of the data would bring NGS into line with other diagnostic methods for IFD. NGS in the diagnosis of IFD could be the future but there should be a community approach to developing standardised protocols to get the best from this technology.