Introduction

Candida species account for the most prevalent fungal infections in the world and are a significant cause of human mortality and morbidity [1]. The majority of these infections are associated with Candida albicans, yet non-albicans Candida species (“NACS”) are also important opportunistic pathogens. Much of the history of Candida spp. is steeped in taxonomic confusion. Early studies of Candida species relied upon identifying differences in morphology or metabolism, an ambiguous approach for species identification and assessment of relatedness, given that many Candida spp. are capable of extensive morphological and metabolic flexibility. This lack of clarity has continued into the twentieth century, as the genus Candida has been used as an umbrella term to include yeast capable of causing human infection and lacking a conventional sexual cycle. It was not until 1991 when Susan Barns and colleagues compared ribosomal DNA (rDNA) sequences and demonstrated the phylogenetic divergence among Candida spp. By example, Candida glabrata, the second most common human-associated Candida spp. is more closely related to Saccharomyces cerevisiae than to Candida albicans [2]. Advances in sequencing technology and broader sampling of isolates have enabled higher phylogenetic resolution, and the ~ 200 identified Candida spp. belong to over 13 phylogenetically distinct clades [3]. A recent effort formally renamed species based on their phylogenetic relationships, and we include the revised names in brackets for the first use of each affected species [4] (Fig. 1). As the incidence of NACS as human pathogens as well as the incidence of multiple Candida spp. co-infections is increasing [6, 7], it will become ever more critical to identify how these species are both similar and differ.

Fig. 1
figure 1

Phylogenetic relationship among Candida spp. and close relatives. The ploidy level of the majority of sampled isolates is provided in brackets. Adapted with permission from [5]

In this review, we attempt to look outside the box of medical mycology [8] to discuss what is known and what remains unknown about the factors shaping the evolution of Candida spp. and consider the breadth of the environmental and human-associated conditions these species experience. We discuss understudied topics in genome evolution in Candida spp., from the different organizational levels variation is observed: between species, between isolates of the same species, and between cells within individual populations.

Broadening the Lens We Use to Look at Candida species Evolution

Much of the study of Candida spp. has been through the lens of their capacity to cause infections in humans [9••, 10, 11••, 12] and our understanding of the factors that drive Candida spp. evolution is likely limited by this narrow viewpoint.

Consistent with their broad phylogenetic relationships, virulence traits appear to have independently evolved in Candida spp. Despite triggering highly similar host responses in a whole-blood infection model, C. albicans, C. glabrata, C. parapsilosis, and C. tropicalis displayed unique transcriptional responses [13]. C. dubliniensis, a close relative to C. albicans that is less virulent [14] and seldom isolated from infections [1] has been used to compare with C. albicans to identify genetic determinants of virulence. From these comparisons, it became apparent that several gene families that contribute to virulence, stress responses, and filamentation (e.g., adhesins, secreted aspartyl proteinases, subtelomeric genes) are expanded in the C. albicans genome [15, 16]. A critical virulence factor for C. albicans is its ability to transition between yeast and filamentous growth [17], which is differentially regulated in C. dubliniensis [18]. Intriguingly, in C. tropicalis and C. parapsilosis, two emerging pathogens, filamentation is not essential for virulence and may even be detrimental to survival [17, 19]. In C. albicans, genetic screens and experimental evolution revealed that commensal phenotypes are associated with reduced filamentation [20, 21••, 22], yet reduced filamentation evolves only in hosts with reduced microbial diversity [22, 23]. In most clinical cases, the commensal population gives rise to organisms that become pathogenic [24]. The selective drivers of adaptive evolution may thus emerge more from the roles as commensal members of the healthy human mycobiome (and other environmental niches, in some species) than from their roles in human infection; population-level genetic variation and the traits under selection in the context of commensalism in Candida spp. have been understudied.

In host environments, Candida spp. face complex abiotic and biotic selection pressures. Alongside the many stresses imposed by the host, including but not limited to nutrient availability, oxygen levels, and immune defenses (Fig. 2), fungal cell populations exist alongside other microbial species, and both permissive and antagonistic interactions have been documented [25, 26]. As part of the infection process and during transitions between host environments (e.g., translocation from gut to bloodstream or from bloodstream to systemic organs; Fig. 2), significant population bottlenecks occur and limit the efficiency of selection on niche-specific traits. Candida spp. have also been isolated from several other mammalian hosts, such as hedgehogs, opossums, dogs, and sheep [27, 28] and have been collected from moist soils, fallen leaves, oak trees, and coastal wetlands [27, 29, 30, 31••, 32•• 33]. The recent discovery of C. auris isolates on the coastal wetlands of India, including those that were multi-drug resistant, points to a potential environmental source for this species and signals that additional environmental surveillance of Candida species is needed [27, 29, 30, 31••, 32••, 33]. A complete understanding of how different niches and environments shape the evolution of these species will require comprehensive surveys of different environments paired with fitness measurements of Candida isolates in the diversity of conditions they encounter.

Fig. 2
figure 2

Candida cells are subjected to large population bottlenecks as they move through different host anatomical niches, each harboring diverse selective pressures. Within commensal sites such as the oral cavity, GI tract, or skin, Candida cells face additional stresses imposed by the microbiome and competition for nutrients with other species

Adaptation requires genetic variation for selection to act on. Genetic variation in Candida spp. is primarily the result of asexual reproduction, with many members having restricted or absent meiotic recombination programs [34, 35] and sparse evidence for mating [5, 36••, 37]. The prevailing view has been that limited sexual recombination could preserve the integrity of well-adapted genomes [34, 35, 38]. This also provides an opportunity for karyotypic variation (single chromosome changes or whole ploidy shifts) to play a role in adaptation. Accordingly, in addition to point mutations (single nucleotide variants, small insertions and deletions), variation in chromosome copy number is frequently observed [5, 9••, 36••, 39, 40••, 41], and whole ploidy shifts are observed in nearly all human-associated Candida spp. [42]. The outlier is the haploid C. glabrata; whether the absence of ploidy-variant isolates of C. glabrata is due to undersampling or something specific about the genetics of this species remains unknown [40••].

Different types of mutations occur at different rates and impact a different number of genes (Fig. 3). The types of mutations that arise and are selected during evolution are also influenced by baseline ploidy, which varies among Candida spp. (Fig. 1 [43]). For example, in diploid C. albicans, loss of heterozygosity (LOH) events are common in clinical isolates [10, 12, 36••] and have been identified in the C. albicans laboratory reference strain evolved under a multitude of in vitro and in vivo conditions [44, 45••, 46••]. LOH is also likely to be common in other diploid NACS [47, 48]. LOH mutations can reveal recessive alleles, though whether these events on average are neutral or beneficial is not well resolved (discussed below). Copy number variants of small regions or whole chromosomes are both generated and lost at high frequencies [11••], which might allow cells to bet-hedge the potential benefit of increased gene copy number under certain conditions while also being able to revert to the wild type when the selective pressure is removed. Chromosomal aneuploidy has been associated with replicative, metabolic, and proteotoxic stress [49, 50], and aneuploidies are likely to be temporary rather than long-term solutions. Recent genomic analyses have shown that ancient hybridization events occurred between distantly related isolates before the divergence of C. albicans, C. africana, and C. stellatoidea, a process hypothesized to mediate the emergence of new phenotypic traits and to enable adaptation to new environments [51, 52••, 53]. Hence, there are many different mechanisms on which selection can act to result in rapid genetic variation within fungal populations.

Fig. 3
figure 3

Size of observed genetic variations relative to the time needed to generate them in haploid or diploid fungal cells. Colored arrows indicate the ratio between the genetic size and the time needed to generate these events. For example, aneuploidy can arise within single cell divisions and affects entire chromosomes. At the opposite spectrum, SNPs/indels arising within single divisions affect a small number of single nucleotide positions. Reversible genetic changes are indicated by double-end arrows

Population Structure Reflects Primarily Neutral Rather Than Adaptive Processes

Population structure captures genetic variation at the species level and reflects evolutionary processes and demographic history. The population structure of Candida spp. and other fungal taxa has been assessed using short tandem repeat markers (e.g., microsatellite typing, variable number tandem repeat), sequence variation in the ITS locus, multilocus sequence typing (MLST), and whole-genome sequencing. Whole-genome sequencing offers the highest ability to differentiate among isolates and to capture genome-wide diversity; the high costs and time required for analysis have restricted its broad use for answering phylogenetic questions, though this is likely to change in the future. In C. albicans, MLST analysis is preferred over ITS or microsatellite typing since the data is unambiguous, is reproducible, and can be directly compared among research groups and isolate collections [54]. For C. glabrata [55, 56] and the C. parapsilosis species complex [57], microsatellite typing provides higher discriminatory power and has been more widely used. In C. tropicalis, microsatellite typing and MLST analysis provide similar discriminatory power [58].

Currently, only four Candida spp. are represented in the PubMLST database, the public database for molecular typing and microbial genome diversity (https://pubmlst.org/). The distribution of isolates in all four species is skewed by the source of isolation, with bloodstream and oropharynx isolates being the most common (Fig. 4 [59]). We currently lack a complete picture of how population structure maps to environment or geography at the global level since the geographical distribution of deposited isolates is also skewed. The majority of isolates have been deposited from Asia (mainly China, Taiwan, and Iran), Europe (mainly the UK, France, and Spain), and North America (mainly the USA). Nevertheless, MLST studies in all species repeatedly indicate local clustering with frequent gene flow (e.g., C. tropicalis [60,61,62,63,64,65], C. glabrata [66,67,68,69], C. albicans [70,71,72,73,74,75,76,77,78,79]). A small number of existing phylogenetic analyses using whole-genome sequences of a global set of isolates were overall consistent with the MLST studies (C. albicans [36••], C. glabrata [80], C. tropicalis [81]). Phylogenies thus appear to primarily reflect neutral processes such as geography and gene flow rather than selection. Despite a small number of interesting associations between phenotypes of interest and sequence types (STs), the majority of studies have failed to detect STs associated with anatomical source, patient health status, or phenotypes of interest such as drug susceptibility (C. albicans [36••, 82,83,84], C. glabrata [55, 56, 66, 68, 80, 85], C. tropicalis, [60, 62, 63, 65], C. parapsilosis [86,87,88]). Furthermore, environmental isolates typically cluster with clinical isolates (C. tropicalis [81, 89], C. krusei (referred to as Pichia kudriavzevii when isolated from the environment) [90], C. albicans [31••, 36••], C. glabrata [91], C. parapsilosis [87]).

Fig. 4
figure 4

Proportion of Candida isolates deposited in the PubMLST database based on geographic location (A) and source of isolation (B). The total number of isolates for each species as of January 2021 is indicated in brackets below the species name. The number of researchers that have deposited sequences also differs by species: C. albicans, 76; C. glabrata, 14; C. tropicalis, 25; C. krusei: 7

Selection can act on isolates within specific niches to alter virulence or infection-related traits. However, it remains challenging to disentangle how biased sampling influences potentially spurious associations when looking for correlations between phylogenetic clusters and isolate characteristics. For example, in C. albicans isolates from animals, the most common ST is ST172 (6.5% of animal isolates, versus 0.12% of all other isolates), yet all come from Hungary. In contrast, the five ST172 isolates from the UK, India, Germany, France, and China were collected from oral, vaginal, and systemic infections. Distinguishing whether clustered isolates are genuinely different for specific traits of interest requires significant additional global sampling, such as the recent tour de force to acquire environmental yeast isolates [32••].

The Potential Influence of Heterozygosity

Considerable heterozygosity exists in the diploid species (Fig. 1), but it remains unclear whether this is meaningful in the context of adaptation. Allelic heterozygosity has been quantified as the average frequency of polymorphisms across the genome (i.e., number of SNPs per base [92]) and as nucleotide diversity (i.e., the average number of nucleotide differences per site between two DNA sequences in all possible pairs [36••]). Early C. albicans whole-genome studies noted a high level of heterozygosity compared to sexually reproducing species and speculated that allelic differences might have clinical consequences [84, 92]. Since then, Candida spp. allelic variation has often been interpreted through an adaptive lens. However, heterozygosity levels among diploid, asexual Candida spp. vary widely. C. albicans and C. tropicalis have similar levels of heterozygosity, with ~ 0.5% positions in the genome being heterozygous [45••, 81], ~ 70-fold higher than that observed in C. parapsilosis [93] (with the caveat that fewer C. parapsilosis strains have been examined).

Heterozygosity is theoretically advantageous from the perspective of evolvability. The effect size of many beneficial mutations is at least partially masked by a wild-type allele (i.e., the effect size of a beneficial allele as a homozygote is larger than as a heterozygote). If a particular SNP is beneficial in a specific environment, loss of heterozygosity (LOH) can provide a rapid route to homozygosity. LOH has been observed in both C. albicans and C. lusitaniae following microevolution in clinical and laboratory studies as an efficient mechanism by which cells become drug-resistant [9••, 10, 94] or increase their fitness in mammalian hosts [22, 23]. Similarly, in serial isolates recovered from patients with recalcitrant C. albicans infections, LOH events between early and late isolates were associated with increased drug resistance [10]. Although some specific LOH events have been linked directly to adaptive phenotypes, differences in LOH rate in different environments may also reflect different genome-wide mutation rates under different conditions [44, 45••]. For example, de novo LOH events reached higher frequencies in the host and under stressful conditions relative to growth in rich media in C. albicans [44, 45••, 46••], consistent with stress-induced mutagenesis observed in higher eukaryotes [95]. However, in the majority of patient-derived isolates, LOH tracts appear to be neutral [96••]. It remains challenging to evaluate the fitness consequences of specific tracts as LOH can span single polymorphisms to whole chromosomes, and studies often lack matched isogenic strains with these particular configurations.

There is some evidence for selection to maintain heterozygosity to avoid exposing recessive deleterious alleles that accumulate neutrally in the genome [97,98,99]. Heterozygosity at specific genomic regions may have direct consequences on biological processes. For example, heterozygosity at the mating-type locus (MTL) reduces the capacity for mating in C. albicans cells. [100, 101]. The majority of C. albicans clinical isolates are MTL heterozygous, and they are more virulent in mice compared to closely related MTL homozygous isolates [102, 103]. However, selection for polymorphism and selection against LOH should not be conflated. Determining whether there is selection to maintain polymorphisms per se requires finding significant correlations between genome-wide polymorphism levels and phenotypes of interest. In a set of 21 C. albicans isolates, a significant correlation was found between genome-wide heterozygosity and growth in nutrient-rich media at 30 °C, but not at 37 °C, nor growth in other examined conditions [39]. Furthermore, there was no association between heterozygosity and C. albicans virulence in a murine or insect infection model [39], nor with C. orthopsilosis virulence in an insect model [104]. While higher levels of heterozygosity have been observed in clinical isolates compared to environmental isolates of Saccharomyces cerevisiae, high levels of heterozygosity in Candida spp. do not seem to be tied to selection in the context of human infections; C. albicans strains isolated from oak trees displayed higher heterozygosity levels than clinical isolates from the same clades [31••]. While heterozygosity may be theoretically advantageous, there remains little to no empirical data supporting whether (or not) selection for broad polymorphism contributes to differences among isolates, especially in the context of clinically relevant phenotypes.

The Black Box of Within-Population Heterogeneity

Relatively little is known about the extent of genetic diversity in fungal populations within an individual host. Sequencing-based methods have identified Candida spp. as frequent members of the mycobiome of several human niches, including the gastrointestinal and vaginal tracts, oral cavity, and lungs [105]. Work from the early 1990s found that strains from the same individual isolated from different body sites are typically highly similar but not genetically identical [106]. This led to the common assumption that hosts are colonized by clonal isolates. However, this work has major limitations, including the small number of individuals and isolates examined, and sequencing methods that only reflected a small fraction of the genome.

Later work showed that healthy hosts could simultaneously harbor multiple diverse genotypes [84, 107], indicating that colonization is a dynamic process and that microevolution may shape the population structure of the colonizing isolates. C. albicans isolates were found to differ at a single or limited number of MLST loci during commensal growth in the gastrointestinal or genital tracts, with variation often associated with LOH [84, 108]. Host infection microevolution studies demonstrate that colonizing isolates differ through multiple LOH following infection [109, 110]. Examining C. albicans from 40 secondary infections identified 50 diploid STs, with LOH as the source of variation between isolates [70]. The tract size of LOH events can be used to distinguish between mitotic recombination, resulting in gene conversion, and break-induced replication, resulting in the homozygosis of partial or whole chromosomes. Both were detected within colonizing populations or following murine passage [111, 112] in experimental evolution studies that utilized different mouse models of commensalism or infection. LOH and chromosomal copy number variation frequently arose and rapidly resulted in extensive heterogeneity within cell populations impacting C. albicans fitness. For example, chromosome 7 trisomy was associated with increased fitness during colonization of the mouse gastrointestinal tract [45••], and chromosome 6 trisomy was repeatedly selected during infection of the mouse oral cavity [21••].

Recent whole-genome sequencing of C. albicans isolates from healthy individuals revealed a high degree of variation [96••]. An oral sample from a single individual contained several isolates differing from each other by multiple, short LOH tracts [96••]. Similarly, widespread genetic variation was detected in C. glabrata isolates from single individuals, with hundreds of nonsynonymous single nucleotide polymorphisms (SNPs) identified between isolate pairs and occasional aneuploidy events [40••]. Interestingly, C. glabrata SNPs were enriched in genes encoding cell wall proteins [40••], and these gene families are also a hotspot for C. albicans genetic variation during in vitro evolution [45••]. Temporal heterogeneity in Candida lusitaniae (Clavispora lusitaniae) isolates from bronchoalveolar lavage samples from a patient with cystic fibrosis has also been documented [9••]. Here, isolates derived from a common ancestor differed by hundreds of SNPs and indels, and a hotspot for mutations was observed in MRR1, a transcription factor regulating the expression of antifungal drug transporters. Mutations in MRR1 were shown to protect fungal cells from particular host and bacterial factors and indirectly selected for subpopulations resistant to antifungal drugs [9••].

The development of azole resistance in serial clinical isolates provides an excellent example of how microevolution can follow diverse routes to the acquisition of drug resistance. A comparison of C. albicans isolates before and after the development of azole resistance found that isolates differed by thousands of SNPs, yet azole resistance was primarily attributed to persistent and recurrent LOH [10]. Aneuploidy, particularly of chromosome 5, was also frequently, yet transiently, observed [10]. Since drug resistance in these clinical isolates often coincides with fitness costs in the absence of antifungals [10], the hypothesis is that aneuploidy extends the window for other beneficial mutations with smaller fitness costs to arise, at which point selection to maintain the aneuploidy would be lost.

Conclusion

In commensal or pathogenic contexts, Candida spp. have many mutational pathways available and exhibit extensive genetic variation, which can promote survival under fluctuating conditions of nutrient availability, drug exposure, microbiome competition, and immune surveillance. Outstanding questions remain regarding the nature of the relationships established between coexisting isolates and how genetic heterogeneity impacts commensalism and pathogenicity. Future studies are necessary to determine to what extent this genetic variation reflects selection or genetic drift in the respective environments.