Keywords

1.1 Introduction

For more than two centuries biologists have been interested in understanding the distribution of biodiversity. Following the work of Agustin Pyramus de Candolle and Alexander von Humboldt in the eighteenth century, biogeography has changed from being a merely descriptive discipline to a field rooted in ecological and evolutionary principles (Crisci et al. 2003). Biogeography has now diversified into many branches that specialize on different spatial, temporal, and taxonomic scales, but can be classified into two major categories known as ecological and historical biogeography (Wiens and Donoghue 2004). For historical biogeography (from here on just termed biogeography), the last decades of the twentieth century witnessed paradigm shifts between dispersal and vicariance schools (Zink et al. 2000). Nowadays, it is generally accepted that multiple evolutionary processes such as dispersal, speciation, extinction, and species interactions contribute to biodiversity build-up and distribution (Hubbell 2001; Ricklefs 2004; Wiens and Donoghue 2004; Mittelbach et al. 2007; Ree and Sanmartín 2009; Ronquist and Sanmartín 2011; Birand et al. 2012).

Undoubtedly, the bulk biogeographic knowledge has garnered around the study of plant and animal distribution. In contrast, patterns in microorganisms (fungi included) have been more elusive. This has led to considerable debate on how microorganisms disperse and are structured geographically (Finlay 2002; Martiny et al. 2006; Peay et al. 2007, 2010a). For instance, a classic view in microbial biogeography is that “everything is everywhere, but the environment selects” (Baas-Becking 1934). This hypothesis is based on two major assumptions. One is that many microorganisms have dispersal capabilities (e.g. vegetative reproduction and massive spore production) that allow propagules to be present “virtually” everywhere (Stolp 1988). This perception may be confounded with the fact that most microorganisms have simple morphologies, suggesting they are “cosmopolitan”, when, in fact, there are many different species (Finlay 2002; Peay et al. 2010a). The second is the role of the environment as a selective filter during colonization, which may limit the establishment of propagules in new regions. This last point can relate to geographical bonds that many microorganisms have with their hosts (Werren et al. 1995; Corby-Harris et al. 2007), in spite of their potential for global propagation (Brown and Hovmøller 2002). While this hypothesis would provide a simple test to assess the mechanisms behind microbial geographical structure, their cryptic nature is a complicating factor.

In the last three decades, the study of fungal ecology and evolution has experienced a revolution after the introduction and advancement of molecular tools (Horton and Bruns 2001; Bruns and Shefferson 2004; Peay et al. 2008). DNA-based analyses provide a means to overcome the “micro” dimension, making relevant biological units quantifiable. For instance, environmental meta-barcoding can reveal diversity that is unobservable to the naked eye. Similarly, molecular phylogenetics can help understand evolutionary relationships between observable and unobservable diversity, enabling the exploration of microbial diversity dynamics in both temporal and spatial scales.

Fungi are among the most diverse organisms on Earth. Not only accounting for the thousands of described species or the millions of missing ones, but also referring to the vast complexity of ecological interactions above- and below-ground (Hawksworth 2001; O’Brien et al. 2005; Mueller et al. 2007; Blackwell 2011; Tedersoo et al. 2014b). The mycorrhizal symbiosis is one of the most common forms of mutualistic relationships in nature. Plant, fungal, and bacterial partners interact in intricate ways in the rhizosphere contributing in large extent to nutrient recycling and carbon sequestration (Smith and Read 2010; Bonfante and Genre 2010). Mycorrhizal fungi are scattered across the fungal tree of life, where most can be found in four main fungal groups. The Glomeromycota is a fungal phylum exclusively composed of fungi forming arbuscular mycorrhizae (AM) (Schüßler et al. 2001; Redecker and Raab 2006). Fungi forming ectomycorrhizae (EcM) appeared more recently and are spread across the largest fungal phyla: the Basidiomycota, with about 50 known lineages; the Ascomycota, with about 40 known lineages; and the /endogone1, /endogone2 and /densospora lineages in the Mucoromycotina of Zygomycota (Tedersoo et al. 2010; Tedersoo and Smith 2013; Chap. 6). AM fungi interact with the vast majority of plant biota (ca. 80% land plants), but are taxonomically species-poor (Bonfante and Genre 2010; Öpik et al. 2013; Pagano et al. 2016), whereas EcM fungi are more diverse, but only interact with a limited number of families of mostly woody plants, including Pinaceae, Fagaceae, Betulaceae, Salicaceae, Myrtaceae, Nothofagaceae, Dipterocarpaceae, and some members of the Rosaceae and Fabaceae, which dominate many tree communities of temperate, tropical, alpine, and boreal ecosystems of the Northern and Southern hemispheres (Malloch et al. 1980; Alexander 2006; Smith and Read 2010; Chaps. 19 and 20). Mycorrhizal fungi are key players in all terrestrial ecosystems except Antarctica. By tracing their evolutionary and ecological history, we can better understand the role of past environmental and biotic events in shaping distribution and diversity patterns that we observe today. In addition, host association data can provide interesting points of view for the emergence and conservation of mycorrhizal host communities over evolutionary time scales.

In this review, we seek to highlight phylogenetic approaches that may have valuable applications in current mycorrhizal phylo- and biogeographic research. Rather than enlisting different available methods (reviewed, for instance, in Ronquist and Sanmartín 2011), we conceptualize and discuss relevant methodological advancements, also recounting major methodological biases. We emphasize some examples from both EcM and AM fungi, and other organismal groups; particularly in the light of increasingly popular phylogenetic methods for species delimitation, divergence time estimation, and analyses involving the inference of historical distribution ranges, diversification rates, and trait evolution.

1.2 Barcoding, Species Delimitation, and the Need for Robust Phylogenies

Species are fundamental units for most biodiversity and evolutionary studies (Sites and Marshall 2004; de Queiroz 2007). Recognizing and defining species is a crucial task, not only for high-level species richness assessments and systematic studies, but also for population-level, intraspecific studies. For fungi, this task is particularly challenging given that more than 1.5 million fungal species are thought to exist (Hawksworth 2001; Blackwell 2011), yet less than 10% have a formal taxonomic description. Due to the fact that fungi live out most of their existence hidden from human eyes, the vast majority of undocumented species will likely remain that way. Before the rise of the molecular, PCR-based era in the 1990s (White et al. 1990), fungal taxonomy and systematics relied heavily on the morphological description of taxa. Many studies have shown this to be insufficient in describing fungal biodiversity (Taylor et al. 2000, 2006). More recently, fungal molecular phylogenies of related taxa commonly reveal the existence of species complexes composed of multiple cryptic lineages (Geml et al. 2006, 2008; Matute 2006; Jargeat et al. 2010; Leavitt et al. 2011a, b, 2015; Sánchez-Ramírez et al. 2015a, b). The term “cryptic species” is actually broadly applicable in fungi. Besides the common failure to recognize species by morphological means alone, their hidden existence in the environment makes them generally difficult to study.

For more than twenty years, ribosomal DNA (rDNA) applications have been truly revolutionary in fungal research (White et al. 1990; Bruns et al. 1991; Horton and Bruns 2001; Schoch et al. 2012). In part, this is due to the efficiency of PCR primers that consistently amplify rDNA regions across many different fungal groups (Bruns and Gardes 1993; Schoch et al. 2012), and the variability and phylogenetic resolution found in different portions of the rDNA region (Bruns et al. 1991). For instance, the internal transcribed spacer region (ITS1-5.8S-ITS2 or simply ITS) is widely recognized as a species-level marker for fungi (Schoch et al. 2012). Other rDNA genes such as the 28S and 18S large and small subunits (LSU and SSU), usually provide resolution at higher taxonomic ranks due to being conserved, given their role as functional genes in the genome (Bruns et al. 1991; Bruns and Shefferson 2004). Early molecular studies were largely based on PCR and electrophoretic RFLP patterns, which were quickly replaced by DNA sequencing. With the availability of DNA sequence data, new advances were made in the fields of systematics and evolution through phylogenetics, and ecology through DNA barcoding. Early rDNA-based phylogenetics were a true turning point in fungal systematics, showing that many morphological characters did not reflect shared ancestry (e.g. homoplasy) (Hibbett et al. 1997a, 2000; Moncalvo et al. 2000, 2002; Hibbett and Binder 2002). In parallel, efforts on databasing initiatives (Bruns et al. 1998; Kõljalg et al. 2005; Abarenkov et al. 2010) and massive production of ITS sequences (Schoch et al. 2012; Hibbett 2016), have enhanced the accuracy and efficiency of fungal identification and classification (Peay et al. 2008).

In spite of their importance, transcendence, and widespread use in fungal biology, rDNA sequence data suffer from several deficiencies. In the case of ITS, which is probably the most popular, levels of intra- and inter-specific variation can be very different within and between species (Nilsson et al. 2008). Such inter-taxon differences may have an effect in sequence identity cut-off-based species delimitations, often used in environmental meta-barcoding studies, leading to an over- or under-estimation of diversity. ITS intra-genomic variability has also been reported, where multi-allelic copies have been found within the same genome (Simon and Weiß 2008; Lindner and Banik 2011). Base-calling errors and missing data in DNA chromatograms can arise in such cases, affecting downstream analyses such as multiple sequence alignments and phylogenetic analyses. While high levels of DNA variation is desirable in barcoding genes, too much variation, particularly at indel positions, can be problematic during alignments, causing misleading phylogenetic inference. In AM fungi, ITS is too variable and does not resolve species boundaries (Stockinger et al. 2010). Instead, the preferred barcoding rDNA gene is the small subunit (SSU), which has a resolution power at the family or order level in other groups (Stockinger et al. 2010; Bruns and Taylor 2016). Protein-coding genes, on the other hand, are generally easier to align because most positions along exons are subject to selection. They also have a wide range of phylogenetic scalability. For instance, amino acid alignments can be used for deep phylogenetics (families, orders, classes), while synonymous codon positions and introns often have enough variation for more recent times-scales (species, populations). In principle, this has led initiatives, such as the Assembling the Fungal Tree of Life (AFTOL) project, to explore other genomic loci for resolving relationships among fungi (Blackwell et al. 2006). For some groups, such as Cortinarius, Laccaria, and Amanita, the ITS region has some utility in recognizing species, but protein coding genes such as rpb2 and tef1 are considered superior when defining intra- and inter-species boundaries (Frøslev et al. 2005; Sheedy et al. 2013; Sánchez-Ramírez et al. 2015a, b; Chap. 13). Nonetheless, protein-coding genes may be more challenging to work with at the production stage, given that primer pairs do not consistently amplify across taxa, or are too unspecific (Schoch et al. 2012). Moreover, in spite of their potential for environmental studies, protein-encoding loci are not widely accepted as barcoding markers among the community of fungal ecologists. In part, this might be due to the fact that protein-coding sequences, such as rpb2, are taxonomically not that well represented in nucleotide databases, and can be difficult to produce. However, protein-coding genes have promising advantages that might be worth exploring further for fungal environmental studies (Větrovský et al. 2016).

Simple barcoding for species identification usually involves the use of the Basic Local Alignment Search Tool (BLAST; Altschul et al. 1990). BLAST is an algorithm that efficiently compares sequences to pre-existing databases, retrieving the best matching records. If the sequence is unknown, such as those from environmental samples, this method provides a way to define its taxonomic affinity, and potential geographic ties, depending on the availability of meta-data in the database used in the search. One way to determine if a sequence or a group of sequences belong to a molecular operational taxonomic unit (MOTU) is to establish a sequence identity cut-off (Nilsson et al. 2008; Fig. 1.1). Empirical studies looking at fungal intraspecific ITS variation have shown that a conservative threshold typically averages around 2–3% pairwise differences, with substantial variation between species (Nilsson et al. 2008; Hughes et al. 2009; Schoch et al. 2012). The process of clustering MOTUs can be fully automated given a set of aligned sequences using the barcode gap discovery method (ABGD; Puillandre et al. 2011; Fig. 1.1), or with sequence clustering algorithms such as UCLUST (Edgar 2010; Fig. 1.1). For instance, Tedersoo et al. (2014b) used ABGD to search for similarity thresholds to distinguish MOTUs in a data set of 757 sequences of Sebacinales. Moreover, considering the rate of molecular substitution in ITS and the rate of speciation, MOTUs may be over or underestimated depending on species-specific population histories (Ryberg 2015). A more sensitive approach would be, of course, to use data directly from phylogenetic trees to delimit species. This is, in fact, the purpose of models such as General Mixed Yule Coalescent (GMYC, Pons et al. 2006; bGMYC, Reid and Carstens 2012) and the Poisson Tree Processes (PTP/bPTP/mPTP, Zhang et al. 2013) that use branching patterns in a phylogenetic tree to determine, which branching events correspond to coalescence events (intraspecific) or speciation (interspecific) (Fig. 1.1). These models, however, rely heavily on the topology of the tree and assume that species are reciprocally monophyletic (Fujisawa and Barraclough 2013; Ryberg 2015). For example, species with high population sizes will generally have longer coalescence times, leading to incomplete lineage sorting (Sánchez-Ramírez et al. 2015b). The accuracy of the GMYC model has been shown to drop in these situations, based on simulation data, leading to cases where species are not monophyletic (Fujisawa and Barraclough 2013). For these and other reasons it is generally recommended to use multiple approaches and data sources for species delimitation (Camargo et al. 2012; Carstens et al. 2013). Several studies with lichens (Leavitt et al. 2011a, b, 2015) and the basidiomycete Tulasnella (Linde et al. 2014) have shown the discriminatory power of multiple multi-locus approaches for fungal species delimitation.

Fig. 1.1
figure 1

Schematic presentation of species delimitation approaches. (a) Environmental sequence clustering based on a predefined similarity threshold. White circles represent species-specific barcodes. Grey circles represent intraspecific variation. (b) Similarity threshold estimation based on the ABGD method. (c, d) identification of population-level coalescent (grey dotted lines) and speciation (black lines left) branching events is the basis for GMYC-type species delimitations. Nodes representing the most recent common ancestor of each species are marked by a black circle; (c) represents the GMYC model, where trees are ultrametric, while (d) represents the PTP model, where branches represent substitutions. (e, f) Multi-locus species delimitation based on the multi-species coalescent model. Gene trees (grey dotted lines) from unlinked loci are used to infer the speciation history (species tree) and determine the most likely species delimitation scheme; (f) is an extension that allows incorporating information from continuous trait data

Most of the approaches mentioned above were developed specifically for single-locus data. However, there have been efforts to introduce the application of multi-locus approaches for the recognition of fungal species [e.g. the Genealogical Concordance Phylogenetic Species Concept (GCPSC); Taylor et al. (2000, 2006)]. Moreover, with the drop in sequencing costs and the availability of technology for massive sequencing, whole-genome approaches will be more common for phylogenetic reconstruction (Philippe et al. 2005; Cutter 2013). Biogeographic and phylogeographic analyses can benefit from large amounts of data in the sense that more robust phylogenies typically will lead to more solid evidence when testing hypotheses. Multi-locus data sets not only increase the number of molecular characters; they can also be used to delimit species more robustly using coalescent methods. Rannala and Yang (2003) introduced a model in which independent gene genealogies are fitted within the speciation history of a group of related species, into what it is now called a species tree. Species tree models (e.g. the multi-species coalescent) can take into account sources of gene tree incongruence (e.g. incomplete lineage sorting), while inferring species divergences and demographic histories (Rannala and Yang 2003; Liu et al. 2009; Heled and Drummond 2010). Different implementations of this model are now used to delimit species: Bayesian Phylogenetics and Phylogeography, or BP&P (Fig. 1.1e; Yang and Rannala 2010, 2014; Rannala and Yang 2013; Yang 2015); *BEAST model testing (Grummer et al. 2014); DISSECT and STACEY (Jones 2014) (for a recent review on coalescent-based species delimitation methods, see Mallo and Posada 2016). Novel extensions of BP&P are able to integrate phenotypic or geographic data together with genetic data to delimit species (Fig. 1.1f; iBPP, Solís-Lemus et al. 2015). Such advancements will probably bring systematists closer to the much-desired integrative taxonomy (Dayrat 2005; Will et al. 2005). Up to now, this approach has been used to delimit species in arthropods (Huang and Knowles 2016), reptiles (Pyron et al. 2016), and fish (Dornburg et al. 2016). However, we can envision environmental and geographic data, such as pH, humidity, elevation, latitude and longitude, being used as characters, in addition to genetic data, to delimit fungal species. Initiatives like UNITE that make these kinds of meta-data more easily accessible are therefore very valuable (Tedersoo et al. 2011).

1.3 Reconstructing the Geographic Past: Phylo- and Biogeography

Phylogeography and biogeography are two deeply connected disciplines focusing on the spatial dimension of biodiversity at different temporal scales. As a more recent field in evolutionary biology, phylogeography is concerned with explaining the geographic distribution of genetic diversity within a species (Avise et al. 1987; Avise 2000). This is accomplished by integrating approaches from phylogenetics and population genetics to tackle problems that lie between macro- and micro-evolutionary scales (Avise 2009; Knowles 2009; Hickerson et al. 2010). Biogeography, on the other hand, is largely phylogeny-based and it is primarily concerned with distribution patterns of species or higher taxonomic ranks (Ronquist 1997; Ree and Sanmartín 2009). Both disciplines have phylogenetic roots, and as such, share many methodological approaches to infer geographic patterns.

Ancestral-state reconstruction (ASR) methods are widely used in phylo- and biogeorgaphic research (Ree and Sanmartín 2009; Ronquist and Sanmartín 2011). The basic concept behind ASR involves the projection of character states, that can be discrete or continuous (e.g. a saprotrophic vs. mycorrhizal ecology, latitude, elevation, fruiting morphology, etc.), backwards in time. Character states are usually assigned to sampled biological units (i.e. species or individuals) that occupy the tips of a phylogeny. These character states are then traced back from the tips down through the branches of the tree (for a recent review see Joy et al. 2016). In a geographic context, characters states can be either discrete and spatially defined areas (Maddison et al. 1992; Pagel 1994, 1999) or numeric geographical coordinates represented as continuous characters (Lemmon and Lemmon 2008; Lemey et al. 2010; Bloomquist et al. 2010).

ASR of discrete character states can be evaluated in a number of ways. Maximum parsimony optimizes the reconstruction to the minimum number of state transitions (e.g. Swofford and Maddison 1987). On the other hand, statistical methods apply maximum likelihood or Bayesian inference to optimize a stochastic continuous-time Markov-chain (CTMC) matrix (e.g. Pagel 1994, 1999; Pagel et al. 2004), which is used to describe transition probabilities between states or areas (O’Meara 2012; Sanmartín et al. 2008; Fig. 1.2). Ancestral area reconstruction methods often use a parsimony-based approach, such as the dispersal-vicariance analysis (DIVA) (Ronquist 1997). Others employ CTMC models, which are usually more parameter rich, such as the dispersal-extinction-cladogenesis (DEC) analysis (LAGRANGE, Ree and Smith 2008). Other CTMC models have been optimized to situations when the number of areas is large (BayArea, Landis et al. 2013) or include parameters that account for “jump” dispersal (e.g. founder-events) (BioGeoBears, Matzke 2013). At least two different programs, BioGeoBears and RASP (Yu et al. 2015), allow running different models within the same computing framework. Other packages allow the co-estimation of discrete CTMC phylogeographic models together with phylogenetic inference and divergence times (BEAST, Lemey et al. 2009; Drummond et al. 2012). These ancestral area reconstruction analyses differ in what processes they model. For example, if they allow for species to be distributed over more than one area (e.g. LAGRANGE) or not (e.g. Sanmartín et al. 2008). Perhaps they include separate processes for inheritance of ancestral areas at speciation events (e.g. LAGRANGE), or just include changes in ancestral areas along branches (e.g. BayArea). It is therefore important to consider what processes may be most important in any particular group to effectively formulate a hypothesis that is testable with these methods. Continuous geographic characters (e.g. geographic coordinates) have been more often used to infer phylogeographic patterns at a shallower temporal scale (Lemmon and Lemmon 2008; Lemey et al. 2010), where dispersal is more closely linked to the movement of individuals rather than rare discrete long-distance events. Empirical studies, for instance, have applied diffusion models to track the evolutionary dynamics of epidemic outbreaks (Lemey et al. 2010), human language (Bouckaert et al. 2012), and Pleistocene refugia (Gavin et al. 2014; Bryson et al. 2014). Some of these trait-evolution models are largely based on Brownian motion (BM), where traits evolve by small random changes that are controlled by a diffusion rate parameter (Felsenstein 1988). Extensions of BM allow traits to evolve constrained by a selection rate termed alpha, and are known as Ornstein–Uhlenbeck (OU) models (Hansen 1997; Butler and King 2004). OU models allow for the identification “preferred” trait optima, but they have been poorly explored in a geographic context.

Fig. 1.2
figure 2

Molecular dating and biogeographic reconstruction of the Amanitaceae. (a) Continuous-Time Markov Chain (CTMC) matrix depicting the rate of transition/dispersal between six biogeographic states; (b) time-calibrated molecular phylogeny of the Amanitaceae showing reconstructed and extant areas; and (c) lineage-through-time (LTT) plot of the phylogeny, excluding non-mycorrhizal taxa (clade highlighted in brown in the phylogeny) and the saprotrophic outgroup Limacella (highlighted in black). Grey concentric rings in A mark the Pliocene, Oligocene, and Palaeocene; white rings mark the Pleistocene, Miocene, Eocene, and the late Cretaceous; the green ring marks the time of the potential transition from the saprotrophic to mycorrhizal habit in Amanita (ca. 88–99 Myr). Altogether 789 LSU sequences of Amanitaceae with geographic distribution data, available (Sept. 2016) in NCBI were downloaded and aligned in Mafft (Katoh and Standley 2013). A maximum-likelihood tree was built with RAxML (Stamatakis 2014) and terminal species were delimited with mPTP (Zhang et al. 2013), keeping those with different species names in each cluster to compensate the lack of species-level resolution in LSU. A single sequence was randomly selected to construct a time-calibrated tree in BEAST v1.82 (Drummond et al. 2012), using a relaxed clock model with log-normal distribution, and calibrating with a normal distribution the nodes of the section Caesareae and the subgenus Amanita, based on Sánchez-Ramírez et al. (2015a). Terminal biogeographic states were recoded based on meta-data from the sequences and maximum likelihood reconstructions were performed using the functions make.mkn, find.mle, and asr.marginal in R package diversitree (FitzJohn 2012). The LTT plot is based on 1000 trees from the posterior distribution (in grey) and their mean (dotted line)

Both discrete and continuous biogeographic and phylogeographic inference can also be achieved with standalone programs for ASR such as BayesTraits (Pagel et al. 2004), or through an integrated interphase such as R (R Core Team 2015), where packages like ape (Paradis et al. 2004) and diversitree (FitzJohn 2012) include built-in functions for ASR. R implementations are practical because they facilitate the direct manipulation and visualization of phylogenetic data. In addition, other visualization tools such as SPREAD (Bielejec et al. 2011) and Phylowood (Landis and Bedford 2014) are also important contributions that bring ease to the interpretation of complex historical phylo-/biogeographic processes.

Compared to plants and animals, fungal phylogeography and biogeography are considered to be in their early stages (Lumbsch et al. 2008; Beheregaray 2008; Peay and Matheny 2016). Some of the earliest phylogeny-based biogeographic analyses of fungi have concisely pointed out the importance of geography and molecular data to explain patterns of divergence and speciation—e.g. between intersterile groups in Pleurotus (Vilgalys and Sun 1994) and plant pathogens from the genus Gibberella (O’Donnell et al. 1998). Because many fungi interact with other organisms such as plants and animals, their distribution patterns have often been associated to those of their hosts (Bisby 1943; Horak 1983; Lichtwardt 1995). Nonetheless, mixed results have led to considerable debate on whether fungi exhibit biogeographic structure. Global and regional-scale studies have shown extensive cryptic lineages in EcM groups, some of which exhibit geographic structure, and associations with endemic hosts (e.g. Sato et al. 2012 or specifically in Amanita, Geml et al. 2006, 2008; Cai et al. 2014; Sánchez-Ramírez et al. 2015a, b; Boletus, Feng et al. 2012; Inocybaceae, Matheny et al. 2009; Laccaria, Wilson et al. 2016a; Chap. 13; Pisolithus, Martin et al. 2002; Strobilomyces, Sato et al. 2007; Tuberaceae, Bonito et al. 2013). In contrast, other fungal biogeographic studies have shown more recent distribution patterns, typically explained by episodes of long-distance dispersal (Moyersoen et al. 2003; Kauserud et al. 2006; Moncalvo and Buchanan 2008; Geml et al. 2011) or cosmopolitan distribution (Pringle et al. 2005; Queloz et al. 2011). For EcM fungi, this notion implies that while some species have limited dispersal due to environmental constraints (e.g. Peay et al. 2007, 2010b, 2012; Sato et al. 2012), others are able to successfully establish propagules carried over transoceanic distances to exotic regions, where they might outcompete native fungi (Moyersoen et al. 2003; Vellinga et al. 2009; Pringle et al. 2009; Geml et al. 2011; Wolfe and Pringle 2012; Sato et al. 2012). Furthermore, a seemingly common observation has been a consistent association between continentally disjunct groups of fungi (e.g. between Asia and North America) (Wu et al. 2000; Mueller et al. 2001; Shen et al. 2002; Chapela and Garbelotto 2004; Oda et al. 2004; Geml et al. 2006, 2008; Halling et al. 2008; Cai et al. 2014; Sánchez-Ramírez et al. 2015a; Wilson et al. 2016a; Chap. 13; Fig. 1.2), similar to patterns found in temperate plants from the same regions (Wen 1999; Xiang et al. 2000; Qian and Ricklefs 2000). In several cases, there are hints of Palaeotropical origins and recent temperate diversification in different EcM groups (Matheny et al. 2009; Wilson et al. 2012, 2016a; Feng et al. 2012; Cai et al. 2014; Sánchez-Ramírez et al. 2015a; Fig. 1.2). This observation contrasts with higher species diversity seen in the Northern Hemisphere, both for EcM hosts and symbionts (Malloch et al. 1980; Halling 2001; Matheny et al. 2009). Alexander (2006) argues that the EcM habit is likely to have evolved in a Palaeotropical environment given that a putative EcM host ancestor in the Dipterocarpaceae is likely to have originated in Gondwana about 135 Ma (Moyersoen 2006), predating the Cretaceous radiation of other EcM Angiosperms (Lidgard and Crane 1988; Berendse and Scheffer 2009, but see Chap. 19). In the case of EcM fungi, this kind of evidence would support long-term host co-migration (e.g. Halling 2001; Põlme et al. 2013), followed by allopatric speciation/divergence and/or regional adaptation. If fact, studies in EcM groups commonly suggest patterns consistent with trans-continental dispersal over land masses (Halling et al. 2008; Geml et al. 2006, 2008; Matheny et al. 2009; Wilson et al. 2012; Bonito et al. 2013; Sánchez-Ramírez et al. 2015a; Fig. 1.2a, b), and at least two different studies have explicitly tested biogeographic models which support historical world-wide co-dispersal scenarios with plants (e.g. the Boreotropical hypothesis sensu Wolfe 1978; also see Lavin and Luckow 1993) in the Sclerodermatinae (Wilson et al. 2012) and Amanita sect. Caesareae (Sánchez-Ramírez et al. 2015a). Compared to EcM fungi, it is unfortunate that far less biogeographic studies have been conducted in AM fungi given the need to understand their evolutionary history (Chaudhary et al. 2008). While strict historical biogeographical studies are still scarce in AM fungi, macroecological studies have suggested that while geography and local environment explain some of the variance in global community structures, many operational taxa are globally distributed (Chap. 7; Kivlin et al. 2011; Öpik et al. 2013; Davison et al. 2015; see Bruns and Taylor 2016 for a counter-argument). Moreover, phylogeographic analyses based on coalescent approaches have also been applied to test hypotheses about the cosmopolitan distribution of the AM species Glomus mosseae, indicating a recent expansion within the last few hundred years (Rosendahl et al. 2009).

1.4 Molecular Dating and the Fossil Record

Molecular phylogenies are necessary to study patterns and processes at macro- and micro-evolutionary scales (Avise and Wollenberg 1997; Barraclough and Nee 2001). The phylogeny takes the form of a topology or a graph depicting relationships between biological units, which includes basic information such as: (1) branch lengths indicating the amount of evolutionary change, (2) internal nodes or branching points, and (3) terminal nodes or tips, which represent sampled biological units. An important property of phylogenetic trees is that branch lengths can be represented as evolutionary time (Fig. 1.2). This notion comes from the molecular clock concept, introduced by Zuckerkandl and Pauling (1965), which states that the amount of molecular substitutions between taxa are proportional to the amount of time elapsed since their last common ancestor (Kumar 2005). Given this principle, branches and nodes in the tree can be scaled to time units and become “ultrametric” (i.e. every tip is equidistant to the root). In ultrametric trees, nodes represent divergence times in species-level trees, and coalescent times in population-level genealogies (Drummond and Bouckaert 2015). There are only a limited number of ways to time-calibrate ultrametric trees: (1) by calibrating terminal nodes (tip-calibration) based on known sampling dates; (2) by applying and assuming a known molecular clock (e.g. a molecular substitution rate, usually in the scale of number of substitutions per site per time unit—e.g. Myr, yr, generations); or (3) by calibrating internal nodes based on evidence from either the fossil record or geotectonic events.

Tip-calibration is practical for time-stamped samples of rapidly evolving organisms such as viruses, and some cases where ancient DNA is available (Drummond et al. 2003). Based on prior knowledge of substitution rates, a molecular clock model can be used to scale phylogenetic branches. In fact, the rate of substitution/mutation of some genes such as animal mitochondrial (Brown et al. 1979) and plant chloroplast genes (Clegg et al. 1994) have been well characterized across taxa and within populations. In contrast, substitution rates for rDNA genes, which are commonly used for fungal phylogenies, are quite variable between and within lineages (Bruns and Szaro 1992; Moncalvo et al. 2000; Berbee and Taylor 2001), deeming their use for time-calibrating fungal phylogenies impractical on their own, unless rates are specifically calculated for particular groups. However, rates need to be estimated from independent evidence in the first place, such as the fossil record or biogeographic events. In this case, internal node calibrations can be used as reference points to infer both molecular clock rates and divergence times (Kumar 2005; Ho and Phillips 2009).

Clock models for divergence time estimation have progressed over the last couple of decades (Welch and Bromham 2005; Ho 2014). The first clock model was conceptualized and implemented as a strict molecular clock (i.e. an evenly ticking clock), where every substitution happened at a constant rate within any given lineage. A new generation of “relaxed” clock models were later introduced allowing substitution rates to vary between lineages, accommodating for more biologically realistic evolutionary scenarios (Drummond et al. 2006; Drummond and Suchard 2010; Ho and Duchêne 2014). One of the most popular phylogenetics programs (with over 10,000 citations in last 10 years), and probably the de facto standard for time-tree analysis is the BEAST package (Drummond and Rambaut 2007; Drummond et al. 2012). Some of the advantages of BEAST over other software are that (1) phylogenies are co-estimated with divergence times, (2) the uncertainty in divergence time estimation can be measured, and (3) it offers flexibility and extensibility for model specification (Drummond and Rambaut 2007; Drummond et al. 2012; Bouckaert et al. 2014). Since the introduction of BEAST, together with the steady growth of DNA sequence data, time-calibration has regained much attention in phylogenetic research (Robinson 2006).

The fossil record can be a valuable source for studying ancestral distributions (Meseguer et al. 2015). Besides helping track the distribution of taxa and their extinct relatives in space and time (Lieberman 2003), ages of fossils can be used as priors for time-calibrating molecular phylogenies (Ho and Phillips 2009). In addition, well-sampled records can also provide information about extinction rates (Jablonski 2008) and data that can be used in newer models for divergence-time estimation. For example, the fossilized birth-death process uses “total evidence” from the fossil record, integrating information from rates of speciation, extinction, and fossilization (Heath et al. 2014; Zhang et al. 2016).

Compared to many plant and animal records, the fungal fossil record is depauperate (Berbee and Taylor 2010). One of the reasons is because most fungal structures are made of soft tissues that decay rather quickly, making fossilization difficult (Pirozynski 1976; Taylor et al. 2014). Another challenge is their correct classification and taxonomic assignment, which is largely based on reproductive structures that rarely fossilize. Given these difficulties, mycologists have often relied on secondary calibrations (e.g. using age constraints based on a previous time-calibration), where they either estimate a “taxonomically broad” phylogenetic tree with external fossil calibrations to generate prior calibration distributions (e.g. Skrede et al. 2011; Wilson et al. 2012, 2016a; Tedersoo et al. 2014b; Cai et al. 2014; Sánchez-Ramírez et al. 2015a; Zhao et al. 2016), use node ages from other studies based on fossil records or molecular clocks (Jeandroz et al. 2008; Matheny et al. 2009; Ryberg and Matheny 2011, 2012), or fix the global substitution/mutation rate of a particular gene (Rosendahl et al. 2009; Bonito et al. 2013; Sánchez-Ramírez et al. 2015b). Using a diverse array of approaches can lead to inconsistent results and lack of reproducibility. In order to aid mycologists in their quest to time-calibrate molecular phylogenies we provide a condensed overview of potentially useful fossils (e.g. well-identified fossils representing the minimum age of certain groups; Table 1.1).

Table 1.1 List of fungal fossils with potential for phylogenetic time-calibration (based on Taylor et al. 2014)

Time-calibrated phylogenies can be used for testing hypothesis about the evolutionary history of organisms, in particular those with poor or no fossil record. For instance, some of the oldest putatively Glomeromycota fossils from the Ordovician (ca. 460 Ma, Redecker et al. 2000) and Devonian (Dotzler et al. 2006) suggest that AM fungi where already associated with plants during the transition from an aquatic to a terrestrial environment (Malloch et al. 1980; Brundrett 2002). Molecular dating studies endorse this hypothesis, placing the origin of AM fungi between 400 and 600 Ma (Simon et al. 1993; Berbee and Taylor 2001; Lucking et al. 2009). On the other hand, EcM symbiosis has evolved more recently. Based on molecular clock estimates using SSU branch lengths of several EcM lineages and evidence from the fossil record (e.g. permineralized EcM from the Eocene; LePage et al. 1997), Bruns et al. (1998) suggested that EcM symbioses could have radiated independently and simultaneously during the Tertiary (e.g. Eocene-Oligocene). This was when the climate initiated its cooling trend leading to a more temperate environment dominated by members of the Pinaceae and Fagales (Wolfe 1978; Prothero and Berggren 1992; Zachos et al. 2001). In contrast, Halling (2001) proposed that EcM symbiosis evolved together with the Pinaceae—most of which are able to form EcM associations—during the Jurassic (ca. 180 Ma; Gernandt et al. 2008), and subsequently diversified further as a result of angiosperm radiation in the Cretaceous (125–65 Ma; Berendse and Scheffer 2009). Using time calibrated phylogenies of nine EcM lineages of Agaricales, Ryberg and Matheny (2012) rejected both hypotheses on the basis of discordant clade ages, most of which occurred after the Jurassic, during the Cretaceous and Palaeogene periods (from ca. 100–40 Ma). However, other groups, such as the truffles (Tuberaceae) might have had an older evolutionary history, spanning from the late Jurassic (ca. 156 Ma) and later diversifying during the Cretaceous and Palaeogene (Jeandroz et al. 2008; Bonito et al. 2013). Supporting the findings of Ryberg and Matheny (2012), our case analysis indicates that the EcM habit in Amanita could have evolved during the late Cretaceous (ca. 90 Ma; Fig. 1.2b). The genus Amanita is a particularly interesting system to study the evolution of EcM symbiosis. First, its close saprotrophic sister group is known (Wolfe et al. 2012b); second, several Amanita genomes have been sequenced to date, which may facilitate comparative assessments of genomic machineries between mycorrhizal and non-mycorrhizal species (Nagendran et al. 2009; Wolfe et al. 2012a; Hess and Pringle 2014; Hess et al. 2014; van der Nest et al. 2014); and third, the growing number of biogeographic and phylogeographic studies (Oda et al. 2004; Geml et al. 2006, 2008; Cai et al. 2014; Sánchez-Ramírez et al. 2015a, b, c; Zhang et al. 2015) providing ample resources for phylogenetic inference.

1.5 Tracking Species Richness Over Time and Space: Diversification Rates

Speciation and extinction are the ultimate processes responsible for biodiversity build-up (Hubbell 2001; Ricklefs 2004, 2007). One way to look at patterns of variation in species diversity throughout evolutionary time is to measure the amount of fossil species (Jablonski 2008; Liow 2010). However, not all taxonomic groups have reliable fossil record. Alternatively, branching patterns in (well-sampled) molecular phylogenies can be interpreted or modeled as diversification processes (Barraclough and Nee 2001; Nee 2006; Ricklefs 2007; Purvis 2008). Yule (1925) developed one of the first models of phylogenetic bifurcation. This model described a process of pure birth (speciation) where lineages split independently from one another at a constant rate—usually termed λ. Later, a model that allowed both birth and death of lineages (the birth-death model) was introduced (Raup 1985; Nee et al. 1992). This incorporated an additional parameter controlling the rate at which lineages went extinct—usually termed μ. From then on, several different macro-evolutionary models have been developed with the intention of better describing plausible diversification scenarios (Moen and Morlon 2014).

Another way to assess how nodes in the phylogenetic tree are distributed relative to the root or the tips is to plot the cumulative number of lineages as a function of time (Nee 2006). This is known in the literature as a lineage-through-time (LTT) plot (Fig. 1.2c). A different method is the γ-statistic, which measures the branching patterns in molecular phylogenies numerically by quantifying the degree of deviation from a constant-rate expectation (γ = 0) (Pybus and Harvey 2000). Positive values (γ > 0; significant if >1.96 at 95% confidence) indicate that nodes are closer to the tips (“exponential” LTT plot), which reflect recent diversification bursts or background extinction, whereas negative γ values (γ < 0; significant <−1.64 at 95% confidence) indicate that nodes in the tree are closer to the root (“logistic” LTT plot), suggesting a rapid burst of diversification followed by a slowdown (Pybus et al. 2002; Crisp and Cook 2009). In fact, the latter signature is a common pattern observed in phylogenies from different plants and animals (McPeek 2008; Morlon et al. 2010). These slowdown patterns can be attributed to many different scenarios, including diversity-dependence due to niche saturation (Rabosky and Lovette 2008; Phillimore and Price 2008; Etienne et al. 2012), time-dependency (Stadler 2011), and protracted speciation (Etienne and Rosindell 2012).

Other models measure diversification rates as a function of character states, and are particularly useful for biogeographic and trait-evolution analyses. They have ‘blossomed’ into a family of trait-dependent models that range from a basic binary (two discrete states) model (BiSSE, Maddison et al. 2007), to multi-states (MuSSE, FitzJohn 2012), to geographic states (GeoSSE, Goldberg et al. 2011), to continuous traits (QuaSSE, FitzJohn 2010), all of which are implemented in likelihood and Bayesian frameworks in the R package diversitree (FitzJohn 2012). The most recent addition is the hidden-state speciation and extinction (HiSSE) model, which attempts to correct for potentially unaccounted states that could also influence rates of diversification (Beaulieu and O’Meara 2016). Furthermore, complex mixtures of diversification rate-variation can also be detected using reversible-jump MCMC algorithms, such as BAMM (Rabosky 2014).

Unsurprisingly, most empirical analyses have focused on patterns in plants and animals (McPeek 2008; Butlin et al. 2009), leaving microorganisms understudied. Nevertheless, diversification analyses can prove to be powerful approaches to understand diversity dynamics though evolutionary time in groups with a poor fossil record, such as fungi. Likewise, these approaches can help mycologists test evolutionary hypotheses regarding the role of hosts, soil chemistry, geography, and other underlying mechanisms driving fungal diversification.

A long-standing question in EcM evolution has been the high degree of functional convergence and the high relative diversity of different EcM groups (Malloch et al. 1980; Bruns et al. 1998; Hibbett et al. 2000; Halling 2001; Brundrett 2002). Although most EcM fungi converge into a similar ecological niche, they are scattered across the fungal tree of life occurring independently in at least 80 phylogenetic lineages (Tedersoo et al. 2010; Chap. 6). Substantial variation in species diversity can be found among EcM lineages/clades; for instance, the /cortinarius lineage comprises >2000 species, while only 1–4 species are found in the /meliniomyces lineage and other helotialean groups (Tedersoo et al. 2010). If we hypothesize that all EcM lineages/clades originated around the same time (i.e. have the same clade age), and assume that they diversify at a constant rate, then we would expect clades to have similar number of species (same clade size). In contrast, observations of EcM richness pattern suggest otherwise; either that (a) EcM clades originated at different times and have diversified under a constant rate, or that (b) EcM clades originated within a similar time-frame but their diversification rate is variable within and/or among clades, or that (c) both times and rates vary. The relationship between clade age and clade size has been studied and discussed broadly for plant and animal clades, with a more or less generalized conclusion that both variables are decoupled, supporting variable diversification rates among clades (Ricklefs 2006; Rabosky et al. 2012; Scholl and Wiens 2016). Ryberg and Matheny (2012) showed that both ages and rates of diversification vary among several EcM clades of Agaricales. They also tested the hypothesis of a potential initial rapid radiation followed by a diversification slowdown tentatively caused by rapid niche occupation, as shown to occur in other taxa (Rabosky and Lovette 2008; Etienne et al. 2012). However, models of rate constancy could not be rejected. If the degree of statistical power was adequate, this observation could imply that diversification in these fungi is not driven primarily by niche specialization, which can happen where there is competition (Ackermann and Doebeli 2004), probably depending on other sources of speciation, such as allopatry or parapatry (Ryberg and Matheny 2012). Compared to EcM fungi, AM fungi appear to have much lower rates of diversification. While formal diversification rate analyses are still lacking in AM fungi, it is possible to estimate the net diversification rate based on an approximation by Magallón and Sanderson (2001). Based on a clade size of 200–300 spp. (Öpik et al. 2013), a crown age of 460 Ma (Redecker et al. 2000), and the assumption of a constant diversification rate, the Glomeromycota would have a net diversification rate of about 0.01 speciation events per million years. Notably, this in 3–14 times lower than speciation rates in some EcM agarics (Ryberg and Matheny 2012).

Other diversification studies in fungi focusing on trait or character state evolution have found support for different trends. For instance, a study in the saprotrophic agaric Coprinellus found a correlation between higher rates of lineage accumulation and trait diversification as evidence of an adaptive radiation linked to the appearance of auto-digestion as a key innovation trait (Nagy et al. 2012). Another study on gasteriod fungi showed that net diversification rates (e.g. speciation–extinction) in several gasteroid lineages are elevated in comparison to non-gasteroid lineages across the Agaricomyces (Wilson et al. 2011). While this result was not significant, equilibrium frequency calculations that incorporated the one-way (irreversible) transition of gasteromycetation suggested a trend toward increased gasteroid diversity. Furthermore, after finding evidence of multiple independent dispersal events from the New World to the Old World in the Caesar’s mushrooms (Amanita sect. Caesareae), Sánchez-Ramírez et al. (2015a) tested the hypothesis of increased diversification after the colonization of a new environment, finding evidence that supports both higher speciation and extinction in New World compared to Old World lineages. This suggests higher species turnover in the New World, which is probably coupled with recent drivers of diversification such as glacial cycles (Sánchez-Ramírez et al. 2015a, b).

Most of these studies have focused on isolated clades, making broader comparisons difficult. Nevertheless, a recent initiative known as the Agaricales Diversification (aDiv; https://sites.google.com/site/agaridiv2013/home) project seeks to generate a LSU and rpb2 data set for about 3000 species of Agaricales. A primary goal of the project is to explore diversification drivers within key ecological groups in the Agaricales (Szarkándi et al. 2013). An order-level time-calibrated phylogeny can offer a unique opportunity for testing broader hypotheses on EcM evolutionary ecology.

1.6 Evolutionary Ecology

The field of evolutionary ecology is concerned with studying the evolution of species interactions, specifically targeting biological or environmental processes that influence changes in diversity over evolutionary time scales. An obvious step towards understanding the evolution of modern ecological roles is to integrate phylogenetic information with geographic and environmental data (Ricklefs 2004; Wiens and Donoghue 2004; Pinto-Sánchez et al. 2014), as well as community assembly data (Webb et al. 2002; Cavender-Bares et al. 2009; Cadotte and Davies 2016). Having a historical view about biodiversity is crucial to advance our understating of past and present-day patterns.

A well-recognized spatial pattern across the tree of life is the general latitudinal diversity gradient (LDG), which shows that species richness is highest at tropical latitudes and decreases towards the poles (Hillebrand 2004; Brown 2014). While this latitude-diversity relationship has been observed for many groups of plants and animals over past decades, these patterns in soil fungi have only recently been recognized. Studies have shown that the general LDG holds for soil fungi as a whole (Tedersoo et al. 2014a), but for EcM fungi the diversity peaks at temperate latitudes (Tedersoo and Nara 2010; Tedersoo et al. 2012, 2014a; Chap. 18). This means that EcM species richness is higher in temperate regions, compared for instance to tropical or boreal regions. From a macro-evolutionary perspective, processes such as speciation, extinction, and dispersal are the ultimate contributing factors to the LDG (Mittelbach et al. 2007). Recent studies based on phylogenetic and ecological data have linked higher species richness in the temperate region to higher rates of diversification. For example, Kennedy et al. (2012) found that a single temperate clade in the genus Clavulina had 2.6 times higher speciation rate that the rest of the group, which was inferred to be mainly tropical. Sánchez-Ramírez et al. (2015c) used the time-calibrated phylogeny of Amanita sect. Caesareae and continuous geographic data to test for the role of latitude as a driver of diversification. Model testing, together with continuous trait evolution, suggest that lineages diversify at a faster rate at temperate latitudes compared to tropical climate, supporting the findings of Kennedy et al. (2012). Further support has come from a study in the genus Russula that reported overall higher net diversification rates in extra-tropical lineages with continual transitions between temperate and tropical environments (Looney et al. 2016). In the light of the growing evidence in favor of higher rates of diversification in the temperate region, it would be interesting to test if these bouts of temperate diversification occurred simultaneously during the Miocene cooling trend that coincided with orogenic activity around the globe and an increase in dominance of EcM plants (Askin and Spicer 1995; Potter and Szatmari 2009; Chap. 20). Until now, these studies have focused on geographic traits, either discrete or continuous, but studies in other groups (e.g. amphibians) have shown how climatic data can be coupled with comparative phylogenetic methods to look at how ecological niches evolve in relation to diversification processes (Pyron and Wiens 2013).

Macro-ecological studies also indicate that other groups of fungi have particular patterns of diversity that vary, not only with respect to latitude, but also with respect to other environmental factors such as temperature or precipitation (Arnold and Lutzoni 2007; Öpik et al. 2013; Tedersoo et al. 2014a; Treseder et al. 2014; Davison et al. 2015). For instance, compared to EcM fungi, AM and endophytic fungi appear to be more diverse in tropical and subtropical regions, and their communities seem to be more differentiated (Arnold 2007; Arnold and Lutzoni 2007; Öpik et al. 2013). A similar pattern rises for fungal pathogens and saprotrophs (Tedersoo et al. 2014a). Both differences in diversity patterns across fungal taxa, as well as differences in their ecological modes, might reflect a historical relationship with their ancestral ecological niche. In spite of heavy criticism about sampling, Treseder et al. (2014) found support to the hypothesis that tropical environments tend to harbor older taxa, compared to younger taxa that tend to reside at more temperate ones.

Another topic of interest regarding evolutionary ecology of mycorrhizal fungi is the co-evolution of host associations. While AM fungi are obligate mutualists with a wide range of hosts (Giovannetti and Sbrana 1998; Bonfante and Genre 2010), EcM fungi can be either generalists or specialists (Molina et al. 1992; Bruns et al. 2002), some of which may be potentially facultative (Baldrian 2009). Examples of high specificity in EcM associations are interactions between certain fungi and myco-heterotrophic plants (Bidartondo and Bruns 2005; Bidartondo 2005), the bolete genus Suillus and members of the Pinaceae (Kretzer et al. 1996; Bidartondo and Bruns 2005; Nguyen et al. 2016), and alder-associated mycobiota (Tedersoo et al. 2009; Kennedy et al. 2011, 2015; Põlme et al. 2013). Studies in some of these systems can provide insights into the co-evolution of plant-fungal interactions. High degree of symbiont affinity in the Monotropoideae (Ericaceae) has been evidenced by unique congeneric associations among different myco-heterotrophic plant lineages (Bidartondo and Bruns 2002). Waterman et al. (2011) studied how pollinators and symbionts affected speciation, coexistence, and distribution in orchid species. They show that shifts in symbiont partners are important for plant coexistence, but not for speciation, as most closely related species tend to have the same EcM partners (Waterman et al. 2011, 2012). Given that specific EcM and bacterial communities can be found in Alnus-dominated forests, several studies have focused on how the natural history of the host has affected the distribution of the symbionts. Kennedy et al. (2011) compared community assemblages in different Alnus-dominated locations in Mexico and other locations in the Americas. They found a striking similarity in the composition of MOTUs between the different locations, giving support to the hypothesis of host-fungal co-dispersal. Similarly, Põlme et al. (2013, 2014) found that the evolutionary history of Alnus species had a strong impact on EcM and bacterial (Frankia) community structure.

Historical biogeographic analyses have also evidenced host co-dispersal based on phylogenetic and ASR data (Matheny et al. 2009; Wilson et al. 2012; Sánchez-Ramírez et al. 2015a). A number of studies have focused on evolutionary transitions between gymnosperm and angiosperm hosts, with the aim of investigating ancestral host preferences in EcM fungi. A period of rapid speciation in Leccinum has been associated to different host changes from an Angiosperm ancestor (den Bakker et al. 2004). Also, Matheny et al. (2009) found that members of the EcM family Inocybaceae were ancestrally associated to Angiosperms and later switched to members of the Pinaceae. Similar patterns have been observed in the Hysterangiales (Hosaka et al. 2008), the truffle family Tuberaceae (Bonito et al. 2013), as well as in gasteroid boletes (Sclerodermatineae) (Wilson et al. 2012). Ryberg and Matheny (2012) showed some support for older EcM agaric clades (e.g. Hygrophorus) being ancestrally associated with Pinaceae hosts, whereas younger clades (e.g. Inocybaceae and Cortinarius) were ancestrally associated with Angiosperms.

1.7 Methodological Biases and Caveats

As a word of caution, we point out a number of biases and caveats, some that can arise through the application of specific methodology, and others that are inherent of mycological fieldwork and fungal biology in general. We emphasize that some of these points should be considered when making phylo-/biogeographic inferences or interpretations of observed patterns.

Mycorrhizal fungi spend most of their life cycle dwelling in the rhizosphere underground. EcM fungi, for instance, only produce fruiting bodies (on which morpho-taxonomy is based on) during a narrow time-frame (e.g. one or two months). Also, fruiting bodies decay rather quickly, which can further narrow the observational window. Other EcM groups such as members of the Sebacinales or Thelephorales are rarely collected in the field, but have been found to be quite abundant underground (Gardes and Bruns 1996; Dahlberg 2001; Tedersoo et al. 2006; Porter et al. 2008). AM fungi are only known to reproduce asexually, which can complicate morphological species delimitations and sampling strategies (Helgason and Fitter 2009). These limitations can have implications for fungal diversity assessments in general, but specially in a geographic context. Probably due to logistic reasons and bureaucracy in certain regions, fungal taxa from different geographic locations have been studied disproportionately. Given that significantly more biodiversity research is conducted in North America and Europe (Wilson et al. 2016b), mycorrhizal fungi from these regions (most of the times in temperate ecosystems) have been sampled and studied more often than others (Dahlberg 2001; Dickie and Moyersoen 2008). Historically, many more fungal biodiversity surveys (Mueller et al. 2007) and genetic analyses (Douhan et al. 2011) have been conducted in temperate regions than in tropical ones. This systematic sampling bias can thus generate gaps in our understanding of the distribution of fungal taxa, which can have profound effects in the proposition and assessment of biogeographic hypotheses.

Human-mediated dispersal is also well documented in fungi. In particular, AM and EcM fungi can easily travel with soil or roots of trees that have been translocated for reforestation practices, food production, or as ornamental plants (Dunstan et al. 1998; Vellinga et al. 2009). Many of them are able to invade and spread in non-native habitat (Pringle et al. 2009). These events can also introduce noise in biogeographic inference. Nevertheless, long-distance dispersal is also a natural process by which spores travel long distances and establish in distant locations (e.g. Moyersoen et al. 2003; Bonito et al. 2013; Geml et al. 2011).

Reliable data on host association is often unavailable for many mycorrhizal species, which can directly affect studies on host coevolution. Accurately identifying hosts can be tedious if done through inoculation studies, or misleading if done in the field. While the most straight-forward way to identify a host is by molecular means (Muir and Schlötterer 1999; Sato et al. 2007; Wilson et al. 2007), this step is not done routinely. This also concerns the correct and detailed annotation of sequences deposited in GenBank, which often lacks isolation source data, including geographic location and host (Vilgalys 2003; Bidartondo et al. 2008; Tedersoo et al. 2011). Establishing such connections is critical to effectively investigate how photobionts shape biogeographic patterns in fungi.

Phylogenetic analyses are known to be subject to sampling issues. For instance, the accuracy in dated molecular phylogenies strongly depends on taxonomic sampling (Heath et al. 2008), in particular for clades used for fossil-calibration (Linder et al. 2005). The shape of a phylogenetic tree can change significantly if the sampling is non-random or incomplete, which is often the case in fungal phylogenies (Hibbett et al. 2011; Ryberg and Matheny 2011; Hinchliff et al. 2015), affecting the interpretation of diversification processes (Pybus and Harvey 2000; Pybus et al. 2002; Ryberg and Matheny 2011). Most models for ASR are also susceptible to sampling, as state or location transition probabilities will tend to be more accurate in better sampled phylogenies. It is also unclear how robust models including cladogenetic processes are to missing branching events in the tree. BiSSE-type analyses have also undergone scientific scrutiny for their high false-positive rates due to phylogenetic pseudo-replication (Maddison and FitzJohn 2015; Rabosky and Goldberg 2015), and issues with the size of phylogenetic trees (Davis et al. 2013). Many of these issues can be controlled for by doing simulations (e.g. Rabosky and Goldberg 2015), or by applying models that directly account for the issues (e.g. HiSSE, Beaulieu and O’Meara 2016). Similarly, implementations of other models, such as BAMM have also been critiqued (Moore et al. 2016). Finally, although phylo-community methods are appealing approaches to answer many questions about mycorrhizal (and fungal/microbial) biogeography, most of the species-level data comes from ITS sequences, which are often problematic to align over distantly related taxonomic groups.

1.8 Conclusions and Future Directions

For about a decade, fungal (and microbial) biogeography has been regarded as a young, emerging field (Martiny et al. 2006; Lumbsch et al. 2008; Douhan et al. 2011). Nonetheless, it is clear that a slow but steady body of knowledge is amassing around our understanding of the dimensions of fungal diversity. This includes the notion that the ‘everything-is-everywhere’ paradigm does not hold generally true, and that an historical perspective is necessary to understand the diversity of any given area (Peay et al. 2010a, 2016; Peay and Matheny 2016).

The steady stream of sequence data promises to supply us with information to solve many of the questions on fungal biogeography. However, most sequences come from the ITS region, which is difficult to use in wider taxonomic contexts, and the necessary meta-data for studies of biogeography and host associations are often lacking. Some of the major challenges relate to accurate and biologically meaningful species delimitations, as well as the generation of robust phylogenies for molecular dating and testing biogeographic hypothesis. Genomic initiatives (e.g. Kohler et al. 2015) and cheaper sequencing (i.e. next-generation sequencing) will undoubtedly provide unprecedented molecular resources for phylogenomics, that together with better models, promise to solve many of the current downfalls.

Although there are only a handful of studies about diversification and evolutionary ecology of fungi (many of which are focused on EcM symbioses), results seem to be consistent with biogeographic scenarios that point to recent high diversification rates in temperate regions, compared to more ancient and historically conserved tropical patterns (Kennedy et al. 2012; Treseder et al. 2014; Sánchez-Ramírez et al. 2015c; Looney et al. 2016). We envision future phylogeny-based studies incorporating more ecological data (e.g. physiological, climatic, environmental, and geographic traits) and future meta-barcoding-based studies incorporating more phylogenetic data. The first point could be achieved, in part, by making use of geographic information system resources, such as WorldClim (http://www.worldclim.org), while the second could be achieved by implementing supertree approaches (e.g. Beaulieu et al. 2012; Qian and Jin 2016). With regards to EcM phylogeography, there is virtually no study to date (Google searched on Oct. 18, 2016) that has used geographic-coordinate-based diffusion models to infer ancestral distribution ranges in fungi. Similarly, there are very few studies that have applied palaeo-distribution modeling to infer refugial areas during the last glacial maximum (Sánchez-Ramírez et al. 2015b; Feng et al. 2016), in spite of its great potential to understand EcM population dynamics during the last tens of thousands of years.