Introduction

Aspergillus and Penicillium species are widespread and common, and particular species impact human activities either positively or negatively. Aspergillus versicolor is a cosmopolitan species often used as an indicator species for sick building syndrome (Anderson et al. 2011). Additionally it produces the mycotoxin sterigmatocystin (Veršilovskis and Saeger 2010) in several commodities. Aspergillus fumigatus, Aspergillus flavus, Aspergillus terreus (Walsh et al. 2011), and Talaromyces (syn. = Penicillium) marneffei (Sudjaritruk et al. 2012) are recognized opportunistic pathogens of humans, especially those with weakened immune systems. Aspergillus niger is used for the production of enzymes and citric acid (e.g., Dhillon et al. 2012), Aspergillus oryzae is used to produce soy sauce, Penicillium roqueforti ripens blue cheeses and penicillin is produced by Penicillium chrysogenum (e.g., Xu et al. 2012). Heat-resistant Byssochlamys species can grow in pasteurized fruit juices (Sant’ana et al. 2010). These genera along with a few others comprise the family Trichocomaceae. Because of their impact on our lives, it is often important to have accurate identifications of these species.

Traditional taxonomic studies rely primarily on phenotypic analysis (e.g., Raper and Fennell 1965). Sequence analysis for taxonomic purposes in Aspergillus and Penicillium began in the late 1980s (e.g., Edman et al. 1986; Olsen et al. 1986; Dupont et al. 1990; Logrieco et al. 1990) using rRNA sequences, and advanced to analysis of protein-coding genes soon after PCR methods (Glass and Donaldson 1995), thermal cycler units, dye-based sequence reactions, and automated sequence readers became readily available. The Barcode of Life Consortium (http://connect.barcodeoflife.net/group/fungi) and PubMLST (http://pubmlst.org) provide frameworks for advancing the use of DNA for species identifications. In this communication, I discuss some of the factors affecting the possibility of detecting and identifying the species of the Trichocomaceae using DNA sequences.

What is a species

At one time it was generally accepted that species resulted from a special act of creation, but Darwin argued in the 1860 s that individuals were mutable, mutations were heritable, and that natural selection could change the essential character of the species. Since then the nature of species and how to detect the limits of each has been a regular topic of debate (e.g., de Queiroz 2007; Doolittle and Bapteste 2007; Ereshefsky 2011). How we define and recognize species is the critical issue in DNA-based identification.

In a colloquial sense, a species is an essentialistic entity, meaning that a set of characteristics or properties exist, all of which any individual of the species must possess, and these properties or characteristics define the species. The set of useful or harmful characteristics is the practical reason for knowing species and the set of characteristics may be simple or complex. For Link in 1809, A. flavus was defined by the possession of the aspergillum and yellow-green color, while in 1869 A. niger Tiegh. was defined by the aspergillum and dark brown to black color. On the basis of a far more sophisticated phenotypic analysis, Raper and Fennell (1965) recognized 148 distinct Aspergillus taxa. Pitt et al. (2000) accepted 185 taxa in Aspergillus 35 years later. Some of the species number increase came from discovery of germplasm not seen before and some came from the reexamination and splitting of previously known but variable species. Raper and Thom (1949) accepted 139 Penicillium species, while 51 years later Pitt et al. (2000) accepted 239 species.

As a matter of formality, fungal species are established under the rules of botanical nomenclature. The requirements are that a type specimen must be permanently preserved in a publicly accessible herbarium, a description of the features that make this new species identifiable must be formulated and the description and the designation of the type must be published in a non-ephemeral medium. Different techniques have been introduced to aid taxonomic decisions such as SEM (Kozakiewicz 1989), secondary metabolite analysis (Frisvad and Filtenborg 1989), or single locus DNA sequence analysis (Peterson 2000; Samson and Frisvad 2004), but the morphology of the type specimen has always held the position of primacy in defining a fungal species.

A significant shift in taxonomic philosophy was published by Taylor et al. (2000). They proposed adoption of Simpson’s (1951) evolutionary species concept, “…a phyletic lineage (ancestral-descendent sequence of interbreeding populations) evolving independently of others, with its own separate and unitary evolutionary role and tendencies…” for fungi. Rather than rely on morphological distinctions seen in a type specimen, they endorsed genealogical concordance phylogenetic species recognition (GCPSR) as the method to identify the species. Recent studies have validated the view of Taylor et al. (2000). Dettman et al. (2003, 2006) have correlated GCPSR with mating potential in the genus Neurospora. In Trichocomaceae, Peterson and Horn (2009) using GCPSR analysis showed that the morphologically defined Penicillium cinnamopurpureum included elements of six phylogenetically distinct species. Perrone et al. (2011) recognized the cryptic species Aspergillus awamorii on the basis of phylogenetic analysis of DNA sequence data and refer to the other eight described species in the “A. niger aggregate” as morphologically indistinguishable. Soares et al. (2012) described three new species in the A. flavus clade on the basis of genealogical concordance analysis of DNA sequences, even though those species were barely distinguishable morphologically from A. flavus. Morphological analysis sometimes leads to the grouping together of disparate species due to convergence of characters, and in other cases morphological divergence does not occur as quickly as genotypic divergence and the species are morphologically indistinguishable. Genealogical concordance phylogenetic analysis is a consistent measure of genetic divergence and an objective means of asserting the limits of species.

Nomenclature

Nomenclature rules for fungi are defined in the International Code of Botanical Nomenclature (ICBN). The ICBN is updated periodically with the most recent changes due to be published in summer 2012. The question of nomenclature arises because Aspergillus and Penicillium are pleomorphic fungi, living either solely as the anamorphic (asexual) phase, the teleomorphic (sexual) phase, or the holomorph (both phases) and some species have only one known state. Major monographers of Aspergillus and Penicillium (Raper and Thom 1949; Raper and Fennell 1965) in the middle of the twentieth century believed that asexual morphology revealed phylogenetic relationships and placed all fungi with Penicillium anamorphs in that genus whether they also made a teleomorph or not. They used the same philosophy in their study of Aspergillus. However, following nomenclatural rules that gave precedence to names based on sexual reproductive morphology, Benjamin (1955) segregated certain sexually reproducing Penicillium species into the genus Talaromyces, Stolk and Samson (1971) renamed some Penicillium species into the genus Hamigera, and Stolk and Scott (1967) moved some Penicillium species into Eupenicillium based on sexual reproduction.

Using the same nomenclature, Horn et al. (2009) discovered the teleomorph of A. flavus and introduced the name Petromyces flavus for the fungus. Peterson (1992) gave the names Aspergillus thermomutatus and Neosartorya pseudofischeri to a single species opportunistically causing human disease that displays both Aspergillus and Neosartorya morphologies. In a naming system based solely on phenotypic distinctions and dried herbarium-type specimens, the assignment of closely related species into different genera on the basis of whether they possess a teleomorph or not was a rational way to deal with identification. Consequently taxonomists of the Trichocomaceae have created a system of dual names for most species in the family.

The most recent changes to the nomenclatural code (Knapp et al. 2011; McNeil and Turland 2011; Norvell 2011) allow phylogenetically related organisms to be treated together under a single generic name even when they display different morphs and it now allows anamorphic and teleomorphic names to compete equally on the basis of priority. Name changes, based solely on compliance with nomenclatural rules, will likely occur in the Trichocomaceae based on the application of DNA sequence-based phylogenetic analysis (e.g., Samson and Frisvad 2004; Houbraken et al. 2007; Peterson 2000; Peterson 2008; Varga et al. 2011). Samson et al. (2011) applied the single name philosophy to one major clade of Trichocomaceae consolidating Talaromyces and Penicillium subgenus Biverticillium species under the name Talaromyces, regardless of whether the species form the Talaromcyes state or not. They rely on phylogenetic rather than phenotypic analysis of relatedness to guide naming. The other primary clade with Penicillium morphology contains Penicillium species from three subgenera and Eupenicillium species. Some authors have started treating this clade under the singular name Penicillium (Houbraken et al. 2011; Peterson et al. 2011). The application of phylogenetic analysis and unitary naming will result in stable, reliable, and intuitively understandable taxonomy.

Barcodes

Barcoding initially used cytochrome oxidase 1 (COI) for barcode identification of animals. Seifert et al. (2007) tested the applicability of COI for barcoding species of Penicillium subg. Penicillium (Samson and Frisvad 2004) and found that not all species possessed distinct genotypes at the COI locus. Seifert (2009) and Eberhardt (2010) have discussed some of the candidate loci for a fungal barcode. Schoch et al. (2012) have a detailed proposal to use internal transcribed spacer (ITS) region as the universal fungal barcode locus. While limitations exist for this molecule, the ease of amplification, primer universality, and very good level of sensitivity make the prospects for correct or near-correct identification in the entire fungal kingdom likely. Some of the Trichocomaceae species can be barcode identified using ITS but for others the ITS barcode is problematic.

Peterson (2008) as part of a larger study examined 13 isolates of four phenotypically defined species placed in Aspergillus section Cervini by Raper and Fennell (1965). That study used multi-locus DNA sequence data and GCPSR analysis to define species. Of six isolates phenotypically identified initially as Aspergillus kanagawaensis (Fig. 1), two are A. kanagawaensis, two belong in Aspergillus parvulus, and two other belong in an undescribed species; of the three isolates morphologically identified as Aspergillus cervinus, two are in A. cervinus and one is in A. parvulus; of the two Aspergillus nutans isolates, one is A. nutans and one represents an undescribed species. Phenotypic analysis fails to correctly identify these isolates and species.

Fig. 1
figure 1

Maximum parsimony tree of 13 Aspergillus section Cervini isolates using sequence data from four loci, numbers above tree nodes are bootstrap values. The morphological identification for each isolate is written out in the colored boxes. The colored boxes show the limits of species determined from GCPSR analysis of four-locus DNA data (Peterson 2008). Nuclear ITS, beta tubulin, calmodulin, and RNA polymerase beta genotypes are given arbitrary letter designations and listed by the isolates. ITS genotypes are shared by different species and do not always uniquely identify species; protein-coding loci are variable in some species but species do not share genotypes and each MLST genotype correctly identifies the isolates

Using the phenotypic taxonomy of Raper and Fennell (1965), the phylogenetic systematics of Peterson (2008) and arbitrarily assigning a letter to each unique ITS region genotype (Table 1, Fig. 1), phenotypic A. kanagawaensis isolates displayed three genotypes, A. cervinus isolates displayed one genotype, A. parvulus isolates display two genotypes, and A. nutans isolates display two ITS genotypes. However, A. parvulus and A. cervinus share ITS genotype C, and A. parvulus and A. nutans share the ITS genotype B rendering identification in either case uncertain. In terms of phylogenetic species, A. kanagawaensis isolates have one ITS genotype, A. cervinus isolates have one ITS genotype, A. nutans isolates have one ITS genotype, A. parvulus isolates have three ITS genotypes, undescribed sp.1 has two ITS genotypes, and undescribed sp.2 has one ITS genotype. Using phylogenetic species concepts, A. cervinus and A. parvulus share ITS genotype C, and A. kanagawaensis and Aspergillus sp. 1 share genotype E, rendering identification uncertain. Using the data from Aspergillus section Cervini, (Table 1, Fig. 1) and ITS, beta tubulin, calmodulin, and DNA-dependent RNA polymerase beta loci, each of the species (either phenotypic or phylogenetic) have MLST genotypes that are not overlapping between species, rendering a specific identification for each isolate.

Table 1 Phenotypic species identification, phylogenetic species identifications, ITS barcode genotypes and ITS, BT, CF, and RPB MLST genotypes for isolates of Aspergillus section Cervini

The protein-coding region genotypes often display polymorphisms within species, and genotypes are not shared between species making them a potential target for barcoding. However the lack of primer sequence universality (Schoch et al. 2012) and presence of variable sized introns limits the application of protein-coding regions for all fungi barcoding.

MLST

Maiden et al. (1998) proposed a DNA sequence multi-locus sequence typing system for isolates of Neisseria meningitidis. The system is appealing in fungi too because DNA sequences are measurably accurate and unambiguous, easily transmittable via the web, and can be compared at a central worldwide data center. MLST is not confounded by different locus-associated rates of evolution nor does it rely on a single locus that could be affected by horizontal gene transfer. Balajee (2008) suggested a two-tiered identification system for species of Aspergillus, particularly the medically important species. Sectional or group identification would be based on ITS sequences, and species or infraspecific taxa would be identified on the basis of protein-coding gene sequences. This has the positive value of being able to take advantage of MLST schemes for medically important species that have been worked out using various loci. Short et al. (2011) produced a six-locus MLST scheme for Fusarium species found in hospital environments. Balmas et al. (2010) produced a MLST system for studying soil species of Fusarium.

At this time, the alpha taxonomy of the Trichocomaceae is well advanced, and current investigations are mostly utilizing GCPSR analysis of DNA sequence data to determine species limits in revisions of the sections. The data largely exist for a three- or four-locus identification system in non-medically important Aspergillus species using beta tubulin, calmodulin, DNA-dependent RNA polymerase, and nuclear ITS loci. MLST will provide an accurate and reliable DNA sequence-based identification system for the Trichocomaceae.