IMA Genome-F 5A

Draft genome sequence of Ceratocystis eucalypticola

Many species of Ceratocystidaceae have been studied extensively due to their significance as pathogens of agricultural forestry crops (Roux & Wingfield 2009), as well as their impact on natural woody ecosystems (Roux et al. 2007, Lee et al. 2015). The family includes eight genera accommodating more than 80 phylogenetically closely related but often morphologically similar species (van Wyk et al. 2013, de Beer et al. 2014, Mayers et al. 2015). These genera, as defined by de Beer et al. (2014), are clearly delimited based on a combination of phylogenetic inference, morphology, and in some cases distinct ecological partitioning. For example, the genus Huntiella accommodates species that are saprobes, whereas most species of Ceratocystis are pathogens of angiosperms.

Species of Ceratocystis include important pathogens of trees propagated as non-natives in plantations in the tropics and Southern Hemisphere (Roux & Wingfield 2013, Wingfield et al. 2013), including Eucalyptus (Laia et al. 1999, Roux et al. 2000, Roux et al. 2001, Barnes et al. 2003). Isolates of Ceratocystis from Eucalyptus in South Africa, related to those known to kill trees in these plantations, were described as the new species C. eucalypticola (van Wyk et al. 2012). The taxonomy of this species and some of its relatives remains open to debate (Fourie et al. 2015, Oliveira et al. 2015) and there is a clear need to gain a deeper understanding of species boundaries as well as issues relating to its biology and ecology.

The aim of this study was to sequence the genome of C. eucalypticola in order to allow for genomic analysis and comparisons with already available genomes from other Ceratocystidaceae (Wilken et al. 2013, van der Nest et al. 2014a, b). These comparisons, coupled with phylogenomic studies, will be useful in resolving the taxonomic debates ongoing in Ceratocystis. Additionally, these resources will provide a platform to characterise factors associated with pathogenicity and fungal ecological strategy, as well as provide an opportunity to study the evolution of these traits within a family of closely related fungi.

Sequenced Strain

South Africa: Mpumalanga: Sabie, isol. ex artificial wound of Eucalyptus, July 2002, M. van Wyk & J. Roux (CMW 9998, CBS 124017, PREM 60169 — dried culture).

Nucleotide Sequence Accession Number

The draft genome sequence of Ceratocystis eucalypticola (CMW9998) has been deposited at DDBJ/EMBL/GenBank with the accession number LJOA00000000. Here we describe version LJOA01000000.

Methods

Genomic DNA of Ceratocystsis eucalypticola isolate CMW 9998 was sequenced using the Illumina HiSeq 2000 platform at the UC Davis Genome Centre, University of California, Davis (CA). Two libraries with medium insert sizes of 350 bp and 530 bp were used to generate pair-end sequences with read lengths of approximately 100 bases. CLC Genomics Workbench v. 7.5.1 (CLCBio, Aarhus, Denmark) was used to analyse the NGS-data, as well as to perform a de novo assembly. Reads of low quality (P error limit of 0.05) and/or terminal nucleotides were trimmed, with the remaining reads being retained for assembly. De novo genome assembly was performed with a word size of 64, and a bubble size of 100 bp. The raw reads were mapped back to the contigs in order to perform scaffolding, with an estimated paired distance ranging from 147 to 654 bp. The completeness of the assembled genome was assessed using the Benchmarking Universal Single-Copy Orthologs tool, BUSCO (Software v. 1.1b1 of May 2015) (Simão et al. 2015). BUSCO was performed on all contigs >1 kb, making use of the fungal lineage dataset. Lastly, the assembly was submitted to AUGUSTUS (Stanke et al. 2004) in order to predict putative open reading frames (ORFs) using the gene models of Fusarium graminearum.

Results and Discussion

The draft genome of Ceratocystis eucalypticola had an estimated size of 31 260 284 bases, with an N50 of 116 489 and an average coverage of 80x. The de novo assembly generated a total of 2129 contigs, of which 961 were longer than 1 kb. The average scaffold length was 14 676 bases, with the largest scaffold being 726 305 bases in size. The GC content of the assembly was 47.9 %. BUSCO analysis defined the genome as 97 % complete with 1408 single-copy orthologs present, while 92 BUSCO orthologs were found to be duplicated. Only 30 BUSCO orthologs were missing or fragmented out of the possible 1438 groups searched from the fungal lineage dataset. Gene prediction resulted in a total of 7353 putative ORFs, at a gene density of approximately 235 ORFs/Mb.

The assembled genome of C. eucalypticola, with a size of approximately 31.2 Mb and 7353 ORFs, closely resembled those of other sequenced Ceratocystis spp. (Wilken et al. 2013, van der Nest et al. 2014a, b). The fungus had a genome size most similar to that of C. fimbriata (29.4 Mb, 7266 ORFs) and C. manginecans (31.7 Mb, 7494 ORFs), while the C. albifundus genome is slightly smaller (27.2 Mb) with only 6967 genes predicted. The genome size statistics for Ceratocystis are similar to those found in the genus Huntiella, with H. omanensis and H. moniliformis being 31.5 Mb and 25 Mb, respectively (van der Nest et al. 2014a, b). Gene predictions for Huntiella showed a slightly higher gene density when compared with those of Ceratocystis (243 ORFs/Mb on average), with H. omanensis having a density of 266 ORFs/Mb and H. moniliformis having 280 ORFs/Mb, respectively.

The availability of these resources will provide opportunities to answer questions regarding the similarities and differences seen in this genus. The genome data for Ceratocystis s. str. is particularly useful for exploring the species boundaries through phylogenomic analysis. This, in combination with genomic comparisons to other species within Ceratocystidaceae, will lead to a better understanding of the evolution of pathogenicity and other life history traits.

Authors: C. Trollip*, T.A. Duong, M.A. van der Nest, I. Barnes, M.J. Wingfield, and B.D. Wingfield

*Contact: Conrad.Trollip@fabi.up.ac.za

IMA Genome-F 5B

Draft genome sequences of Chrysoporthe cubensis and C. deuterocubensis, causal agents of Eucalyptus canker

Fungi in the genus Chrysoporthe are economically important pathogens of plantation grown Eucalyptus spp. and other members of Myrtales (Gryzenhout et al. 2004). These fungi cause serious stem canker disease, referred to as Chrysoporthe canker (Gryzenhout et al. 2004), and are predominantly found in tropical and subtropical parts of the world where conditions favour their growth (Alfenas et al. 1982). Although Chrysoporthe canker has been successfully managed through propagation of disease resistant clones, it is still considered a threat since it can lead to substantial economic losses where resistance breeding is not in place (Wingfield 2003).

There are eight described species of Chrysoporthe, including C. cubensis (Hodges et al. 1976, 1979, Rodas et al. 2005), C. doradensis (Gryzenhout et al. 2005), C. inopina (Gryzenhout et al. 2006), and C. hodgesiana (Gryzenhout et al. 2004) which occur in South and Central America. Chrysoporthe deuterocubensis is primarily found in Southeast Asia, although introductions to Australia, China, Hawaii, and parts of East Africa have been suggested (Myburg et al. 2002, Nakabonge et al. 2006, van der Merwe et al. 2010). Chrysoporthe zambiensis and C. syzygiicola are found in Zambia (Chungu et al. 2010), while C. austroafricana is found only in southern Africa (Wingfield et al. 1989, Gryzenhout et al. 2004).

Chrysoporthe cubensis, C. deuterocubensis, and C. austroafricana have been isolated from native trees, suggesting that these fungi might be native to the regions where the trees are found (Myburg et al. 2003, Rodas et al. 2005, Heath et al. 2006). Interestingly, despite the distinct geographical distribution, these species seem to be closely related (Chungu et al. 2010, van der Merwe et al. 2010). Unfortunately, there is limited available information regarding the evolution of Chrysoporthe species.

The genome of C. austroafricana was recently sequenced and released in the public domain (Wingfield et al. 2015). This is the only whole genome sequence resource available for the genus Chrysoporthe. Additional genomic resources could enhance further understanding of the biology of this assemblage of fungi, through genome-wide comparisons. The aim of this study was thus to sequence the genomes of C. cubensis (isolate CMW 10028) and C. deuterocubensis (isolate CMW 8650).

Sequenced Strains

Chrysoporthe cubensis: Colombia: Timba, 2002, C.A. Rodas (CMW 10028, PREM 58311 — dried culture).

Chrysoporthe deuterocubensis: Indonesia: Sulawesi, 2001, M.J. Wingfield (CMW 8650, CBS 115719, PREM 58018 — dried culture).

Nucleotide Sequence Accession Number

The Chrysoporthe cubensis isolate number CMW 10028 and C. deuterocubensis isolate CMW 8650 Whole Genome Shotgun projects were deposited in GenBank with accession numbers LJCY00000000 and LJDD00000000, respectively. The version described here is LJCY00000000 and LJDD00000000 for C. cubensis and C. deuterocubensis, respectively.

Materials and Methods

Genomic DNA was extracted using a modified protocol (Steenkamp et al. 1999) from isolate CMW 10028 (Chrysoporthe cubensis) and CMW 8650 (C. deuterocubensis) mycelium obtained from 7-d-old fungal cultures. The Illumina MiSeq paired-end sequencing protocol at the Agricultural Research Council (ARC, South Africa) was used to obtain whole genome sequence data. To assemble the paired-end MiSeq sequences, CLC Genomics Workbench v. 7.5.1 (CLCBio, Aarhus, Denmark) was used. The assemblies were subsequently scaffolded using SSPACE v. 2.0 (Boetzer et al. 2011), which included unused MiSeq reads from the CLC Genomics Workbench assembly. The AUGUSTUS (Stanke & Morgenstern 2005) protein coding gene prediction software was used for de novo annotation of protein coding gene models using Neurospora crassa and Fusarium graminearum as references. Genome completeness was assed using BUSCO (Benchmarking Universal Single-Copy Orthologs) which utilizes single-copy orthologs to predict genome completeness (Simão et al. 2015).

Results and Discussion

The approximate size of the Chrysoporthe cubensis genome was 42 624 564 base pairs (bp) including gaps, while the C. deuterocubensis assembly was 43 969 123 bp in size. These figures were calculated from 989 and 2 599 scaffolds for C. cubensis and C. deuterocubensis, respectively. From the AUGUSTUS analysis, 12 435 gene models were predicted from the C. cubensis genome, while 13 098 gene models where predicted in the C. deuterocubensis genome. Despite the differences observed in the assembly statistics, the CEGMA analysis for genome completeness in both C. cubensis and C. deuterocubensis was predicted at 95.16 %.

Compared to the closely related C. austroafricana genome, that of C. cubensis was slightly smaller, while the C. deuterocubensis genome was slightly larger. Similarly, C. cubensis had fewer predicted gene models than either C. austroafricana or C. deuterocubensis (Table 1). In terms of gene content, the Chrysoporthe spp. genomes were slightly larger than that of the distantly related Cryphonectria parasitica (43.9 Mb, 11 184 gene models) (https://doi.org/genome.jgi.doe.gov/Crypa2/Crypa2.info.html) although the genome sizes were relatively close, and the model filamentous fungi, Neurospora crassa (39.9 Mb, 10 082 gene models) (Galagan et al. 2003) and Magnoporthe grisea (40.3 Mb, 11 109 gene models) (Dean et al. 2005).

Table 1 Comparison of whole genome sequencing assembly features for Chrysoporthe cubensis, C. Deuterocubensis, and C. austroafricana (Wingfield et al. 2015).

The significance of differences observed in genome size and the number of predicted genes among the Chrysoporthe species is not known. However, it might be speculated that their geographical distribution could have played a role in the evolution of these genomes. The availability of these genomes will make it possible to answer such phylogeographic questions and will aid in addressing questions relating to the biology of Chrysoporthe species.

Authors: A.M. Kanzi, B.D.Wingfield, M.J.Wingfield, E.T. Steenkamp, and N. A. van der Merwe*

*Contact: albe.vdmerwe@up.ac.za

IMA Genome-F 5C

Draft nuclear genome sequence for Davidsoniella virescens, the causal agent of sapstreak disease in hardwood trees

The newly recognized genus Davidsoniella (de Beer et al. 2014) includes species previously accommodated in the Ceratocystis coerulescens s. lat. clade. Davidsoniella virescens is a tree pathogen that infects hardwood trees such as Acer saccharum (sugar maple) in eastern North America (Davidson 1944). This fungus is highly pathogenic to sugar maple in plantations, where it feeds on the sugars and other carbohydrates in the wood of the trees (Bal et al. 2013). The disease caused by D. virescens is commonly referred to as sapstreak and the fungus affects the internal wood chemistry where it has been implicated in the production of volatiles that can enhance the growth of other fungi (Wargo & Harrington 1991).

The aim of this study was to assemble a draft nuclear genome sequence of D. virescens, which would ultimately allow for comparative studies with other sequenced genomes in Ceratocystidaceae. The genomes of seven other species of Ceratocystidaceae are publically available to aid with such a comparative analyses. These include Ceratocystis fimbriata, a pathogen affecting sweet potatoes (Wilken et al. 2013); the canker and wilt disease causing C. albifundus, occurring on Acacia mearnsii trees (van der Nest et al. 2014a); the mango wilt pathogen, C. manginecans (van der Nest et al. 2014b); species in the related genus Huntiella, H. moniliformis and H. omanensis, saprobic fungi usually found on freshly cut or wounded logs (van der Nest et al. 2014a), the causal agent of black scorch disease in date palms, Thielaviopsis punctulata (Wingfield et al. 2015) and the plane tree pathogen C. platani (Belbahri 2015). Two additional Ceratocystidaceae genomes, those for C. eucalypticola and Thielaviopsis musarum, are included in this issue and collectively, these will add value to the comparative analysis of the genomes across this family. Understanding the general biology of D. virescens will further assist in developing a deeper understanding of sap streak and potentially contribute to disease management strategies.

Sequenced Strain

USA: New Hampshire: isol. ex Acer saccharum, Aug. 1987, D. Houston (CMW 17339 = CBS 130772; PREM 61293 — dried culture).

Nucleotide Sequence Accession Number

The Whole Genome Shotgun project of the Davidsoniella virescens genome has been deposited at DDBL/EMBL/ GenBank under the accession no. LJZU00000000. The version described in this paper is version LJZU01000000.

Methods

Davidsoniella virescens isolate CMW 17339 was used in this study. Cultures were grown at 25 °C on 2 % malt extract agar (MEA: 20 % w/v, Biolab, Midrand, South Africa) supplemented with 100 µg/L thiamine. Total genomic DNA was isolated from 2-wk-old cultures using a phenol-chloroform method previously described (Roux et al. 2004). Sequencing was carried out on the Genomics Analyzer IIx platform (Illumina) at the Genome Centre (University of California at Davis, CA). Paired-end libraries with insert fragments of 350 and 600 bases were used to generate the read lengths of 100 bases. CLCBio Genomics workbench software v. 7.5.1 (CLCBio, Aarhus, Denmark) was used for quality assessment and de novo assembly. Poor-quality reads (limit of 0.05) and/or terminal nucleotides were discarded. The remaining reads were assembled de novo using a word size of 64 with a bubble size of 100 base pairs. Scaffolding with an estimated pair distance of 99 to 562 base pairs was performed by mapping raw reads back to the contigs. Only contigs greater than 1000 bases were retained. Predictions of open reading frames (ORFs) based on the gene model for Fusarium graminearum (https://doi.org/bioinf.uni-greifswald.de/augustus) were made using AUGUSTUS (Stanke et al. 2006). The quantitative assessment of the genome assembly completeness was assessed against the Benchmarking Universal Single-Copy Orthologs software program, BUSCO (Simão et al. 2015) using contigs greater than 1000 bases in length.

Results and Discussion

Davidsoniella virescens had an estimated nuclear genome size of 33 645 160 bases. The N50 value was determined to be 118 189 bases generating a mean GC content of 44.50 %. A total of 563 contigs were produced from the CLCBio assembly, of which 561 were retained after excluding the mitochondrial scaffolds. The AUGUSTUS gene prediction pipeline estimated 6 953 ORFs. This draft genome assembly had a BUSCO completeness score of 97 % indicating that the core eukaryotic genes were present. From this analysis, 1404 single-copy genes were observed, of which 73 were duplicated genes. Of the 1 438 genes searched, only 2.2 % were classified as fragmented or missing. A gene density of 207 ORFs per Mb was observed for the 6 953 genes predicted.

Davidsoniella virescens has the largest estimated genome size (33 Mb) of all Ceratocystidaceae genomes sequenced thus far (Table 2). The dissimilarity in the coverage, N50 values and number of contigs can be attributed to the different sequencing and assembly platforms used to generate the data (Table 2). The retention of the final number of contigs processed in gene prediction tools also differed because some researchers have chosen to retain smaller contigs (greater than 500 nucleotides). Davidsoniella virescens had a similar GC content and genome completeness to the genomes sequenced of the other species in Ceratocystidaceae (Table 2). Adding to the growing number of sequenced and assembled genomes, the D. virescens genome provides a powerful resource to aid in its phylogenetic classification in Ceratocystidaceae. Similar or shared biological features can now be identified due to the availability of these genomes.

Table 2 Summary of whole nuclear genome DNA sequence assembly statistics (Wilken et al. 2013, van der Nest et al. 2014a, b, Belbahri 2015, Wingfield et al. 2015).

Authors: K. Naidoo*, C. Trollip, P.M. Wilken,

M.J. Wingfield, and B.D. Wingfield

*Contact: Kershney.Naidoo@fabi.up.ac.za

IMA Genome-F 5D

Nuclear genome assembly for the maize pathogen Fusarium temperatum

Fusarium temperatum (formerly F. subglutinans group 1, de Vos et al. 2014) is an important mycotoxin-producing pathogen of maize (Scauflaire et al. 2011). This fungus is a member of the Fusarium fujikuroi complex which includes numerous pathogens responsible for destructive diseases of many plants (Kvas et al. 2009). Due to the economic importance of the complex, the whole genome sequences for several of its members have been determined and are publicly available. These include F. verticillioides (Fusarium Comparative Sequencing Project, Broad Institute of Harvard and MIT; https://doi.org/www.broad.mit.edu), F. circinatum (Wingfield et al. 2012), F. fujikuroi and F. mangiferae (Wiemann et al. 2013), as well as F. nygamai (Wingfield et al. 2015).

To complement these genomic resources, genetic linkage maps for some of these fungi are also available (Jurgensen et al. 2002, de Vos et al. 2007). For example, a genetic linkage map available for a hybrid cross between F. circinatum and F. temperatum (de Vos et al. 2007) has been used as a framework in the analyses of certain loci and traits in these fungi (de Vos et al. 2011, 2013). Most recently, an analysis of the genomic architecture of species in this complex, allowed the anchoring of this genetic linkage map to the genomic sequence data for F. verticillioides and F. fujikuroi (de Vos et al. 2014). The aim of this study was therefore to determine the whole genome sequence for the other parent (F. temperatum) in this hybrid cross. The availability of genome data for this fungus would allow comparisons to the other sequenced members of the F. fujikuroi complex as well as contribute to improving our knowledge of the genetic processes and properties underlying the biology of these important fungi.

Sequenced Strain

Mexico: Texcoco: isol. Zea mays ssp. mexicana seeds (teosinte), Nov. 1996, A.E. Desjardins, R.D. Plattner & T.R. Gordon (CMW 40964, CBS 138287; PREM 61039 — dried culture).

Nucleotide Sequence Accession Number

The Fusarium temperatum genomic sequence data has been deposited at DDBJ/EMBL/GenBank under the accession LJR00000000. The version described in this paper is version LJR01000000.

Methods

DNA was extracted from Fusarium temperatum grown on ½ PDA (Iturritxa et al. 2011). One mate-pair (2 840 bp average insert size) and two paired-end (average insert sizes of 213 and 476 bp) libraries were prepared and subjected to 100 bp Illumina HiSeq 2000 sequencing at Fasteris (Geneva). After removing poor quality reads using CLC Genomics Workbench v. 6.5 (CLCbio, Aarhus, Denmark), sequences were assembled using ABySS v. 1.3.7 (Simpson et al. 2009). Closing of gapped regions within the scaffolds was done using GapFiller v. 1.11 (Boetzer & Pirovano 2012). The completeness of the genome assembly was evaluated using CEGMA (Parra et al. 2008) and putative open reading frames (ORFs) were predicted using AUGUSTUS (Hoff & Stanke 2013) together with the gene models for F. graminearum and cDNA data from the closely related F. circinatum (Wingfield et al. 2012). By making use of MUMmer v. 3.22 (Kurtz et al. 2004), the F. temperatum scaffold sequences were compared to the chromosomes of two other sequenced members in the F. fujikuroi complex, F. fujikuroi (Wiemann et al. 2013) and F. verticillioides (Fusarium Comparative Sequencing Project) (De Vos et al. 2014).

Results and Discussion

Assembly of 188 294 812 good quality reads yielded a draft genome for Fusarium temperatum that consisted of 45 458 781 bp with 414x coverage. This assembly consists of 43 scaffolds with an N50 of 4 506 647 bp and an average scaffold size of 1 057 181 bp. Based on the CEGMA analysis, this draft genome is 97.38 % complete (Parra et al. 2008). The GC content is 47 %. The assembly contains 14 284 putative ORFs with an average length of 1576 bp and an average density of 314 ORFs/Mb. These genome statistics for F. temperatum are comparable to those of the other sequenced Fusarium members (Fusarium Comparative Sequencing Project, Wingfield et al. 2012, 2015, Wiemann et al. 2013), which highlights the genomic similarities amongst the members in the F. fujikuroi complex.

Sequence comparisons of the sixteen largest scaffolds (which accounts for 99.56 % of the total genome size) to the information for the F. verticillioides and F. fujikuroi genomes suggests that these scaffolds likely make up the 12 chromosomes predicted for species in the F. fujikuroi complex (Xu et al. 1995). This was further illustrated by the alignments of the F. temperatum scaffolds to the chromosome sequences for F. verticillioides and F. fujikuroi (Fig. 1). These alignments also confirmed the reciprocal translocation in F. temperatum and F. circinatum observed by de Vos et al. (2014) between chromosomes 8 and 11 (Fig. 1). The subtelomeric regions missing from chromosome 4 in F. fujikuroi (Wiemann et al. 2013) are present in F. temperatum (Fig. 1B), confirming that the shortened chromosome 4 is F. fujikuroi-specific.

Fig. 1
figure 1

Whole genome comparisons of (A) Fusarium verticillioides to F. temperatum, and (B) F. fujikuroi to F. temperatum. Dotplot alignments of F. verticillioides scaffolds (placed into chromosomes; de Vos et al. 2014) and F. fujikuroi chromosomes (Wiemann et al. 2013) against the 16 largest F. temperatum scaffolds. Forward matches are indicated by red dots, reverse matches with blue dots. The black circles show the reciprocal translocation between chromosome 8 and 11 — the single scaffold representing chromosome 11 has a portion that aligns to chromosome 8 and a portion that align to chromosome 11 of F. verticillioides and F. fujikuroi. Solid arrows are indicative of inversions in F. temperatum, while the dotted arrow indicates an inversion in F. fujikuroi, when compared to the two other Fusarium spp.

Like F. verticillioides, F. temperatum also harboured the large inversion previously reported in chromosome 11 between F. verticillioides and F. fujikuroi (Wiemann et al. 2013, de Vos et al. 2014) (Fig. 1B), although F. temperatum appears to have an additional inversion in this chromosome when compared to F. verticillioides and F. fujikuroi (Fig. 1). Sequence comparisons also revealed that chromosome 12 is present in this F. temperatum assembly, albeit 1.42 times larger than its counterpart in F. fujikuroi (Wiemann et al. 2013). Within the F. fujikuroi complex, chromosome 12 has also been shown to be dispensable as well as strain-specific (Xu et al. 1995, Jurgenson et al. 2002, Ma et al. 2010, Wiemann et al. 2013, van der Nest et al. 2014a). Collectively, chromosome 11 and 12 therefore seems to be the most variable of the chromosomes in this complex. The addition of the whole genome sequence of F. temperatum, to the other sequenced members of the F. fujikuroi complex, would assist phylogenomic studies into the evolution and biology of these important fungi.

Authors: L. De Vos*, Q.C. Santana, B.D.Wingfield,

M.A. van der Nest, M.J. Wingfield, and E.T. Steenkamp

*Contact: Lieschen.devos@up.ac.za

IMA Genome-F 5E

Draft genome sequence of Graphilbum fragrans

Graphilbum is one of six currently recognized genera in Ophiostomatales (Ascomycota, Sordariomycetes) (de Beer & Wingfield 2013). The genus includes nine named species and some undescribed taxa (de Beer & Wingfield 2013). As with most other species of Ophiostomatales, species of Graphilbum are commonly found associated with coniferous hosts. Graphilbum fragrans was first described in 1954 from Sweden (Mathiesen-Kããrik 1954), where it was initially treated in Graphium (as G. fragrans). This species was later reported from conifers or conifer-infesting beetles from various other countries including Australia, Canada, China, New Zealand, Korea, Poland, South Africa, Spain, and the USA (Harrington et al. 2001, Jacobs et al. 2003, Zhou et al. 2006, Kim et al. 2007, Romon et al. 2007, Paciura et al. 2010, Jankowiak & Bilański 2013).

The availability of whole genome sequences and recent advancements in genome analyses have contributed to a better understanding of the biology, pathogenicity and evolutionary processes in fungi. A number of genomes of species in Ophiostomatales have been sequenced and analysed, however, these include only species in Leptographium, Ophiostoma, and Sporothrix (DiGuistini et al. 2011, Haridas et al. 2013, Khoshraftar et al. 2013, Teixeira et al. 2014, van der Nest et al. 2014, Wingfield et al. 2015). The aim of this study was to generate the genome sequence for G. fragrans, the first genome available for the genus Graphilbum and thus to provide a basis for comparison between the genera of Ophiostomatales.

Sequenced Strain

South Africa: Mpumalanga: from Hylastes angustatus infesting Pinus patula, 1999, X.D. Zhou (culture CMW 19357 = CBS 138720; PREM 61294 — dried culture).

Nucleotide Sequence Accession Number

The genomic sequence of Graphilbum fragrans (CMW 19357, CBS 138720) has been deposited at DDBJ/EMBL/GenBank under the accession LLKO00000000. The version described in this paper is version LLKO01000000.

Methods

Methods for DNA extraction, genome sequencing, assembly and annotation were similar to those used for Leptographium lundbergii (Wingfield et al. 2015). Total genomic DNA was extracted following the protocol of Duong et al. (2013). Two pair-end libraries (350 bp and 530 bp average insert size) were prepared and sequenced using the Illumina HiSeq 2000 platform. Obtained reads were first subjected to quality filtering and trimming, followed by de novo assembly using CLC Genomics Workbench v. 8.0.1 (CLCBio, Aarhus, Denmark). Genome completeness was estimated using BUSCO (Simão et al. 2015). Total number of gene models was predicted using the MAKER genome annotation pipeline (Cantarel et al. 2008).

Results and Discussion

Over 26.2 million reads were obtained after filtering and trimming. De novo assembly using CLC Genomic Workbench resulted in 80 scaffolds that were over 500 bp in size. The assembly had a N50 value of 973.6 kb and the longest scaffold was 2.66 Mb. The genome of Graphilbum fragrans was estimated to be 34.26 Mb, with the mean GC content of 55.7 %. We assessed the completeness of the obtained genome by running BUSCO on the resulting assembly using the fungal reference dataset and obtained BUSCO values of C: 97 % [D: 5.8 %], F: 1.8 %, M: 0.6 %, n: 1348 (C: complete, [D: duplicated], F: fragmented, M: missed, n: genes), indicating that the obtained genome sequence should cover most of the organism’s gene space. Genome annotation using MAKER resulted in 10 633 gene models filtered based on MAKER max build (8 942 gene models if MAKER standard build was applied) (Campbell et al. 2014). Of 10 633 gene models predicted using MAKER max build, 8102 were multi-exonic genes, mean intron length was 121.1 bp and mean exon length was 552.8 bp. The genome of G. fragrans, which is the first genome reported for Graphilbum, represents a useful resource for various comparative genomic and systematic studies in Ophiostomatales.

Authors: T.A. Duong*, M.J. Wingfield, Z.W. de Beer,

and B.D. Wingfield

*Contact: Tuan.Duong@fabi.up.ac.za

IMA Genome-F 5F

Draft genome sequence of Penicillium nordicum DAOMC 185683

Penicillium nordicum is classified in the subgenus Penicillium section Fasciculata (Houbraken & Samson 2011) and is commonly isolated from cheese, nuts and other fat and protein rich substrates like salami and ham (Frisvad & Samson 2004). The importance of this fungus relates to its production of the regulated mycotoxin ochratoxin A (OTA), which is hepatoxic, nephrotoxic, teratogenic and immunotoxic in animals (Pitt et al. 2012), known to promote oxidative DNA damage by the production of reactive oxygen species and to generate DNA adducts (Hadjeba-Medjdoub et al. 2012), and is classified as a possible human renal carcinogen (group 2B) by the International Agency for Research on Cancer (IARC, Pitt et al. 2012).

OTA is also produced by P. verrucosum, the sister species to P. nordicum (Samson et al. 2004), and by several species of Aspergillus (Visagie et al. 2014a). Despite the importance of OTA in grain, coffee and grape products, its biosynthetic pathway has yet to be fully elucidated. However, there is evidence that a gene cluster including an alkaline serine protease, a polyketide synthase and a non-ribosomal peptide synthase may play a role in OTA production in P. nordicum (Karolewiez & Geisen 2005, Geisen et al. 2006). In this study, we sequenced and annotated a genome draft of P. nordicum DAOMC 185683, as part of our investigation of genes regulating OTA production in Penicillium species.

Sequenced Strain

Canada: Alberta: Brooks, isolated from Lycopersicon esculentum (tomato), collected and isolated 24 Jan. 1983, R.J. Howard GT-78. (DAOMC 185683). Originally identified as Penicillium aurantiogriseum by John D. Bissett; reidentified as P. nordicum by Keith A. Seifert in 2012.

Nucleotide Sequence and Raw Reads Accession Numbers

This Whole Genome Shotgun project was deposited at DDBJ/EMBL/GenBank under accession LHQQ00000000. The version described in this paper is version LHQQ01000000. Raw reads were deposited in NCBI SRA (https://doi.org/www.ncbi.nlm.nih.gov/sra) accession number SRR2146067.

DNA Extraction, Whole Genome Sequencing and Assembly

Penicillium nordicum DAOMC 185683 was grown on Blakeslee’s malt extract agar for 7 d at 25 °C (Visagie et al. 2014b). To make a spore suspension, the colonies were flooded with 5 mL of sterile distilled water. One mL of this spore suspension was inoculated in 100 mL of Blakeslee’s malt extract broth and was left shaking at 300 rpm at 25 °C for 6 d. To obtain fungal tissue for DNA extraction, cells were removed from the liquid culture by filtration. DNAwas extracted with the OmniPrep kit for fungi (G-Biosciences) following the manufacturer’s protocol. Whole-genome sequencing (paired-end with 101 bp) was performed on an Illumina HiSeq 2500 with TrueSeq V3 chemistry at the National Research Council Canada in Saskatoon (Saskatchewan, Canada).

The quality of genomic reads was determined with FastQC v. 0.10.1 (https://doi.org/www.bioinformatics.babraham.ac.uk/projects/fastqc/). Using fastx_trimmer (part of the FASTX-Toolkit v.0.0.13 (https://doi.org/hannonlab.cshl.edu/fastx_toolkit/)), 10 bases from the 5′ end were trimmed to yield higher quality reads of 91 bp. Adaptor sequences were removed with Trimmomatic v. 0.33 (Bolger et al. 2014). Prior to genome assembly, the optimal k parameter was calculated with KmerGenie v. 1.6950 (Chikhi & Medvedev 2014). Error correction was performed on the trimmed reads with BayesHammer (Nikolenko et al. 2013).

De novo genome assembly was performed with SPAdes v. 3.5.0 (Bankevich et al. 2012) with the option to correct mismatches and short indels enabled. Scaffolds shorter than 1000 bp were discarded. Species identification was confirmed by comparing the internal transcribed spacer (KJ834513) and beta-tubulin (KJ834476) barcode sequences of P. nordicum (Visagie et al. 2014b) against the assembled genomic scaffolds using BLASTn. Assembly statistics were generated with QUAST v. 2.3 (Gurevich et al. 2013).

The assembly was assessed by alignment of the corrected reads onto the scaffolds using Bowtie2 v. 2.0.0 (Langmead & Salzberg 2012). Alignments produced by Bowtie2 in SAM format were converted to sorted BAM format by SAMtools v. 0.1.19 (Li et al. 2009) and statistics for nucleotide coverage were generated with Qualimap v. 2.1 (Garcia-Alcalde et al. 2012). To evaluate the completeness of our genome assembly, CEGMA v. 2.5 (Parra et al. 2007) was run on the scaffolds to detect the percentage of conserved eukaryotic genes (CEG’s) and BUSCO v. 1.1b1 (https://doi.org/busco.ezlab.org/) was run on the scaffolds using the fungal profile (Dec. 19, 2014 release) to detect Universal Single-Copy Orthologs.

Genome annotation was carried out using webAugustus (Hoff & Stanke 2013) running Augustus v. 3.0.3 (Stanke et al. 2006). Predicted proteins were compared against UniProt/Swiss-Prot manually curated fungal protein data set by BLASTp v. 2.2.28+. The BLAST hits with e-values less than 1.0E-100 and similarity hits ≥ 90 % were assumed to be orthologs and were given protein names in the annotation set. Genome Annotation Generator (https://doi.org/genomeannotation.github.io/GAG/) and tbl2asn (https://doi.org/www.ncbi.nlm.nih.gov/genbank/tbl2asn2/) were used to validate annotations.

Results and Discussion

Approximately 22 million reads, comprising 2.2 Gbp of data, were assembled into 996 scaffolds resulting in an assembly of 30.8 Mb with a GC content of 47.8 %. The N50 value was 92.3 Kb and the longest scaffold was 391 Kb. The median nucleotide coverage across the whole assembly was 57x. The assembled genome had a CEGMA score of 96.8 % when calculated from the complete gene set and 98.4 % when calculated from both partial and complete gene sets. Assessment of the completeness of the genome using BUSCO groups for fungi resulted in values of C: 99 %, [D: 6.8 %], F: 0.7 %, M: 0.1 %, n: 1438 (C: complete, [D: duplicated], F: fragmented, M: missed, n: genes). Therefore, the assembled genome covered most of the organism’s gene content. After annotation and validation, the genome contained 12 959 protein-coding genes. Of all suggested gene models, 12 448 were complete (96.0 %), but 511 gene models lacked a start codon, stop codon or both (4.0%). Mean gene length was 1388 bp, mean exon length was 437 bp and mean intron length was 85 bp. One other P. nordicum genome is accessioned in NCBI (JNNR), sequenced from a strain isolated from crop fields in Karlsruhe, Germany (UASWS BFE487). A comparative analysis has not yet been published, but as with our strain, the genome size was 30.4 Mb contained in 915 scaffolds, but the genome has less than half the coverage (at 20×) and only 46 genes were annotated.

This draft genome of a North American strain of P. nordicum, the first record of this species from Canada, represents a useful resource for biogeographical and comparative genomic studies of OTA (ochratoxin A) producing species of Penicillium, Aspergillus, and other related fungi. It will facilitate future gene knockout studies aiming to uncover the full OTA biosynthetic pathway in P. nordicum.

Authors: H.D.T. Nguyen* and K.A. Seifert

*Contact: hai.nguyen.1984@gmail.com

IMA Genome-F 5G

Draft genome sequence of the banana pathogen Thielaviopsis musarum

Thielaviopsis musarum is a pathogen of banana (Mitchell 1937, Riedl 1962) that typically infects banana fruits during maturation. This is especially true under conditions of high humidity, darkness and moderately steady temperatures (Riedl 1962).

Thielaviopsis musarum was previously treated as Ceratocystis musarum, but was transferred to Thielaviopsis as part of a major revision of the family Ceratocystidaceae by de Beer et al. (2014). The fungus was first reported as a new variety of C. paradoxa causing stem-end rot of banana in Australia (Mitchell 1937). Riedl (1962) isolated a similar fungus from banana stems in Vienna although the plant material probably originated elsewhere, and described it as a new species distinct from C. paradoxa. Although some authors regarded the species from banana as distinct from C. paradoxa (de Hoog 1974, Nag Raj & Kendrick 1975), others viewed C. musarum as a synonym of C. paradoxa (Upadhyay 1981). These disputes have, however, been settled with DNA-based studies (Harrington 2009, de Beer et al. 2013) and T. musarum is now recognised as a distinct species in Thielaviopsis. The aim of this study was to sequence and assemble the whole genome of an isolate of T. musarum. This was undertaken to provide information allowing for the recognition of fungal genes that are associated with pathogenicity and other important biological properties in members of Ceratocystidaceae.

Sequenced Strain

New Zealand: on Musa sp., T.W. Canter Vischer (PREM 60962 — epitype, dried culture; CMW 1546 — ex-epitype culture).

Nucleotide Sequence Accession Numbers

This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession LKBB00000000. The version described in this paper is version LKBB00000000.

Methods

Isolate CMW1546 (CBS 139399) of Thielaviopsis musarum was grown in malt extract agar (MA). High quality DNA was isolated from harvested mycelium (Raeder & Broda 1985) and sequencing was performed using the Genomics Analyzer IIx platform (Illumina) using paired-end libraries with insert sizes of approximately 350 and 600 bases. Reads with an average length of 97 bases were quality-trimmed using the software package CLC Genomics Workbench v. 6.0.1 (CLCBio, Aarhus, Denmark). The quality-filtered reads were assembled using the Velvet de novo assembler (Zerbino & Birney 2008), with an optimized k-mer size of 77. We used SSPACE v.2.0 (Boetzer et al. 2011) to assemble contigs into scaffolds and gaps were filled using GapFiller v. 2.2.1 (Boetzer & Pirovano 2012). The completeness of the assembled genome was assessed using the Benchmarking Universal Single-Copy Orthologs (BUSCO) tool, (Software v. 1.1b1 of May 2015; Simão et al. 2015). The BUSCO analysis was performed on all contigs >1 kb, making use of the fungal lineage dataset.

AUGUSTUS (Hoff & Stanke 2013) and the gene models for Fusarium graminearum were used to identify putative open reading frames (ORFs).

Results and Discussion

The Thielaviopsis musarum draft genome had an estimated size of 28 493 324 bases, a 95x coverage, N50 contig size of 103 017 bases and a mean GC content of 49.17 %. The assembly was composed of 672 contigs, of which 541 were larger than 1 kb. Based on the BUSCO analysis, this assembly is 96 % complete. A total of 1392 single-copy BUSCO orthologs were present, of which 78 were duplicated. Out of a possible 1438 BUSCO groups searched, 11 BUSCO groups were missing or fragmented. The final assembly was predicted to encode 6 963 putative ORFs at a density of 244 ORFs/Mb.

The T. musarum genome appears to be relatively small and harbours fewer genes than other Sordariomycetes (e.g., Fusarium fujikuroi, 43.8 Mb with 14813 ORFs; Cryphonectria parasitica, 43.9Mb with 11184 ORFs) (Wiemann et al. 2013; https://doi.org/genome.jgi.doe.gov/Crypa2/Crypa2.home.html). The genome size of T. musarum was, however, in the same range of some species of Ceratocystidaceae such as Ceratocystis manginecans (of 31.7 Mb with 7494 ORFs), C. albifundus albifundus (27.1 Mb with 6967 ORFs), C. fimbriata (29.4 Mb with 7266 ORFs), Huntiella omanensis (31.5 Mb with 8395 ORFs; and H. moniliformis (25.5 Mb with 6832 ORFs) (Wilken et al. 2013, van der Nest et al. 2014a, b).

The Thielaviopsis musarum genome was only marginally larger than that of T. puntulata (accession number: LAEV00000000) with its 28.1 Mb genome. However, T. puntulata was reported to encode 5480 ORFs (Wingfield et al. 2015) as opposed to the 6963 of T. musarum, suggesting a higher ORF density for the latter (i.e., 244 ORFs/Mb for T. musarum vs. 195 ORFs/Mb for T. puntulata). Future research should, therefore, consider whether these differences in genome size and ORF density could be ascribed to differences in the methodologies used to sequence and annotate the respective genomes. Overall, these genomes will provide interesting perspectives regarding the development and evolution of important biological traits in these fungi.

Authors: M.A Sayari*, C. Trollip, K. Naidoo,

B.D. Wingfield, and M.J. Wingfield

*Contact: Mohammad.Sayari@fabi.up.ac.za