Introduction

Auricularia heimuer F.Wu, B.K.Cui & Y.C.Dai, an ear-like shaped macro-fungus, is one of the most widely cultivated edible mushrooms in China due to its nutritional and medicinal properties (Fan et al. 2014). It has potential anti-tumor, anti-inflammatory, anticoagulant and hypoglycemic activities (Misaki et al. 1981; Ukai et al. 1983; Yuan et al. 1998; Yoon et al. 2003). Furthermore, its extracts have been reported to possess antioxidant and nitric oxide synthase activation properties (Acharya et al. 2004).

A.heimuer is a white rot fungus which absorbs its nutrients by degrading lignin, cellulose and hemicellulose. In recent years, with the increasing demand of A.heimuer both at home and abroad, large numbers of trees are utilized to cultivate this fungus. However, the cultivation of A.heimuer has not been reported at a molecular level due to insufficient genomic sequence resources.

Over the past several years, next-generation sequencing (NGS) techniques (Roche 454, Illumina Solexa GA and ABI SOLID) have dramatically improved the efficiency of gene discovery and have provided a platform for advanced research in many areas, including genome sequencing, ChIP-seq, methyl-seq and de novo transcriptome sequencing (MacLean et al. 2009; Metzker 2010). Recently, NGS technology has been widely used with medicinal and edible mushrooms, such as Lentinula edode (Berk.) Pegler, Ganoderma lucidum (Curtis) P. Karst, Agrocybe aegerita (V. Brig.) Singer, Cordyceps militaries (BerK.)Sacc. (Yin et al. 2012; Yu et al. 2012; Wang et al. 2013; Zhong et al. 2013). In spite of its important properties, RNA-Seq has not been applied to A. heimuer research.

The aims of this study were to provide an invaluable resource for functional genomic studies in A. heimuer using RNA-Seq. The genome of A. heimuer has just become available (Yuan et al. 2019) and hence, our results, together with the genome should facilitate future developmental studies of this important fungus.

Materials and methods

A. heimuer DL202 strain

The A. heimuer DL202 strain was isolated from the wild strain LSA007 which had been collected from Quercus mongolica Fisch. ex Ledeb. by tissue isolation in August 2008.

Sample preparation and RNA isolation

The DL202 strain was cultured on a potato dextrose agar (PDA) medium containing 1.5% agar and 2.0% glucose at 25 °C in the dark. After seven days, the mycelium was transferred to a liquid potato dextrose (PD) medium containing 2.0% glucose and cultivated at 25 °C and at 180 rpm using a shaker. After 10 days, the mycelium pellets were collected and immediately frozen in liquid nitrogen and stored at − 80 °C until RNA isolation.

Total RNA from the A. heimuer mycelium was extracted using TRIzol reagent (Invitrogen Life Technologies, USA) following the manufacturers’ protocol, and then treated with TaKaRa RNase-free DNase I for 30 min. The total RNA quantity and quality were checked with a NanoDrop 1000 spectrophotometer.

cDNA library construction and transcriptome sequencing

A total of 20 μg RNA was isolated from the mycelium for cDNA library construction. Beads with oligo (dT) were used to isolate poly(A) mRNA. Fragmentation buffer was added for interrupting mRNA into short fragments (200–700 nt). These short sequences were used as templates to synthesize the first-strand cDNA using random hexamer primers and reverse transcriptase. The second-strand cDNA was synthesized by adding RNase H, DNA polymerase I, buffer and dNTPs. The products were purified with a QIAquick PCR extraction kit and resolved with EB buffer for end reparation poly(A) addition. Subsequently, the cDNA fragments were connected with sequencing adapters. Following agarose gel electrophoresis and extraction of cDNA from gels, suitable fragments were selected as templates for PCR amplification. Finally, the constructed cDNA library was sequenced using Illumina HiSeq™ 2000 at Beijing Genome Institute.

Sequence assembly

Before assembly and mapping, low-quality sequences and empty reads were removed to obtain high-quality clean reads. De novo assembly of the clean reads was carried out using the SOAPdenovo program (Li et al. 2010). The contigs were generated by combining the overlap from the reads, and the reads mapped back to contigs. With paired-end reads, the SOAPdenovo program detected contigs from the same transcript, and scaffolds were made by connecting contigs using ‘N’ to represent unknown sequences. Subsequently, unigenes were obtained by filling the gap of scaffolds using paired-end reads again. The unigenes had the least number of ‘N’s. Finally, the BLASTx algorithm (E-value < 0.00001) was used to search for homologous sequences against protein databases such as COG, Swiss-Prot, Nr and KEGG. The best aligning results were used to predict unigenes direction. When a unigene happened to be unaligned to all the above databases, ESTScan software was used to decide its sequence direction as well as to predict its coding regions (Iseli et al. 1999).

Gene annotation and analysis

The annotation of unigenes was performed by searching for homologous sequences against the NCBI Nr database with BLASTx software (E-value < 0.00001). Based on the results, GO annotation of unigenes of A. heimuer was obtained using the Blast2GO program (Conesa et al. 2005). GO functional classification for unigenes of A. heimuer was analyzed by WEGO software (Ye et al. 2006). Pfam domain annotation was predicted by searching against the Pfam database (Finn et al. 2014) using protein sequences. The metabolic pathways, such as cellular processes, genetic information processing, organismal systems, were analyzed by KEGG database (Ogata et al. 1999). Unigenes were submitted to the KEGG Automatic Annotation Server (KAAS) (Moriya et al. 2007) to obtain the metabolic pathway annotation.

Analysis of CAZymes and FOLymes

Putative carbohydrate-active enzymes and lignin oxidative enzymes were predicted by BLASTx software (E-value < 10–20) using known protein sequences as queries.

Results

RNA sequencing and mapping

After cleaning and quality checks, more than 51.7 million clean reads were obtained in a single sequencing run. The Q20 value was 90.9% more than 80.0% and G + C content was 58.9%. By de novo assembly, 26,857 unigenes with an average length of 985 bp and N50 length of 1333 bp were obtained (Table 1). Among them, 9158 unigenes (34.1%) were > 1000 bp. In addition, 20 unigenes were randomly selected from A.heimuer transcriptome for RT-PCR amplification to test the accuracy of the sequencing results. All the PCR products were sequenced and compared with the original sequence of 20 unigenes using BLASTn (Table 2). The results show that the average accuracy rate of illumina sequencing technology was 99.2%.

Table 1 Overview of the sequencing and assembly
Table 2 The identity between the original sequence of unigenes in the A.heimuer transcriptome and PCR products

Functional annotation

The Nr, Swiss-Prot, KEGG and COG databases were used for functional annotation of assembled unigenes. The number of unigenes with significant similarity to sequences in the above four databases were 17,546 (65.3%), 13,265 (49.4%), 11,343 (42.2%), and 8639 (32.2%), respectively.

Gene ontology (GO) was used to classify the functions of DL202 unigenes. There are three main categories in GO classification: biological processes, cellular components and molecular functions. In the biological processes, ‘‘metabolic process’’ was the largest subcategory (13.6%) followed by ‘‘cellular processes’’ (11.4%). Under the category of cellular component, ‘‘cell’’ (10.3%) and ‘‘cell part’’ (8.1%) represented the majority of the category. In terms of molecular function, ‘‘catalytic activity’’ (20.9%) and ‘‘binding’’ (12.3%) were highly represented (Fig. 1).

Fig. 1
figure 1

Histogram of gene ontology classification. (The results are summarized in three categories: biological processes, cellular components; molecular functions)

COG (clusters of orthologous groups) classifications were used to further evaluate the completeness and effectiveness of A.heimuer transcriptome. In total, 20,096 unigenes had a COG classification. Among the 25 COG categories, ‘‘general function prediction only’’ represented the largest group (14.8%) followed by ‘‘carbohydrate transport and metabolism’’ (8.5%), and the smallest group was ‘‘nuclear structure’’ (6 members) (Fig. 2).

Fig. 2
figure 2

Histogram of clusters of orthologous group (COG) classifications; (20,096 unigenes have a COG classification among the 25 categories)

After searching the Pfam database, it was discovered that 10,760 unigenes contained more than one Pfam protein domain and were categorized into 3585 Pfam domains. Several unigenes were involved in metabolism, transcription and protein synthesis (Supplementary Table S1).

To identify the biological pathways active in A.heimuer DL202, 11,343 unigenes were mapped into 160 KEGG metabolic pathways. The pathways with most representation were “metabolic pathways” (37.1%) and “biosynthesis of secondary metabolites” (14.8%), followed by “starch and sucrose metabolism” (11.9%) and “purine metabolism” (7.9%) which are involved in the regulation of A.heimuer basic metabolic processes (Supplementary Table S2).

Putative wood-degrading genes in Auricularia heimuer

The components of plant cell walls are cellulose, hemicellulose and lignin. A.heimuer is an edible mushroom as well as a white rot fungus that degrades these components because it contains specific enzymes (Leonowicz et al. 1999). Lignin-degrading enzymes consist lignin oxidases (LOs) and lignin-degrading auxiliary enzymes (LDAs) (Levasseur et al. 2008). To obtain the related coding, lignin-degrading enzyme genes from A. heimuer, BLASTx (E-value < 10–20), was used in our study with known protein sequences as queries. The results showed 16 and 22 putative genes in the LO and LDA families, respectively (Table 3). Among the putative 16 genes involved in the LO enzyme, seven coded laccase (LO1), seven peroxidase (LO2), and two coded cellobiose dehydrogenase (LO3). In addition, among the 22 potential genes involved in the LDA family, one gene coded glyoxal oxidase (LDA3), nine coded pyranose oxidase (LDA4), seven coded glucose oxidase (LDA6), and five genes coded alcohol oxidase (LDA8).

Table 3 Putative wood-degrading genes in the A. heimuer transcriptome

Glycoside hydrolases (GH), glycosyl transferases (GT), carbohydrate esterases (CE) and polysaccharide lyases (PL) are four main members of the CAZymes or carbohydrate-active enzyme family (Cantarel et al. 2009). In our results, a total of 251 unigenes from A. heimuer transcriptome belonged to CAZymes. Among them, the GH superfamily contained 206 unigenes, GT five unigenes, CE 31, and PL had nine. Moreover, the largest number of GH superfamily was GH16 (24 members), followed by GH13 (21 members) and GH5 (16 members) (Table 3).

Discussion

Sanger et al. (1977) developed DNA sequencing technology in 1977. After several years of improvement, the next-generation sequencing (NGS) technologies, such as Illumina/Solexa, Roche 454 and ABI SOLiD System sequencing platforms with reduced cost and manpower were released (Liu et al. 2012). Among these technologies, the Illumina HiSeq 2000 has become widely available in different organisms for its vast output and low cost (Yin et al. 2012; Yu et al. 2012; Li et al. 2013; Wang et al. 2013, 2014; Zhong et al. 2013; Yang et al. 2014).

Auricularia heimuer, a traditional edible fungus in China, has very high nutritional value and various pharmacological functions (Zhang et al. 2011; Zeng et al. 2012). In recent years, many countries have energetically advocated environmental protection for this species. Trees have protected from removal and this has resulted in restriction of fungus production. In this situation, molecular biology methods might be applied to resolve this problem. However, little functional genomic information about this valuable fungus is known. To get a better understanding of A. heimuer at the molecular level, insight into transcriptome is required.

In this study, the transcriptome of A. heimuer was performed by RNA-Seq. A total of 26,857 unigenes were obtained by de no assembly. Moreover, 38 and 251 putative unigenes coding FOLymes and CAZymes, respectively, were screened from the transcriptome by the sequence comparison method. Furthermore, candidate genes coding FOLymes and CAZymes were also found in other wood rot fungi such as Ganoderma lucidum (Curtis) P. Karst and Inonotus baumii T. Wagner & M. Fisch. (Yu et al. 2012; Zou et al. 2016). Surprisingly, the number of genes related to lignin and cellulose degrading enzymes from these two fungi is less than A. heimuer and may mean that the ability to degrade wood is stronger in A. heimuer. Among the lignin oxidases (LOs), laccase (EC1.10.3.2) has gained attention due to its contribution in lignin degradation. In wood rot fungi, it has been considered the fourth contributor for lignin degradation (Hatakka and Hamm 2011). However, this has been controversial since Phanerochaete chrysosporium,Burdsall, a white rot fungus, was found to lack the typical laccase sequence (Martinez et al. 2004). Yet there is increasing evidence that supports laccase involvement in lignin degradation (Munk et al. 2015). In our results, seven laccase genes were detected in the A. heimuer transcriptome. Further studies of these genes are required. Additionally, glycoside hydrolases (GH) play an important role in the degradation of cellulose. Cellulases are classified into 14 glycoside hydrolase families: GH 5, 6, 7, 8, 9, 10, 12, 26, 44, 45, 48, 51, 61 and 74 (Henrissat 1991). In our work, GH 5, 7, 12 and 45 were found in A. heimuer. Among these GH superfamilies, GH 5 had the largest number of unigenes (16). Candidate genes, possibly related to cellulose degradation, were identified with homology cloning, and their functions are being analyzed (data not shown).

In summary, de novo characterization of A. heimuer transcriptome provided insight into the genetic background of this valuable fungus. In addition, the feasibility of illumina sequencing technology has been demonstrated by RT-PCR amplification. The putative genes obtained and involved in wood degrading in A. heimuer will be further identified.