Introduction

Long noncoding RNAs (lncRNAs) are RNA molecules that are longer than 200 nucleotides and carry many signatures of mRNAs, such as 5′ capping, 3′ polyadenylation, and RNA splicing, but have little or no open reading frame (Bhartiya et al. 2012; Liao et al. 2011; Carninci et al. 2005). They have emerged as a new class of regulatory transcripts in recent years (Perkel 2013; Khaitovich et al. 2006). Recent advances in sequencing technologies have opened a new horizon for the identification and annotation of this class of RNAs in many species. The lncRNAs that are transcribed from intergenic regions of genomes are termed large intergenic noncoding RNAs (lincRNAs). As lincRNAs do not overlap with protein-coding regions, it makes computational analysis of such RNAs easier. To date, at least 15,512 human lincRNAs and over 10,000 mouse lincRNAs have been identified (Derrien et al. 2012; Luo et al. 2013).

Recent studies have supported the view that lincRNAs play important roles in many biological processes, such as procession of p53 response pathways (Huarte et al. 2010; Loewer et al. 2010; Hung et al. 2011), regulation of epigenetic marks and gene expression (Rinn et al. 2007; Zhao et al. 2008; Khalil et al. 2009; Pandey et al. 2008), maintenance of pluripotency (Guttman et al. 2009), and activation of gene expression as “enhancer RNAs” (Orom et al. 2010; Wang et al. 2011). In addition, lincRNAs have also been associated with human diseases and pathophysiological conditions (Gupta et al. 2010; Zhu et al. 2011; Cabianca et al. 2012).

Rainbow trout (Oncorhynchus mykiss) is a species of salmonid native to cold-water tributaries of the Pacific Ocean in Asia and North America. It is one of the most important cold-water fish species in the USA due to its importance for food production, sport fisheries, and as a research model (Thorgaard et al. 2002). To generate genomic resources for genetic studies of this species, we have characterized the rainbow trout mRNA and microRNA transcriptomes (Ma et al. 2012; Salem et al. 2010a, b, 2015). In particular, a complete transcriptome has been generated by RNA sequencing of cDNA libraries from multiple tissues of a single-doubled haploid rainbow trout (Salem et al. 2015). With the increasing evidence supporting important roles of lincRNAs in diverse processes, a systematic catalog of these RNA transcripts and their expression across tissues in rainbow trout is warranted. The recent publication of rainbow trout genome sequence (Berthelot et al. 2014) and computational methods for transcriptome reconstruction (Guttman et al. 2010; Trapnell et al. 2009; Garber et al. 2011) provide an opportunity to comprehensively annotate and characterize lincRNA transcripts in rainbow trout.

Here we report the systematic identification and characterization of lincRNAs in 15 major tissue types of rainbow trout. We analyzed the known genomic features of the identified lincRNAs including transcript length, exon number, and spatiotemporal expression specificity. We also used weighted gene co-expression network to assign functionalities to the lincRNAs, which revealed that lincRNAs are expressed in a strong tissue-specific manner, and many of them are highly associated with biological processes specific to that tissue (e.g., a brain-specific group is enriched with functional terms such as neural development and axon injury response). This study is the first report of a genome-wide annotation of rainbow trout lincRNAs, which will facilitate future experimental and computational studies to uncover the functions of lincRNAs in rainbow trout.

Materials and Methods

Tissue Sample Collection and RNA Sequencing

Tissue collection and RNA sequencing were described in detail in a previous study (Salem et al. 2015). In brief, 13 different tissues were collected from a single male homozygous rainbow trout, which was euthanized under protocol no. 02456 approved by the Washington State University Institutional Animal Care and Use Committee. These tissues include the brain, fat, gill, head kidney, intestine, kidney, liver, testis, red muscle, skin, spleen, stomach, and white muscle. In addition, oocyte and pineal samples were collected from different fish. Total RNA from each sample was isolated using Trizol (Invitrogen, Carlsbad, CA). Library construction and sequencing were performed at Roy J. Carver Biotechnology Center, University of Illinois at Urbana-Champaign. Each library was loaded onto one lane and paired-end sequencing with 2 × 100 cycles was performed on an Illumina Genome Analyzer IIx (Illumina, San Diego, CA).

RNA-Seq Reads Mapping and Transcriptome Assembly

Spliced read aligner TopHat version V2.0 (Trapnell et al. 2009) was used to map all sequence reads to the rainbow trout genome (Berthelot et al. 2014). A two-step mapping process was performed by TopHat using the following parameters: min-anchor = 5, min-isoform-fraction = 0, and default values for the remaining parameters. Bowtie2 (Langmead and Salzberg 2012) was used first to align reads with no gaps that can directly map to the genome reference sequence. Gapped alignment was then performed to align the reads that were not aligned in the first step. The aligned reads from each sample were assembled into transcriptome by Cufflinks version V2.2.1 (Trapnell et al. 2010) that uses spliced reads information to determine exons connectivity. The Cufflinks assembler generates the output in the form of fragments per kilo base of exons per million fragments generated (FPKM) value, which is directly proportional to the relative abundance of a transcript in a given sample.

FPKM Threshold for Classifying Complete and Partial Transcripts

Individual transcript assembly may have noise from multiple sources such as artifacts generated by sequence alignment, unspliced intronic pre-mRNA, or genomic DNA contamination. Sebnif (Sun et al. 2014), an integrative bioinformatics pipeline that identifies high-quality single- and multi-exonic lincRNAs by optimizing a FPKM threshold, was used to minimize the assembly noise and enhance the quality of identified lincRNAs. Considering the difference of the structure between the multi- and single-exonic transcripts, two separate algorithms were used to identify the optimal FPKM thresholds. (1) For multi-exonic transcripts, a fully reconstruction fraction estimation (FRFE) approach was used by Sebnif (Guttman et al. 2010). Briefly, multi-exonic transcripts in reference annotation were first divided into N expression quantiles based on their FPKM values. At each expression quantile, the reference transcript set was then divided into two categories, fully reconstructed transcripts and partially reconstructed transcripts. The assembly quality was evaluated by the proportion of the fully reconstructed transcripts, which is also called fully reconstruction fraction (FRF), at each expression quantile. The index of the optimum FPKM threshold was obtained by balancing the sensitivity and specificity based on the FRF value with the following formula (Sun et al. 2012):

$$ i*=\underset{i\in I}{ \arg\;\min}\left\{\sqrt{{\left(1-\mathrm{sensitivities}\left[i\right]\right)}^2+{\left(1-\mathrm{specificities}\left[i\right]\right)}^2}\right\} $$

where i* is the index of FPKM threshold for each quantile i. The sensitivity [i] and specificity [i] indicate the ith sensitivities and specificities, respectively. The i belongs to [1, N]. The optimum FPKM threshold was generated by pROC (Robin et al. 2011). (2) For single-exonic transcripts, single-exonic transcript Gaussian/gamma estimation (STGE) was implemented to estimate the optimal expression threshold (Sun et al. 2014). In the STGE algorithm, the appropriate model was determined by fitting the expression values of the single-exonic transcripts in the reference annotation. Any transcript whose expression falls into either tail of fitting model distribution was considered unreliable and discarded.

LincRNA Detection Pipeline

A step-wise filtering pipeline (Fig. 1) was used to identify putative lincRNAs from deep sequencing data. (1) All transcripts smaller than 200 bases were excluded. (2) Assembled transcripts were annotated using Cuffcompare from Cufflinks (Trapnell et al. 2010). Transcripts that are located in the intergenic region, at least 1 kb from any known protein-coding genes, were selected as putative lincRNAs (Luo et al. 2013). (3) The coding potential of each transcript was calculated using Coding-Potential Assessment Tool (CPAT) (Wang et al. 2013a) and Coding Potential Calculator (CPC) (Kong et al. 2007). (4) To evaluate which of the remaining transcripts contains a known protein-coding domain, HMMER-3 (Finn et al. 2011) was used to identify transcripts translated in all six possible frames having homologs with any of the 31,912 known protein family domains in the Pfam database (release 24; both PfamA and PfamB). All transcripts with a Pfam hit were excluded. (5) Putative protein-coding RNAs were filtered out by applying a maximal open reading frame (ORF) length threshold. Any transcripts with a maximal ORF > 100 amino acids was excluded. (6) Sequence homology search was performed to remove those transcripts with significant similarity with RNAs in several different public RNA databases including Rfam (Gardner et al. 2009), RNAdb (Pang et al. 2007), and lncRNAdb (Amaral et al. 2011). (7) The remaining transcripts that are at least 1 kb from any known protein-coding genes were selected (Luo et al. 2013).

Fig. 1
figure 1

Pipeline used to identify rainbow trout lincRNAs. a Raw RNA-Seq data was pre-processed and mapped using TopHat and assembled using Cufflinks in ab initio mode. b Sebnif was used to filter all lowly expressed unreliable transcripts. c Pipeline for lincRNA detection

Tissue Specificity Score and Neighboring Gene Correlation Analysis

To evaluate tissue specificity of a transcript, an entropy-based metric that relies on Jensen-Shannon (JS) divergence was used to calculate specificity scores (0 to 1). A perfect tissue-specific pattern is scored as JS = 1, which means a transcript is expressed only in one tissue (Cabili et al. 2011). In neighboring gene analysis, two genes were defined as neighbors if the minimal distance between them is <10 kb (regardless of their directions) (Zhang et al. 2014; Luo et al. 2013). The expression correlation between two neighbors was estimated by calculating the Pearson correlation coefficient between their density-normalized expression values (log2 FPKM + 1).

Weighted Gene Co-expression Network Construction and Gene Module Detection

All genes with expression variance ranked in the top 75 percentile of the data set were retained (Liao et al. 2011). R package “WGCNA” was then used to construct the weighted gene co-expression network (Langfelder and Horvath 2008). A matrix of signed Pearson correlation between all gene pairs was computed, and the transformed matrix (TOM) was used as input for linkage hierarchical clustering (Langfelder and Horvath 2008). Genes with similar expression patterns were clustered together.

Functional Enrichment Analysis

To investigate the potential roles of lincRNAs in rainbow trout, we performed Blast2GO (Conesa and Gotz 2008) analysis to assign gene ontology (GO) terms to all protein-coding genes associated with lincRNAs in each network module. A cutoff value of 1E−10 was used for the BLASTx search. GO term enrichment analysis was performed using Fisher’s exact test (p value <0.01). The interaction networks among lincRNA and protein-coding genes were constructed based on co-expression using Cytoscape (http://www.cytoscape.org/).

Validation of Expression Specificity of lincRNAs

Expression specificity of selected lincRNAs was validated by reverse transcription polymerase chain reaction (RT-PCR) analysis as described previously (Wang et al. 2013b). PCR primers are listed in Supplemental file 1. Tissue samples used in the analysis include the brain, fat, gill, head kidney, intestine, kidney, liver, testis, red muscle, skin, spleen, stomach, white muscle, oocyte, and pineal. 18S rRNA was used as a control for RNA quality.

Results and Discussion

Transcriptome Reconstruction and Filtering Low-Quality Assemblies

To comprehensively identify rainbow trout lincRNAs, we collected and deeply sequenced the RNA samples from the brain, fat, gill, head kidney, intestine, kidney, liver, testis, red muscle, skin, spleen, stomach, white muscle, oocyte, and pineal. A total of 1.3 billion raw paired-end sequence reads (100-bp read length) were generated from these samples. The number of reads from each tissue ranged from 78.8 to 93.5 million. A total of 1,087,497,866 cleaned reads (81.4 %) were harvested for further analysis. These sequence reads were mapped to the rainbow trout genome using TopHat (Trapnell et al. 2009), and approximately 447 million (82 %) mapped reads were recovered. The mapping ratio ranged from 76.9 to 89.5 % with an average of 82.3 % (Table 1). We then used the ab initio assemble software Cufflinks (Trapnell et al. 2010) to reconstruct the transcriptome for each tissue based on the read-mapping results (Fig. 1a). On average, 79,021 transcripts for each tissue were obtained.

Table 1 Summary of samples and RNA-Seq data

The first challenge to annotate lincRNA gene loci is to distinguish lowly expressed lincRNAs from the tens of thousands of lowly expressed unreliable fragments assembled from RNA-Seq (Guttman et al. 2010). To address this challenge, we removed unreliable lowly expressed transcripts using a learned FPKM threshold, which was calculated using Sebnif (Sun et al. 2014) (Fig. 1b). First, we classified all transcripts that did not overlap the genomic region of known protein-coding genes as novel intergenic transcripts (category of “u” assigned by Cuffcompare) and defined an average of 28,012 u transcripts for each tissue (Fig. 1b; Supplemental file 2), among which 6975 and 21,037 are multi- and single-exonic transcripts, respectively. Next, FRFE and STGE algorithms were used to distinguish partial transcripts from full length transcripts. For 6975 multi-exonic transcripts, Sebnif applied a FRFE threshold of 0.5. For 21,037 single-exonic transcripts, STGE was used to model the transcript expression profiles with the lower and upper probability cutoffs set at 0.05 and 0.95, respectively. Following this filtering, an average of 4628 multi-exonic (FPKM >2.76) and 4071 single-exonic (FPKM >3.14) transcripts for each tissue were retained. Finally, a total of 39,745 intergenic transcripts were obtained by merging all intergenic transcripts from 15 tissues.

Identification and Characterization of Rainbow Trout lincRNAs

The currently available coding potential prediction methods only work well for protein-coding RNAs. Therefore, the most widely used strategy to annotate potential noncoding RNAs (ncRNA) is to exclude those that possess protein-coding features (Solda et al. 2009). The filtering pipeline we used to identify novel lincRNAs is shown in Fig. 1c. First, we analyzed the coding potential of unannotated transcripts using CPAT (Wang et al. 2013a) and CPC (Kong et al. 2007), which filtered out 61 % (24,329) of all transcripts. Second, we scanned each transcript in all six frames to exclude transcripts that contain any of the 31,912 protein-coding domains cataloged in the protein family database Pfam (Finn et al. 2008). This filtering retained 10,773 potential lincRNA transcripts. Furthermore, a minimal ORF length criterion was applied to distinguish lincRNAs from mRNAs. A cutoff of 300 nt (100 codons) was used to exclude putative mRNAs (Okazaki et al. 2002). For the characterization of ncRNAs not yet annotated in the rainbow trout genome assembly, sequence homology search was performed to exclude those transcripts with significant similarity with RNAs in Rfam (Gardner et al. 2009), RNAdb (Pang et al. 2007), and lncRNAdb (Amaral et al. 2011). Finally, we identified 9674 lincRNAs after removing those transcripts that are located within 1 kb from any known protein-coding genes (Supplemental file 3).

Previous studies in mammals have shown that lncRNAs are shorter, less conserved, and expressed at significantly lower level compared with protein-coding genes (Guttman et al. 2010; Cabili et al. 2011). To determine whether rainbow trout lincRNAs have similar features, we characterized the basic features of the identified lincRNAs by comparing them with protein-coding genes. We found that rainbow trout lincRNAs are on average about 1 s of the length of protein-coding genes (mean length of 705 nt for lincRNAs vs. 1635 nt for protein-coding transcripts) (Fig. 2a). Moreover, lincRNAs had fewer exons (on average, 1.3 exons for lincRNAs vs. 6.9 exons for protein-coding genes) (Fig. 2b). Notably, the mean length and average exon number of rainbow trout lincRNAs are shorter/smaller than those of human (∼1000 nt and 2.9 exons) (Cabili et al. 2011) and zebrafish (∼1000 nt and 2.8 exons). This could be due to the underestimation of the length and exon number of rainbow trout lincRNAs resulting from their lower abundance and lower sequencing depth (incomplete assembly). Furthermore, the expression levels of lincRNAs are on average about tenfold lower than those of protein-coding genes across 15 tissues (Fig. 3), which is consistent with the findings in human, mouse, and zebrafish (Cabili et al. 2011; Pauli et al. 2012; Guttman et al. 2010). Thus, the predicted rainbow trout lincRNAs share similar genomic features with lincRNAs from other species, suggesting that they are bona fide rainbow trout lincRNAs.

Fig. 2
figure 2

Structural characteristics of lincRNAs in comparison to protein-coding genes. a Cumulative distribution of transcript length for lincRNAs (red line) and protein-coding genes (blue line). Protein-coding genes larger than 8 kb were removed in the analysis. b Distribution of exon number for lincRNAs (red bars) and protein-coding genes (blue bars). Protein-coding genes with more than 20 exons were not included in the analysis

Fig. 3
figure 3

Comparison of expression levels of lincRNAs and protein-coding genes. Maximal expression abundance (log2-normalized FPKM counts estimated by Cufflinks) of each lincRNA (red solid line) and protein-coding gene (green broken line)

Analysis of Tissue-Specific Expression of Rainbow Trout lincRNAs

Recent studies have shown that lincRNAs are expressed in a more tissue-specific manner than protein-coding genes. We analyzed the expression pattern for each of the lincRNA transcripts. Of the 9674 potential lincRNAs, 8545 were expressed in more than one tissue (Fig. 4a, b; Supplemental file 4). The remaining 1129 lincRNAs displayed tissue-specific expression (Fig. 4d). Among the 15 tissues, the brain expressed the most number of tissue-specific lincRNAs (161), which is consistent with the result from a previous study in zebrafish (Kaushik et al. 2013). The skin, white muscle, and liver had relatively lower numbers of tissue-specific lincRNAs (Fig. 4c). The tissue specificity score for each lincRNA was calculated using an entropy-based metric that relies on Jensen-Shannon (JS) divergence (Cabili et al. 2011). Results showed that 46 % of rainbow trout lincRNAs were tissue-specific, relative to only 18 % of protein-coding genes (p < 10−16, Fisher exact test) (Fig. 5). Thus, rainbow trout lincRNAs exhibited more tissue specificity than protein-coding genes, which is in agreement with data from other species (Guttman et al. 2010; Cabili et al. 2011; Pauli et al. 2012).

Fig. 4
figure 4

Tissue-wise distribution of predicted lincRNAs. a Distribution of 9674 potential lincRNAs across 15 tissues. b Venn diagram representing 7783 lincRNAs in the gill (blue), intestine (yellow), kidney (orange), spleen (green), and stomach (pink). c Distribution of tissue-specific lincRNAs across 15 tissues. d Heatmap of 1129 tissue-specific lincRNAs across 15 tissues. Each column represents the expression levels of 1129 lincRNAs in the parent tissue vs. other tissues based on FPKM values

Fig. 5
figure 5

Tissue specificity of lincRNAs and protein-coding genes. Distribution of maximal tissue specificity scores calculated for each lincRNA (red solid line) or protein-coding transcript (green broken line) across all tissues

Tissue-specific expression of lincRNAs determined by computational analysis was validated by RT-PCR analysis. A total of 10 lincRNAs were selected for validation of their expression in 15 tissues. They include seven linRNAs specifically expressed in a particular tissue (Linc-OM9284 in the brain, Linc-OM8822 in the red muscle, Linc-OM8901 in the intestine, Linc-OM3900 in the stomach, Linc-OM8614 in the testis, Linc-OM8334 in fat, Linc-OM8318 in the kidney), two lincRNAs expressed in two tissues (Linc-OM8912 in oocyte and the skin, Linc-OM9283 in the skin and the liver), and one lincRNA ubiquitously expressed in all tissues (Linc-OM9274). As shown in Fig. 6, the RT-PCR result matches perfectly with the expression profiles estimated from deep sequencing data.

Fig. 6
figure 6

Validation of expression specificity of lincRNAs by RT-PCR analysis. Expression of ten selected lincRNAs was analyzed by RT-PCR in rainbow trout tissues including the brain (Br), oocyte (Oo), white muscle (Wm), pineal (Pi), fat (Fa), gill (Gi), skin (Sk), head kidney (Hk), testis (Te), spleen (Sp), stomach (St), liver (Li), red muscle (Rm), intestine (In), and kidney (Ki). 18S rRNA was used as a control for RNA quality

Co-expression of lincRNAs with Neighboring Coding Genes

The occurrence of pairs of neighboring lincRNA, protein-coding genes within expression clusters, suggests that such organization may be important for the regulatory function of lincRNAs (Cabili et al. 2011). Recent studies indicated that some lincRNAs may act in cis and regulate the expression of genes in their chromosomal neighborhood (Orom et al. 2010; Ponjavic et al. 2007; Luo et al. 2013; Cabili et al. 2011; Zhang et al. 2014). One expectation of the cis hypothesis is that the expression of lincRNAs and their neighboring genes would be correlated across all tissue samples. Therefore, we analyzed the expression patterns of 1146 (12 %) of identified lincRNAs that are located within 10 kb from a coding gene. We observed a more correlated expression pattern of lincRNAs and their neighboring coding genes (mean correlation: 0.211) compared to random coding gene pairs (mean correlation 0.042) [p < 2 × 10−16, Kolmogorov-Smirnov (KS) test] (Fig. 7). Meanwhile, lincRNAs: coding gene pairs also exhibited a modestly higher correlative expression pattern than coding gene pairs (mean correlation: 0.115) (p < 2.2 × 10−16, KS test). On the other hand, there is also a significant difference between neighboring coding gene pairs and random coding gene pairs (p < 7.9 × 10−13, KS test). This observation suggests that the correlation between lincRNAs and their neighbor coding genes are higher than both neighboring coding gene pairs and random coding gene pairs.

Fig. 7
figure 7

Correlation of expression patterns between pairs of neighboring genes. Shown are distributions of Pearson correlation coefficients in expression levels across the tissues between 1146 pairs of lincRNAs and their neighboring coding genes (green solid line), 9363 pairs of coding gene neighbors (blue broken line), and 8000 random pairs of protein-coding gene (red dotted line)

Functional Prediction of lincRNAs Based on Co-expression Network

The comprehensive lincRNA catalog allows us to investigate the potential functions of these novel transcripts in rainbow trout. Here, we built a co-expression network to associate lincRNAs with mRNAs by performing weighted gene co-expression network analysis (WGCNA) (Langfelder and Horvath 2008) and inferred the putative lincRNA functions based on “guilty-by-association” analysis. By clustering correlated genes together, we identified 34 co-expression gene modules containing 2963 lincRNAs and 10,321 protein-coding genes in total (Supplemental files 5 and 6). Notably, 6 of 34 modules are related to immune response, muscle differentiation, and neural development based on the enriched GO terms associated with their modules (Fig. 8).

Fig. 8
figure 8

Functional prediction of rainbow trout lincRNAs. a Upper panel, heatmaps showing expression patterns of all genes in each co-expression gene modules across 15 tissues. Middle panel, bar plots showing the corresponding module eigengene expression value. Lower panel, pie charts showing ratio of mRNAs and lincRNAs in each module. b Functional enrichment in each module. The length of bars indicates the significance (−log10 transferred FDR)

The functional annotations enriched in four modules (blue, grey60, tan, and green) are functionally related to immune responses (Fig. 8b and Supplemental file 7). In each of these four modules, we observed many lincRNAs that are highly expressed in the spleen, gill, and intestine (Fig. 8a), suggesting that these lincRNAs might be involved in immune-related processes. In the blue module, many genes were enriched in T cell receptor signaling and PI3K/AKT/mTOR signaling pathways (Supplemental file 8). The lincRNAs that are co-expressed with tyrosine-protein kinase (ITK), which phosphorylates PLCγ1 in T cell signaling (Andreotti et al. 2010), may play important roles in T cell signaling and function. PI3K and mTOR signaling pathways are important in regulating immune cell activation in neutrophils and mast cells and type I interferon production (Weichhart and Saemann 2008). Those lincRNAs that are co-expressed with PI3K or mTOR pathway genes are likely involved in these immune processes (Supplemental file 8). In grey60 module, the lincRNAs that are co-expressed with integrin, which mediates immune cells to penetrate into tissues (Evans et al. 2009), may play critical roles in immune cell migration and cell-cell interactions that occur during the course of an immune response. In the tan module, the lincRNAs that are co-expressed with Rab20, a key player in phagosome maturation (Pei et al. 2014), may function in phagocytosis. Likewise, lincRNAs in green module are co-expressed with MHC class I genes (Neefjes et al. 2011), indicating that they might be involved in processing and presenting antigen to T cells.

Genes in cyan module contains transcripts (165 protein-coding genes and 15 lincRNAs) that are highly expressed in muscle (Fig. 8). Most of the enriched genes in this module are related to functions or development of muscle (Supplemental file 7). Notably, the lincRNAs that are co-expressed with myoblast determination protein 2 (MyoD2) may play roles in regulating muscle differentiation. A previous study has demonstrated the role of a specific lncRNA in controlling muscle differentiation (Cesana et al. 2011).

Recent studies have shown that many lncRNAs are brain-specific, indicating their indispensable roles in brain development (Ng et al. 2012; Clark and Blackshaw 2014). This study also found that brain has the most tissue-specific lincRNAs (Fig. 4c). The lincRNAs in light yellow module are co-expressed with genes important for neural differentiation and development, such as dihydropyrimidinase-related protein (DRP) and Draxin precursor, indicating that they may function as important regulators of neurogenesis.

Collectively, the functional prediction analysis revealed that tissue-specific lincRNAs and protein-coding genes are enriched for processes specific to that tissue and essential in maintaining each tissue’s identity and functionality.

Conclusions

In this report, we provided the first comprehensive annotation of rainbow trout lincRNAs based on whole transcriptome sequencing of multiple tissues and identified 9674 novel lincRNA transcripts. These lincRNAs tend to be expressed in tissue-specific manner and share many characteristics with those in mammalian species. Co-expression network analysis suggested that many rainbow trout lincRNAs are associated with immune response, muscle differentiation, and neural development. The study lays the groundwork for future functional characterization of lincRNAs in rainbow trout.