Introduction

Pollen grains, the male gametophyte of flowering plants are produced in anthers, the male reproductive organs of flowers. The formation of mature viable pollen is the culmination of a highly specialised and strictly regulated developmental gene expression program (Borg et al. 2009; Haerizadeh et al 2006). Pollen/microspore mother cells (also known as meiocytes) undergo meiosis to form tetrads of haploid microspores, which then divide mitotically and differentiate, giving rise to the sperm cell-carrying mature pollen. The stages of pollen development are well defined with stage-specific markers making it an ideal system for studying plant developmental processes (Brownfield et al. 2009).

Recently, long non-coding RNAs (lncRNAs) have emerged as important, stage-specific regulators of developmental processes in animals and plants (Golicz et al. 2018a, b; Perry and Ulitsky 2016). No lncRNA conservation between plants and animals has been reported. Still it is postulated that lncRNAs can be universal regulators of developmental processes and that their similar functions and mechanisms of action could be a result of convergent evolution (Golicz et al. 2018a, b). lncRNAs are RNA molecules with more than 200 base pairs in length, lack open reading frames more than 100 amino acids long, and have no protein-coding potential. The discretionary length limit defining lncRNAs distinguishes them from small non-coding RNAs, including microRNAs (miRNAs), small nucleolar RNAs (snoRNAs), and small interfering RNAs (siRNAs). LncRNAs, which are primarily intergenic ncRNAs (lincRNAs), intronic ncRNAs (incRNAs), or natural antisense transcripts (NATs), often show polyadenylation and tend to have highly tissue-specific expression (Mattick and Rinn 2015). They act as decoys, molecular scaffolds, or target mimics of miRNAs and siRNA precursors to influence gene expression (Franco-Zorrilla et al. 2007; Wu et al. 2013). When acting as decoys, certain lncRNAs can bind with transcription factors, thereby precluding their interaction with DNA to promote the expression of target genes, while as molecular scaffolds, they can bind with DNA or protein-recruiting regulatory components to specific gene loci (Franco-Zorrilla et al. 2007; Wang and Chang 2011; Wu et al. 2013).

Several reports have highlighted the critical role of lncRNAs in plant biological processes such as stress response, development regulation and nutrient procurement by regulating modification of histones, transcription, alternative splicing, chromatin remodelling or target mimicry (Böhmdorfer and Wierzbicki 2015; Di et al. 2014; Li et al. 2016; Mattick and Rinn 2015; Yu et al. 2013; Yuan et al. 2016; Zhang et al. 2014). In Arabidopsis, during cold exposure, an antisense transcript—COOLAIR (Cold Induced Long Antisense Intragenic RNA)—and an intronic lncRNA—COLDAIR (COLD ASSISTED INTRONIC NONCODING RNA)—restrict the transcriptional activation of the floral repressor FLOWERING LOCUS C (FLC) via histone modification and thereby promote flowering (Csorba et al. 2014; Heo and Sung 2011; Rosa et al. 2016). Similarly, another cold-induced natural antisense lncRNA, MAS (MAF4 antisense RNA), is reported to direct the activation of MADS AFFECTING FLOWERING4 (MAF4) via histone modification resulting in the suppression of early flowering in Arabidopsis (Zhao et al. 2018). In rice, the silencing of an antisense lncRNA—LRK Antisense Intergenic RNA (LAIR)—results in reduced plant growth along with reduced expression of LEUCINE-RICH REPEAT SERINE/THREONINE-PROTEIN KINASE (LRK) gene cluster (Wang et al. 2018). Lines overexpressing LAIR, on the other hand, show a significant increase in overall grain yield and increased expression of some members of the LRK gene cluster. It was reported that in rice, LAIR could variably activate the promoters of the LRKs gene by binding to histone modification enriched in the LRK1 gene area (Wang et al. 2018).

In plants, lncRNAs have also been linked to male reproductive development. In rice, under transcription of long-day conditions, long-day specific male-fertility-associated RNA, LDMAR is required for photoperiod-sensitive male sterility (PSMS) activation and proper pollen formation (Ding et al. 2012, Babaei et al. 2022). In young panicles of rice, overexpression of LDMAR impairs fertility under long-day conditions. In maize, high expression of lncRNA Zm401 was observed in developing male gametophytes and mature pollen, and it was identified as the primary regulator of genes essential for pollen formation, such as ZmC5, ZmMADS2, and MZm3–3 (Ma et al. 2008). Downregulation of Zm401 leads to aberrant tapetum and microspore development, resulting in the production of sterile pollen. Furthermore, in Chinese cabbage (Brassica campestris L.), a novel pollen-specific lncRNA BcMF11 was identified to regulate male reproductive development (Song et al. 2013, 2007). The silencing of BcMF11 resulted in delayed tapetum degradation, abnormal microspore development and pollen abortion. These findings demonstrated that lncRNAs are essential for regulating pollen formation.

Here, we performed a genome-wide identification of lncRNAs during five stages (pollen mother cell, tetrad, microspore, bicellular pollen and mature pollen) of pollen development in field mustard (Brassica rapa) using strand-specific RNA sequencing (ssRNA60 Seq). lncRNAs exhibit stage-specific expression suggesting potential roles at well-defined developmental points. Next, we analysed the genomic location of lncRNAs and predicted cis and trans-acting lncRNAs and their potential target protein-coding genes. Differential expression and functional enrichment analysis highlighted the complex transcriptional reprogramming involved in the transition of diploid pollen/microspore mother cells into haploid trinucleate pollen. We further performed a weighted gene co-expression network analysis (WGCNA) coupled with gene expression correlation to identify lncRNA–mRNA pairs with a potential role in regulating pollen development progression. Collectively, our findings shed light on the roles of lncRNAs during pollen development and expand our knowledge of the molecular mechanisms underlying male reproductive development.

Results

Identification and characterisation of lncRNAs in B. rapa expressed during pollen development

Strand-specific RNA-Seq sequencing reads corresponding to five stages of pollen development (pollen mother cell—‘PMC’, tetrad—‘TET’, microspore—‘MIC’, binucleate pollen—‘BIN’ and trinucleate pollen—‘POL’), were used to track changes in gene expression during male gametophyte development in B. rapa (Fig. 1A). Both poly(A) capture and ribosomal RNA (rRNA) depletion libraries were prepared. The reads were aligned to the Brassica rapa genome with a mapping rate for poly(A) capture libraries between 78.18 and 90.85% (mean: 87.34%) and for the rRNA depletion libraries between 71.02 and 83.22% (mean: 76.74%; Table S1A). Because pollen development requires the participation of highly specialised tissues and cell types, some of the genes involved may not be found in the existing annotation. A reference-based (Zhang et al. 2022) transcriptome assembly was performed to update the existing genome annotation (using an in-house pipeline, Figure S1, (Golicz 2022)), identify novel expressed protein-coding genes and long non-coding RNAs (lncRNAs), including long intergenic non-coding RNAs (referred to as ‘lincRNAs’ hereafter) and lncRNAs overlapping protein-coding genes on the opposite strand (referred to as ‘lncNATs’ hereafter). In total, 49,577 protein-coding genes, 4347 lincRNAs and 2,045 lncNATs were identified. Comparison of the poly(A) capture and rRNA depletion libraries (TPM < 0.1 for all the poly(A) libraries and TPM > 0.1 in at least one rRNA depletion library) suggests that 1.3, 4.3 and 1.7% of loci produce non-polyadenylated transcripts for coding lincRNA and lncNAT genes, respectively. Principal component analysis (PCA) revealed high relatedness between the replicates of each sample (Figure S2A). Further, the Pearson correlation between the three biological replicates ranged from 0.911 to 0.989 (median: 0.968). The correlation between the coding genes, lincRNAs and lncNATs, was also significant across the five pollen developmental stages (Figure S2B). We have tested the concordance between expression observed in this dataset and the previously reported expression patterns of known male development markers in Arabidopsis thaliana. All the markers, other than AtMGH3 and AtGEX2, for which no confident orthologues were identified, had expected expression patterns (Fig. 1B and S2C).

Fig. 1
figure 1

Male gametophyte development and properties of the lncNATs and lincRNAs discovered. A The five stages of pollen development, B heat map of B. rapa homologs of known pollen development marker genes, C distribution of expression values for coding genes and lncRNAs, D expression specificity index for protein-coding genes and lncRNAs, E summary of the number of coding and lncRNA genes showing peak expression at a given stage, and F heat maps and Upset plots presenting overall expression patterns of coding genes lncNATs and lincRNAs across the five stages. PMC pollen/microspore mother cell, TET tetrads, MIC microspores to polarised microspores, BIN early to late binucleate pollen, POL trinucleate pollen

Lastly, based on data mean–variance trend analysis, genes with low expression were filtered, and a CPM cutoff (> 1.0 CPM in at least three samples) was imposed, identifying 31,729 coding genes, 1,052 lincRNA and 780 lncNAT loci available for the analysis. A comparison of the protein-coding and lncRNA loci confirms that the latter have lower expression levels and more stage-specific expression (Fig. 1C and Fig. 1D), with different expression profiles of coding genes and lncRNAs. Among the samples used in this study, the highest number of protein-coding genes (35.22%) had peak expression in PMC and lincRNAs (36.50%), and lncNATs (38.59%) had peak expression in MIC (Fig. 1E, F). It is important to note that the peak expression stage has been defined as the stage with maximum gene expression measured by TPM (transcripts per million). Therefore, the peak expression stage is the stage where transcript abundance is the highest relative to the abundance of other transcripts at that stage.

The lncRNAs were shorter than coding genes with ~ 80 and 40% lncRNAs with one transcript and only one exon, respectively (Fig. 2A–C). Compared to lincRNAs, lncNATs had slightly higher proportion of lncNATs genes that had one transcript (lincRNAs: 78.61%, lncNATs: 82.18%) and multiple exon (lincRNAs: 52.96%, lncNATs: 55.28%). A/U content of the lincRNAs and lncNATs (particularly the lincRNAs) was also higher than the protein-coding sequences (Fig. 2D). Among the lncRNAs with assigned chromosome locations (Fig. 1E), most expressed lncRNA loci (164 lincRNAs and 107 lncNATs) were mapped to chromosome A09, and the least was found to be present on chromosome A10 (48 lincRNAs and 65 lncNATs). The majority of the expressed mRNA loci were located on chromosome A03 (4618) and the least on chromosome A04 (2212) (Fig. 1E).

Fig. 2
figure 2

A Distribution of transcript length of coding genes and lncRNAs, B number of transcripts per gene of coding genes and lncRNAs, C number of exons per transcript of coding genes and lncRNAs, D comparison of A/U content of coding transcripts and lncRNAs, E chromosome distribution of coding genes and lncRNAs and F proportion of collinear loci of different types between B. rapa and three other Brassicaceae species

Conservation analysis of B. rapa lncRNAs

We investigated putative lncRNA conservation between B. rapa and three related Brassicaceae species namely B. napus, B. oleracea and A. thaliana by searching for collinear genomic sequences with similarity to annotated B. rapa lncRNA loci. The highest number of lncRNAs loci could be matched between B. rapa and B. napus A sub-genome followed by B. oleracea and A. thaliana (Fig. 2F, Table S2). The lower number of corresponding non-coding loci compared to protein coding genes, especially at higher evolutionary distance, is consistent with lineage specific nature of lncRNAs. It is important to note that comparisons are based on sequence similarity only, without evidence of expression.

Prediction of cis- and trans-acting lncRNAs

In the next step, the cis and trans interactions of the lncRNAs with the expressed protein-coding genes were predicted. The relative location of lncRNA to their neighbouring protein coding gene has been shown to be associated with the effect the lncRNA has on protein-coding gene expression (Rinn and Chang 2012). lncRNAs that act closer to the transcription site of neighbouring genes are identified as cis-acting lncRNAs (Figure S3A). In contrast, lncRNAs can regulate numerous genes throughout the genome by acting in a trans manner away from the transcription site (Figure S3A). The cis-acting lncRNAs are divided into several classes (Figure S3B) based on the direction (sense or antisense), type of interactions (intergenic or genic) and relative location (upstream or downstream) with respect to the interacting protein-coding gene (Kornienko et al. 2013). Figure 3A, B summarises the cis lincRNAs and lncNATs present on A01–A10 chromosomes, respectively. In this analysis, the lincRNAs are identified as intergenic, and their distribution between sense and antisense is roughly equal (Fig. 3A, Table S3). A slightly higher number of lincRNAs are located upstream (2540) of the protein-coding genes compared to the lincRNAs located downstream (1,947). lncNATs are identified as antisense and genic, the majority of which are located in exons of protein-coding genes (Fig. 3B, Table S4).

Fig. 3
figure 3

A lincRNA cis interactions classification per chromosome in B. rapa, B top significant non-redundant GO terms associated with expressed protein-coding genes identified as partners of cis-acting lincRNAs, C lncNAT cis interactions classification per chromosome in B. rapa and D top significant non-redundant GO terms associated with expressed protein-coding genes identified as partners of cis-acting lncNATs

Further, these cis-acting lncRNA-protein-coding genes neighbouring pairs were filtered out to select the pairs in which both lncRNA and protein-coding genes were identified as expressed in the samples. The GO enrichment analysis of the protein-coding genes identified as partners of the cis-acting lincRNAs and lncNATs is provided in Fig. 3C, D, respectively. lincRNAs neighbouring proteins coding genes were associated with biological process categories such as “hormone-mediated signalling pathway”, “regulation of pollen tube development”, “cell communication”, “regulation of cell morphogenesis involved in differentiation” and “transcription, DNA-templated”, (Fig. 3C, Table S5). The protein-coding genes neighbouring cis-acting lncNATs were involved in “carbohydrate utilisation”, “transmembrane transport”, “replication fork reversal”, “phosphorylation” and “stamen filament development” among other biological processes (Fig. 3D, Table S6).

The prediction of trans regulation of protein-coding genes by lncRNAs depends on the formation of complementary hybrids and the associated interaction energy between the lncRNA and the associated protein-coding genes. Interactions in the scaffold were discarded since the scaffold is unplaced, and one cannot determine the bona fide of the trans interactions. Initially, the maximum threshold of interaction energy was set at − 20 J to retain significant interactions, and 103,545 interactions were identified for lincRNA transcripts. For lncNAT transcripts, 82,606 total trans interactions were identified. However, a number of significant trans interactions in the order of hundreds of thousands are unlikely. The distribution of the energy of interactions (Fig. 4A) shows that most of these interactions have low energy (below − 100 J). Thus, setting a more stringent arbitrary threshold of − 100 J (red vertical line in Fig. 4A) brings down the number of trans interactions to 1418 for lincRNAs and 1061 for lncNATs. The 1418 identified trans interactions involved 548 lincRNAs (Table S7), out of which ~ 43% significantly interacted with only 1 protein-coding gene, whereas 9 lincRNAs interacted with ≥ 10 protein-coding genes. LINC_BRAPST00049411 interacted with the maximum number of protein-coding genes (39) in a trans manner. In contrast, 1061 trans interactions involved 575 lncNATs (Table S8), with only 65% lncNATs interacting with 1 protein-coding gene, and 9 lncNATs interacted with ≥ 10 protein-coding genes. Among the lncNATs, NAT_BRAPST00007879 interacted with 28 protein-coding genes. Further, these lncRNA-protein coding genes trans interacting pairs were filtered out to select the pairs in which both lncRNA and protein-coding genes were identified as expressed in the samples.

Fig. 4
figure 4

A Comparison of trans interactions free energy distribution for lincRNAs (LINC) and lncNATs (NAT), B top significant non-redundant GO terms associated with expressed protein-coding genes identified as partners of trans-acting lincRNAs and C top significant non-redundant GO terms associated with expressed protein-coding genes identified as partners of trans-acting lncNATs

Functional enrichment of protein-coding genes identified as potentially regulated by lincRNAs in a trans manner revealed their association with “DNA integration”, “cell wall organisation”, “proteolysis”, “cell morphogenesis involved in differentiation” and “regulation of cell growth” among other biological process categories (Fig. 4B). Furthermore, trans-acting lncNATs potentially regulated protein-coding genes involved in biological processes such as “oxylipin biosynthetic process”, “carbohydrate utilisation”, “DNA integration”, “stamen filament development”, and “response to hormone” (Fig. 4C).

lncRNA as potential miRNAs targets and precursors

microRNAs (miRNAs) play an important role in regulating gene expression by influencing mRNA degradation and translational repression (Bartel 2004). lncRNAs, like mRNA, can be miRNA targets and operate as miRNA decoys, suppressing the interaction between miRNAs and their target genes (Franco-Zorrilla et al. 2007). Out of the 1052 lincRNAs, only 22 were predicted as potential targets of 18 miRNAs, and 21 out of 780 lncNATs were predicted to be targeted by 36 miRNAs (Table S9). Majority of the identified B. rapa lncRNAs targeted by miRNAs were potentially regulated by cleavage, and very few lncRNA were inhibited at the translational level. The low number of lncRNAs detected as miRNA targets in this analysis is probably due to the lack of male reproductive tissue-specific miRNAs available in published miRNAs.

Some lncRNAs are also considered small RNA (miRNA and siRNA) precursors (Amor et al. 2009; Arikit et al. 2013; Ma et al. 2014; Wei et al. 2022). Plant small RNAs provide a crucial regulatory role in gene expression and genome integrity by silencing transposons during plant reproduction (Liu et al. 2020; Pokhrel et al. 2021). The comparison of the lncRNA sequences to the miRbase collection found only 0.95% of lincRNA and 0.90% of lncNATs as potential small RNA precursors (have 100% similarity to known mature miRNAs). Furthermore, to identify high confidence targets of the miRNAs for which lncRNAs served as precursors, psRNATarget with a stringent expectation cutoff of 0 was employed (Table S9). Three lncRNA–miRNA–mRNA modules were identified (Fig. 5A). For one of the modules, the expression profile of the lncRNA was antagonistic to the protein-coding gene expression profile. LINC_BRAPST00004757 potentially acts as a precursor of bra-miR162-3p, which then targets the expression of BRAPST00013543. Functional annotation identified BRAPST00013543 as a gene encoding thymidine kinase that salvages DNA precursors. The pyrimidine salvage pathway is crucial for genome replication and maintaining of its integrity. BRAPST00013543 showed the highest expression in the PMC stage, and its expression gradually decreased, whereas LINC_BRAPST00004757 expression increased as pollen development progressed (Fig. 5A). Thus, it can be postulated that LINC_BRAPST00004757 regulates the expression of BRAPST00013543 during male gametophyte development in B. rapa.

Fig. 5
figure 5

A Three identified lncRNA–miRNA–mRNA modules, where the lncRNA acts as a precursor of the miRNA, and the mRNA or the protein-coding gene is the direct target of miRNA. Heat maps represent the expression profiles of lncRNA and protein-coding genes during pollen development, B differential regulation of protein-coding genes during pollen development as depicted by an alluvial plot and C differential regulation of lncRNAs during pollen development as depicted by an alluvial plot

Differential transcriptional reprogramming during pollen development

LncRNAs identified in the datasets used in this study had lower expression levels than protein-coding genes. LncRNAs with low abundance might get filtered out while performing differential expression analysis (Assefa et al. 2018). limma R package employed in this study to perform differential expression analysis runs a moderated t test after an empirical Bayes correction (Ritchie et al. 2015), a generic and suitable for the differential expression of processed lncRNA expression data. In the RNA-Seq libraries, 49,577 protein-coding genes, 4347 lincRNAs and 2045 lncNATs were identified. A CPM cutoff of 1 in at least 3 samples was used to identify 31,729 coding genes, 1,052 lincRNAs and 780 lncNATs expressed during pollen development. To investigate the regulation of protein-coding genes and lncRNA during pollen development, we performed differential expression analysis (log2fc cutoff = 0.585, adjusted p value cutoff < 0.01) across four contrasts (TET-PMC, MIC-TET, BIN-MIC, and POL-BIN) by comparing each pollen developmental stage with the previous one (Table S10, Figure S4A). In total, 92.58, 89.73 and 93.21% of the expressed protein-coding genes, lincRNAs and lncNATs, respectively, were differentially regulated across the four contrasts. When the uninucleate microspore transitions into a binucleate microspore, a significantly higher percentage of genes and lncRNAs were differentially regulated with a higher proportion of downregulated genes (Fig. 5B, C, Figure S4A). These observations align with the reported findings that during male germline development, a decreasing trend in transcriptome size and complexity throughout microsporogenesis and microgametogenesis is observed in flowering plants (Singh et al. 2008; Wei et al. 2010). Only 19 and 84 genes were commonly upregulated or downregulated among the protein-coding genes across all developmental stage contrasts (Figure S4B). Interestingly, no common lncRNAs were identified to be differentially regulated across all four contrasts, further highlighting their stage-specific regulation (Figure S4B).

Gene ontology (GO) analysis of the differentially expressed protein-coding genes was performed to unravel their role during different pollen developmental stages (Figure S5). When pollen/microspore mother cell transitions into tetrads, protein-coding genes associated with biological process categories such as “mRNA processing”, “transcription by RNA polymerase II”, “proteolysis”, “gene expression” and “histone modification” were upregulated. Further, as the pollen development progresses, biological process categories including “ribosome biogenesis”, “ncRNA processing”, “translation” and “gene expression” were upregulated. As highlighted earlier, gene expression significantly downregulates as the uninucleate microspore transitions into binucleate pollen, contributing to the decreasing complexity of transcription in binucleate pollen. This was further supported by the downregulation of biological process categories involved in “regulation of transcription, DNA-templated” and “regulation of gene expression”. In contrast, the upregulated genes were associated with “cellular localization” and “protein transport” among others. During the final trinucleate pollen stage, the genes involved in transcription and protein synthesis were generally expressed at lower levels, as indicated by the downregulation of GO terms “ribosome biogenesis”, “translation” and “mRNA processing”. Furthermore, protein-coding genes involved in processes including “localization”, “regulation of pollen tube growth”, “ion transmembrane transport” and “nucleotide-sugar metabolic process” were upregulated in trinucleate pollen. The functional annotation of differentially expressed genes highlighted the stage-specific differential regulation of an array of biological processes during the progression of male gametophyte development.

lncRNA–mRNA co-expression analysis

To predict the regulatory roles of lncRNA during pollen development, the co-expression networks between the protein-coding genes and lncRNAs (expressed genes, > 1 CPM in at least three samples) were identified using the WGCNA tool. The tool identified 24 modules in the dataset (Figure S6). For further analysis, the top three modules associated with the five pollen developmental stages were identified (Table S11). A different number of lncRNAs were present in the selected modules. Based on the cis and trans regulation of protein-coding genes by lincRNA and lncNATs, we next investigated the hub genes in the selected modules and identified lncRNA–protein-coding genes interactions and grouped them as cis-lincRNA–protein-coding gene, cis-lncNAT–protein-coding gene, trans-lincRNA–protein-coding gene, and trans-lncNAT–protein-coding gene co-expressed pairs. We also performed correlation analysis to supplement the WGCNA analysis and further filtered out lncRNA–coding gene pairs with a Pearson correlation coefficient of less than 0.8 or more than -0.8. Additionally, only those pairs were selected in which the protein-coding gene was differentially expressed and had > 65% similarity with its homolog in A. thaliana. In total, 54 cis-lincRNA–protein-coding genes, 58 cis-lncNAT–protein-coding gene, 8 trans-lincRNA–protein-coding genes and 18 trans-lncNAT–protein-coding gene interacting co-expressed pairs were identified (Table S12).

We further searched for an association between protein-coding genes and lncRNAs expressed during male gametophyte development based on functional annotation of genes. We collected genes annotated with the GO biological process terms associated with male gametophyte and pollen development. We also collected genes predicted to be transcription factors or homologous to pollen-specific genes in A. thaliana. In total, we found 38 cis-lincRNA–protein-coding genes, 31 cis-lncNAT–protein-coding genes, 7 trans-lincRNA–protein-coding genes and 14 trans-lncNAT–protein-coding gene pairs of interest (Figs. 6 and 7). Several key pollen developmental regulators were found among the genes identified, including genes involved in “regulation of cell cycle”, “microtubule-based movement”, “pollen development”, “pollen tube growth”, “cell wall organisation” and “transmembrane transport” along with genes showing pollen-specific expression (Figs. 6 and 7). Analysis of the function of protein-coding genes in the identified lncRNA–protein-coding gene pairs revealed genes involved in transcription regulation, such as transcription factors belonging to WRKY, bHLH and NAC TF families (Fig. 7). The proximity of lncRNAs and our previous results suggesting a possible regulatory relationship between lncRNAs, and their co-expressed protein-coding gene partners suggest that the expression of these developmental regulators and transcription factors could be affected by lncRNAs and that they present targets for future investigation.

Fig. 6
figure 6

Heat map representing the cis-lincRNA–protein-coding gene and cis-lncNAT–protein-coding gene pairs in the top modules of pollen development stages as identified by the co-expression analysis. The GO term description of the protein-coding gene highlights the involvement of the co-expressed pairs in biological processes critical to pollen development. The class of cis interaction between the lncRNAs, lncNATs and their respective co-expressed protein-coding gene is also illustrated. cis classification of lincRNAs ACD antisense convergent downstream, ADU antisense divergent upstream, SU same strand upstream, SD same strand downstream. cis classification of lncNATs CE contained exonic, OE overlapping exonic, NE nested exonic NI nested intronic

Fig. 7
figure 7

A Heat map representing the cis- and trans-lincRNA–protein-coding gene, lncNAT–protein-coding gene pairs in the top modules of pollen development stages as identified by the co-expression analysis. The protein-coding genes in these pairs were identified as transcription factors (light red box), B heat map representing the cis-lincRNA–protein-coding gene and cis-lncNAT–protein-coding gene pairs in the top modules of pollen development stages as identified by the co-expression analysis. The GO term description of the protein-coding gene highlights the involvement of the co-expressed pairs in biological processes critical to pollen development. The class of cis interaction between the lncRNAs, lncNATs and their respective co-expressed protein-coding gene is also illustrated. cis classification of lincRNAs- ACD antisense convergent downstream, ADU antisense divergent upstream, SU same strand upstream, SD same strand downstream. cis classification of lncNATs- CE contained exonic, OE overlapping exonic, NE nested exonic, NI nested intronic

Discussion

Long non-coding RNAs (lncRNAs) play diverse roles in regulating biological processes (Golicz et al. 2018a, b; Perry and Ulitsky 2016; Waseem et al. 2020; Yu et al. 2019). Recent research has shown an increasing number of lncRNAs with significant tissue-specific expression patterns, implying that they play regulatory roles in developmental processes (Gawronski and Kim 2017; Wang et al. 2015; Ward et al. 2015). Plant reproductive development is an essential feature of crop breeding, and identification of lncRNAs that influence reproductive development is becoming increasingly important. The progression of pollen development from diploid pollen mother cells to haploid microspores to highly specialised haploid trinucleate pollen provides the opportunity for dissecting molecular program controlling male lineage development and identification of transcriptional network in each stage associated with cell identity (Okada et al. 2005, 2007; Russell et al. 2012), cell cycle, cell fate determination (Haerizadeh et al. 2006; Singh and Bhalla 2007; Sharma et al. 2011) as well as male gametophyte specification processes. Here, we identified sets of lncRNAs and protein-coding genes expressed during pollen development (Fig. 1A) in field mustard (B. rapa). As an important vegetable crop, B. rapa is an attractive option for use as a reference species in Brassica genome investigations (Zhang et al. 2022). We predicted regulatory roles of lncRNAs based on genome location and co-expression analysis. Our results help resolve the distinctive identity of pollen developmental stages and provide a rich data source for further describing the mechanisms underlying male lineage development.

We identified 6,392 lncRNAs (4347 lincRNAs and 2,045 lncNATs) during pollen development in B. rapa (Fig. 1E). A study in B. rapa conducted a time series of RNA-seq experiments at five developmental stages during pollen development and three different time points after pollination and identified 12,051 putative lncRNAs (Huang et al. 2018). This difference in results can be attributed using different cell types for sequencing. Huang et al. (Huang et al. 2018) used whole buds representing the five pollen developmental stages for sequencing, whereas we isolated pollen mother cells, tetrads, microspores, binuclear pollen and trinucleate pollen for sequencing. Another reason can be the use of a different variety of B. rapa and the employment of different bioinformatic tools for identification and discovering of lncRNAs.

The lncRNAs (lncNATs and lincRNAs) identified in B. rapa during pollen development (Fig. 2A–C) were shorter and had fewer isoforms and exons per transcript than protein-coding genes; these properties were consistent with previous reports of genome-wide lncRNA discovery (Golicz et al. 2018a, b; Liu et al. 2012; Wang et al. 2015; Zhang et al. 2014). About half of the identified lncRNAs reported in Arabidopsis and rice have one transcript and include only a single exon (Liu et al. 2012; Zhang et al. 2014). A higher A/U content is another feature of lncRNAs observed in B. rapa lncRNAs (Fig. 1D). The A/U composition may reflect the underlying sequence. However, it is attractive to speculate that it may be a feature of lncRNAs that facilitates recognition by RNA-binding proteins. The flexible nature of lncRNAs in interacting with other transcripts is potentially indicated by a high A/U content (Smith and Mattick, 2017), as transcripts rich in A/U content are less stable (Barreau et al. 2005).

lncRNAs are reported to be low in abundance and show tissue-specific expression in plants and animals (Golicz et al. 2018a, b). In this study also, B. rapa lncRNAs showed lower expression in comparison to protein-coding genes (Fig. 1B). The expression patterns of lncRNAs (Fig. 1C–F) likely reflect their stage-specific roles during pollen development and are consistent with previous observations of high lncRNA transcription in buds containing the microspore stage (Huang et al. 2018). LncRNAs showed peak expression at the microspore stage, whereas a higher fraction of protein-coding genes had peak expression at pollen/microspore mother cells (Fig. 1E). Few lncRNAs were also expressed exclusively at a single pollen developmental stage (Fig. 1F). Furthermore, differential expression analysis revealed no common B. rapa lncRNA to be differentially regulated between all pollen developmental stage contrasts. Different expression windows for lncRNAs suggest they are part of a coordinated expression program rather than a result of non-specific pervasive transcription. These results indicate that lncRNA expression profiles may precisely determine the male lineage specifications. Further investigation of these stage-specific lncRNAs may lead to a better understanding of the molecular control of pollen development.

Across plant species, lncRNAs are reported to show low conservation and are mostly species specific (Ke et al. 2019; Simopoulos et al. 2019). Because lncRNAs are highly evolved, there is less sequence conservation across plant and animal taxa, resulting in fewer phylogenetic connections (Simopoulos et al. 2019). In the present study, we found that the proportion of collinear non-coding loci between B. rapa and three other Brassicaceae species non-coding decreased with increasing evolutionary distance between plant species consistent with the lineage specific nature of lncRNAs. In our study, a relatively high percentage (64%) of B. rapa lncRNA loci showed similarity to colinear genomic loci in the B. napus A sub-genome. Interestingly B. napus is amphidiploid species originating from the hybridisation of B. rapa and B. oleracea and contains complete diploid chromosome sets of both parental genomes. Since the A genome in B. napus represents the diploid B. rapa genome, the high level of conservation of B.rapa lncRNAs in the A genome of amphidiploid B. napus species is expected. According to Liu et al. (2012), 2% of all putative lncRNAs identified in A. thaliana are conserved across the plant kingdom. Comparing maize (Zea mays) lncRNAs to A. thaliana lncRNAs yielded a similar conservation level (L. Li et al. 2014a, b). With the increasing availability of whole plant genomes, it will be feasible to address concerns of non-coding sequence conservation in a phylogenomic setting using genetic collinearity in addition to sequence similarity. The expanding lncRNA database and comparative genomics may further the understanding of the functional conservation of lncRNAs, and the underlying mechanism across different tissues and plant species.

Recent investigations have revealed the various functions mediated by the action of lncRNAs in plants (Nejat and Mantri 2018). Different interactions between lncRNAs and protein-coding genes point towards the diverse mode of action of lncRNAs (Rinn and Chang 2012). lncRNAs, like mRNA, can be miRNA targets and operate as miRNA decoys, suppressing the interaction between miRNAs and their target genes (Franco-Zorrilla et al. 2007). Out of the 1052 lincRNAs, only 22 were predicted as potential targets of 18 miRNAs, and 21 out of 780 lncNATs were predicted to be targeted by 36 miRNAs (Table S9). Some lncRNAs are also considered small RNA (miRNA and siRNA) precursors (Amor et al. 2009; Arikit et al. 2013; Ma et al. 2014; Wei et al. 2022). We found that only 10 lincRNA and 8 lncNATs are potential small RNA precursors (100% similarity to known mature miRNAs). The small fraction of lncRNAs identified as targets of miRNA and the lack of similarity between lncRNAs and mature miRNAs suggest that the majority of lncRNAs are unlikely to be miRNA decoys or act as small RNA precursors and have other, independent modes of regulation. However, it is notable that the publicly available databases may lack some miRNAs specific to pollen development; therefore, the roles of lncRNAs as miRNA decoys or miRNA precursors of yet unknown miRNAs cannot be excluded.

LncRNAs can regulate gene expression in a cis or trans manner (Fatica and Bozzoni 2014). Different classes of lncRNAs may play distinct roles in regulating protein-coding gene expression changes in relative abundance. lncRNAs act closer to the transcription site of the neighbouring genes while acting in a cis manner (Guil and Esteller 2012). Contrary to this, they can regulate multiple genes throughout the genome by functioning away from the transcription site while acting in a trans manner (Fatica and Bozzoni 2014). The present study predicted cis and trans interactions between all the annotated lncRNAs and protein-coding genes (Figs. 3 and 4, Table S3–S8). The cis interactions detected for lincRNAs and lncNATs highlighted that cis-acting lincRNAs are completely identified as intergenic. The distribution between sense/antisense and upstream/downstream description is approximately similar. Cis-acting lncNATs are mostly identified as antisense and genic, and most of them are located in exons of protein-coding genes. A higher number of significant trans interactions for lncNATs as compared to lincRNA were identified in this study. It is highly likely that the lncNATs overlapping with coding genes in the B. rapa genome potentially emerged from an ancestral genome triplication event (Cai et al. 2017; Cheng et al. 2014; Zhang et al. 2022). A substantial proportion of B. rapa genes would be present in groups of paralogs having sequence similarity scattered in different positions on the genome (Mun et al. 2009). Therefore, the chance of finding a high similarity between a lncNAT and a distant protein-coding gene is higher than for lincRNA, which is instead located outside protein-coding genes.

In plants, few studies have reported lncRNAs linked to male reproductive development, for instance, LDMAR in rice, Zm401 in maize and BcMF11 in B. campestris (Ding et al. 2012; Ma et al. 2008; Song et al. 2013, 2007). Huang et al. (2018) also reported lncRNA–mRNA pairs, including 10 genes involved in pollen and pollen development functions. Weighted gene co-expression network and correlation analysis revealed 90 pairs of cis- or trans-acting lncRNAs and protein-coding genes, including several genes involved in transcriptional regulation, regulation of cell cycle, microtubule-based movement, pollen development, pollen tube growth, cell wall organisation and transmembrane transport (Fig. 6, 7 and Table S12). Interestingly, a higher number of cis-acting lncRNAs containing pairs were identified, and most of them were positively correlated (Table S12). The next obvious step is determining whether this link between cis-acting lncRNAs and protein-coding genes is causal. The intimate relationship between lncRNA/protein-coding gene pairs could indicate the presence of sophisticated gene expression regulatory mechanisms during pollen development. Several studies have reported cis-acting lncRNAs regulating the activation of neighbouring genes’ expression (Csorba et al. 2014; Krishnan and Mishra 2014; Rosa et al. 2016; Statello et al. 2021; Vance and Ponting 2014; Yap et al. 2010). In our study, the lncRNAs that are strongly co-expressed with neighbouring protein-coding genes are, thus, promising candidates for further research. Even though cis-acting lncRNA and neighbouring protein-coding genes are typically positively correlated, we also found few lncRNA that potentially negatively regulated the expression of neighbouring genes. Several trans-acting linked lncRNAs were also discovered during pollen development. These results indicate that an intimate relationship between lncRNAs and protein-coding transcripts may be mediated by various molecular interactions.

Material and methods

Plant material, sample collection and sequencing

Brassica rapa accession no. ATC 92,270 Y.S (AND)-168 was used in this study. The plants were grown in a growth cabinet under the following conditions (21/18 °C day/night, a photoperiod of 16/8 h light/dark, 200 μmol m−2 s−1 light intensity and 60% humidity). The different stages of pollen development were harvested as per previously published protocol (Babaei et al. 2021; Golicz et al. 2021). Briefly, the following five groups of buds were identified: < 0.5 mm buds (pollen/microspore mother cells, PMC), 0.8–1 mm buds (tetrads, TET), 1–2.5 mm (microspores to polarised microspores, MIC), 3–4.5 mm (early to late binucleate pollen, BIN) and 5–6 mm (trinucleate pollen, POL). Anthers were carefully dissected from the buds of the last two groups and crushed in the B5 medium in a 1.5 mL tube. The whole buds were crushed in B5 medium for the first three groups. The crushed suspension was then filtered through a 44 μm nylon mesh into 15 mL tubes. The filtrate was centrifuged at 150 g for 3 min at 4 °C. The supernatant was discarded, and the pellet was washed using 0.5 × B5 medium and was again centrifuged at 150 g for 3 min at 4 °C. The supernatant was removed, and the pellet was immediately frozen in liquid nitrogen and stored at − 80 °C. An aliquot from each isolation was analysed to check the developmental stage.

The buds collected from different plants were used as one biological replicate, and three independent biological replicates were prepared for each sample. The total RNA was isolated using the mirVana™ miRNA Isolation Kit (Thermo-Fisher; Part Numbers AM1560, AM1561, Carlsbad, CA, USA) according to the manufacturer’s instructions. The isolated RNA samples were treated with TURBO™ DNase (Ambion, Carlsbad, CA, USA) to remove DNA contamination. The libraries were prepared using Illumina TruSeq stranded mRNA kit with poly (A) selection (Golicz et al. 2021). Additionally, five libraries were prepared using rRNA depletion (Babaei et al. 2021). The sequencing was performed at the Australian Genome Research Facility (AGRF), Melbourne.

B. rapa genome re-annotation and lncRNA discovery

The reads were mapped to the B. rapa ‘Chiifu’ genome assembly v3.5 (Zhang et al. 2022) using STARv2.7.9a (Dobin and Gingeras 2015) two-pass mapping as described by (Veeneman et al. 2016). Default settings were used for generating the first-pass genome index. The filters and parameters for the two-pass mapping are provided in Table S1B. The transcripts for each library were assembled separately using Stringtie v2.1.4 (Pertea et al. 2015; Varabyou et al. 2021) using existing the B. rapa ‘Chiifu’ genome assembly v3.5 genome annotation as a guide. The individual assemblies were merged using Stringtie v2.1.4, again using the B. rapa ‘Chiifu’ genome assembly v3.5 genome annotation as a guide.

The coding potential of all genes was evaluated by (1) Coding potential calculator2, CPC2-beta, (Kang et al. 2017) (2) PLEK tool (A. Li et al. 2014a, b) (3) DIAMOND blastx v0.9.30 comparison against the RefSeq database (obtained on: 23.05.2021) (Buchfink et al. 2015) and (4) DIAMOND blastp comparison of the extracted longest open reading frames (ORFs; TransDecoder v5.5.0) against RefSeq database. The genes for which none of the transcripts was identified as coding were designated as non-coding. Non-coding loci with at least one transcript >  = 200 bp in length and no matched in the Rfam v14.7 database were identified as lncRNAs. The overlap between lncRNAs and protein-coding loci was identified by bedtools v2.30.0 (Quinlan 2014). lncRNAs that did not overlap any coding loci were designated lincRNAs, while lncRNA that overlapped at least one protein-coding locus was designated as lncNATs. A schematic representation of the pipeline for B. rapa genome re-annotation, lncRNA discovery and alternative splicing analysis is provided in Figure S1.

cis and trans regulation of protein-coding genes by lncRNAs

Classification of lncRNAs acting in cis is performed with the FEELnc_classification script from the software FEELnc (Wucher et al. 2017). This software applies a 100 kb sliding window and classifies lncRNAs based on its relationship with the closest mRNA in the window (Figure S3B). A custom python script was used for generating statistics per chromosome of the annotated lncRNAs. The results are visualised in a Sankey plot with a custom R script using the packages ggplot2 and ggalluvial (Wickham 2016). For simplicity, only the best-predicted interactions for lncRNAs localised in chromosomes are visualised, and the lncRNAs located on scaffold chromosomes were removed.

Possible RNA–RNA interactions in trans between B. rapa lncRNAs and mRNAs were performed using the RlBLAST software (Fukunaga and Hamada 2017). RlBLAST was chosen for this analysis as it is over 64 times faster than other similar software for achieving a similar level of precision.

Gene expression quantification, data pre-processing and differential expression analysis

Transcript expression was quantified using Kallisto v0.45.1 (Bray et al. 2016), and the downstream analysis was performed using the limma R package. Tximport R package version 1.10.0 (lengthScaledTPM method) was used to generate read counts and transcript per million reads (TPMs) (Soneson et al. 2015). Transcripts and genes with low expression were filtered based on data mean–variance trend analysis. A gene was considered expressed if any transcripts had CPM \(\ge\) 1 in at least 3 of the 15 samples. Normalisation of the gene and transcript read counts to \(lo{g}_{2}\)-CPM was performed by the TMM method (Bullard et al. 2010). Principal component analysis (PCA) was undertaken to determine the relatedness between the biological replicates (Figure S2A). Batch effects were corrected using the RUVSeq R package (Risso 2015). Four contrast groups were defined to perform the differential expression analysis: TET vs PMC, MIC vs TET, BIN vs MIC and POL vs BIN. For a gene to be considered significantly differentially expressed in a contrast group, a cutoff adjusted p value < 0.01 and log2 fold change ≥ 0.5 was used. p values of multiple testing were adjusted to correct the false discovery rate (FDR) using the BH method (Benjamini and Yekutieli 2001).

Functional annotation and enrichment analysis

Homologous B. rapa coding genes, compared to the Arabidopsis proteome, were identified using the BlastP program with an e value ≤ 1e-05. PANNZER2 (Törönen et al. 2018) was employed for the functional annotation of the protein-coding genes. PANNZER provides a GO term and a functional description for the query protein sequences. 24,179 out of 31,729 expressed protein coding genes were assigned a GO term. The GO enrichment analysis was performed using the GOstats R package, and a hypergeometric test (p value < 0.01) was performed to identify overrepresented GO terms (Falcon and Gentleman 2007). The annotated expressed genes (24,179) were used as background for GO enrichment analysis. Further, ReViGO was used to retain the significant non-redundant GO terms (Supek et al. 2011).

lncRNA conservation analysis

Conservation of lncRNA loci was investigated by searching for sequence similarity between B. rapa lncRNA loci and the genome assemblies of B. napus (A genome) (https://academic.oup.com/gigascience/article/9/12/giaa137/6034787), B. oleracea (https://phytozome-next.jgi.doe.gov/info/Boleraceacapitata_v1_0) and A. thaliana (https://phytozome-next.jgi.doe.gov/info/Athaliana_TAIR10) using Liftoff (Shumate and Salzberg 2021). Collinearity between loci with sequence similarity was the confirmed with MCScanX (Wang et al., 2012).

Analysis of lncRNA as miRNA targets or precursors

The identification of B. rapa miRNA targets were done using psRNATarget v2 (2017 update) (Dai et al., 2018). Using the default parameters, the alignment of identified B. rapa lncRNAs and protein-coding genes expressed during pollen development was performed against the B. rapa miRNAs available on the psRNATarget database. A strict expectation threshold of ≤ 3 was used to filter potential targets.

Mature B. rapa sequences were downloaded from miRbase release 22.1 (Kozomara et al. 2019). To identify whether lncRNAs acted as potential precursors or targets of small RNAs (miRNA and siRNA), the lncRNAs were compared to the miRNA sequences using BLASTN v2.7 (-task blastn -outfmt '6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qcovs qlen slen'). The matches were filtered to retain only matches with 100% sequence identity and match length equal to miRNA. The matches were filtered to retain only matches with 100% sequence identify and match length equal to miRNA length.

lncRNA–mRNA co-expression analysis by WGCNA

Weighted gene co-expression network analysis using the WGCNA R package v1.63 (Langfelder and Horvath 2008) was performed to identify coding genes and lncRNA co-expressed networks. As per the WGCNA guidelines, normalised gene counts were used. The construction of a co-expression network with an approximate scale-free topology of above 0.80 was achieved with a soft power of 10. A minimum module size of 30 is chosen. Moreover, gene modules that exhibit highly similar expression patterns are merged. After evaluating the module dendrogram, an arbitrary threshold of 0.20 (corresponding to 0.80 correlation) was set as limit to merge modules with a similar expression. The top three best correlating modules for each developmental stage were selected, and then the transcripts with the best significance in the selected correlated modules were identified, and functional enrichment analysis was performed. LncRNA–mRNA co-expressed pairs were identified among the hub genes of the selected modules and the potential regulatory relationship (cis or trans) was characterised between the pair. Further, the Pearson correlation coefficient between the lncRNA–protein-coding gene co-expressed pairs was calculated using the R package.

Conclusion

In summary, we investigated the expression of long non-coding RNAs during pollen development and identified 1,832 lncRNA (1052 lincRNA, 780 lncNAT) and 31,729 protein-coding genes as expressed during pollen development in B. rapa. The lncRNAs have defined stage-specific expression patterns. The lncRNAs were subdivided into classes based on their genomic location and orientation relative to protein-coding gene neighbours. lncRNA belonging to different classes have distinct properties suggesting possible differences in function and/or mode of action. The analysis of expression patterns of lncRNA–protein-coding genes points to the involvement of lncRNAs in the modulation of the protein-coding gene associated with several biological processes regulating pollen development. Overall, genome-wide identification, characterisation and functional analysis enabled the identification of lncRNAs candidates and their functional associations with protein-coding genes, potentially revealing regulatory and molecular mechanisms underlying male reproductive development in B. rapa.