Introduction

Genomic imprinting is an epigenetic process that leads to monoallelic gene expression in a parent-of-origin-dependent manner (Bartolomei and Ferguson-Smith 2011; Barlow and Bartolomei 2014). Imprinting can be regulated by DNA methylation, histone modifications and long noncoding RNAs (lncRNAs). Currently, there are \(\sim \)100 validated imprinted genes in humans (Morcos et al. 2011; Cuellar Partida et al. 2018), and these genes often occur in cluster (Barlow and Bartolomei 2014).

The DLK1–DIO3 imprinted cluster is \(\sim \)1-Mb long and is located on human chromosome 14q32 (da Rocha et al. (2008)). The DLK1–DIO3 domain contains three paternally expressed protein-coding genes (DLK1, RTL1 and DIO3), several maternally expressed lncRNAs (GTL2/MEG3, anti-RTL1, MEG8, and MIRG/MEG9), numerous microRNAs (miRNAs) and small nucleolar RNAs (snoRNAs), and several pseudogenes (Edwards et al. 2008; Benetatos et al. 2014). Expression patterns from the DLK1–DIO3 cluster can be used to identify fully induced pluripotent stem cells (iPSCs) and embryonic stem cells (ESCs); in particular, detecting maternally expressed noncoding RNAs serves as an indicator of the pluripotency of these cells (Liu et al. 2010; Benetatos et al. 2014; Mo et al. 2015).

Advances in sequencing technology have shown that 98% of transcripts of human genomes are ncRNAs (Djebali et al. 2012; Iyer et al. 2015). LncRNAs are defined as transcripts longer than 200 nucleotides with no functional protein product (Cao 2014). LncRNAs have widely participated in the cellular functions such as differentiation, development, cancer and gene imprinting, and often influence the expression of the neighbouring genes (Engreitz et al. 2016). A very important subgroup of lncRNAs is long intergenic noncoding RNA (lincRNA). In mice, over a thousand lincRNAs have been identified through chromatin signature analysis (Guttman et al. 2009). In cattle, lincRNAs comprised more than half of the total ncRNAs that were predicted from different developmental stages and tissues (Qu and Adelson 2012). Recent studies show that lincRNAs play an important role in regulating gene expression and epigenetic modification (Marchese and Huarte 2014; Deniz and Erman 2017).

The mouse DLK1–DIO3 domain contains two lincRNAs, AK050713 (Hagan et al. 2009) and B830012L14Rik (Zhang et al. 2011), between two maternally expressed imprinted genes (Meg8 and Meg9). Previously, we identified a lincRNA (LINC24061) that was located between MEG8 and MEG9 in the bovine DLK1–DIO3 imprinted domain (Zhang et al. 2017). By further analysing the EST sequences between bovine MEG8 and MEG9, we found three spliced ESTs (CB463975, DY196469 and CR852162) located upstream of LINC24061. Using reverse transcriptase-polymerase chain reaction (RT-PCR), we confirmed that these three ESTs come from the same transcript, which was designated as LINC24065. Here, we aimed to identify the structural features, expression pattern of transcript variants and determine the allele-specific expression pattern of LINC24065.

Materials and methods

Collection of tissue samples

Tissue samples (heart, liver, spleen, lung, kidney, skeletal muscle, adipose and brain) of 32 Holstein cows were collected at the time of slaughter from a local abattoir. All tissue samples were immediately frozen in liquid nitrogen and stored in \(-\)80\(^{\circ }\)C for DNA and RNA extraction. This study was conducted with the approval of the Agriculture Research Animal Care Committee of Hebei Agricultural University.

Isolation of total RNAs and cDNA synthesis

Total RNAs were isolated from various bovine tissues using the TransZol Up Plus RNA kit (Transgen, Beijing, China) following the manufacturer’s protocol. RNA was converted into cDNA using EasyScript One-Step gDNA Removal and cDNA Synthesis SuperMix (Transgen, Beijing, China), with a 20 \(\mu \)L reaction volume containing 1 \(\mu \)g total RNA, 1 \(\mu \)L anchored Oligo(dT)\(_{18}\) primer (0.5 \(\mu \)g/\(\mu \)L), 10 \(\mu \)L \(2\times \hbox {ES}\) reaction mix, 1 \(\mu \)L gDNA remover and 1 \(\mu \)L EasyScript RT/RI enzyme mix. The sample was incubated at 42\(^{\circ }\)C for 30 min, and 85\(^{\circ }\)C for 5 min.

5\(^{\prime }\) And 3\(^{\prime }\) rapid amplification of cDNA ends (RACE)

To identify novel lincRNAs in DLK1–DIO3, we searched for ESTs between MEG8 and MEG9. Three spliced ESTs with partial sequence overlapping were selected (CB463975, DY196469 and CR852162), and the partial sequences were confirmed as part of the same transcript, which was termed LINC24065 following the GENCODE annotation bibliography. To clone the full cDNA sequence of LINC24065, one 5\(^{\prime }\) and two 3\(^{\prime }\) gene-specific primers (GSP5\(^{\prime }\)-1 5\(^{\prime }\)-TGGAGGCAACCAGAGAGCGCGTCCGTGG-3\(^{\prime }\), GSP3\(^{\prime }\)-1 5\(^{\prime }\)-GAACGGACCAGTGGGGCTGGCTGGA-3\(^{\prime }\) and GSP3\(^{\prime }\)-2 5\(^{\prime }\)-GCCTCTGTCAGGAGGGCGACAAAGACC-3\(^{\prime }\)) were designed through Oligo7 software. The 5\(^{\prime }\) and 3\(^{\prime }\) RACE were performed with SMARTer RACE 5\(^{\prime }\)/3\(^{\prime }\) kit (TaKaRa, Dalian, China). The 5\(^{\prime }\) and 3\(^{\prime }\) RACE products were cloned using the pEASY-Blunt Cloning kit (Transgen, Beijing, China) and sequenced (BGI, Beijing, China).

RT-PCR analysis of transcript expression patterns

RT-PCR was used to analyse the expression patterns of alternative splice variants in different tissues. Total RNAs was extracted from eight tissues (heart, liver, spleen, lung, kidney, skeletal muscle, adipose and brain) and reverse-transcribed into cDNA. Two forward primers (RT-F1 5\(^{\prime }\)-GCCTTATGGAAATGGTTATTG-3\(^{\prime }\) and RT-F2 5\(^{\prime }\)-TGGCTGTTACCTTTGTGGA-3\(^{\prime }\)) and two reverse primers (RT-R1 5\(^{\prime }\)-AGACCGTGGAAGATGCTT-3\(^{\prime }\) and RT-R2 5\(^{\prime }\)-AGACCGACAATACAACCCT-3\(^{\prime }\)) were designed using the 5\(^{\prime }\) and 3\(^{\prime }\) end sequences of LINC24065, respectively. The expression patterns of LINC24065-v1 and LINC24065-v2 were analysed using the primer pair combination, RT-F1/RT-R1. The primer pairs of RT-F2/RT-R1 and RT-F2/RT-R1 were used to determine the expressions of LINC24065-v3 and LINC24065-v4, respectively. RT-PCR was performed with KOD-Plus-Neo (TOYOBO, Shanghai, China) for 35 cycles of 94\(^{\circ }\)C for 30 s, 55\(^{\circ }\)C for 30 s and 72\(^{\circ }\)C for 40 s. GAPDH (accession no. BTU85042) was amplified using primers (GAPDH-F 5\(^{\prime }\)-GCACAGTCAAGGCAGAGAAC-3\(^{\prime }\)and GAPDH-R 5\(^{\prime }\)-GGTGGCAGTGATGGCGTGGA-3\(^{\prime }\)) as an internal control.

Genomic DNA isolation and SNP identification

Genomic DNA was extracted from the liver of 32 cows through the standard phenol–chloroform extraction method. Using the UCSC Genome browser (http://genome.ucsc.edu/), candidate SNPs were predicted in exons of LINC24065. Primers (gF 5\(^{\prime }\)-TGAGGCGTTGTCTAACTGTGA-3\(^{\prime }\) and gR 5\(^{\prime }\)-TACTCCACCAACGGTCATCAT-3\(^{\prime }\)) were designed to amplify the SNPs of interest, which were located in introns 5 and 6. The genomic PCR reaction mixture (50 \(\mu \)L) contained 1 \(\mu \)L forward and reverse primers (10 mM), 2 \(\mu \)L cDNA, 25 \(\mu \)L \(2\times \hbox {ES}\) Taq MasterMix (Dye) (CWBIO, China) and 21 \(\mu \)L double distilled H\(_{2}\)O, and the reaction was 30 cycles of 94\(^{\circ }\)C for 30 s, 55\(^{\circ }\)C for 30 s and 72\(^{\circ }\)C for 30 s. The amplified products were detected on 1.5% agarose gels and purified using a Gel Extraction kit (Omega, Monsanto, USA) according to the manufacturer’s protocol.

The SNPs were identified by directly sequencing the amplified products. In the heterozyous individual, the double peaks in the sequencing chromatogram were observed at the SNP site.

Allelic expression analysis of LINC24065

The heterozyous individual was used to determine the allelic expression of LINC24065. New primer pairs (cF 5\(^{\prime }\)-TGGCTGGAAAGCTCTTCC-3\(^{\prime }\) and cR 5\(^{\prime }\)-AGTCTGTAGAGCCCAAGACCA-3\(^{\prime }\)) spanning four introns (introns 5, 6, 7 and 8) were used to amplify the sequence with the informative SNP site from cDNA samples. The same RT-PCR reaction as above was used to amplify the region containing SNPs. The amplified products were purified using a Gel Extraction kit (Omega, Monsanto, USA), and sent to Sanger sequencing.

The sequencing chromatograms at the heterozygous SNP site were compared between the cDNA and genomic DNA. Monoallelic expression was observed when the genomic DNA was heterozygous and the cDNA sequence was homozygous for the SNP.

Results

LINC24065 is a lincRNA

Here, we mapped a novel lncRNA, LINC24065, between MEG8 and MEG9 in the cattle DLK1–DIO3 imprinted domain. Using RACE and RT-PCR, we obtained four splice variants of LINC24065 (LINC24065-v1, LINC24065-v2, LINC24065-v3 and LINC24065-v4) from the adult brain tissue, and these sequences were submitted to GenBank (accession no. MG757567, MG757568, MG757569 and MG757570).

The amplified products of the four LINC24065 variants from v1 to v4 were 1687, 1766, 1058 and 1306 bp, respectively. Alignment of sequencing products to the assembled genomic sequence (Bos_taurus_UMD_3.1.1/bosTau8) revealed the presence of 18 exons in LINC24065, including two long exons, exon 1 (644 bp) and exon 12a (741bp), and 16 shorter exons, the length of which were between 9 and 90 bp. We found two alternative promoters in the 5\(^{\prime }\) end of the LINC24065 transcripts. One promoter generated two transcripts (LINC24065-v1 and LINC24065-v2) with a longer exon 1, and the other promoter led to two other transcripts (LINC24065-v3 and LINC24065-v4) initiating exon 2a. Compared with exon 2, a 27 bp region was missing at the 5\(^{\prime }\) end of exon 2a. The transcripts v1, v2 and v3 had the same 3\(^{\prime }\) end, but the 3\(^{\prime }\) end of transcript v4had the long exon 12a of 741 bp and did not contain exons 14–17 (figure 1).

Fig. 1
figure 1

Schematic representation of the DLK1–DIO3 imprinted domain on cattle chromosome 21 and genomic location of LINC24065. Arrows indicate the transcriptional directions. Maternal and paternal expression genes are depicted in red and blue, respectively. The organization of LINC24065 gene shows four alternatively spliced isoforms, including LINC24065-v1, LINC24065-v2, LINC24065-v3 and LINC24065-v4. A heterozygous C/T SNP (rs135792171) in exon 6 of LINC24065 in yellow bar used to allelic expression analysis. The exons of LINC24065 are clearly enlarged and marked 1–18 with black rectangles. The length of the exons from 1 to 18 are in turn: 644 bp, 64 bp / 27 bp (2a), 42 bp, 36 bp / 9 bp (4a), 59 bp, 84 bp, 53 bp, 71 bp, 59 bp, 84 bp, 77 bp, 52 bp / 741 bp (12a), 72 bp, 68 bp, 84 bp, 76 bp, 90 bp and 51 bp, respectively.

Multiple small open reading frames (ORFs) were detected in the four LINC24065 transcripts using ORF Finder (https://www.ncbi.nlm.nih.gov/orffinder/), and the longest one encoded 90 amino acids. None of the predicted ORFs had favourable Kozak contexts (ACCAUGG) that would initiate translation. When we compared these peptides via BLASTp against protein databases of NCBI, we did not identify any homologous peptides. Taken together, these results suggest that LINC24065 transcripts were lncRNAs.

Tissue expression profile of LINC24065

RT-PCR was conducted to analyse the tissues-specific expression of LINC24065 transcripts in eight adult cattle tissues, including heart, liver, spleen, lung, kidney, skeletal muscle, adipose and brain. Three primer pair combinations were used to evaluate the expression of LINC24065-v1/v2, LINC24065-v3 and LINC24065-v4, and their relative locations are shown in figure 1. LINC24065-v1 and LINC24065-v2 showed similar expression patterns, where expression was strongest in the brain and kidney, weaker in heart, liver, spleen, lung and muscle, and no detectable expression in adipose tissues. LINC24065-v3 and LINC24065-v4 were expressed in all eight tissues that were tested. LINC24065-v3 was highly expressed in the brain, kidney and lung, while LINC24065-v4 was most strongly expressed in the brain and kidney (figure 2).

Fig. 2
figure 2

Differential expression pattern of LINC24065 transcripts was detected in eight tissues of adult cattle. The lengths of RT-PCR products are 1687 bp for LINC24061-v1, 1766 bp for LINC24061-v2, 1766 bp for LINC24061-v3, 1306 bp for LINC24061-v4 and 375 bp for GAPDH. RT, negative control; the cDNA template was omitted.

LINC24065 is monoallelic expression in cattle

Since LINC24065 was located between two maternally expressed imprinted genes, MEG8 and MEG9, the allelic expression of LINC24065 was determined using a SNP-based sequencing method (figure 3). A 712 bp fragment was amplified from genomic DNA, and an informative SNP (rs135792171) was identified (figure 3b). A SNP was located in exon 6, and this SNP was present in the four splice variants. Three of the 32 cows that were screened were C/T heterozygotes, and the others were C/C homozygotes. We did not detect any T/T homozygotes in the individuals we analysed. We determined the SNP genotype in various tissues (heart, liver, spleen, lung, kidney, skeletal muscle, adipose and brain) of the three heterozygous individuals by amplifying the target region (248 bp) using RT-PCR (figure 3a). Comparison of the genotype at the SNP site (rs135792171) between genomic DNA and cDNA from the same individual indicated that LINC24065 was expressed in a monoallelic manner in tissues that were tested (figure 3c).

Fig. 3
figure 3

Allelic expression analysis of LINC24065. Arrows point to the SNP (C/T) site. (a) Expression of LINC24065 and GAPDH in eight tissues. The LINC24065 product of RT-PCR was 248 bp and that of GAPDH was 375 bp. H\(_{2}\)O, negative control. (b) SNP site of the LINC24065 in hybrid cattle. Sequence chromatogram of genomic DNA indicated a heterozygous SNP site (C/T). (c) Sequence chromatograms of cDNAs obtained from eight tissues of hybrid cattle. Comparison to the SNP site of genomic DNA, only one parental allele in all the eight tissues analysed, and LINC24065 is monoallelically expressed.

Discussion

LncRNAs are abundantly transcribed from the mammalian genome (Djebali et al. 2012; Iyer et al. 2015), and are important regulators of tissue physiology and pathological processes (Meseure et al. 2015; Sanchez et al. 2018). LncRNAs from imprinted regions employ diverse molecular mechanisms to regulate transcription across the imprinted clusters (Kanduri 2016). LincRNAs are a class of lncRNAs transcribed from the intergenic region between two genes. In this study, a lincRNA, LINC24065, was identified in the cattle DLK1–DIO3 domain.

The DLK1–DIO3 domain, one of the largest imprinted clusters, was associated with developmental disorders, respiratory disease and multiple malignancie (Enfield et al. 2016; Enterina et al. 2017). Maternally expressed lncRNA and miRNA from this cluster are associated with stem cell pluripotency in mice (Moradi et al. 2017). Meg8 and Meg9 are two maternally expressed lncRNAs in the DLK1–DIO3 domain, and two imprinted lincRNA transcripts, AK050713 and B830012L14Rik, are derived from the intergenic region of the mouse MEG8 and MEG9 (Hagan et al. 2009; Zhang et al. 2011). In this study, we identified a novel monoallelically expressed lincRNA, LINC24065, which was located between the cattle MEG8 and MEG9. We did not observe sequence homology between LINC24065 and the two mouse lincRNAs AK050713 and B830012L14Rik. Further sequence analysis of regions between MEG8 and MEG9 from human and mice also did not reveal homologues to LINC24065. However, evolutionary conservation always indicates the functionality of the particular regions, the lack of sequence conservation is dispensable for the function of lncRNAs (Deniz and Erman 2017). Like many lncRNAs (Ulitsky and Bartel 2013; Johnsson et al. 2014), LINC24065 lacked interspecies conservation.

The four isoforms of LINC24065 exhibited tissue-specific expression patterns, similar to LINC24061 (Zhang et al. 2017), another lincRNA between MEG8 and MEG9 that we previously identified. LINC24065 was mapped to about 80-kb upstream of LINC24061 (figure 1). Two splice variants of LINC24061 (KU870638 and KU870639) also exhibited monoallelic expression in adult bovine tissues, and LINC24061-v1 is expressed in heart, kidney and muscle, but LINC24061-v2 is expressed in all eight tissues that were analysed.

In conclusion, we mapped a novel lincRNA LINC24065 with monoallelic expression to the intergenic region of bovine MEG8 and MEG9. LINC24065 had four splice variants with tissue-specific expression patterns in adult bovine tissues. Although the biological function of LINC24065 has not been elucidated, these results provide a foundation for further studying the role of lincRNA in regulating genomic imprinting of the DLK1–DIO3 domain.