Introduction

In mammals, nearly 1% of protein-coding genes show parent-of-origin monoallelic expression due to the different epigenetic modifications inherited by the zygote from the sperm and the egg; these parent-dependent epigenetic marks are defined as genomic imprinting (Barlow and Bartolomei 2014). To date, there are more than 300 imprinted genes identified in mouse and human (http://igc.otago.ac.nz), which often occur in clusters. The Dlk1-Dio3 domain is a large imprinted cluster located on mouse chromosome 12q and human chromosome 14q32. This domain is a 1 Mb region flanked by the paternally expressed protein-coding gene Dlk1 (delta-like homolog 1) and Dio3 (the type III iodothyronine deiodinase), with an interior containing maternally expressed long noncoding RNAs (lncRNAs): Meg3/Gtl2, Meg8/Rain, Meg9/Mirg, Peg11as, Irm, and numerous microRNAs and small nuclear RNAs (snoRNAs) (Charlier et al. 2001) (Fig. 1). The expression of the maternally expressed noncoding RNAs in the Dlk1-Dio3 locus has been associated with the development potential of induced pluripotent cells (iPSCs) (Liu et al. 2010; Stadtfeld et al. 2010).

Fig. 1
figure 1

Genomic organization of the cattle Dlk1-Dio3 imprinted domain and genomic location of CB457990 and CB460301. The CB457990 and CB460301 are located between Meg8 and Meg9

In recent years, the international ENCODE (Encyclopedia of DNA Elements) and GENCODE project has uncovered over 98% of the human genome does not encode protein sequences, whilst at least 80% of genomic DNA is transcribed as noncoding RNAs (Harrow et al. 2012; Pennisi 2012). LncRNAs are non-protein-coding transcripts greater than 200 bp in length, or molecules longer than 2 kb with a coding potential of less than 100 amino acids. LncRNAs can be classified into antisense lncRNAs, intergenic lncRNAs, intronic lncRNAs, and enhancer lncRNAs. By chromatin signature analysis, over a thousand highly conserved large intergenic noncoding RNAs (lincRNAs) have been identified in the mouse (Guttman et al. 2009). Using public expressed sequence tag (EST) data from many developmental stages and tissues of the cattle, 23,060 cattle ncRNAs were predicted, with the majority (57%) of these ncRNAs being intergenic transcripts (Qu and Adelson 2012).

Recently, two novel intergenic long noncoding RNAs (AK044800 and B830012L14Rik) were identified between Meg8 and Meg9 genes in mouse Dlk1-Dio3 imprinted region (Han et al. 2013; Zhang et al. 2011). Previously, we have analyzed the gene structure and alternative splicing patterns of the cattle Meg8 and Meg9 genes (Hou et al. 2011; Zhang et al. 2014). Searching candidate imprinted genes in the cattle Dlk1-Dio3 domain in the NCBI database and UCSC Genome Browser revealed two EST sequences, CB457990 and CB460301, located 115 bp apart in the region between Meg8 and Meg9 (Fig. 1). The two EST sequences were determined to be part of the same transcript, and named LINC24061 according to the GENCODE annotated bibliography. The aim of the present work was to first obtain the full-length cDNA sequence of LINC24061, then analyze the expression of alternative splicing variants and determine the imprinting status of LINC24061.

Materials and methods

Animals and tissues

Samples from heart, liver, spleen, lung, kidney, skeletal muscle, subcutaneous fat, and brain from 32 dairy cattle (Holstein) were collected from a local abattoir and frozen at −70 °C for further analysis. Protocols involving the use of animals were approved by the Agriculture Research Animal Care Committee of the Agricultural University of Hebei.

RNA isolation and cDNA synthesis

Total RNA was extracted from tissue samples of heterozygous individuals stored at −70 °C using Trizol Reagent (Invitrogen, Carlsbad, CA, USA) according to manufacturer’s instructions. First-strand cDNA was synthesized using 20 μL reaction volume, it containing 1 μL total RNA, 1 μL oligo (dT)18 (0.5 μg/μL, Sangon, Shanghai, China), 1 μL dNTP (10 mM each), 4 μL 5 × M-MLV buffer, 0.5 μL (200 units) M-MLV reverse transcriptase (Promega, Madison, WI, USA), 0.5 μL RNase inhibitor (40 units/μL) and 12 μL RNase-free H2O, with incubation at 42 °C for 60 min.

Cloning of LINC24061

A 1050-bp LINC24061 cDNA sequence was first obtained by RT-PCR with primers LINC24061-F1 (5′-TCTAAATACTTGCCCGAG-3′) and LINC24061-R1 (5′-AGAGTTACAGAACCCGTG-3′) designed according to the EST sequences CB457990 and CB460301, respectively. RT-PCR was performed in a 25 μL volume containing 1 μL of first-strand cDNA (50 ng/μL), 12.5 μL 2 × Es Taq MasterMix (CWBIO, Beijing, China), 0.5 μL of forward and 0.5 μL of reverse primers (10 mM), 10.5 μL of ddH2O and using the following temperature cycle: 94 °C for 5 min, 30 cycles of 94 °C for 30 s, 55 °C for 30 s, 72 °C for 45 s and final extension at 72 °C for 10 min. The PCR product was separated on 1.5% agarose gel, purified with an E.Z.N.A.GelR Extraction Kit (Omega, Monsanto, USA), and sequenced (BGI, Beijing, China).

The 5′ RACE (rapid amplification of cDNA ends) and 3′ RACE reaction was performed to obtain the 5′ and 3′ ends of the LINC24061 cDNA sequence, respectively, using the SMARTer™ RACE cDNA Amplification Kit. The RACE reaction was performed by nested PCR according to the manufacturer’s protocol. Gene-specific primers were as follows: 5′ GSP1 (5′-ATGGAGCAGCATCTACAAAGTTCCGAGG-3′) and 5′ GSP2 (5′-GGTAGGACTTCCAGGTCAACAGATACGATG-3′), 3′ GSP1 (5′-TGCTTCTGGGGATTCTGGCTTTTCTAA-3′) and 3′ GSP2 (5′-CATGCTGGCAAAGTCACGGTGGGGGAAC-3′). The amplified products were purified using an E.Z.N.A.GelR Extraction Kit (Omega, USA), cloned into PMD18-T (TaKaRa, Shanghai, China), and sequenced (BGI, China).

RT-PCR

RT-PCR was performed to detect the expression patterns of the two transcripts of LINC24061 in different tissues, including heart, liver, spleen, lung, kidney, skeletal muscle, subcutaneous fat, and brain. Two forward primers and one reverse primer were as follows: LINC24061-F2 (5′-TGGCATCCGTGTCACT-3′) on exon 1, LINC24061-F3 (5′-CTCCAGGGAAGACACACT-3′) on exon 2 and LINC24061-R2 (5′-GCCCTGATGGTTACTTCTGGG-3′). An amplified fragment of the GAPDH (GenBank accession no.BTU85042) gene was used as an internal control with primers GAPDH-F (5′-GCACAGTCAAGGCAGAGAAC-3′) and GAPDH-R (5′-GGTGGCAGTGATGGCGTGGA-3′).

Strand-specific RT-PCR (ssRT-PCR) of LINC24061

To detected the LINC24061 mRNA is transcribed in sense or in antisense, samples of eight tissues (heart, liver, spleen, lung, kidney, skeletal muscle, subcutaneous fat, and brain) were performed strand-specific reverse transcription using using primer LINC24061-F4 (5′-AAGTAACCATCAGGGCT-3′) or LINC24061-R1, which were used to detect antisense and sense strand transcripts, respectively. A 721 bp PCR product was amplified using LINC24061-F4 and LINC24061-R3 (5′-GCATCTACAAAGTCCGA-3′) in a 25 μL volume.

DNA extraction and SNP identification

Genomic DNA was extracted from liver tissue of 32 adult Holstein cattle with a DNA Extraction kit (Sangon, Shanghai,China) according to the manufacturer’s instruction. The SNP site was identified by sequencing the PCR products directly. A 1050 bp fragment was amplified using the primers LINC24061-F1 and LINC24061-R1, which are located on a commonly shared exon 2 sequence of both transcripts. PCR was performed in a 25 μL volume containing: 1 μL Genomic DNA (100 ng/μL), 0.5 μL forward and reverse primers (10 μM/μL), 2 μL dNTP (2.5 nm/μL), 2.5 μL 10 × Taq Buffer, 0.3 μL Taq DNA polymerase (5 U/μL) and 18.2 μL ddH2O. The PCR procedure was as follows: 94 °C for 5 min, 35 cycles of 94 °C for 30 s, 55 °C for 30 s, 72 °C for 45 s and final extension at 72 °C for 10 min. The amplified products were purified using an E.Z.N.A.GelR Extraction Kit (Omega, monsanto, USA) and sequenced (BGI, China). The SNP site was identified by observing the double peaks in the sequencing chromatogram.

Allele-specific expression of LINC24061

The heterozygous animals with the identified SNP site were used to analyze the allele-specific expression of LINC24061. Total RNA prepared from the tissues of heterozygous animals was reverse transcribed into cDNA. RT-PCR was then performed using the primers LINC24061-F1 and LINC24061-R1, and the reaction volume and cycling conditions were the same as that of the SNP identification. The amplified products were purified and sequenced directly.

Results

Cloning and structure analysis of LINC24061 gene

The RT-PCR and RACE were used to determine the full-length cDNA sequences of LINC24061. A 1050 bp product containing partial sequence of both CB457990 and CB460301 was first obtained by RT-PCR using primers LINC24061-F1 and LINC24061-R1, indicating that the two EST sequences belong to the same transcript. The sequencing results of RACE indicated that two transcripts exist in the 5′ end and one in the 3′ end. The two splice variants of LINC24061 have been submitted to GenBank (KU870638 and KU870639). The LINC24061 molecular structure and two splice variants are shown in Fig. 2a. LINC24061-v1 has 2 exons, but LINC24061-v2 lacks exon 1 and only has a shorter exon 2 with 181 bp missing at the 5′ end. The open reading frames (ORFs) were detected using the Open Reading Frame Finder (www.ncbi.nlm.nih.gov/gorf.html), and multiple small ORFs were found in the two transcripts of LINC24061. However, none of the ATG start codons were consistent with the Kozak consensus sequence (ACCAUGG), suggesting that LINC24061 is a noncoding RNA gene (Fig. 2b). The results of ssRT-PCR assay indicated that LINC24061 is only transcribed in senses in the cattle tissues (Fig. 2c).

Fig. 2
figure 2

a Gene structure, splice variants of the cattle LINC24061. Exons are indicted by white boxes and straight lines between exons indicated introns. Boxes with different patterns stand for alternative exons. The red arrow indicates that the LINC24061 is transcribed in sense orientation, and black ones stand for the orientation of tissue-specific PCR primers. b Sequence analysis of LINC24061-v1 (a) and LINC24061-v2 (b) by ORF Finder in both sense and anti-sense showed it lacked long (>100 amino acids) ORFS. c Strand transcripts analysis by Strand-specific RT-PCR in eight tissues, only the transcripts in sense were detected. A 721 bp fragments of LINC24061 amplified were both from cDNA and genomic DNA. Marker (M), Heart (H), Live (Li), Spleen (S), Lung (Lu), Kidney (K), Muscle (Mu), Fat (F), Brain (B), Genomic (G)

Expression profile of LINC24061

The expression patterns of the two splice variants of LINC24061 in eight tissues were determined using RT-PCR, including heart, liver, spleen, lung, kidney, skeletal muscle, subcutaneous fat, and brain (Fig. 3). Two forward primers (LINC24061-F2 and LINC24061-F3) and one reverse primer LINC24061-R2 were designed according to the two splice variants of the cattle LINC24061. The locations of the primers were shown in Fig. 2a. A 1123 bp fragment of LINC24061-v1 and a 598 bp fragment of LINC24061-v2 were obtained using RT-PCR. The LINC24061-v1 splice variant was expressed in only three types of tissues: heart, kidney, and muscle, whereas LINC24061-v2 was expressed in all eight tissues examined.

Fig. 3
figure 3

Expression patterns of LINC24061-v1 and LINC24061-v2 in eight tissues analyzed by RT-PCR. The cDNA amplified fragment size of LINC24061-v1, LINC24061-v2 and GAPDH were 1123, 598 and 375 bp. And the amplified sizes of genomic DNA were 1945, 598 and 873 bp. Marker (M), Heart (H), Live (Li), Spleen (S), Lung (Lu), Kidney (K), Muscle (Mu), Fat (F), Brain (B), Genomic DNA (G)

Identification of SNP

The SNP site was identified by direct sequencing of the PCR products. A 1050 bp fragment of LINC24061 was amplified from genomic DNA using the primers LINC24061-F1 and LINC24061-R1 (Fig. 4b). An A/C SNP was determined at nucleotide 1512 of LINC24061-v1 (accession number: KU870638). The sequencing results of three genotypes, heterozygous (A/C) and homozygous (A/A) and (C/C), were shown in Fig. 4b. In 32 cattle, six individuals were the A/C heterozygous genotype, three were the A/A homozygous genotype, and 23 were the C/C homozygous genotype.

Fig. 4
figure 4

Allele-specific expression analysis of LINC24061 by direct sequencing of RT-PCR products. a Relative expression of LINC24061 and GAPGH in eight tissues by RT-PCR. b Sequence chromatograms of gDNA obtained from the heterozygous cattle with A/C, homozygous cattle with C/C and with A/A. The arrow points to the c. 1512 A > C SNP. c Representative sequence chromatograms of cDNA obtained from eight tissues of heterozygous cattle. Arrows point to the c. 1512 A > C SNP. Compared with the sequence chromatograms of DNA at the c. 1512 A > C SNP, only 1 parental allele (C) was expressed. Marker (M), Heart (H), Live (Li), Spleen (S), Lung (Lu), Kidney (K), Muscle (Mu), Fat (F), Brain (B), Genomic (G)

Allele-specific expression analysis of LINC24061 in the cattle

The allele-specific expression analysis of LINC24061 was investigated by comparing the base of the heterozygosis site from the sequencing chromatograms between products of the genomic DNA PCR and RT-PCR from the same heterozygous animal. The sequencing results of RT-PCR products obtained from heart, liver, spleen, lung, kidney, skeletal muscle, subcutaneous fat, and brain tissues revealed that only one parental allele (C) was expressed at the A/C SNP locus (Fig. 4c), suggesting that LINC24061 is imprinted in cattle.

Discussion

Long noncoding RNAs (lncRNAs) are transcribed across most of the mammalian genome (Carninci et al. 2005), and the field of lncRNA research has been rapidly advancing in recent years. The ultra RNA deep sequencing analysis revealed that the number of lncRNA expressed genes is more than double of protein-coding genes in human (Iyer et al. 2015). LncRNAs can be classified into intergenic lncRNAs, antisense lncRNAs, intronic lncRNAs, and enhancer lncRNAs based on their location. Most lncRNAs play a functional role in genomic imprinting, with the exception of intronic lncRNAs (Kanduri 2016). In this study, we identified LINC24061 as a novel imprinted lncRNA in the Dlk1-Dio3 domain.

Alternative splicing is a normal phenomenon in eukaryotes, and can greatly add to the biodiversity without increasing the number of genes encoded by a genome (Black 2003). Approximately 95% of multi-exonic genes are alternatively spliced in humans (Pan et al. 2008). Previously, we obtained the alternative splices of three lncRNAs in the cattle Dlk1-Dio3 domain: six splice variants of Gtl2 gene (Su et al. 2011), 12 splice variants of Meg8 (Hou et al. 2011), and three splice variants of Meg9 (Zhang et al. 2014). In our analysis of LINC24061, two splice variants (LINC24061-v1 and LINC24061-v2) were obtained using RT-PCR and RACE method. Although multiple small ORFs were encoded by the two splice variants, neither contained a Kozak sequence for translation initiation, suggesting that LINC24061 plays a role as a noncoding RNA.

In this study, LINC24061 was identified a ncRNA with monoallelic expression in tissues of adult cattle. LncRNAs often show tissue- and cell-specific expression (Dinger et al. 2008; Mercer et al. 2008; Ravasi et al. 2006). In mouse, the intergenic lncRNA B830012L14Rik was primarily expressed in brain, heart, lung, and liver at embryonic day 15.5 (Zhang et al. 2011). In this study, the expression of an LINC24061-v1 variant was detected in three tissues, heart, kidney, and muscle, whereas LINC24061-v2 was detected in all eight tissues examined, including heart, liver, spleen, lung, kidney, skeletal muscle, subcutaneous fat, and brain. These expression patterns were similar to that of Meg9, with three splice variants showing tissue-specific expression (Zhang et al. 2014).

The function of LINC24061 is unknown, but several lines of evidence provide hints to its potential role. The potential promoter of LINC24061 was predicted using PromoterScan software. Analysis of the potential promoter region of LINC24061 revealed that two DNA elements, Bov-A2 and La2, were observed in the 5 kb upstream sequence of LINC24061. The Bov-A2 is a retroposon, that is one of the most common short interspersed nucleotide elements (SINEs) among the genomes of ruminants, and is generally present in the noncoding regions of several genes preferentially expressed during the cellular response to environmental stress or activation signals (Damiani et al. 2008). The DNA element, L2a, contains binding sites for two MAR (matrix-associated region)-interacting proteins (SATB1 and CDP), and functions as a silencer of CD8 gene encoding an important T cell co-receptor in mouse (Yao et al. 2010). Therefore, LINC24061 may play a role in the cell growth and differentiation.