Introduction

Goat milk has been recognized as beneficial for human nutrition (Finocchiaro and van Kaam 2004) and possesses medicinal properties (Beppu et al. 2006; Haenlein 2004; Pellerin 2001).The nutritional value of goat milk is mainly attributable to the fat and protein fractions secreted from mammary glands. Previous studies demonstrated that the biochemical pathways in mammary glands related to biosynthesis and secretion of lipid, lactose, and proteins are regulated by complex gene networks (Andres and Djonov 2010; Rudolph et al. 2007b). Improving goat milk quality through the alteration of milk composition to maximally benefit human health is one of the major goals of goat farming. Despite the nutritional and biological importance of goat milk, breeding and genetics studies have been hindered by the lack of transcriptome-wide information from mammary glands.

Next-generation sequencing technologies provide a unique opportunity for functional genomic research, e.g., gene expression profiling, genome annotation, and profiling and detection of aberrant transcription (Mardis 2008; Metzker 2010). As one approach, RNA sequencing (RNA-seq) is powerful for mapping and quantifying transcriptomes developed to analyze global gene expression (Hitzemann et al. 2013; Marston et al. 2013). Taking advantage of this approach, the transcriptome profiling of human milk (Lemay et al. 2013b), cow milk (Wickramasinghe et al. 2011, 2012), and mammary glands (Finucane et al. 2008; Swanson et al. 2009) has been performed and helped uncover global insights on gene expression during lactation.

From a functional standpoint, however, the composition of fat, lactose, and fatty acids differs between cow and goat milk (Wilken-Jensen 1984), suggesting that there may be substantial and important differences in regulatory gene networks between these species. Thus, data on cow or sheep mammary transcript could not be used as a reference for goat mammary metabolism. We believe that a first step toward improving the yield and composition of goat milk must be based on a thorough understanding of transcriptome-wide expression of the goat mammary gland. Even though some work has been performed to study the genetic information in goat mammary glands (Brenaut et al. 2012; Faucon et al. 2009; Genini et al. 2011; LeProvost et al. 1996; Leroux et al. 2003; Ollier et al. 2007, 2008), a comprehensive dataset on gene profiles is not available.

We utilized a de novo RNA-seq approach using Illumina (Illumina, San Diego, CA, USA) to better understand the biochemical processes in the goat mammary gland. Mammary tissue from four physiological stages was pooled to maximize the chance of revealing as many genes as possible. A total of 98,864 transcripts were obtained and assembled into 51,361 unique sequences. Genes related to fatty acid metabolism and protein synthesis were identified.

Materials and methods

Goat mammary gland tissue collection and RNA extraction

All animal collection and utility protocols were approved by the Animal Care and Use Committee of the Northwest A&F University. Xinong Saanen dairy goats (Lin et al. 2013) from the experimental farm of Northwest Agricultural University, Shaanxi, China, were used in the study. Mammary gland tissue from goats (n = 3 biological replicates from each stage) at 270 days of age (non-pregnant), peak lactation (100 days postpartum; three-year old animals), late-lactation (310 days postpartum; 3-year-old animals) and the non-lactating period (multiparous, non-lactation and non-pregnant period; 3-year-old animals) were collected by a professional veterinarian after slaughter. All the tissue samples were obtained under sterile conditions, harvested within 20 min of slaughter, and immediately frozen in liquid nitrogen until RNA extraction.

Total RNA extraction, messenger RNA (mRNA) purification, and complementary DNA (cDNA) library construction were conducted by LC Sciences (Houston, TX, USA). In brief, total RNA was obtained from mammary tissue using the total RNA purification kit (LC Sciences, Houston , TX, USA ) as instructed, treated with RNase-free DNase, and re-purified with the RNAeasy kit (Qiagen, Valencia, CA, USA) following the manufacturer’s protocol. The total RNA quantity and purity were analyzed by Bioanalyzer 2100 and RNA 6000 Nano Lab Chip Kit (Agilent, CA, USA). The total RNA was extracted for both RNA-seq and quantitative real-time PCR (qPCR).

cDNA library construction and Illumina sequencing

Total RNA was mixed together in equal quantities (1 μg mRNA from each stage). Subsequently, mRNA was isolated with poly-T-oligo-attached magnetic beads (Invitrogen, USA). Following purification, the mRNA was fragmented into small pieces using divalent cations at high temperature. Then, the cleaved RNA fragments were reverse transcribed to create the final cDNA library, purified by AMPure XP beads in accordance with the protocol for the mRNA-seq sample preparation kit (Illumina, San Diego, USA). The average insert size for the paired-end libraries was 200 bp (±20 bp). Paired-end sequencing was then performed on an Illumina HiSeq 2500 equipment following the vendor’s recommended protocol after quality identification.

Data filtering and de novo assembly

The raw reads were first filtered by removing the adapter sequences and potential contaminations, which are reads with unknown base greater than 5 % and also low-quality sequences (<Q30) (Lohse et al. 2012). De novo assembly of clean reads was carried out using Trinity software (Grabherr et al. 2011). A final assembly was generated after removing contigs shorter than 200 bp. In order to exclude the interference from alternative splicing of transcripts, we first clustered all transcripts that matched the same reference gene; then, we removed redundant transcripts and only preserved the longest transcript of each cluster to represent a unique gene. Only unigenes were considered in the following bioinformatics analysis. Partial and complete open reading frames (ORFs) of unigenes were predicted using the transdecoder script available in the Trinity package.

Gene annotation and classification

We first constructed a reference dataset for gene annotation without reference genome information. Assembled unigenes were annotated to the reference dataset using BLAST. Those data included Annotated and Reviewed Protein Sequence Database (SWISS-PROT), NCBI non-redundant protein sequence database (NR), Pfam database, and Kyoto Encyclopedia of Genes and Genomes (KEGG) under an E-value cutoff of 1E-10. Gene ontology (GO) categories (Balakrishnan et al. 2013) and KEGG (Kanehisa and Goto 2000) were performed for gene annotation. KEGG and eukaryotic Ortholog Groups of Protein (KOG) classification (Moon et al. 2008) was performed using the Blast 2 GO pipelines with the default parameters. GO was used to classify the functions of unigenes.

Abundance analysis of gene expression

Bowtie 0.12.8 was used for abundance analysis of gene expression. The mapping method of single end was used for read analysis; one read was allowed to be compared to multiple genes. Reads per kilobase of exon model per million mapped reads (RPKM) was used for measuring gene expression.

Quantitative real-time PCR

According to the manufacturer’s instructions, the cDNA was synthesized from 0.5 μg of total RNA using the PrimeScript™ RT kit with gDNA Eraser (Takara, Japan) which removes genomic DNA contamination. The qPCR was performed according to manufacturer’s instructions using SYBR green (SYBR® Premix Ex Taq™ II, Perfect Real Time, Takara, Japan). The sequences for the primers of fatty acid synthase (FASN), stearoyl-CoA desaturase (SCD), fatty acid binding protein 3 (FABP3), perilipin 2 (PLIN2),beta-casein (CSN2), alpha-S1-casein (CSN1S1), alpha-S2-casein (CSN1S2), kappa-casein (CSN3), beta-lactoglobulin (BLG), and alpha-lactalbumin (LALBA) were described previously (Faucon et al. 2009; Shi et al. 2013b). Ribosomal protein S15 (RPS15), ribosomal protein S9 (RPS9), and ubiquitously expressed transcript isoform 2 (UXT) were chosen as the three internal control genes, and their sequences were reported previously (Bionaz and Loor 2007). Three independent biological replicates for each sample were analyzed by qPCR. The expression of selected unigenes was normalized using the geometric mean of the three internal control genes (Bionaz and Loor 2008a). PCR-normalized data are reported as n-fold change relative to the non-lactating mammary sample. The relative gene expression was calculated using the 2-ΔΔC t method (Zhou et al. 2012). Significance for RNA expression between lactation stages was determined by one-way ANOVA. Significance was declared at P < 0.05.

Results

Illumina sequencing and de novo assembly

Illumina sequencing of dairy goat mammary tissue yielded a total of 251,176,146 clean reads with 100 % valid data. Using de novo assembly, we integrated the sequence overlaps and eliminated redundancies. The final assembly included 98,864 transcripts with an N50 length of 2538 bps and a mean length of 1438 bps, where 51,361 unigenes were included with an N50 length of 2281 bps and a mean length of 1219 bps. The sequencing information is presented in Table 1, and the length distribution of unigenes is shown in Fig. 1. All sequence reads are deposited at NCBI (accession number SPR040710). All the final assembly unigene sequences are presented as supporting information (Supporting files 1 and 2).

Table 1 Summary of assembled transcripts and unigenes of dairy goat mammary glands
Fig. 1
figure 1

Length distribution of unigenes in base pairs. The numbers of unigenes are shown on top of each bar

Unigene annotation

To study the sequence conservation among species, we used BLAST (Zhou et al. 2012) to align unigenes to the NR of the NCBI using an E value threshold of 1E−10. Of all unigenes, 30,853 genes (60.07 %) had BLAST hits in NR. Blasted with NR, the majority of annotated dairy goat mammary gland transcriptome corresponded to sequences of Ovi saries with a match of 31.2 %, followed by Bos tarurus (15.7 %), Bos grunniens (11.3 %), Homo sapiens (2.9 %), Orcinus orca (2.6 %), and Ceratotherium simum (2.4 %) (Fig. 2). A total of 29,270 (56.99 %), 28,580 (55.65 %), 27,279 (53.11 %), and 16,846 (32.8 %) unigenes were annotated to KEGG, SWISS-PROT, KOG, and Pfam, respectively (Fig. 3).

Fig. 2
figure 2

Species distribution of the top BLAST hits

Fig. 3
figure 3

Venn map of unigenes distributed in SWISS-PROT, NCBI non-redundant protein sequences database (NR), Pfam, and Kyoto Encyclopedia of Genes and Genomes (KEGG)

GO annotation

GO is widely used to standardize representation of genes across species and provides a set of structured and controlled vocabularies for annotating genes, gene products, and sequences (Blake et al. 2013). In total, 28,690 unique transcripts were assigned to 60 level 2 GO terms, which were summarized under three main GO categories, including cellular component, molecular function, and biological process (Fig. 4). Within the GO category of cellular components, 20 level 2 categories were identified, and the extracellular region, mitochondrion, nucleoplasm, nucleolus, and endoplasmic reticulum membrane were the most abundant. Within the GO category of molecular function, 20 level 2 categories were identified, and the metal ion binding was the most abundant. For biological process function, 20 level 2 categories were identified, and the gene number among terms was not significantly different. The top two terms, in terms of gene number, were related to apoptosis and protein transport.

Fig. 4
figure 4

Gene ontology (GO) classifications of goat mammary gland unigenes. Distribution of the GO categories assigned to the goat mammary gland transcriptome. Unigenes were classified into three categories: cellular components, molecular functions, and biological processes

KOG classification

According to the results of KOG comparison, 31,657 unigenes of dairy goat mammary glands were annotated to 25 classifications (Fig. 5). The genes related to general function prediction took the largest part, which was followed by signal transduction mechanisms, and the genes related to cell motility take the least part (Fig. 5). Among these classifications, 582 unigenes were annotated to the lipid transport and metabolism category, while 1283 unigenes were annotated to the intracellular trafficking, secretion, and vesicular transport category (Fig. 5). Tables 2 and 3 give the detail of top 15 genes related to the two categories. As shown in Table 2, FABP3, FASN, and SCD are three top genes related to lipid transport and metabolism, with the RPKM = 300.133, 276.273, and 263.212, respectively. Translocon-associated protein subunit delta had the highest expression among genes related to trafficking, secretion, and vesicular transport in goat mammary gland (RPKM = 240.674) (Table 3). Lipid droplet formation protein PLIN2 is related to trafficking and secretion of lipid (RPKM = 170).

Fig. 5
figure 5

euKaryotic Ortholog Groups of Protein (KOG) classification of goat mammary gland unigenes. Unigenes of dairy goat mammary glands were annotated to 25 classifications. The numbers of unigenes are shown on top of each bar

Table 2 Top 15 genes in lipid transport and metabolism category of KOG
Table 3 Top 15 genes in the intracellular trafficking, secretion, and vesicular transport category of KOG

KEGG analysis

Potential biological pathways represented in goat mammary gland transcriptome were identified by KEGG (Kanehisa and Goto 2000; Ogata et al. 1999). A total of 29,270 unigenes were assigned to 128 KEGG pathways. These annotations provide a valuable resource for investigating specific processes, functions, and pathways in goat mammary gland research. Among these pathways, about 48 and 32 genes were annotated to fatty acid metabolism and galactose metabolism pathways, respectively. Table 4 contains the top 15 genes related fatty acid metabolism signal pathway in KEGG, respectively. Aldehyde dehydrogenase 2 (ALDH2), acetyl-CoA acetyltransferase (ACAT), and long-chain acyl-CoA synthetase isoform 1 (ACSL1) are the top three expressed genes in the fatty acid metabolism pathway.

Table 4 Top 15 genes related to the fatty acid metabolism signal pathway in KEGG

The top 15 genes related to lactose metabolism are presented in Table 5, and lactose synthase B protein (LALBA), beta-1, 4-galactosyltransferase I (B4GT1), and UTP—glucose-1-phosphate uridylyltransferase (UGPA) were the top three genes in this pathway (Table 5). LALBA was the highest-expressed gene in this pathway (RPKM = 14,059.643). Unigenes corresponding to the abundant milk proteins CSN2, CSN3, CSN1S1, CSN1S2, LALBA, and LGB were the most abundant in goat mammary gland transcriptome. These unigenes accounted for 42 % of all mRNA transcripts in the samples. CSN2 was the highest-expressed gene in the entire dataset (RPKM = 84,829), followed by CSN1S1, LGB, CSN3, CSN1S2, and LALBA.

Table 5 Top 15 genes related to the lactose metabolism signal pathway in KEGG

Expression of genes encoding enzymes of milk fat metabolism

Compared with the non-lactating period, the mRNA expression of FABP3, FASN, and SCD at 100 and 310 days postpartum increased significantly. The fold change of FABP3 expression was far greater than that of FASN and SCD. Relative to the non-lactating period, the expression of FASN and FABP3 at 270 days of age decreased while that of SCD increased. Relative to the non-lactating period, the mRNA expression of PLIN2 was almost undetectable at 270 days of age and increased significantly at 100 and 310 days postpartum (Fig. 6a).

Fig. 6
figure 6

Expression changes in genes related to lipid metabolism, lipid droplet trafficking, protein biosynthesis, and lactose biosynthesis among different periods. a Genes related to lipid metabolism (FASN, FABP3, and SCD) and lipid droplet trafficking (PLIN2). b Genes related to lactose biosynthesis (LALBA) and protein synthesis (CSN1S1, CSN1S2, CNS2, CNS3, and BLG). Values are means ± SEM. qPCR data were expressed relative to the non-lactating period. The different superscripts denote significant (P < 0.05) differences in expression among periods

Expression of genes encoding caseins, whey proteins, and enzymes in lactose synthesis

Genes encoding whey proteins (LALBA and BLG) and caseins (CSN1S1, CSN1S2, CSN2, and CSN3) were markedly upregulated during lactation compared with the non-lactating period. Relative to 100 days postpartum, the gene expression for CSN1S1, CSN1S2, and LALBA was decreased at 310 day postpartum, while there was no significant change for BLG, CSN2, and CSN3. Compared with the non-lactating period, the fold change for CSN1S1, CSN1S2, and LALBA during lactation was more than 1000 (Fig. 6b).

Discussion

In modern civilized society, milk and dairy products from ruminants occupy an important role in the human daily diet. Goat milk, rich in unsaturated fatty acids and short and medium-chain fatty acids, is an important source of these nutrients for humans. Further, studies demonstrated that goat milk has medicinal properties for patients with intestinal disorders (Beppu et al. 2006; Haenlein 2004; Pellerin 2001). Exploring the gene profiling of goat mammary glands will help understand the molecular mechanism of its mammary metabolism and secretion.

Dong et al. were the first to publish genome information of goat including the transcriptome from 10 tissues, e.g., the brain, kidney, lung, muscle, spleen, bladder, heart, liver, lymph, and ovary (Dong et al. 2013). However, to date, there is still no global information about goat mammary gland transcriptome, which hinders studies of dairy goat metabolism particularly during lactation. In this study, we explored the transcriptome profile of the dairy goat mammary gland using de novo RNA sequencing, supporting 51,361 unigenes.

Most mammary gland transcriptomic studies in bovine sequenced the pooled cDNA samples only from peak lactation period or assembled transcriptomic data using sequencing reads from different lactation periods (Finucane et al. 2008). Although accurate transcriptomics data across different lactation stages can be obtained using this strategy, the entire transcriptome of mammary glands can only be evaluated during various stages of lactation, involution, and the period when the animal is not lactating (Khokha and Werb 2011). On account of this, we sequenced mRNA pools from four different stages of goat mammary gland to obtain transcriptome information covering growth and development and also lactation periods. An N50 length is commonly used for assembly evaluation, and a higher number suggests high-quality assembly (Lander et al. 2001). The N50 length of our assembly is higher than other ruminant transcriptome data with a higher overall number of unique transcripts (Wickramasinghe et al. 2011, 2012). The transcriptome in our study will provide a comprehensive reference dataset of gene expression profiling for future goat mammary gland research.

Even though most of unigenes (58.2 %) were annotated to ruminants (Fig. 2), there were still many unigenes annotated to Homo sapiens, Orcinus orca, Ceratotherium simum, and others. In total, 48 species were represented including Mus musculus (1.6 %) and Capra hircus (0.6 %), etc. (Supporting file 5). Various reasons could account for this effect, for instance, deficiency of assembly technology. In this study, we performed de novo sequencing and assembled transcripts without relying on any specific genome. This would cause some sequences not being assembled as accurately as possible. Annotation of unigenes to other species has been reported in previous work similar to ours (Rismani-Yazdi et al. 2011; Zhou et al. 2012). It also could be possible that these are new unigenes to ruminants. Some of them are only described in the human transcriptome, which is better defined. Whether or how these unigenes play a role in the goat mammary gland is currently unknown. Future work will have to address this issue.

Among mammalian species, the lactating mammary gland is the most lipogenic organ of the body (Chong et al. 2011). Lipids in milk supply the majority of the calories required for neonatal growth in many species (Oftedal 2000) and are a primary source of the essential fatty acids needed for neonatal membrane synthesis and synthesis of eicosanoids and other bioactive lipid signaling molecules (Koletzko and Rodriguez-Palmero 1999). The complex gene networks related to lipid metabolism in mammary glands have been investigated in rats (Rudolph et al. 2007a) and bovines (Bionaz and Loor 2008a; Kadegowda et al. 2009). In our data, about 582 genes related to lipid metabolism were identified within KOG categories. Among these genes, FABP3 had the highest expression, which agrees with previous data demonstrating that FABP3 is one of the most abundant isoforms in bovine mammary tissue (Bionaz and Loor 2008b). In addition, fatty acid binding protein 4 (FABP4), another member of the FABP family, was also highly expressed in goat mammary gland tissue (Table 2). The biological function of FABPs is to transport fatty acids from the plasma membrane to the sites of triacylglycerol or phospholipid synthesis (Calvo et al. 2004; McArthur et al. 1999). The high expression of FABP3 and FABP4 indicates their essentiality for uptake and trafficking of fatty acids by mammary gland tissue.

About half of the total fatty acids in bovine milk are synthesized within the mammary cells (James 2012). The enzyme FASN, a rate-limited enzyme for fatty acid de novo biosynthesis during lactation (Bionaz and Loor 2008a), also had a high expression in mammary glands (RPKM = 276.273). The high expression of FASN could partly explain why goat milk is rich in short and medium-chain fatty acids (Morris et al. 2007; Silanikove et al. 2010). The enzyme SCD is an endoplasmic-reticulum-bound enzyme that catalyzes the desaturation of saturated fatty acyl-CoAs (Miyazaki and Ntambi 2003), which can directly affect the composition of unsaturated fatty acids in milk (Macciotta et al. 2008). Following FABP3 and FASN, SCD was also highly expressed, which further supports our previous hypothesis that SCD plays an important role in goat mammary glands (Shi et al. 2013a). Even though the expression of these three genes was high in the pooled sample, the qPCR data revealed that they had the highest expression in mammary glands during lactation. These results agree with previous data (Faucon et al. 2009).

The high expression of FASN, ADRP, FABP3, and SCD at 310 days postpartum was unexpected because previous data reported that genes related to lipid metabolism in late lactation were downregulated in cow mammary glands (Bionaz and Loor 2008a). But, they were likely to agree with the data in pregnant goat mammary glands (Faucon et al. 2009). This may be due to cell proliferation during pregnancy (Norgaard et al. 2008), which requires the synthesis of many membranous lipids composed of long-chain fatty acids (Mellenberger et al. 2009). Lipid synthesis activity observed in pregnant goat mammary glands would not be dedicated to milk synthesis and secretion but rather to the cell proliferation associated with mammary gland development (Faucon et al. 2009). On the other hand, the sampling is also a potential factor contributing to these results. Even when we invited a professional veterinarian to perform a dissection of the mammary gland, the lobulo-alveolar epithelium develops within a stromal compartment composed of multiple cell types, e.g., epithelia, adipocytes, and stromal cells (Capuco et al. 2003; Faucon et al. 2009). Thus, molecular cross-talk exists among these cell compartments, and gene expression measurements in such heterogeneous tissues make it difficult to identify the contributions of specific cell types using mammary tissue. The latest study reported that RNA extracted from macaque milk fat and milk cell fractions more accurately represented RNA from mammary epithelial cells than did RNA from whole mammary tissue (Lemay et al. 2013a). Isolated pure or enriched populations of goat mammary epithelial cells, e.g., laser capture microdissection (Bevilacqua et al. 2010), will further improve the quality of transcripts and representativeness of MEC contributing to the milk secretion in the future.

During lactation, fatty acids are esterified into triacylglycerol within mammary cells. Among the genes related to lipid metabolism, we observed that genes related to triacylglycerol biosynthesis also had high expression, e.g., glycerol-3-phosphate acyltransferase 1 (GPAT1), glycerol-3-phosphate acyltransferase 4 (GPAT4), and 1-acyl-sn-glycerol-3-phosphate acyltransferase alpha (AGPAT1).

Lactose, commonly referred to as milk sugar, is the major carbohydrate in milk. Ruminant milk is a rich source of lactose, while goat milk has less lactose than cow milk (Jelert 1984), which makes goat milk better for people with intestinal disorders (Pieniak-Lendzion and Niedziolka 2004). As reported in Table 5, in general, the expression of genes related to lactose metabolism was far less than those related to lipid metabolism except for LALBA. LALBA, a classical marker of lactation, had the highest expression, which also has been reported in bovines (Wickramasinghe et al. 2012) and human (Lemay et al. 2013b) mammary tissue.

In ruminant milk, more than 95 % of the proteins are encoded by six genes, i.e., CSN1S1, CSN2, CSN1S2, CSN3, LALBA, and LBG (Martin et al. 2002). Their transcripts often account for 70–80 % of all mRNA transcripts in the lactating ruminant mammary gland (Lemay et al. 2013a). Our data revealed that these six genes related to protein and lactose synthesis were the top six genes in the whole dataset and shared similar expression patterns in the different physiological stages compared. Their average accounted for 42 % of all mRNA transcripts, which is lower than previously reported in lactation cow mammary glands (Lemay et al. 2013a). The qPCR confirmed that the expression of these six genes at 270 days of age and non-lactating period was far lower than that at 100 and 310 days postpartum (Fig. 6b).

The level of alpha s1-casein in goat milk ranges from 0 to 7 g/L, which is associated with polymorphisms within the CSN1S1 (Martin et al. 2002). However, the mRNA expression of CSN1S1 was high in our study. It also was reported that the goat CSN1S1 polymorphism has a significant effect on milk protein yield and lipid composition (Ollier et al. 2008). In lactation, the four casein transcripts (CSN1S1, CSN1S2, CSN2, and CSN3) occur at the same level of abundance in the goat, sheep, and cow mammary tissue (Bevilacqua et al. 2006; Colitti and Pulina 2010; Sigl et al. 2012), whereas the amounts of the corresponding proteins are different, suggesting that the four casein mRNAs are not translated with the same efficiency (Bevilacqua et al. 2006). Our qPCR data agree with the lower expression for these genes in late pregnancy (Bionaz and Loor 2011). The high expression of caseins and whey protein genes at 310 days of parturition seemed to partly agree with previous data demonstrating that casein-encoding genes are highly expressed in goat mammary glands at every stage of lactation (Bevilacqua et al. 2006). The mechanism behind the upregulation during pregnancy is unknown. We speculate that these genes might also play an important role in cell turnover of pregnant mammary glands. Additional experiments in the future should be performed to explore this hypothesis.

Many of the milk components such as proteins, lactose, and citrate are secreted via vesicle trafficking (Lu et al. 2014; McManaman and Neville 2003). Thus, the analysis of these membrane proteins related to trafficking, transport, and secretion could help understand further the control of secretion of milk components. In mammalian cells, secretory proteins are typically translocated across the endoplasmic reticulum (ER) membrane in a co-translational mode by the ER protein translocon, comprising the protein-conducting channel Sec61, additional complexes involved in nascent chain processing and translocon-associated protein complex (Koji et al. 2007). The mammalian translocon-associated protein (TRAP) complex comprises four transmembrane protein subunits in the ER, including α-, β-, γ-, and δ-subunits (Koji et al. 2007).

Intracellular trafficking, transport, and secretion of protein, lactose, and lipid droplets are highly essential in mammary epithelial cells (Chong et al. 2011). In this study, we annotated about 1283 unigenes related to this process. The mRNA expression of Sec61α and four isoforms of TRAPs were all in the top 20 in this classification. Even though a few papers have reported the potential role of TRAPs in trafficking and secretion of ER-related vesicles in other mammalian cells (Fons et al. 2003; Mesbah et al. 2006; Yamaguchi et al. 2011), its role in secretory mammary cells is still unclear. Clearly, the huge number of unigenes annotated to intracellular trafficking, transport, and secretion classification underscores the complexity and importance of this process in mammary glands. The identification of these genes in this process will help understand the mechanism of secretion in mammary glands.

The milk fat fraction contains PLIN2, a protein that coats lipid droplets and is involved in milk fat globule trafficking and secretion (Chong et al. 2011; Lemay et al. 2013a). In this study, the expression of PLIN2 in mammary glands at 270 days of age was almost undetectable. Relative to non-lactation, the fold change of PLIN mRNA in lactation was small, which agrees with results in dairy cow (Bionaz and Loor 2008a). The transcript of PLIN2 is among the most abundant mRNAs in the lactating mouse mammary gland, equivalent in level to that of other secreted milk proteins such as the caseins (Rudolph et al. 2003). However, our results indicated that the expression of PLIN2 was far less than that of caseins. This difference could be associated with inherent species differences. Additionally, the different methods of sampling tissue might also partly contribute to different results. Indeed, compared with adult ruminant mammary glands, the mouse mammary gland contains a marked amount of adipose tissue (Rudolph et al. 2003, 2007a). PLIN2 is highly induced during differentiation of adipocytes (Brasaemle et al. 1997; Jiang and Serrero 1992); thus, the high expression of PLIN2 in adipocyte cells infiltrating the mouse mammary gland influences the results.

Taken together, our findings allowed for the development of an up-to-date model of milk fat, protein, and lactose synthesis regulation in goat mammary tissue (Fig. 7). The model incorporates the most recent information available, including our data, on enzymes involved in milk fat, protein, and lactose synthesis. This is the first published study on the global expression profiling of genes in heterogeneous samples of goat mammary gland tissue using next-generation sequencing. A total of 51,361 unique sequences represent a major transcriptomic level resource for goat mammary gland and provide considerable insight into goat mammary gland and its underlying genetic mechanisms. Besides their active role in lipid metabolism and protein synthesis during lactation, their higher expression at 310 days postpartum also could contribute to mammary cell turnover during pregnancy. Further experiments need to be performed to address the functions of genes in mammary gland development and lactation.

Fig. 7
figure 7

Model of the networks of factors (proteins, hormones, and enzymes) potentially involved in the regulation of milk fat, protein, and lactose biosynthesis in goat mammary tissue. The network includes the transport of fatty acids, amino acids, and glucose; insulin and prolactin signaling pathway; and milk fat, milk protein, and milk lactose biosynthesis processes. The detailed description of the network is also reported in Supporting file 6. The gene symbols used in this model and their description are reported in Supporting file 6