Genome-wide detection of copy number variation in Chinese indigenous sheep using an ovine high-density 600 K SNP array

Ma, Qing; Liu, Xuexue; Pan, Jianfei; Ma, Lina; Ma, Yuehui; He, Xiaohong; Zhao, Qianjun; Pu, Yabin; Li, Yingkang; Jiang, Lin

doi:10.1038/s41598-017-00847-9

Genome-wide detection of copy number variation in Chinese indigenous sheep using an ovine high-density 600 K SNP array

Article
Open access
Published: 19 April 2017

Volume 7, article number 912, (2017)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Genome-wide detection of copy number variation in Chinese indigenous sheep using an ovine high-density 600 K SNP array

Download PDF

Qing Ma¹^na1,
Xuexue Liu^2,3^na1,
Jianfei Pan^2,3,
Lina Ma¹,
Yuehui Ma^2,3,
Xiaohong He^2,3,
Qianjun Zhao^2,3,
Yabin Pu^2,3,
Yingkang Li¹ &
…
Lin Jiang^2,3

2873 Accesses
45 Citations
1 Altmetric
Explore all metrics

Abstract

Copy number variants (CNVs) represent a form of genomic structural variation underlying phenotypic diversity. In this study, we used the Illumina Ovine SNP 600 K BeadChip array for genome-wide detection of CNVs in 48 Chinese Tan sheep. A total of 1,296 CNV regions (CNVRs), ranging from 1.2 kb to 2.3 Mb in length, were detected, representing approximately 4.7% of the entire ovine genome (Oar_v3.1). We combined our findings with five existing CNVR reports to generate a composite genome-wide dataset of 4,321 CNVRs, which revealed 556 (43%) novel CNVRs. Subsequently, ten novel CNVRs were randomly chosen for further quantitative real-time PCR (qPCR) confirmation, and eight were successfully validated. Gene functional enrichment revealed that these CNVRs cluster into Gene Ontology (GO) categories of homeobox and embryonic skeletal system morphogenesis. One CNVR overlapping with the homeobox transcription factor DLX3 and previously shown to be associated with curly hair in sheep was identified as the candidate CNV for the special curly fleece phenotype in Tan sheep. We constructed a Chinese indigenous sheep genomic CNV map based on the Illumina Ovine SNP 600 K BeadChip array, providing an important addition to published sheep CNVs, which will be helpful for future investigations of the genomic structural variations underlying traits of interest in sheep.

Genome-wide detection of CNVs in Chinese indigenous sheep with different types of tails using ovine high-density 600K SNP arrays

Article Open access 10 June 2016

Copy number variants in the sheep genome detected using multiple approaches

Article Open access 08 June 2016

Genome-wide detection of copy number variation in American mink using whole-genome sequencing

Article Open access 13 September 2022

Introduction

Copy number variations (CNVs), which represent a type of genomic structural variation, are DNA segments ranging in size from 1 kilobase (kb) to several megabases (Mbs) in which duplication or deletion events have occurred¹. Previous studies have shown that CNVs influencing genes or gene regions are associated with important phenotypic traits in livestock. For example, duplications of the KIT gene constitute the Dominant white locus in pigs^{2, 3}. In chicken, the pea-comb phenotype is caused by a CNV in intron 1 of the SOX5 gene⁴, and the late feathering locus comprises a partial duplication of the PRLR and SPEF2 genes⁵. A 1.6-kb deletion in TBX3 disrupts the asymmetric hair pigmentation that underlies Dun camouflage coloring in horses⁶. As one of the first domesticated animals, sheep (Ovis aries) have played an important role in human society⁷. Although increasing attention has been paid at identifying ovine CNVs^{8,9,10,11,12,13}, the total number of CNVs, particularly in Chinese indigenous sheep, has been limited. None of the previously described ovine CNVs except for the agouti duplication, which affects the ASIP locus in sheep and contributes to coat color variability, have been determined to have a direct effect on a sheep trait^{14, 15}. Therefore, more efforts are needed to identify CNVs, one of the most important types of genomic variation, in the sheep genome.

In recent years, advances in high-throughput genome scanning technologies, especially SNP arrays, DNA hybridization on array platforms, and next-generation sequencing methods, have allowed the identification of genome-wide structural variants that, due to their small size, are undetectable using microscopic techniques¹⁶. Compared to the other two technological platforms, SNP array is more cost-effective, allowing users to increase the number of samples on a limited budget and, as a result, achieve a more desirable performance in large-scale CNV detections, particularly at the genome-wide scale¹⁷. SNP arrays are also advantageous due to their high signal-to-noise ratios and the use of B-allele frequency, which facilitates the interpretation of results¹⁸. Furthermore, less sample per experiment is required for SNP arrays compared to that required for array-based comparative genomic hybridization (aCGH)¹⁹. However, the main bias of SNP arrays on CNV detection is the low SNP coverage of the genomic regions that often harbor CNVs²⁰. This bias may be minimized to some extent by increasing the genome coverage using commercial high-density SNP arrays. Therefore, the detection of CNVs by high-density SNP arrays has become in increasingly common and performed successfully in various species.

Tan sheep, one of the most important sheep breeds indigenous to China, are reared in Ningxia Province and are renowned for their production of high-quality pelts and long-term adaptation to the dry, cold and windy climate of northwestern China. The famous lamb pelts from Tan sheep are the result of long-term artificial selection and thus exhibit a lustrous white curly fleece that disappears gradually with age. Earlier studies have focused on various candidate genes to investigate the genetic mechanism of the distinct curly fleece trait (e.g., polymorphisms in the KRT1.2 (keratin 1.2)²¹ and KAP1.3 (keratin associated protein 1.3)²² genes, which are related to wool curvature). The latest skin transcriptome profiling of Tan sheep at ages 1 and 48 months determined that the keratin genes (including KRT25, KRT5, KRT7, and KRT14) and their associated pathways, which were previously shown to be associated with hair/fleece development and function, are expressed differentially between 1 and 48 months of age²³. However, the genetic components behind the white curly fleece phenomenon and the adaptation to harsh environments in Tan sheep are still unclear.

The primary aim of this study was to conduct a genome-wide survey of sheep CNV regions (CNVRs) using the Illumina Ovine SNP 600 K BeadChip array and to explore the roles of the potentially specific CNVs in Chinese Tan sheep. First, a reliable algorithm was used to assay 48 animals to obtain highly convincing CNVs. Second, the identified CNVRs were compared with five existing CNVR reports in sheep and qPCR was conducted to validate a subset of the novel CNVRs. Finally, the potentially breed-specific CNVRs were determined and the functional relevance of the CNVR-harboring genes was further analyzed using Gene Ontology enrichment and the QTL database (http://www.animalgenome.org/cgi-bin/QTLdb/OA/browse).

Results

Genome-wide detection of CNVs and CNVRs in Chinese Tan sheep

After employing a stringent CNV calling pipeline, we identified 5,190 autosomal CNVs (4,985 losses and 205 gains) in 48 Chinese Tan sheep (Additional file: Table S1). In this study, we defined “loss” and “gain” as deletions and insertions relative to the normal di-allele copy number in the ovine genome. The average length of the CNVs was 64.1 kb, and the median length was 47.1 kb. We found that approximately 48.5% of the CNVs range from 10 kb to 50 kb, 21.7% of CNVs range from 50 kb to 100 kb in size, with 12.2% being small fragment CNVs (<10 kb) (Fig. 1a). We found the number of CNVs in each individual to vary from 140 to 560. After aggregating the overlapping CNVs, we obtained 1,296 autosomal CNVRs representing 121.8 Mb (4.7%) of the entire ovine genome (Additional file: Table S2). The average CNVR size is 92.7 kb, ranging from 1.2 kb to 2.3 Mb. CNVRs ranging in size from 10 kb to 500 kb represent the majority (86.2%), whereas CNVRs larger than 1 Mb were rarely observed (0.8%) (Fig. 1b).

We generated a genome-wide map of CNVRs in Tan sheep (Fig. 2a). The chromosomal proportion covered by CNVRs varies between chromosomes, ranging from 1.5% on OAR26 to 18.2% on OAR24 (Table 1). The number of chromosomal CNVRs ranges from 15 to 135 on OAR23 and OAR3, and the chromosome length and the number of CNVRs show a strong positive linear relationship (Fig. 2b, R² = 0.87). The average distance between the CNVRs on each chromosome ranges from 467.5 kb on OAR6 to 544.9 kb on OAR11. The closest CNVRs are 4.5 kb apart on OAR3, whereas the largest inter-CNV distance is 34.4 Mb on OAR23.

Table 1 The distribution of CNVRs in the ovine autosomes.

Full size table

Remarkably, compared with the number of gain events, we observed almost ten times more loss events and a longer loss length, with the number of regions with CNV losses and gains being 1,173 and 119, respectively. Both types were found in only five regions. For each CNVR, the relative frequency of animals with an overlapping CNV ranged from 2.1% to 37.5% (Additional file: Table S2). In addition, 553 CNVRs were present in only one animal, whereas most CNVRs (57.3%) were identified in two or more samples.

Validation of the identified CNVRs

Based on five previous reports using various platforms to detect CNVs in different breeds of sheep, a total of 4,321 CNVRs (Additional file: Table S3) were obtained^{9,10,11,12,13}. When comparing the novel CNVRs to the previously identified CNVRs, we found 740 (57.1%) overlapping regions (Fig. 3a). Interestingly, the majority of the previously identified CNVRs detected in our study were large CNVRs (>100 kb) (Fig. 3a). The total length of the novel CNVRs corresponded to 20.7 Mb of the genomic area. There were 340, 340, 10, 15, and 187 CNVRs that overlapped with those described by Zhu et al.⁹, Jenkins et al.¹³, Ma et al.¹¹, Hou et al.¹², and Liu et al.¹⁰, respectively (Table 2). To investigate the differences in distribution patterns between the five studies, we performed principle component analysis (PCA) based on the composite CNVR dataset (Fig. 3b). PCA showed that PC1 distinguished our study from others and that PC2 distinguished our study from Jenkins’ study; the other four studies clustered together (Fig. 3b). The hierarchical clustering results showed the same tendency (Additional file: Figure S1).

Table 2 Comparison of our study with five recent ovine CNV reports using various platforms.

Full size table

As many as 556 (42.9%) CNVRs identified in our study are novel (Fig. 3a). To verify the accuracy of our prediction of the novel CNVRs, quantitative real-time PCR (qPCR) was used to validate ten randomly selected CNVRs from our study (i.e., CNVRs #16, 147, 169, 602, 642, 660, 808, 1130, 1160, and 1219) (Fig. 3c). These CNVRs represent three predicted statuses (losses, gains and both) for CNVRs, with frequencies ranging from low to high. Eight (80%) of the selected CNVRs were successfully confirmed. As shown in Fig. 3c, a normalized ratio (NR) of approximately 2 indicates a normal status (no CNV); an NR of approximately 1 or 0 indicates one or two copies deleted, and an NR of approximately 3 or above indicates one or more copies gained. Each CNVR had a reference sample, in which a CNV was not detected, and a reference gene, which also did not contain a CNV. The correlation between our CNV prediction and PCR validation was highly significant (P = 8.92E-12) (Additional file: Figure S2). Details of the primers and results are listed in the supporting information (Additional file: Table S4).

The reference genome used in this study, Oar_v3.1, has 21,585 gaps larger than 1 kb¹⁶. As stated earlier, we identified 5,190 CNVs, which accounted for 4.7% of the entire ovine genome. Among these, 463 (8.9%) overlapped with 1,255 gaps, indicating that the majority (90%) of the CNVs that we identified are located within the gap-free genomic region. Among the 463 CNVs that overlap with gaps, 140 (>30%) were confirmed by other studies. Of the eight randomly selected CNVs for qPCR validation, four overlapped with gaps and were verified by qPCR, suggesting that these gap-overlapping CNVs are also high-confidence CNVs.

Functional annotation of the identified CNVRs

The BioMart system in ENSEMBL (http://www.biomart.org/) was used to retrieve the gene content in the 1,296 CNVRs (1173 losses, 119 gains and 5 both). The 81% of our CNVRs overlap with 4,541 genomic genes (Additional file: Table S2), which are mostly located inside CNVRs (Fig. 4a). Among these genes, 89.9% are protein-coding genes, 3.5% microRNAs (miRNAs), 3.2% lncRNAs, and others pseudogenes, processed pseudogenes, small nucleolar (snoRNA) genes, and miscRNAs.

Interestingly, we found that these CNVR-harboring genes are significantly enriched for lipid metabolism (P = 0.001) and GTPase activity (P = 4.63E-07). Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis showed that the Notch signaling pathway, the MAPK signaling pathway and the VEGF signaling pathway to be significantly enriched (Additional file: Table S5). Furthermore, we found genes associated with lipid metabolism, including PPARA (peroxisome proliferator-activated receptor-α), RXRA (retinoic X receptor A), KLF11 (Kruppel-like factor 11), PPP1CA (phosphoprotein phosphatase 1 catalytic subunit A), and PDGFA (platelet-derived growth factor alpha), that were previously reported to overlap with CNVs in fat-tailed sheep⁹. We found a total of 1,094 CNVRs (84.4%) that overlap with at least one base with QTLs from the sheep Animal QTLdb (Fig. 4b and Additional file: Table S5). Most cases proved to be CNVRs residing in a QTL region.

Potentially specific CNVRs in Tan sheep

Using the same 600 K ovine BeadChip platform, Zhu et al.⁹ identified 371, 370 and 66 CNVRs in large-tailed Han sheep, Altay and Tibetan sheep, respectively, which allowed us to identify the potentially breed-specific CNVRs for Tan sheep. According to the method described by Zhu et al.⁹, we filtered out CNVs smaller than 100 kb and obtained 303 CNVRs. The Venn diagram of the four different indigenous Chinese sheep breeds showed 46, 60, 130, and 5 potentially specific CNVRs in Tan sheep, Altay sheep, large-tailed Han sheep, and Tibetan sheep, respectively, which were sampled by both Zhu et al.⁹ and in our studies (Fig. 4c). GO analysis of genes overlapping with the 46 potentially specific CNVRs in Tan sheep showed significant enrichment of the homeobox (P = 9.6E-06), embryonic skeletal system morphogenesis (GO:0048704, P = 2.15E-08) and anterior/posterior pattern specification (GO:0009952, P = 2.5E-07) categories (Table 3). Interestingly, DLX3/DLX4 (homeobox protein DLX-3/DLX-4), two well-known genes associated with hair development, belong to the most significantly enriched GO category of homeobox. Moreover, HOXD12 (homeobox protein Hox-D12) and TBX6 (T-box transcription factor 6), as well as HOXB3 (homeobox protein Hox-B3), which is involved in embryonic skeletal system morphogenesis, overlap with the CNVRs potentially specific to Tan sheep.

Table 3 The enrichment of Go terms for the specific CNVRs in Tan sheep.

Full size table

Discussion

Using the high-density 600 K SNP arrays, this study identified more than 5,000 autosomal CNVs in 48 animals and grouped them into 1,296 CNVRs in the sheep genome, thereby providing a genome-wide view of CNVs in the Chinese Tan sheep genome. The number of identified CNVRs is larger than that reported by two existing ovine 50 K SNP array-based CNV studies, in which 111 and 238 CNVRs were detected (Table 2). This difference is not surprising, as the current study used a high-density SNP array containing more than 600 K probes, whereas Ma et al.¹¹ and Liu et al.¹⁰ used low-density SNP arrays with 50 K probes. Thus, more than 10 times better resolution was achieved by this study than the earlier two SNP array studies, which also resulted in the smaller mean/median length of CNVRs (Table 2). In fact, a significant improvement in CNVR detection in sheep has been previously achieved by increasing the aCGH probe-spacing resolution from 385 K to 2.1 million probes¹³. In addition to probe-spacing resolution in the genome, the applied filtering process can also affect CNV detection. This was reflected by the comparison of the current study to a previous 600 K SNP array-based CNV study, in which Zhu et al.⁹ filtered out the small CNVs (less than 100 kb in size) for CNV detection. As a result, Zhu et al.⁹ identified half as many CNVRs as the present study, and the median size of the CNVRs they detected was twice as large as that of our study. As expected, when we applied the same filtration process used by Zhu et al.⁹, a CNVR overlap of nearly 60% was reached, which is much higher compared to that found in the previous four studies (Table 2).

The 1,296 autosomal CNVRs reported in this study account for approximately 4.7% of the entire ovine genome. This estimate is similar to the range reported in horses¹⁸, pigs¹⁹, cattle²⁴, and humans²⁵ (0.8% to 5.0%). The estimate is still higher, even when compared with a comprehensive CGH array-based CNV study that covered 2.7% of the ovine genome¹³. This difference could be due to the underestimation of sheep CNVs by Jenkins et al.¹³, in which a cattle reference genome was used for the probe design and, thus, sheep CNVs in regions that were deleted in or of low homology with the reference genome were likely ignored. The difference could also be due to the overestimation by the CNV calling algorithm, PennCNV, in our study. The current SNP array-based detection of CNVs remains prone to false positives and shows low concordance between multiple calling algorithms²⁶. Although PennCNV software is widely used for Illumina SNP arrays²⁷, particularly for high-density SNP arrays⁹, a certain proportion of false positives exists in our findings. SNP arrays may also miss CNVs because their SNP chip coverage shows inherent bias against the genomic regions harboring CNVs²⁰. For example, segmental duplications (SDs), one of the catalysts and hotspots for CNV formation, are often affected by low probe density due to the difficulties of array design²⁸. Furthermore, common CNVs may cause SNPs to be rejected when the SNPs deviate from Mendelian inheritance and the Hardy-Weinberg equilibrium²⁰.

Of the 1,296 identified CNVRs, more losses than gains were observed. This imbalance was also observed in reports by Zhu et al.⁹ and Liu et al.¹⁰ (Table 2) and is commonly reported in the literature²⁹. This large disparity between deletions and duplications can be biased by the high sensitivity to loss events of the CNV calling algorithm and the lack of a Chinese sheep population during the SNP array design. If this bias can be avoided and the high ratio between loss and gain in Chinese native sheep breeds is still found, then this high ratio could be due to genomic differences between foreign breeds and Chinese breeds. Therefore, new strategies (for example, sequencing-based CNV detection) and large Chinese sheep populations are needed to investigate the high loss/gain ratio in future studies.

Notably, 57.1% of the CNVRs detected in this study can be confirmed by other published studies^{9,10,11,12,13} and 80% of the randomly selected novel CNVRs were further confirmed by qPCR, indicating the accuracy of our CNV detection using a high-density SNP chip in the sheep genome. Currently, three algorithms for CNV detection based on SNP arrays have been developed, which are available in different programs, including PennCNV³⁰, cnvPartition (http://www.illumina.com/documents/products/technotes/technote_cnv_algorithms.pdf) and QuantiSNP³¹. According to a previous comprehensive assessment of multiple CNV calling algorithms for array-based detection, the concordance of these algorithms is low²⁶. The common set of CNVs detected by multiple algorithms may avoid bias to some extent but may also miss many false negatives, whereas the union set may contain a large number of false positives. This makes it difficult to determine the appropriate number of CNVs, as summarized by Winchester et al.²⁰. According to the number of citations in PubMed, PennCNV software is currently the most widely used for Illumina chips (PennCNV: 955 citations; QuantiSNP: 432 citations)²⁷, particularly for high-density SNP data⁹. Compared to other algorithms such as CNVPartition and QuantiSNP, PennCNV is more reliable for assessing the number of copies when using Illumina high-density arrays because it incorporates the allelic intensity ratio at each SNP marker and the total signal intensity, the allele frequency of SNPs, the distance between neighboring SNPs, and the GC content to overcome biases³². In this study, we used high-density SNP arrays and stricter filtering criteria (SD of LRR < 0.30 and BAF = 0.01) to reduce the rate of false-positive results, resulting in a qPCR confirmation percentage of 80%. We also found a good correlation between our CNV prediction and qPCR validation (P = 9.0E-12; R = 0.7105; Figure S2). The discrepancy between PennCNV and qPCR validation could represent false negatives in qPCR amplification due to the ambiguous boundaries of CNVs. The uncertain boundaries may lead to placing the qPCR primers outside the actual CNVs, whereas the potential impacts of SNPs/small indels may affect the specific binding of primers to the CNV region for some individuals¹⁷ or could be false positives in the CNV detection by PennCNV in our study.

More than 80% of the CNVRs identified in this article span 1,094 QTLs (Table S6) belonging to seven categories: meat and carcass traits, milk traits, production traits, exterior traits, reproduction traits, health traits, and wool traits. Tan sheep is one of the most important sheep breeds in China because the pelts have special curly fleece after birth, but the animal gradually loses this phenotype with age²³. To identify Tan sheep-specific CNVRs underlying this unique phenotype, the current study was compared with that by Zhu et al.⁹ and 46 potentially breed-specific CNVRs in Tan sheep were obtained (Table S7). GO analysis of the CNVR-harboring genes showed the most significant enrichment in homeobox proteins (P = 9.6E-06), which have been previously shown to play key roles during fetal development in humans³³. Among the homeobox transcription factors, DLX3 is essential for hair morphogenesis, differentiation and cycling programs³⁴. The mutation of this gene is associated with tricho-dento-osseous syndrome, which is characterized by curly and kinky hair at infancy that later straightens³⁵. Interestingly, this is consistent with the unique phenotype in Tan sheep (i.e., curly fleece after birth that disappears with age). Previous studies have shown that SNPs in the 3′UTR and promoter regions of DLX3 have a significant effect on wool curvature in Chinese Merino sheep^36,37,38. Therefore, this candidate gene is worthy of validation for its functional relevance to the special curly fleece in Tan sheep. In addition to the unique trait of curly fleece, Tan sheep exhibits the common fat-tail phenotype, as do other sheep indigenous to China, for adaptation to the dry (average annual precipitation <400 mm) and cold (average annual temperature is 4 °C) climate. Thus, by comparison with the previous CNV study of the two typical Chinese fat-tail breed⁹, we found that the same set of CNVR-harboring genes involved in lipid metabolism and the same GO category of lipid metabolism were significantly enriched (P = 0.001), indicating that the identified CNVRs are also likely associated with the fat-tail phenotype in Tan sheep.

Materials and Methods

Sample preparation

Blood samples were randomly collected from 48 unrelated Tan sheep (6 rams and 42 ewes) from multiple flocks in Ningxia province. Each sheep was carefully confirmed to match the phenotypic characteristics of the Tan sheep breed. Genomic DNA was extracted from blood using the Promega Wizard Genomic DNA Purification Kit (Promega, Madison, Wisconsin, USA) according to the standard protocol provided by the manufacturer. A NanoDrop 2000 was used to measure the purity and concentration of the genomic DNA.

Procurement of peripheral blood was performed according to the guidelines for the care and use of experimental animals established by the ethics committees of the Ministry of Agriculture of People’s Republic of China. All of the animal experiments were approved by the Chinese Academy of Agricultural Sciences (CAAS) (Beijing, China).

Genotyping and quality control

According to the manufacturer’s protocols (Illumina, San Diego, California, USA), all genomic DNA samples from 48 sheep were genotyped using the Illumina Ovine SNP600K BeadChip array, which contains oligo probes for 685,734 SNPs, with the majority (80%) equally spanning the ovine genome, 10% reported in the literature as functional variants, 7% overlapping with the Illumina Ovine SNP50K array, and 3% accessible using a genotyping-by-sequencing protocol³⁹. The raw data were extracted using GenomeStudio (Illumina) and strict quality control was used for SNP filtering to increase the accuracy of the CNV detection. First, we removed individuals in which the call rate was <90%. Second, we discarded SNPs with a >10% missing genotype and a minor allele frequency (MAF) <0.05. Third, we removed SNPs that severely deviated from Hardy-Weinberg equilibrium (multiple test-adjusted P < 10⁻⁵) within each population. In addition, identical-by-descent (IBD) was conducted using Plink software. Finally, 495,786 autosomal SNPs were subjected to the subsequent CNV detection and analysis. The X and Y chromosomes were excluded⁴⁰.

Genome-wide detection of CNVs and CNVRs

CNVs were detected using a hidden Markov Model (PennCNV, http://www.openbioinformatics.org/penncnv/), which allows for the detection of CNVs based on Illumina or Affymetrix SNP chip data. Illumina GenomeStudio software can export the signal intensity data of the log R ratio of R (LRR) and B allele frequency (BAF) for each SNP. The population frequency of B allele (PFB) file was calculated based on the average BAF of each marker in the population. The PennCNV algorithm³⁰ was only applied to autosomes (command: -lastchr 26) to identify individual-based CNVs. To increase the confidence of the detected CNVs, quality control was performed by employing standard exclusions of the LRR (standard deviation of LRR) <0.3, a BAF drift <0.01 and a waviness factor <0.05. We classified the status of these CNV into two categories: “loss” (CNV containing a deletion) and “gain” (CNV containing a duplication).

The CNVRs were determined by aggregating the overlapping CNVs (with at least 1-bp of overlap) that were identified across all of the samples, according to previously reported methods^{41, 42}. We removed the CNVRs that were less than 1 kb, as CNVs were defined as fragments ranging from 1 kb to several Mbs and having a variable copy number in comparison to with reference genome¹. To further support the PCA results, a Hierarchical clustering analysis⁴³ for all published studies was performed according to their CNVR distribution.

Functional enrichment analysis of CNVR-harboring genes

BioMart (http://www.biomart.org/) in the Ensembl database was employed to identify genes located within or partially overlapping with the identified CNVRs. CNVRs that overlapped with the gene’s coding region by at least 1 bp were used to calculate the proportion of CNVR overlapping genes. Functional annotation was performed in DAVID (http://david.abcc.ncifcrf.gov/) for GO terms and KEGG pathway analyses. Because the sheep genome annotation is limited, the ovine Ensembl gene IDs were converted into orthologous human Ensembl gene IDs for the functional enrichment analysis. Furthermore, these CNVRs were mapped to the sheep QTLs from the Animal QTL database (http://www.animalgenome.org/cgi-bin/QTLdb/OA/browse).

Comparison with previous studies

To evaluate the reliability of the CNVRs detected, we compared our results to five existing studies of sheep CNVs detected using various platforms and involving different breeds^{9,10,11,12,13}. The CNVRs were compared according to our previous paper¹⁷. To compare the CNVRs among sheep breeds with different types of tails under the same condition (including the same platforms, analysis methods and filtering parameters), we removed all CNVs less than 100 kb according to the technique described by Zhu et al.⁹.

qPCR validation of CNVRs

We performed qPCR analysis on ten random selected genomic regions harboring CNVs identified in this study on an ABI7500 (Applied Biosystems by Life Technologies, Darmstadt, Germany) sequence detection system. The primers (Additional file: Table S4) were designed using Primer Premier 6 software (Premier Company) and were based on NCBI reference sequences. The genomic DNAs of the same individual used in the Illumina Chip genotyping were used for the experimental validation. The two normal copies of DGAT2 in the ovine genome were used as reference genes according to our own study and a previous study¹⁰. Four samples in which a normal copy number was identified in the target regions were used as reference samples. PCR experiments were conducted using Power SYBR Green PCR Reagent Kit (Applied Biosystems). The qPCR conditions were as follows: 95 °C for 3 min, followed by 40 cycles of 95 °C for 15 s and 60 °C for 60 s. Three replications were performed for each sample. Fold changes were determined using a standard 2^−ΔΔCT method that compares the ΔC_T value of a reference sample with the sample of interest for the ΔΔC_T calculation and compares the C_T (cycle threshold) values of a reference gene to the gene of interest for the ΔC_T calculation. Fold changes were normalized to a diploid number for a better comparison of copy number in all qPCR plots.

References

Sebat, J. et al. Large-scale copy number polymorphism in the human genome. Science 305, 525–528, doi:10.1126/science.1098918 (2004).
Article ADS CAS PubMed Google Scholar
Pielberg, G., Olsson, C., Syvänen, A. C. & Andersson, L. Unexpectedly high allelic diversity at the KIT locus causing dominant white color in the domestic pig. Genetics 160, 305–311 (2002).
CAS PubMed PubMed Central Google Scholar
Fontanesi, L. et al. Genetic heterogeneity and selection signature at the KIT gene in pigs showing different coat colours and patterns. Anim. Genet 41, 478–492, doi:10.1111/j.1365-2052.2010.02054.x (2010).
Article CAS PubMed Google Scholar
Wright, D. et al. Copy number variation in intron 1 of SOX5 causes the Pea-comb phenotype in chickens. PLoS Genet. 5, doi:10.1371/journal.pgen.1000512 (2009).
Elferink, M.G., Vallée, A.A., Jungerius, A.P., Crooijmans, R.P. & Groenen, M.A.M. Partial duplication of the PRLR and SPEF2 genes at the late feathering locus in chicken. BMC Genomics 9, doi:10.1186/1471-2164-9-391 (2008).
Imsland, F. et al. Regulatory mutations in TBX3 disrupt asymmetric hair pigmentation that underlies Dun camouflage color in horses. Nat. Genet. 48, 152–158, doi:10.1038/ng.3475 (2016).
Article CAS PubMed Google Scholar
Lv, F. H. et al. Mitogenomic meta-analysis identifies two phases of migration in the history of eastern Eurasian sheep. Mol. Biol. Evol. 32, 2515–2533, doi:10.1093/molbev/msv139 (2015).
Article CAS PubMed PubMed Central Google Scholar
Fontanesi, L. et al. A first comparative map of copy number variations in the sheep genome. Genomics 97, 158–165, doi:10.1016/j.ygeno.2010.11.005 (2011).
Article CAS PubMed Google Scholar
Zhu, C. et al. Genome-wide detection of CNVs in Chinese indigenous sheep with different types of tails using ovine high-density 600 K SNP arrays. Sci. Rep. 6, doi:10.1038/srep27822 (2016).
Liu, J. et al. Analysis of copy number variations in the sheep genome using 50 K SNP BeadChip array. BMC Genomics 14, doi:10.1186/1471-2164-14-229 (2013).
Ma, Y., Zhang, Q., Lu, Z., Zhao, X. & Zhang, Y. Analysis of copy number variations by SNP50 BeadChip array in Chinese sheep. Genomics 106, 295–300, doi:10.1016/j.ygeno.2015.08.001 (2015).
Article CAS PubMed Google Scholar
Hou, C. L. et al. Genome-wide analysis of copy number variations in Chinese sheep using array comparative genomic hybridization. Small Ruminant Res 128, 19–26, doi:10.1016/j.smallrumres.2015.04.014 (2015).
Article Google Scholar
Jenkins, G.M. et al. Copy number variants in the sheep genome detected using multiple approaches. BMC Genomics 17, doi:10.1186/s12864-016-2754-7 (2016).
Norris, B. J. & Whan, V. A. A gene duplication affecting expression of the ovine ASIP gene is responsible for white and black sheep. Genome Res. 18, 1282–1293, doi:10.1101/gr.072090.107 (2008).
Article CAS PubMed PubMed Central Google Scholar
Fontanesi, L., Dall’Olio, S., Beretti, F., Portolano, B. & Russo, V. Coat colours in the Massese sheep breed are associated with mutations in the agouti signalling protein (ASIP) and melanocortin 1 receptor (MC1R) genes. Animal 5, 8–17, doi:10.1017/S1751731110001382 (2011).
Article CAS PubMed Google Scholar
Jiang, Y. et al. The sheep genome illuminates biology of the rumen and lipid metabolism. Science 344, 1168–1173, doi:10.1126/science.1252806 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Dong, K. et al. Copy number variation detection using SNP genotyping arrays in three Chinese pig breeds. Anim. Genet. 46, 101–109, doi:10.1111/age.2015.46.issue-2 (2015).
Article CAS PubMed Google Scholar
Kader, A. et al. Identification of copy number variations in three Chinese horse breeds using 70 K single nucleotide polymorphism BeadChip array. Anim. Genet. 47, 560–569, doi:10.1111/age.12451 (2016).
Article CAS PubMed Google Scholar
Wang, J. et al. A genome-wide detection of copy number variations using SNP genotyping arrays in swine. BMC Genomics 13, doi:10.1186/1471-2164-13-273 (2012).
Winchester, L., Yau, C. & Ragoussis, J. Comparing CNV detection methods for SNP arrays. Brief. Funct. Genomic. Proteomic 8, 353–366, doi:10.1093/bfgp/elp017 (2009).
Article CAS PubMed Google Scholar
Rui, Z. et al. Correlation between KRT1. 2 gene and properties of lamb fur qualities of Tan sheep in Ningxia. J. Agric. Sci. 3, 10 (2010).
Google Scholar
Lijuan, Y. et al. Correlation between KAP1. 3 gene and fur quality characteristics in Ningxia Tan sheep. Journal of Ningxia University (Natural Science Edition) 4, 021 (2010).
Google Scholar
Kang, X. et al. Transcriptome profile at different physiological stages reveals potential mode for curly fleece in Chinese tan sheep. PLoS One 8, doi:10.1371/journal.pone.0071763 (2013).
Hou, Y. et al. Genomic characteristics of cattle copy number variations. BMC Genomics 12, doi:10.1186/1471-2164-12-127 (2011).
McCarroll, S. A. et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat. Genet. 40, 1166–1174, doi:10.1038/ng.238 (2008).
Article CAS PubMed Google Scholar
Pinto, D. et al. Comprehensive assessment of array-based platforms and calling algorithms for detection of copy number variants. Nat. Biotechnol. 29, 512–520, doi:10.1038/nbt.1852 (2011).
Article CAS PubMed PubMed Central Google Scholar
Mace, A. et al. New quality measure for SNP array based CNV detection. Bioinformatics 32, 3298–3305, doi:10.1093/bioinformatics/btw477 (2016).
Article CAS PubMed Google Scholar
Xu, L., Hou, Y., Bickhart, D., Song, J. & Liu, G. Comparative analysis of CNV calling algorithms: literature survey and a case study using bovine high-density SNP data. Microarrays 2, 171–185, doi:10.3390/microarrays2030171 (2013).
Article CAS PubMed PubMed Central Google Scholar
Doan, R. et al. Whole-genome sequencing and genetic variant analysis of a quarter horse mare. BMC Genomics 13, doi:10.1186/1471-2164-13-78 (2012).
Wang, K. et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17, 1665–1674, doi:10.1101/gr.6861907 (2007).
Article CAS PubMed PubMed Central Google Scholar
Colella, S. et al. QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res. 35, 2013–2025, doi:10.1093/nar/gkm076 (2007).
Article CAS PubMed PubMed Central Google Scholar
G., M. et al. Assessment of copy number variation using the Illumina Infinium 1 M SNP-array: a comparison of methodological approaches in the Spanish Bladder Cancer/EPICURO study. Human Mutat. 32, 240–248, doi:10.1002/humu.v32.2 (2011).
Article Google Scholar
Mavrogiannis, L. A. et al. Haploinsufficiency of the human homeobox gene ALX4 causes skull ossification defects. Nat. Genet. 27, 17–18, doi:10.1038/83703 (2001).
Article CAS PubMed Google Scholar
Hwang, J., Mehrani, T., Millar, S. E. & Morasso, M. I. Dlx3 is a crucial regulator of hair follicle differentiation and cycling. Development 135, 3149–3159, doi:10.1242/dev.022202 (2008).
Article CAS PubMed PubMed Central Google Scholar
Al-Batayneh, O.B. Tricho-dento-osseous syndrome: diagnosis and dental management. Int. J. Dent. 2012, doi:10.1155/2012/514692 (2012).
Pei, W. et al. Promoter characterization of sheep Dlx3 gene and association of promoter polymorphisms with wool quality traits in Chinese Merino. Scientia Agricultura Sinica 46, 614–622 (2013).
CAS Google Scholar
Rong, E. G. et al. Polymorphism in 3′UTR of DLX3 gene and its association with wool quality traits in Chinese merino sheep. Chinese Journal of Animal and Veterinary Sciences 3, 6 (2012).
Google Scholar
Rong, E. et al. Functional characterization of a single nucleotide polymorphism in the 3′untranslated region of sheep DLX3 gene. PLoS One 10, doi:10.1371/journal.pone.0137135 (2015).
Anderson, R. Development of a high density (600 K) Illumina Ovine SNP Chip and its use to fine map the yellow fat locus. In Plant and Animal Genome XXII Conference (Plant and Animal Genome, 2014).
Raudsepp, T. & Chowdhary, B. P. The horse pseudoautosomal region (PAR): characterization and comparison with the human, chimp and mouse PARs. Cytogenet. Genome Res. 121, 102–109, doi:10.1159/000125835 (2008).
Article CAS PubMed Google Scholar
Feuk, L., Carson, A. R. & Scherer, S. W. Structural variation in the human genome. Nat. Rev. Genet. 7, 85–97, doi:10.1038/nrg1767 (2006).
Article CAS PubMed Google Scholar
Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454, doi:10.1038/nature05329 (2006).
Article ADS CAS PubMed PubMed Central Google Scholar
Xie, J. et al. Identification of copy number variations in Xiang and Kele pigs. PLoS One 11, doi:10.1371/journal.pone.0148565 (2016).

Download references

Acknowledgements

This project was supported by the Ningxia Special Project of Agricultural Breeding (Strain Breeding of Tan sheep, 2013NYYZ04), the Basic R&D Fund for the Central Level Scientific Research Institute (2015ywf-zd-1,2015ZL044), the National Natural Science Foundation of China (31272403, 31472064, 31601910), the Agricultural Science and Technology Innovation Program of China (ASTIP-IAS01), the earmarked fund for Modern Agro-industry Technology Research System (CARS-40-01) and the Special Fund for Agro-scientific Research in the Public Interest (20130305902).

Author information

Qing Ma and Xuexue Liu contributed equally to this work.

Authors and Affiliations

Institute of Animal Science, Ningxia Academy of Agriculture and Forestry Sciences, Yinchuan, Ningxia, 75002, China
Qing Ma, Lina Ma & Yingkang Li
Institute of Animal Science, Chinese Academy of Agricultural Sciences (CAAS), No. 2 Yuanmingyuan West Road, Beijing, 100193, China
Xuexue Liu, Jianfei Pan, Yuehui Ma, Xiaohong He, Qianjun Zhao, Yabin Pu & Lin Jiang
CAAS-ILRI Joint Laboratory on Livestock and Forage Genetic Resources, Institute of Animal Science, Chinese Academy of Agricultural Sciences (CAAS), No. 2 Yuanmingyuan West Road, Beijing, 100193, China
Xuexue Liu, Jianfei Pan, Yuehui Ma, Xiaohong He, Qianjun Zhao, Yabin Pu & Lin Jiang

Authors

Qing Ma
View author publications
You can also search for this author in PubMed Google Scholar
Xuexue Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jianfei Pan
View author publications
You can also search for this author in PubMed Google Scholar
Lina Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yuehui Ma
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohong He
View author publications
You can also search for this author in PubMed Google Scholar
Qianjun Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yabin Pu
View author publications
You can also search for this author in PubMed Google Scholar
Yingkang Li
View author publications
You can also search for this author in PubMed Google Scholar
Lin Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.L. and L.Y. designed the experiment, M.Q., L.X. and J.L. carried out computational analysis and drafted the manuscript. P.J., M.L., M.Y., H.X., Z.Q. and P.Y. participated in the animal samples collection and statistical analysis. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Yingkang Li or Lin Jiang.

Ethics declarations

Competing Interests

The authors declare that they have no competing interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

dataset1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ma, Q., Liu, X., Pan, J. et al. Genome-wide detection of copy number variation in Chinese indigenous sheep using an ovine high-density 600 K SNP array. Sci Rep 7, 912 (2017). https://doi.org/10.1038/s41598-017-00847-9

Download citation

Received: 22 September 2016
Accepted: 16 March 2017
Published: 19 April 2017
DOI: https://doi.org/10.1038/s41598-017-00847-9
Springer Nature Limited

This article is cited by

Structural variant landscapes reveal convergent signatures of evolution in sheep and goats
- Ji Yang
- Dong-Feng Wang
- Meng-Hua Li
Genome Biology (2024)
Genome-wide investigation to assess copy number variants in the Italian local chicken population
- Filippo Cendron
- Martino Cassandro
- Mauro Penasa
Journal of Animal Science and Biotechnology (2024)
Identification of copy number variation in Tibetan sheep using whole genome resequencing reveals evidence of genomic selection
- Huibin Shi
- Taotao Li
- Youji Ma
BMC Genomics (2023)
Genome-wide identification of candidate copy number polymorphism genes associated with complex traits of Tibetan-sheep
- Dehong Tian
- De Sun
- Kai Zhao
Scientific Reports (2023)
Genetics of the phenotypic evolution in sheep: a molecular look at diversity-driving genes
- Peter Kalds
- Shiwei Zhou
- Xiaolong Wang
Genetics Selection Evolution (2022)

Genome-wide detection of copy number variation in Chinese indigenous sheep using an ovine high-density 600 K SNP array

Abstract

Similar content being viewed by others

Introduction

Results

Genome-wide detection of CNVs and CNVRs in Chinese Tan sheep

Validation of the identified CNVRs

Functional annotation of the identified CNVRs

Potentially specific CNVRs in Tan sheep

Discussion

Materials and Methods

Sample preparation

Genotyping and quality control

Genome-wide detection of CNVs and CNVRs

Functional enrichment analysis of CNVR-harboring genes

Comparison with previous studies

qPCR validation of CNVRs

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing Interests

Additional information

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation