Building a sequence map of the pig pan-genome from multiple de novo assemblies and Hi-C data

Tian, Xiaomeng; Li, Ran; Fu, Weiwei; Li, Yan; Wang, Xihong; Li, Ming; Du, Duo; Tang, Qianzi; Cai, Yudong; Long, Yiming; Zhao, Yue; Li, Mingzhou; Jiang, Yu

doi:10.1007/s11427-019-9551-7

Building a sequence map of the pig pan-genome from multiple de novo assemblies and Hi-C data

Research Paper
Published: 08 July 2019

Volume 63, pages 750–763, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Science China Life Sciences Aims and scope Submit manuscript

Building a sequence map of the pig pan-genome from multiple de novo assemblies and Hi-C data

Download PDF

Xiaomeng Tian¹^na1,
Ran Li¹^na1,
Weiwei Fu¹^na1,
Yan Li²^na1,
Xihong Wang¹,
Ming Li¹,
Duo Du¹,
Qianzi Tang²,
Yudong Cai¹,
Yiming Long¹,
Yue Zhao¹,
Mingzhou Li² &
…
Yu Jiang¹

1429 Accesses
44 Citations
2 Altmetric
Explore all metrics

Abstract

Pigs were domesticated independently in the Near East and China, indicating that a single reference genome from one individual is unable to represent the full spectrum of divergent sequences in pigs worldwide. Therefore, 12 de novo pig assemblies from Eurasia were compared in this study to identify the missing sequences from the reference genome. As a result, 72.5 Mb of non-redundant sequences (∼3% of the genome) were found to be absent from the reference genome (Sscrofa11.1) and were defined as pan-sequences. Of the pan-sequences, 9.0 Mb were dominant in Chinese pigs, in contrast with their low frequency in European pigs. One sequence dominant in Chinese pigs contained the complete genic region of the tazarotene-induced gene 3 (TIG3) gene which is involved in fatty acid metabolism. Using flanking sequences and Hi-C based methods, 27.7% of the sequences could be anchored to the reference genome. The supplementation of these sequences could contribute to the accurate interpretation of the 3D chromatin structure. A web-based pan-genome database was further provided to serve as a primary resource for exploration of genetic diversity and promote pig breeding and biomedical research.

Article PDF

A chromosome-level genome assembly of the Korean crossbred pig Nanchukmacdon (Sus scrofa)

Article Open access 03 November 2023

A chromosome-level genome assembly of the Korean minipig (Sus scrofa)

Article Open access 03 August 2024

Pig pangenome graph reveals functional features of non-reference sequences

Article Open access 22 February 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Data availability

The sequencing reads of each sequencing library have been deposited at NCBI for Hi-C data (Project ID: PRJNA482496). The assembly of the pig pan-genome and subsequent analysis results are available from our PIGPAN website (http://animal.nwsuaf.edu.cn/code/index.php/pan-Pig). All other data supporting the findings of this study are available in the article and its supplementary information files are available from the corresponding author on request.

References

Ai, H., Fang, X., Yang, B., Huang, Z., Chen, H., Mao, L., Zhang, F., Zhang, L., Cui, L., He, W., et al. (2015). Adaptation and possible ancient interspecies introgression in pigs identified by whole-genome sequencing. Nat Genet 47, 217–225.
Article CAS PubMed Google Scholar
Arumemi, F., Bayles, I., Paul, J., and Milcarek, C. (2013). Shared and discrete interacting partners of ELL1 and ELL2 by yeast two-hybrid assay. ABB 04, 774–780.
Article CAS Google Scholar
Blanco, E., Parra, G., and Guigo, R. (2007). Using geneid to identify genes. Curr Protoc Bioinformatics Chapter 4, Unit 4.3.
Burge, C.B., and Karlin, S. (1998). Finding the genes in genomic DNA. Curr Opin Struct Biol 8, 346–354.
Article CAS PubMed Google Scholar
Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., and Madden, T.L. (2009). BLAST+: architecture and applications. BMC BioInf 10, 421.
Article CAS Google Scholar
Casper, J., Zweig, A.S., Villarreal, C., Tyner, C., Speir, M.L., Rosenbloom, K.R., Raney, B.J., Lee, C.M., Lee, B.T., Karolchik, D., et al. (2017) OUP accepted manuscript. Nucleic Acids Res.
Christopoulos, A., Ligoudistianou, C., Bethanis, P., and Gazouli, M. (2018). Successful use of adipose-derived mesenchymal stem cells to correct a male breast affected by Poland Syndrome: a case report. J Surg Case Rep 2018(7), rjy151.
Article PubMed PubMed Central Google Scholar
Dixon, J.R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J. S., and Ren, B. (2012). Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380.
Article CAS PubMed PubMed Central Google Scholar
Doerks, T., Copley, R.R., Schultz, J., Ponting, C.P., and Bork, P. (2002). Systematic identification of novel protein domain families associated with nuclear functions. Genome Res 12, 47–56.
Article CAS PubMed PubMed Central Google Scholar
Dong, P., Tu, X., Chu, P.Y., Lü, P., Zhu, N., Grierson, D., Du, B., Li, P., and Zhong, S. (2017). 3D chromatin architecture of large plant genomes determined by local A/B compartments. Mol Plant 10, 1497–1509.
Article CAS PubMed Google Scholar
Durand, N.C., Shamim, M.S., Machol, I., Rao, S.S.P., Huntley, M.H., Lander, E.S., and Aiden, E.L. (2016). Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst 3, 95–98.
Article CAS PubMed PubMed Central Google Scholar
Fang, X., Mou, Y., Huang, Z., Li, Y., Han, L., Zhang, Y., Feng, Y., Chen, Y., Jiang, X., Zhao, W., et al. (2012). The sequence and analysis of a Chinese pig genome. Gigascience 1, 16.
Article CAS PubMed PubMed Central Google Scholar
Frantz, L.A.F., Schraiber, J.G., Madsen, O., Megens, H.J., Cagan, A., Bosse, M., Paudel, Y., Crooijmans, R.P.M.A., Larson, G., and Groenen, M.A.M. (2015). Evidence of long-term gene flow and selection during domestication from analyses of Eurasian wild and domestic pig genomes. Nat Genet 47, 1141–1148.
Article CAS PubMed Google Scholar
Frazee, A.C., Pertea, G., Jaffe, A.E., Langmead, B., Salzberg, S.L., and Leek, J.T. (2015). Ballgown bridges the gap between transcriptome assembly and expression analysis. Nat Biotechnol 33, 243–246.
Article CAS PubMed PubMed Central Google Scholar
Golicz, A.A., Bayer, P.E., Barker, G.C., Edger, P.P., Kim, H.R., Martinez, P. A., Chan, C.K.K., Severn-Ellis, A., McCombie, W.R., Parkin, I.A.P., et al. (2016). The pangenome of an agronomically important crop plant Brassica oleracea. Nat Commun 7, 13390.
Article CAS PubMed PubMed Central Google Scholar
Gordon, S.P., Contreras-Moreira, B., Woods, D.P., Des Marais, D.L., Burgess, D., Shu, S., Stritt, C., Roulin, A.C., Schackwitz, W., Tyler, L., et al. (2017). Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure. Nat Commun 8, 2184.
Article PubMed PubMed Central CAS Google Scholar
Groenen, M.A.M., Archibald, A.L., Uenishi, H., Tuggle, C.K., Takeuchi, Y., Rothschild, M.F., Rogel-Gaillard, C., Park, C., Milan, D., Megens, H.J., et al. (2012). Analyses of pig genomes provide insight into porcine demography and evolution. Nature 491, 393–398.
Article CAS PubMed PubMed Central Google Scholar
Guirao-Rico, S., Ramirez, O., Ojeda, A., Amills, M., and Ramos-Onsins, S. E. (2018). Porcine Y-chromosome variation is consistent with the occurrence of paternal gene flow from non-Asian to Asian populations. Heredity 120, 63–76.
Article CAS PubMed Google Scholar
Hirsch, C.N., Foerster, J.M., Johnson, J.M., Sekhon, R.S., Muttoni, G., Vaillancourt, B., Peñagaricano, F., Lindquist, E., Pedraza, M.A., Barry, K., et al. (2014). Insights into the maize pan-genome and pan-transcriptome. Plant Cell 26, 121–135.
Article CAS PubMed PubMed Central Google Scholar
Jeong, H., Song, K.D., Seo, M., Caetano-Anollés, K., Kim, J., Kwak, W., Oh, J.D., Kim, E.S., Jeong, D.K., Cho, S., et al. (2015). Exploring evidence of positive selection reveals genetic basis of meat quality traits in Berkshire pigs through whole genome sequencing. BMC Genet 16, 104.
Article PubMed PubMed Central CAS Google Scholar
Kent, W.J. (2002). BLAT—The BLAST-like alignment tool. Genome Res 12, 656–664.
CAS PubMed PubMed Central Google Scholar
Kim, D., Langmead, B., and Salzberg, S.L. (2015). HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12, 357–360.
Article CAS PubMed PubMed Central Google Scholar
Knight, P.A., and Ruiz, D. (2013). A fast algorithm for matrix balancing. IMA J Numer Anal 33, 1029–1047.
Article Google Scholar
Kumar, S., Stecher, G., and Tamura, K. (2016). MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol 33, 1870–1874.
Article CAS PubMed PubMed Central Google Scholar
Larson, G., Dobney, K., Albarella, U., Fang, M., Matisoo-Smith, E., Robins, J., Lowden, S., Finlayson, H., Brand, T., Willerslev, E., et al. (2005). Worldwide phylogeography of wild boar reveals multiple centers of pig domestication. Science 307, 1618–1621.
Article CAS PubMed Google Scholar
Leung, D., Jung, I., Rajagopal, N., Schmitt, A., Selvaraj, S., Lee, A.Y., Yen, C.A., Lin, S., Lin, Y., Qiu, Y., et al. (2015). Integrative analysis of haplotype-resolved epigenomes across human tissues. Nature 518, 350–354.
Article CAS PubMed PubMed Central Google Scholar
Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760.
Article CAS PubMed PubMed Central Google Scholar
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., and Durbin, R. (2009). The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079.
Article PubMed PubMed Central CAS Google Scholar
Li, M., Chen, L., Tian, S., Lin, Y., Tang, Q., Zhou, X., Li, D., Yeung, C.K.L., Che, T., Jin, L., et al. (2017). Comprehensive variation discovery and recovery of missing sequence in the pig genome using multiple de novo assemblies. Genome Res 27, 865–874.
Article CAS PubMed PubMed Central Google Scholar
Li, M., Tian, S., Jin, L., Zhou, G., Li, Y., Zhang, Y., Wang, T., Yeung, C.K.L., Chen, L., Ma, J., et al. (2013). Genomic analyses identify distinct patterns of selection in domesticated pigs and Tibetan wild boars. Nat Genet 45, 1431–1438.
Article CAS PubMed Google Scholar
Li, R., Li, Y., Zheng, H., Luo, R., Zhu, H., Li, Q., Qian, W., Ren, Y., Tian, G., Li, J., et al. (2010). Building the sequence map of the human pan-genome. Nat Biotechnol 28, 57–63.
Article CAS PubMed Google Scholar
Li, Y., Zhou, G., Ma, J., Jiang, W., Jin, L., Zhang, Z., Guo, Y., Zhang, J., Sui, Y., Zheng, L., et al. (2014). De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits. Nat Biotechnol 32, 1045–1052.
Article CAS PubMed Google Scholar
Lieberman-Aiden, E., van Berkum, N.L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B.R., Sabo, P.J., Dorschner, M.O., et al. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293.
Article CAS PubMed PubMed Central Google Scholar
McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., et al. (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20, 1297–1303.
Article CAS PubMed PubMed Central Google Scholar
Monat, C., Pera, B., Ndjiondjop, M.N., Sow, M., Tranchant-Dubreuil, C., Bastianelli, L., Ghesquière, A., and Sabot, F. (2016). de novo assemblies of three Oryza glaberrima accessions provide first insights about pan-genome of African rices. Genome Biol Evol evw253.
Morgulis, A., Gertz, E.M., Schäffer, A.A., and Agarwala, R. (2006). WindowMasker: window-based masker for sequenced genomes. Bioinformatics 22, 134–141.
Article CAS PubMed Google Scholar
Neafsey, D.E., Waterhouse, R.M., Abai, M.R., Aganezov, S.S., Alekseyev, M.A., Allen, J.E., Amon, J., Arcà, B., Arensburger, P., Artemov, G., et al. (2015). Highly evolvable malaria vectors: The genomes of 16 Anopheles mosquitoes. Science 347, 1258522–43.
Article PubMed CAS Google Scholar
Pertea, M., Pertea, G.M., Antonescu, C.M., Chang, T.C., Mendell, J.T., and Salzberg, S.L. (2015). StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33, 290–295.
Article CAS PubMed PubMed Central Google Scholar
Rao, S.S.P., Huntley, M.H., Durand, N.C., Stamenova, E.K., Bochkov, I.D., Robinson, J.T., Sanborn, A.L., Machol, I., Omer, A.D., Lander, E.S., et al. (2014). A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680.
Article CAS PubMed PubMed Central Google Scholar
Ron, G., Globerson, Y., Moran, D., and Kaplan, T. (2017). Promoter-enhancer interactions identified from Hi-C data using probabilistic models and hierarchical topological domains. Nat Commun 8, 2237.
Article PubMed PubMed Central CAS Google Scholar
Schatz, M.C., Maron, L.G., Stein, J.C., Hernandez Wences, A., Gurtowski, J., Biggers, E., Lee, H., Kramer, M., Antoniou, E., Ghiban, E., et al. (2014). Whole genome de novo assemblies of three divergent strains of rice, Oryza sativa, document novel gene space of aus and indica. Genome Biol 15, 506.
PubMed PubMed Central Google Scholar
Shen, W., Le, S., Li, Y., and Hu, F. (2016). SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS ONE 11, e0163962.
Article PubMed PubMed Central CAS Google Scholar
Sherman, R.M., Forman, J., Antonescu, V., Puiu, D., Daya, M., Rafaels, N., Boorgula, M.P., Chavan, S., Vergara, C., Ortega, V.E., et al. (2019). Assembly of a pan-genome from deep sequencing of 910 humans of African descent. Nat Genet 51, 30–35.
Article CAS PubMed Google Scholar
Stanke, M., Keller, O., Gunduz, I., Hayes, A., Waack, S., and Morgenstern, B. (2006). AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 34, W435–W439.
Article CAS PubMed PubMed Central Google Scholar
Sun, C., Hu, Z., Zheng, T., Lu, K., Zhao, Y., Wang, W., Shi, J., Wang, C., Lu, J., Zhang, D., et al. (2017). RPAN: rice pan-genome browser for ∼3000 rice genomes. Nucleic Acids Res 45, 597–605.
Article CAS PubMed Google Scholar
Uyama, T., Ichi, I., Kono, N., Inoue, A., Tsuboi, K., Jin, X.H., Araki, N., Aoki, J., Arai, H., and Ueda, N. (2012). Regulation of peroxisomal lipid metabolism by catalytic activity of tumor suppressor H-rev107. J Biol Chem 287, 2706–2718.
Article CAS PubMed Google Scholar
Vaccari, C.M., Romanini, M.V., Musante, I., Tassano, E., Gimelli, S., Divizia, M.T., Torre, M., Morovic, C.G., Lerone, M., Ravazzolo, R., et al. (2014). De novo deletion of chromosome 11q12.3 in monozygotic twins affected by Poland Syndrome. BMC Med Genet 15, 63.
Article PubMed PubMed Central Google Scholar
Wang, X., Zheng, Z., Cai, Y., Chen, T., Li, C., Fu, W., and Jiang, Y. (2017). CNVcaller: highly efficient and widely applicable software for detecting copy number variations in large populations. GigaScience 6.
Wong, K.H.Y., Levy-Sakin, M., and Kwok, P.Y. (2018). De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse populations. Nat Commun 9, 3040.
Article PubMed PubMed Central CAS Google Scholar
Xiao, S., Xie, D., Cao, X., Yu, P., Xing, X., Chen, C.C., Musselman, M., Xie, M., West, F.D., Lewin, H.A., et al. (2012). Comparative epigenomic annotation of regulatory DNA. Cell 149, 1381–1392.
Article CAS PubMed PubMed Central Google Scholar
Xie, C., Mao, X., Huang, J., Ding, Y., Wu, J., Dong, S., Kong, L., Gao, G., Li, C.Y., and Wei, L. (2011). KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases. Nucleic Acids Res 39, W316–W322.
Article CAS PubMed PubMed Central Google Scholar
Yan, G., Zhang, G., Fang, X., Zhang, Y., Li, C., Ling, F., Cooper, D.N., Li, Q., Li, Y., van Gool, A.J., et al. (2011). Genome sequencing and comparison of two nonhuman primate animal models, the cynomolgus and Chinese rhesus macaques. Nat Biotechnol 29, 1019–1023.
Article CAS PubMed Google Scholar
Zhang, Y., Liu, T., Meyer, C.A., Eeckhoute, J., Johnson, D.S., Bernstein, B. E., Nussbaum, C., Myers, R.M., Brown, M., Li, W., et al. (2008). Model-based analysis of ChIP-Seq (MACS). Genome Biol 9, R137.
Article PubMed PubMed Central CAS Google Scholar
Zhao, Q., Feng, Q., Lu, H., Li, Y., Wang, A., Tian, Q., Zhan, Q., Lu, Y., Zhang, L., Huang, T., et al. (2018). Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice. Nat Genet 50, 278–284.
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (31822052 and 31572381) to Y.J and the Science & Technology Support Program of Sichuan (2016NYZ0042 and 2017NZDZX0002) to M.Z.L. We thank the High Performance Computing platform of Northwest A&F University for their assistance with the computing.

Author information

Contributed equally to this work

Authors and Affiliations

Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China
Xiaomeng Tian, Ran Li, Weiwei Fu, Xihong Wang, Ming Li, Duo Du, Yudong Cai, Yiming Long, Yue Zhao & Yu Jiang
Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
Yan Li, Qianzi Tang & Mingzhou Li

Authors

Xiaomeng Tian
View author publications
You can also search for this author in PubMed Google Scholar
Ran Li
View author publications
You can also search for this author in PubMed Google Scholar
Weiwei Fu
View author publications
You can also search for this author in PubMed Google Scholar
Yan Li
View author publications
You can also search for this author in PubMed Google Scholar
Xihong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ming Li
View author publications
You can also search for this author in PubMed Google Scholar
Duo Du
View author publications
You can also search for this author in PubMed Google Scholar
Qianzi Tang
View author publications
You can also search for this author in PubMed Google Scholar
Yudong Cai
View author publications
You can also search for this author in PubMed Google Scholar
Yiming Long
View author publications
You can also search for this author in PubMed Google Scholar
Yue Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Mingzhou Li
View author publications
You can also search for this author in PubMed Google Scholar
Yu Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Mingzhou Li or Yu Jiang.

Ethics declarations

Compliance and ethics The author(s) declare that they have no conflict of interest.

Supplementary Materials for

Tian, et al. Building a sequence map of the pig pan-genome from multiple de novo assemblies and Hi-C data

Supplementary material, approximately 18.9 MB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tian, X., Li, R., Fu, W. et al. Building a sequence map of the pig pan-genome from multiple de novo assemblies and Hi-C data. Sci. China Life Sci. 63, 750–763 (2020). https://doi.org/10.1007/s11427-019-9551-7

Download citation

Received: 07 January 2019
Accepted: 03 April 2019
Published: 08 July 2019
Issue Date: May 2020
DOI: https://doi.org/10.1007/s11427-019-9551-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Building a sequence map of the pig pan-genome from multiple de novo assemblies and Hi-C data

Abstract

Article PDF

Similar content being viewed by others

A chromosome-level genome assembly of the Korean crossbred pig Nanchukmacdon (Sus scrofa)

A chromosome-level genome assembly of the Korean minipig (Sus scrofa)

Pig pangenome graph reveals functional features of non-reference sequences

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Supplementary Materials for

Tian, et al. Building a sequence map of the pig pan-genome from multiple de novo assemblies and Hi-C data

Supplementary material, approximately 18.9 MB.

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Building a sequence map of the pig pan-genome from multiple de novo assemblies and Hi-C data

Abstract

Article PDF

Similar content being viewed by others

A chromosome-level genome assembly of the Korean crossbred pig Nanchukmacdon (Sus scrofa)

A chromosome-level genome assembly of the Korean minipig (Sus scrofa)

Pig pangenome graph reveals functional features of non-reference sequences

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Supplementary Materials for

Tian, et al. Building a sequence map of the pig pan-genome from multiple de novo assemblies and Hi-C data

Supplementary material, approximately 18.9 MB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation