Abstract
Sorghum is a drought-tolerant staple crop for half a billion people in Africa and Asia, an important source of animal feed throughout the world and a biofuel feedstock of growing importance. Cultivated sorghum and its inter-fertile wild relatives constitute the primary gene pool for sorghum. Understanding and characterizing the diversity within this valuable resource is fundamental for its effective utilization in crop improvement. Here, we report analysis of a sorghum pan-genome to explore genetic diversity within the sorghum primary gene pool. We assembled 13 genomes representing cultivated sorghum and its wild relatives, and integrated them with 3 other published genomes to generate a pan-genome of 44,079 gene families with 222.6 Mb of new sequence identified. The pan-genome displays substantial gene-content variation, with 64% of gene families showing presence/absence variation among genomes. Comparisons between core genes and dispensable genes suggest that dispensable genes are important for sorghum adaptation. Extensive genetic variation was uncovered within the pan-genome, and the distribution of these variations was influenced by variation of recombination rate and transposable element content across the genome. We identified presence/absence variants that were under selection during sorghum domestication and improvement, and demonstrated that such variation had important phenotypic outcomes that could contribute to crop improvement. The constructed sorghum pan-genome represents an important resource for sorghum improvement and gene discovery.
Similar content being viewed by others
Data availability
The datasets generated during and/or analysed during current study have been deposited in China National GeneBank database (https://db.cngb.org) under the project CNP0001440 and the Genome Sequence Archive64 in the National Genomics Data Center65, Beijing Institute of Genomics (China National Center for Bioinformation), Chinese Academy of Sciences under accession number CRA003806, which are publicly accessible at https://bigd.big.ac.cn/gsa.
Code availability
The code used in this manuscript is available at the GitHub repository https://github.com/xujiabao507/Sorghum_pangenome.
References
Sakschewski, B., Von Bloh, W., Huber, V., Müller, C. & Bondeau, A. Feeding 10 billion people under climate change: how large is the production gap of current agricultural systems? Ecol. Modell. 288, 103–111 (2014).
Clark, J. D. & Stemler, A. Early domesticated sorghum from Central Sudan. Nature 254, 588–591 (1975).
Wendorf, F. et al. Saharan exploitation of plants 8,000 years bp. Nature 359, 721–724 (1992).
Mace, E. S. et al. Whole-genome sequencing reveals untapped genetic potential in Africa’s indigenous cereal crop sorghum. Nat. Commun. 4, 2320 (2013).
Morris, G. P. et al. Population genomic and genome-wide association studies of agroclimatic traits in sorghum. Proc. Natl Acad. Sci. USA 110, 453–458 (2013).
Zheng, L. Y. et al. Genome-wide patterns of genetic variation in sweet and grain sorghum (Sorghum bicolor). Genome Biol. 12, R114 (2011).
Smith, O. et al. A domestication history of dynamic adaptation and genomic deterioration in Sorghum. Nat. Plants 5, 369–379 (2019).
Dewet, J. M. J. Systematics and evolution of Sorghum-sect Sorghum (Gramineae). Am. J. Bot. 65, 477–484 (1978).
Wiersema, J. H. & Dahlberg, J. The nomenclature of Sorghum bicolor (L.) Moench (Gramineae). Taxon 56, 941–946 (2007).
Waterhouse, R. M. et al. BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol. Biol. Evol. 35, 543–548 (2018).
Paterson, A. H. et al. The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551–556 (2009).
McCormick, R. F. et al. The Sorghum bicolor reference genome: improved assembly, gene annotations, a transcriptome atlas, and signatures of genome organization. Plant J. 93, 338–354 (2018).
Deschamps, S. et al. A chromosome-scale assembly of the sorghum genome using nanopore sequencing and optical mapping. Nat. Commun. 9, 4844 (2018).
Cooper, E. A. et al. A new reference genome for Sorghum bicolor reveals high levels of sequence similarity between sweet and grain genotypes: implications for the genetics of sugar metabolism. BMC Genom. 20, 420 (2019).
Li, H., Feng, X. & Chu, C. The design and construction of reference pangenome graphs with minigraph. Genome Biol. 21, 265 (2020).
Li, Y. H. et al. De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits. Nat. Biotechnol. 32, 1045–1052 (2014).
Gordon, S. P. et al. Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure. Nat. Commun. 8, 2184 (2017).
Wang, W. et al. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature 557, 43–49 (2018).
Zhao, Q. et al. Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice. Nat. Genet. 50, 279–284 (2018).
Gao, L. et al. The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor. Nat. Genet. 51, 1044–1051 (2019).
Deu, M., Rattunde, F. & Chantereau, J. A global view of genetic diversity in cultivated sorghums using a core collection. Genome 49, 168–180 (2006).
Gobena, D. et al. Mutation in sorghum LOW GERMINATION STIMULANT 1 alters strigolactones and causes Striga resistance. Proc. Natl Acad. Sci. USA 114, 4471–4476 (2017).
Zhang, L.-M. et al. Sweet sorghum originated through selection of Dry, a plant-specific NAC transcription factor gene. Plant Cell 30, 2286–2307 (2018).
Zhang, Z. et al. Genome-wide mapping of structural variations reveals a copy number variant that determines reproductive morphology in cucumber. Plant Cell 27, 1595–1604 (2015).
Jovelin, R. & Cutter, A. D. Fine-scale signatures of molecular evolution reconcile models of indel-associated mutation. Genome Biol. Evol. 5, 978–986 (2013).
Swanson-Wagner, R. A. et al. Pervasive gene content variation and copy number variation in maize and its undomesticated progenitor. Genome Res. 20, 1689–1699 (2010).
Lin, Z. et al. Parallel domestication of the Shattering1 genes in cereals. Nat. Genet. 44, 720–724 (2012).
Liu, S. et al. Overexpression of a CPYC-type glutaredoxin, OsGrxC2.2, causes abnormal embryos and an increased grain weight in rice. Front. Plant Sci. 10, 848 (2019).
Tao, Y. F. et al. Novel grain weight loci revealed in a cross between cultivated and wild sorghum. Plant Genome 11, 170089 (2018).
Yu, Y. C. et al. Independent losses of function in a polyphenol oxidase in rice: differentiation in grain discoloration between subspecies and the role of positive selection under domestication. Plant Cell 20, 2946–2959 (2008).
Tao, Y. et al. Large-scale GWAS in sorghum reveals common genetic control of grain size among cereals. Plant Biotechnol. J. 18, 1093–1105 (2020).
Chopra, S., Brendel, V., Zhang, J., Axtell, J. D. & Peterson, T. Molecular characterization of a mutable pigmentation phenotype and isolation of the first active transposable element from Sorghum bicolor. Proc. Natl Acad. Sci. USA 96, 15330–15335 (1999).
Sweeney, M. T., Thomson, M. J., Pfeil, B. E. & McCouch, S. Caught red-handed: Rc encodes a basic helix–loop–helix protein conditioning red pericarp in rice. Plant Cell 18, 283–294 (2006).
Hufford, M. B. et al. Comparative population genomics of maize domestication and improvement. Nat. Genet. 44, 808–811 (2012).
Xu, X. et al. Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes. Nat. Biotechnol. 30, 105–111 (2012).
Golicz, A. A. et al. The pangenome of an agronomically important crop plant Brassica oleracea. Nat. Commun. 7, 13390 (2016).
Tao, Y., Zhao, X., Mace, E., Henry, R. & Jordan, D. Exploring and exploiting pan-genomics for crop improvement. Mol. Plant 12, 156–169 (2019).
Montenegro, J. D. et al. The pangenome of hexaploid bread wheat. Plant J. 90, 1007–1013 (2017).
Jensen, S. E. et al. A sorghum practical haplotype graph facilitates genome-wide imputation and cost-effective genomic prediction. Plant Genome 13, e20009 (2020).
Wang, B., et al. Pan-genome analysis in sorghum highlights the extent of genomic variation and sugarcane aphid resistance genes. Preprint at bioRXiv https://doi.org/10.1101/2021.01.03.424980 (2021).
Hackl, T., Hedrich, R., Schultz, J. & Forster, F. proovread: large-scale high-accuracy PacBio correction through iterative short read consensus. Bioinformatics 30, 3004–3011 (2014).
Chen, Y. et al. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. GigaScience 7, gix120 (2018).
Marcais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).
Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).
Ruan, J. & Li, H. Fast and accurate long-read assembly with wtdbg2. Nat. Methods 17, 155–158 (2020).
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE 9, e112963 (2014).
Daccord, N. et al. High-quality de novo assembly of the apple genome and methylome dynamics of early fruit development. Nat. Genet. 49, 1099–1106 (2017).
Kajitani, R. et al. Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res. 24, 1384–1395 (2014).
Ye, C., Hill, C. M., Wu, S., Ruan, J. & Ma, Z. S. DBG2OLC: efficient assembly of large genomes using long erroneous reads of the third generation sequencing technologies. Sci. Rep. 6, 31900 (2016).
Boetzer, M., Henkel, C. V., Jansen, H. J., Butler, D. & Pirovano, W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27, 578–579 (2011).
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).
Stanke, M. & Waack, S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19, ii215–ii225 (2003).
Korf, I. Gene finding in novel genomes. BMC Bioinform. 5, 59 (2004).
Cantarel, B. L. et al. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 18, 188–196 (2008).
Li, L., Stoeckert, C. J. Jr. & Roos, D. S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003).
Hufford, M.B., et al. De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. Preprint at bioRXiv https://doi.org/10.1101/2021.01.14.426684 (2021).
Wang, Y. P. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).
Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).
Sun, S. et al. Extensive intraspecific gene order and gene structural variations between Mo17 and other maize genomes. Nat. Genet. 50, 1289–1295 (2018).
Butler, D. G., Cullis, B. R., Gilmour, A. R. & Gogel, B. J. Technical Report: ASReml-R Reference Manual (Queensland Department of Primary Industries, 2009); http://www.vsni.co.uk/software/asreml/
Liu, X., Huang, M., Fan, B., Buckler, E. S. & Zhang, Z. Iterative usage of fixed and random effect models for powerful and efficient genome-wide association studies. PLoS Genet. 12, e1005767 (2016).
Wang, Y. Q. et al. GSA: Genome Sequence Archive. Genom. Proteom. Bioinformatics 15, 14–18 (2017).
Zhang, Z. et al. Database resources of the National Genomics Data Center in 2020. Nucleic Acids Res. 48, D24–D33 (2020).
Acknowledgements
This work was undertaken as part of the initiative ‘Adapting Agriculture to Climate Change: Collecting, Protecting and Preparing Crop Wild Relatives’, which is supported by the Government of Norway. The project is managed by the Global Crop Diversity Trust with the Millennium Seed Bank of the Royal Botanic Gardens, Kew and implemented in partnership with national and international gene banks and plant breeding institutes around the world. For further information, see the project website: http://www.cwrdiversity.org/. This work was also supported by funding from the Australian Research Council through the Centre of Excellence for Translational Photosynthesis (CE1401000015), National Key R&D Program of China (2019YFD1002701 and 2018YFD1000701) and Strategic Priority Research Program of Chinese Academy of Sciences (XDA26050101).
Author information
Authors and Affiliations
Contributions
E.M., H.J., D.J. and Y.T. designed this study and coordinated the project. Y.T., X.Z., A.C. and A.H. selected samples and conducted field work. T.S., Y.L. and X.W. collected samples. J.X. and F.T. carried out the genome assembly and annotation. Y.T., H.L. and F.T. performed pan-genome analysis. Y.T. and J.X. conducted variation detection, phylogenetic analysis and selection analysis. Y.T., X.Z. and A.H. carried out GWAS analysis. Y.T. wrote the manuscript, E.M. and D.J. edited the manuscript. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Plants thanks Zhangjun Fei and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 A snapshot of graph-based sorghum pan-genome.
This pan-genome graph shows variation within a LGS1 region on Chromosome 5. The graph was visualised using Bandage. Yellow colour highlights the sequence segment containing LGS1. Grey colour indicates sequence segments from the reference genome BTx623. Green colour indicates sequence segments from genomes other than BTx623.
Extended Data Fig. 2 Comparison between core, shell and cloud genes.
A CDS length. Core genes are significantly longer than shell and cloud genes (p-value<2.2e-16, Wilcoxon signed rank, two-sided). B number of exons. Core genes have significantly more exons than shell and cloud genes (p-value<2.2e-16, Wilcoxon signed rank, two-sided). Sample size: core, n = 15,867; shell, n = 28,026; cloud genes, n = 186. In the box plots, center lines represent the median, the bottom and top of boxes represent the first and third percentiles, whiskers show the data that lie within the 1.5 interquartile range of the first and third quartiles.
Extended Data Fig. 3 Comparison of expression level between core, shell and cloud genes.
Expression level (FPKM, Fragments Per Kilobase of transcript per Million mapped reads) of core, shell and cloud genes were measured in six samples. Core genes consistently showed a higher expression level compared to shell and cloud genes across six genomes (p-value<2.2e-16, Wilcoxon signed rank, two-sided). Sample size in the six genomes, 353: core, n = 22,522; shell, n = 13,873; cloud, n = 78, IS3614-3: core, n = 20,786; shell, n = 12,648; cloud, n = 12, IS8525: core, n = 21,223; shell, n = 13,365; cloud, n = 12, IS929: core, n = 21,702; shell, n = 12,860; cloud, n = 35, Ji2731: core, n = 22,251; shell, n = 13,874; cloud, n = 57, PI525695: core, n = 20,445; shell, n = 11,372; cloud, n = 35. In the box plots, center lines represent the median, the bottom and top of boxes represent the first and third percentiles, whiskers show the data that lie within the 1.5 interquartile range of the first and third quartiles.
Supplementary information
Supplementary Information
Supplementary notes, Figs. 1–22 and Tables 1–16, 18–20 and 24–27.
Supplementary Tables
Supplementary Tables 17, 21, 22 and 23.
Rights and permissions
About this article
Cite this article
Tao, Y., Luo, H., Xu, J. et al. Extensive variation within the pan-genome of cultivated and wild sorghum. Nat. Plants 7, 766–773 (2021). https://doi.org/10.1038/s41477-021-00925-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41477-021-00925-x
- Springer Nature Limited
This article is cited by
-
Pangenome and multi-tissue gene atlas provide new insights into the domestication and highland adaptation of yaks
Journal of Animal Science and Biotechnology (2024)
-
Pangenome characterization and analysis of the NAC gene family reveals genes for Sclerotinia sclerotiorum resistance in sunflower (Helianthus annuus)
BMC Genomic Data (2024)
-
Technology-enabled great leap in deciphering plant genomes
Nature Plants (2024)
-
Genome-wide identification of the sorghum OVATE gene family and revelation of its expression characteristics in sorghum seeds and leaves
Scientific Reports (2024)
-
Detection of colinear blocks and synteny and evolutionary analyses based on utilization of MCScanX
Nature Protocols (2024)