Plant genetic information was stored in the DNA sequences, which ultimately control plant traits, including response to different environmental stresses genetically or epigenetically (Han et al. 2022). Thus, decoding genome sequences will be the first step and the foundation to understand DNA genetic information. Since the genome sequence of Arabidopsis thaliana was fully sequenced and assembled in 2000 (The Arabidopsis Genome Initiative 2000), more and more plant genomes have been sequenced, assembled, and deposited into the public databases. Based on a recent study, a total of 798 plant species have been fully sequenced and assembled (Marks et al. 2021). According to this survey, of the currently known 137 land plant orders (Freiberg et al. 2020), there are 62 orders that have at least one plant species been sequenced and assembled (Marks et al. 2021), whose genomes are currently public available in the databases, such as GenBank, Phytozome, and PlantEnsemble. However, the distribution of these sequenced genomes was unbalanced in different plant orders and families, such as 83, 80, and 67 sequenced and assembled genomes were reported in Brassicales, Poales, and Lamiables, respectively. These plant species belong to 146 families, in which 80 species were sequenced and assembled in Brassicaceae, followed by Poaceae and Fabaceae that have 76 and 54 species fully sequenced, respectively. The majority (66) of sequenced families only had one species reported; there are only 16 families in which there are more than 10 species sequenced and assembled (Table 1).

Table 1 A total of 798 plant species, belonging to 146 families in 62 orders, with fully genomes sequenced and assembled*

The 798 plant species with sequenced and assembled genomes include the following: (1) agriculturally important crops, such as rice (Oryza sativa), corn (Zea mays), soybeans (Glycine max), peanut (Arachis hypogaea), potato (Solanum tuberosum), sesame (Sesamum indicum), and cotton (Gossypium hirsutum); (2) vegetables, such as tomato (Solanum lycopersicum), sweet potato (Ipomoea batatas), radish (Raphanus sativus), and cabbage (Brassica oleracea); (3) fruits, such as apple (Malus domestica), pear (Pyrus communis), banana (Musa acuminata), and strawberry (Fragaria ananassa); (4) ornamental flowers and trees, such as Eschscholzia californica, Macleaya cordata, Aquilegia coerulea, Rosa chinensis, Callicarpa americana, Rhododendron delavayi, Antirrhinum majus, and Liriodendron chinense; (5) architectural and industrial trees, such as Trema orientale, Mesua ferrea, Hevea brasiliensis (rubber tree), Abies alba, and Sequoia sempervirens; (6) medical plants, such as Trichopus zeylanicus, Glycyrrhiza uralensis, Quillaja saponaria, Rehmannia glutinosa, Salvia miltiorrhiza, Andrographis paniculata, Coptis chinensis, Cannabis sativa, and Aquilaria sinensis, which produce lots of traditional medicines for treating various human diseases; and (7) model plant species, such as Arabidopsis and the wild species of plant species mentioned above (Marks et al. 2021). The genomes of several special plant species are also sequenced, which include carnivorous plant species, such as Dionaea muscipula (Venus flytrap), Aldrovanda vesiculosa, Utricularia reniformis, and Utricularia gibba. From here, we clearly see that the majority of currently sequenced plant genomes are economically important plant species (human related) and their wild species, such as domesticated (135), cultivated (127), and natural commodity (120) plant species (Marks et al. 2021). Although there are about half of sequenced species are wild species, many of them are the relatives to agriculturally and economically important plants (Marks et al. 2021). To consider the species population of both wild and domesticated species (in which only less than 0.5% of species are domesticated species and more than 99.5% of species are wild species), currently genome sequencing is more preferred to the domesticated plants and their wild relatives.

Due to the cost and technical difficulty, scientists always initially selected species with a small and simple genome size for whole genome sequencing, such as A. thaliana was selected for the first sequenced plant species (The Arabidopsis Genome Initiative 2000). Gossypium raimondii, a diploid ancestor of cultivated tetraploid cotton, was firstly sequenced for cotton (Wang et al. 2012; Peng et al. 2021). Both of them are diploid plant species with small genome size and then move to polyploid species with a more complicated and big genome size (Peng et al. 2021). Currently, all potential genomes can be quickly sequenced and assembled (Marks et al. 2021). This is evidenced by the rapid expansion of plant genomes sequenced and assembled in the past one decade. More than 75% of assembled plant genomes were performed in the last 5 years (Marks et al. 2021), particularly for the complicated polyploid plant species, such as cotton (Gossypium hirsutum) (Li et al. 2015; Zhang et al. 2015), oilseed rape (Brassica napus) (Bancroft et al. 2011), and wheat (Triticum aestivum) (Maccaferri et al. 2019; Budak et al. 2021). The quick development of genome sequencing and assembly is associated with the rapidly developed sequencing technology and associated genome assembling computational tools. In the past two decades, the next-generation deep sequencing (NGS) technologies have been revolutionarily developed (Van Dijk et al. 2018), particularly for the long sequence reading NGS, such as PacBio (www.pacb.com) and Nanopore (www.nanoporetech.com/), which provide more reliable platforms for sequencing more complicated genomes. The mean contig N50 was increased dramatically from 99.5 ± 48.1 kb in 2010 to 3395.2 ± 735.4 kb in 2020 (Marks et al. 2021), majorly contributed by advanced NGS with long read capacity. NGS not only decodes the genomes but also directly detects the different types of DNA base modification (Van Dijk et al. 2018), which provides a powerful tool for studying the genome variance, modification, diversity, structure, evolution, and functions at both the genetic and epigenetic levels. Whole genome sequencing also facilitates the study not only on coding sequences but also on the dark matter (Kaur and Zhang 2022), noncoding sequences, such as microRNAs, which has shown significant roles in many biological processes in both plants and animals (Li and Zhang 2016; Zhang and Unver 2018; Gebert and Macrae 2019).

Plant genome sequencing was associated with the economic development and the funding investment for scientific research (Marks et al. 2021). Among the 798 sequenced plant species, 77% of sequencing was performed in China (235), the USA (212), and the European nations (168). There are only few species that were sequenced and assembled in other countries. Although there are lots of native species in Africa and South America, including lots of important species, the genome sequencing and assembling of these species were majorly performed by off-continents, mentioned early by China, USA, or the European nations (Marks et al. 2021). As discussed by the authors of Nature Plant paper (Marks et al. 2021), more international collaboration should be performed to sequence and assembly more plant genomes for the world’s sustainable development.

Although great progress has been made in plant genome sequencing and assembling, plant genome sequencing is just in the start, and it is quickly moving into the next stage for rapid development and application. In the following decade, more and more progresses will be achieved, which include but not limited to the following fields:

The genome sequencing technique and associated computational tools for analyzing genome sequencing data and genome assembly will be further improved. This will include sequencing techniques that can precisely read long sequences, such as current PacBio and Nanopore. It will also significantly improve the assembling accuracy if we switch sequencing the entire genome to sequencing an individual chromosome. Chromosome sequencing has many advantages, including avoiding potential repetitive sequence confusions and enhancing the precisely assembling the entire genome sequences. In the next couple of years, chromosome level sequencing technology will enhance the genome sequencing and its application.

Current genome sequencing is majorly focused on agriculturally and economically important plant species and their wild relatives. As further reduced the costs of genome sequencing and assembling, other environmentally important and endangered species will become the new targets for whole genome sequencing. Quick development of the entire world, particularly due to industrialization and its associated global issues, such as global warming and emerging environmental pollutants, is threatening our earth, which further negatively contribute to the biodiversity and sustainable development. How to rescue the endangered species is becoming a big challenge for our entire community. At least, from genome sequencing aspect, we can fully sequence these endangered plant species and store their genetic information that is necessary for the future artificial intelligence (AI) technology to recover these species.

More research will focus on the temporal and spatial structures and functions of genomes as well as the comparison of multiple genomes in same species and across different species (pan-genome sequencing). These studies will further allow us to understand the fine structures, including 3D structures and the relationship between structures and functions and elucidate how DNA structure changes, even fine change (such as DNA methylation), impact gene function and response to different environment changes (eco-genomics and environmental genomics).

Genome sequencing is opening a new window and field for better usage of plants for our human and environmentally sustainable development. The decoded genetic information provides new targets for improving crop yield, quality, and response to various environmental biotic and abiotic stresses as well as generating new secondary components, such as pharmaceutical drugs. Particularly with the newly developed genome editing tools, such as clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein (Cas) (Jinek et al. 2012; Cheng et al. 2013; Cong et al. 2013), it is easier for our human beings to operate an individual genome sequence at a more precise way for various economic and health purposes. Based on the genome sequencing information, by using various versatile CRISPR tools (Jogam et al. 2022; Li et al. 2022a), scientists already precisely targeted on certain genes for improving crops (Li et al. 2021, 2022b; Zhang et al. 2021) and gene therapy (Zhang 2021).

As rapid development of deep sequencing techniques and associated computational programming as well as the decreased cost of these techniques, we believe plant genome sequencing and assembling are stepping into its golden age. Genome sequencing will further facilitate the genome biotechnology and its application in agriculture, environment, and pharmaceuticals as well as industry for the world’s food and health safety and security as well as sustainable development.