Chromosome-level genome assembly of Acrossocheilus fasciatus using PacBio sequencing and Hi-C technology

Zheng, Jianbo; Jiang, Jianhu; Rui, Qianlong; Li, Fei; Liu, Shili; Cheng, Shun; Chi, Meili; Jiang, Wenping

doi:10.1038/s41597-024-02999-6

Chromosome-level genome assembly of Acrossocheilus fasciatus using PacBio sequencing and Hi-C technology

Data Descriptor
Open access
Published: 03 February 2024

Volume 11, article number 166, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Data

Chromosome-level genome assembly of Acrossocheilus fasciatus using PacBio sequencing and Hi-C technology

Download PDF

Jianbo Zheng¹,
Jianhu Jiang¹,
Qianlong Rui^1,2,
Fei Li¹,
Shili Liu ORCID: orcid.org/0000-0002-3929-649X¹,
Shun Cheng¹,
Meili Chi¹ &
…
Wenping Jiang¹

1905 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Acrossocheilus fasciatus (Cypriniformes, Cyprinidae) is emerged as a newly commercial stream fish in the south of China with high economic and ornamental value. In this study, a chromosome-level reference genome of A. fasciatus was assembled using PacBio, Illumina and Hi-C sequencing technologies. As a result, a high-quality genome was generated with a size of 879.52 Mb (accession number: JAVLVS000000000), scaffold N50 of 32.7 Mb, and contig N50 of 32.7 Mb. The largest and smallest scafford was 60.57 Mb and 16 kb, respectively. BUSCO analysis showed a completeness score of 98.3%. Meanwhile, the assembled sequences were anchored to 25 pseudo-chromosomes with an integration efficiency of 96.95%. Additionally, we found approximately 390.91 Mb of repetitive sequences that accounting for 44.45% of the assembled genome, and predicted 24,900 protein-coding genes. The available genome reported in the present study provided a crucial resource to further investigate the regulation mechanism of genetic diversity, sexual dimorphism and evolutionary histories.

Chromosome-level genome assembly of Plagiognathops microlepis based on PacBio HiFi and Hi-C sequencing

Article Open access 19 July 2024

A high-quality chromosome-level genome assembly of the topmouth culter (Culter alburnus Basilewsky, 1855)

Article Open access 22 August 2024

A chromosome-level genome assembly of East Asia endemic minnow Zacco platypus

Article Open access 27 March 2024

Background & Summary

The genus Acrossocheilus belongs to Barbinae, Cyprinidae, and is composed of approximately 26 species, which are mainly native in Laos, Vietnam, and China¹. Meanwhile, these groups exhibit diversiform morphological characteristics and ecological habits, providing a great model for investigating species origin and geographical distribution of freshwater fish². In addition, it’s flesh is tender, delicious and contains highly polyunsaturated fatty acids (PUFA), possessing a considerable market value. Recently, the freshwater grouper A. fasciatus has become a commercially emerging aquaculture fish due to its nutritive and ornamental value³. Moreover, as an omnivorous fish, the growth of A. fasciatus requires to feed with moss and other algae plants, which can inhibit the rankness of these aquatic plants, thus playing a role in ecological balance. Previous studies of A. fasciatus have primarily focused on its embryos and larval development, gonad histological characteristics, phylogenetic relationships, population structure,and artificial breeding^4,5,6. On the other hand, A. fasciatus represents significant difference in growth rate and body size between males and females, with females growing faster than males (Fig. 1a), indicating all-female breeding is of high commercial value in aquaculture⁷. However, our knowledge of A. fasciatus on genetic and evolutionary mechanisms have been limited due to lack of genetic resources and genomic information. In this study, we employed an integrated strategy of PacBio, Illumina and Hi-C sequencing technologies to assemble a high-quality genome in a size of 879.52 Mb with scaffold N50 of 32.7 Mb (Fig. 1b,c). We believe that this high-quality of chromosomal-level genome data will provide valuable resources for breeding programs and evolutionary investigation.

Methods

Sample collection and nucleic acid extraction

Mature and healthy A. fasciatus were obtained from Zhejiang institute of freshwater fisheries in Huzhou, Zhejiang province, China. Muscle tissues from adult female A. fasciatus was prepared for DNA extraction with SDS lysis method, while ovary, kidney, brain, testis, skin, and gill were collected for total RNA extraction using a TRIzoL kit following the manufacturer’s protocol. Herein, the high-quality gDNA was used for genome sequencing, and total RNA isolated from all tissues were used for transcriptome sequencing.

Library construction and genome sequencing

For the Illumina platform (NEB, USA), a paired-end library with an insertion size of 350 bp was generated using NEB Next® Ultra™ DNA Library Prep Kit following manufacturer’s recommendations. As a result, a total of 41 Gb Illumina short-reads (coverage of 47.56X, Table 1) with paired-end 150 bp were generated. Simultaneously, HiFi SMRTbell Libraries was prepared using SMRTbell Express Template Prep Kit 2.0 for long-read sequencing with insert size of 20 kb on Pacbio platform. In briefly, gDNA was sheared to 6–20 kb fragments using the g-TUBE, and the ssDNA overhangs were removed with Exo VII. Then DNA damage was repaired for Blunt-End ligation, and large insert SMRTbell libraries were constructed after size selection to prepare for sequencing use DNA Sequencing Reagent Kit. For the PacBio platform, approximately 32 Gb PacBio reads (37.12X coverage, Table 1) were obtained with the longest read of 47.52 kb and the N50 length of 14.56 kb.

Table 1 Statistics of the sequencing data for the A. fasciatus genome assembly.

Full size table

Genome size estimation and assembly

Herein, clean data generated from Illumina sequencing were subjected to k-mer analysis to estimate the genome size, heterozygosity, and the proportion of repetitive sequences in A. fasciatus. Based on 17-mer frequency distribution using Jellyfish v2.3.0⁸ and GenomeScope v2.0⁹, the genome size was estimated to be 862.9 Mb, with a heterozygosity ratio of 0.56% and repeat sequence ratio of 47.09% (Fig. 1d). The 32.66 Gb raw subreads from the PacBio Sequel platform were filtered out, and the remaining clean subreads were error-corrected by Canu (v1.5)¹⁰ and pre-assembled into contigs using FALCON software¹¹. The assembled scaffolds were polished by Pilon (v1.22)¹² with default parameters. The finally assembled genome was 879.52 Mb in size with 134 contigs and a contig N50 of 32.70 Mb (Table 2).

Table 2 Summary of the assembled genome for A. fasciatus genome.

Full size table

Hi-C library preparation and sequencing

The Hi-C libraries were constructed following the standard protocol described previously with certain modifications. Firstly, female muscle samples were cross-linked by 4% formaldehyde, and the fixed tissues were homogenised and centrifuged to collect the nuclei, then digested with Mbo I enzyme overnight at 37 °C. The proximal chromatin DNA was re-ligated using T4 ligase, and Biotin-labeled Hi-C samples were specifically enriched using magnetic beads. After adding A-tails to the fragment ends, Hi-C sequencing libraries were amplified by PCR and sequenced on Illumina HiSeq-2500 platform (PE 150 bp). For chromosome-level assembly, the raw Hi-C sequencing data were primarily filtered using Hi-C-Pro v2.8.0¹³, and the high-quality clean reads were aligned to the polished A. fasciatus genome using BWA (v0.7.10)¹⁴ with default parameters (samtools sort sample.sam–output-fmt BAM–o sample.sort.bam). Finally, 96.95% of the initial assembled sequences were anchored to 25 pseudo-chromosomes that ranged in size from 24.09 to 54.14 Mb (Fig. 2a, Table S1), and the total length of the genome assembly was 879.52 Mb with a contig N50 of 22.57 Mb, and scaffold N50 of 33.13 Mb (Table 2).

Repetitive sequence annotation

Repeat elements in the A. fasciatus genome were annotated employing a combined methods of homology alignment and de novo searches. The homology-based blast was performed against the RepBase data base (http://www.girinst.org/repbase/)¹⁵ using Repeatmasker and repeatproteinmask software for known repeat elements. For de novo annotation, we firstly employed LTR_FINDER¹⁶, RepeatModeler¹⁷ and RepeatScout¹⁸ to bulid a de novo repeat library, and then was used to predict repeat elements using Repeatmasker with default parameters. Additionally, Tandem Repeats can be identified using Tandem Repeat Finder (TRF, http://tandem.bu.edu/trf/trf.html)¹⁹. In this study, we identified 390.91 Mb of repetitive sequences, accounting for 44.45% of the assembled genome (Table 3).

Table 3 Classification of the predicted repeat sequences in the genome of A. fasciatus.

Full size table

Gene prediction and functional annotation

Protein-coding genes were annotated through integrating three different strategies of homology, de novo, and transcriptome-based prediction methods. For homology-based gene prediction, the published protein sequences of Sinocyclocheilus grahami, Puntius tetrazona and Carassius auratus were aligned to the A. fasciatus genome assembly using BLAST²⁰ and Genewise²¹ with default parameters. Five de novo programs, including Augustus²², GlimmerHMM²³, SNAP²⁴, GeneID²⁵ and GENSCAN²⁶, were used to predict coding regions in the repeat-masked assembly with default parameters. For the transcriptome-based annotation, the RNA-seq data were de novo assembled by Trinity (v2.1.1)²⁷ and splicing variations were identified by PASApipeline (v2.4.1)²⁸. Finally, a non-redundant reference gene set was established by merging the above three methods, resulting in a total of 24,900 protein-coding genes (Fig. 2b, Table 4). Simultaneously, we compared the gene parameters of different elements in A. fasciatus and three relative species (S. grahami, C. auratus, P. tetrazona), and the result showed a similar distribution of coding DNA sequence (CDS) length, exon length and number, intron length and mRNA length among the sequenced fish genomes (Fig. 2c).

Table 4 Statistical analysis of predicted protein-coding genes in A. fasciatus genome.

Full size table

Furthermore, all predicted genes were functionally annotated using public biological function databases of SwissPro²⁹, Nr (http://www.ncbi.nlm.nih.gov/protein), KEGG³⁰ and InterPro³¹ and Pfam (http://pfam.xfam.org/). Overall, a total of 24,000 genes (96.40%) were successfully annotated with an average transcript length of 15,927.24 bp and an average CDS length of 1,627.71 bp (Table 5). In addition, non-coding RNAs (ncRNAs) were also annotated, and tRNAscan-SE (v2.0)³² was used to predict tRNAs, and Infernal (1.1)³³ was used to identify rRNAs, snRNAs, and miRNAs. In total, 43,620 non-coding RNAs were predicted, including 17,604 tRNAs, 9,157 rRNAs, 2,606 miRNAs and 2,548 snRNAs (Table 6).

Table 5 Summary of functional annotation in A. fasciatus genome.

Full size table

Table 6 Statistics of annotated non-coding RNAs in the A. fasciatus genome assembly.

Full size table

Gene family construction

Firstly, the protein sequences of other 13 fish species, including P. tetrazona, S. grahami, C. auratus, Opsariichthys bidenswere, Cyprinus carpio, Danio rerio, Ictalurus punctatus, Megalobrama amblycephala, Ctenopharyngodon idellus, Micropterus salmoides, Oreochromis niloticus, Cynoglossus semilaevis, Larimichthys crocea, were downloaded from the public database. The low quality of sequences with less than 50 amino acids were then filtered out and only retained the longest predicted transcript per locus. Next, similarities between the protein sequences of all species were identified employing an all-to-all BLAST search with an e-value of 1e-5. Finally, orthologous gene clusters were performed using the the OrthoMCL³⁴. In summary, we identified 27,983 gene families shared by A. fasciatus and the additional 13 species, and 10,524 gene families and 604 single-copy gene families were found in all species, respectively (Fig. 3a). Moreover, gene families from A. fasciatus, O. bidens, S. grahami, D. rerio, C. carpio and C. auratus, were further clustered, of which 13,850 gene families were shared by these fish species, and 262 gene families were specific to A. fasciatus (Fig. 3b). In addition, functional annotation was conducted for unique gene families in A. fasciatus, and revealed that Phosphatidylinositol signaling system, GABAergic synapse, Vitamin digestion and absorption, Lysine degradation, Synaptic vesicle cycle were enriched.

Phylogenetic and evolutionary analysis

All single-copy gene families were subjected to multiple sequence alignment to generated a super alignment matrix by MUSCLE³⁵, and a phylogenetic tree was constructed using RAxML³⁶. Subsequently, the MCMCTree package in PAML³⁷ was used to estimate divergence times. As expected, evolutionary analysis demonstrated that A. fasciatus and P. tetrazona were clustered into one clade, and their divergence time was estimated to be 156.3 million years ago (Fig. 3c). Furthermore, gene expansions and contractions were analyzed employing CAFE (v3.1)³⁸ with default parameters based on the the divergence times and phylogenetic relationships. A total of 38 and 135 gene families significantly expanded and contracted in A. fasciatus, respectively (Fig. 3d). Finally, chromosome synteny between A. fasciatus and D. rerio were carried out using MCScanX software³⁹, and visual diagram was generated by Circos. Synteny relationships analysis showed that the chromosomes of A. fasciatus displayed a high homology with the D. rerio chromosomes (Fig. 3e).

Data Records

All sequencing data had been uploaded to NCBI database via the project PRJNA1012810. The genomic Illumina sequencing data were deposited in the Sequence Read Archive at SRR25949940⁴⁰, SRR25949941⁴¹. The genomic PacBio sequencing data were deposited in the SRA at NCBI SRR25933437⁴². The transcriptomic sequencing data were deposited in the SRA at NCBI SRR25949840⁴³, SRR25949841⁴⁴, SRR25949842⁴⁵, SRR25949843⁴⁶, SRR25949844⁴⁷, SRR25949845⁴⁸. The Hi-C sequencing data were deposited in the SRA at NCBI SRR25947115⁴⁹, SRR25947116⁵⁰, SRR25947117⁵¹. The final chromosome assembly was deposited in the GenBank at NCBI with accession number: JAVLVS000000000⁵². The genome annotation file was also available in figshare⁵³. The data for the gene family construction was available in the figshare database⁵⁴.

Technical Validation

DNA quantification and qualification

DNA degradation and contamination was monitored on 1.5% agarose gels. DNA purity was checked using the NanoPhotometer® spectrophotometer (IMPLEN, CA, USA). DNA concentration was measured using Qubit® DNA Assay Kit in Qubit® 2.0 Fluorometer (Life Technologies, CA, USA).

Quality control of raw sequencing data

To make sure reads reliable and without artificial bias (low quality paired reads, which mainly resulted from base-calling duplicates and adapter contamination) in the following analyses, raw data were firstly processed through a series of quality control (QC) procedures in-house C scripts. QC standards as the following: (1) Removing reads with ≥ 10% unidentified nucleotides (N); (2) Removing reads with >50% bases having phred quality <5.

RNA quality evaluation

Before transcriptomes sequecing, the quality of total RNA from six tissues was validated. The concentration was measured by Qubit Fluorometr, and the integrity was detected using Aglient 2100 Bioanalyzer. Overall, RNAs samples with a total RNA amount ≧ 10 μg, RNA integrity ≧ 8, and rRNA ratio ≧ 1.5 were served as libraries construction.

Evaluation of the assembled genome

The completeness and accuracy of the A. fasciatus genome assembly were evaluated by multiple methods. First, Benchmarking Universal Single-Copy Orthologs (BUSCO, v5.4.4)⁵⁵ and Core Eukaryotic Genes Mapping Approach (CEGMA, v2.5)⁵⁶ were used to assess the completeness of the assembled genome. The BUSCO results revealed that 98.3% of the complete BUSCOs and 0.7% of the fragmented BUSCOs were found in 3640 single-copy orthologs of actinopterygii_odb10, and 1.0% of BUSCOs was missing. Moreover, CEGMA evaluation showed that 96.77% (240/248) core eukaryotic genes (CEGs) were obtained. In addition, Merqury (v1.3)⁵⁷ was ran to evaluate the accuracy of genome assembly, and a high quality value (QV) of 44.81 indicated that this assembly was of good quality. Taken together, these results suggested that the assembled A. fasciatus genome was of high quality at chromosome level.

Code availability

No special codes or scripts were used in this work, and Data processing was carried out based on the protocols and manuals of the corresponding bioinformatics software.

References

Zhao, D. et al. Natural selection drives the evolution of mitogenomes in Acrossocheilus. PloS one 17(10), e0276056 (2022).
Article MathSciNet CAS PubMed PubMed Central Google Scholar
Han, C. et al. Characteristics and phylogenetic studies of Acrossocheilus parallens (Cypriniformes, Barbinae) complete mitochondrial genome. Mitochondrial. DNA A DNA Mapp. Seq. Anal. 27(6), 4708–4709 (2016).
CAS PubMed Google Scholar
Fu, S. et al. A histological study of testis development and ultrastructural features of spermatogenesis in cultured Acrossocheilus fasciatus. Tissue Cell. 48(1), 49–62 (2016).
Article PubMed Google Scholar
Zhang, L. & Yang, J. Genetic diversity and population demography of the endemic species Acrossocheilus longipinnis (Teleostei, Cyprinidae) based on mtDNA COI and cyt b gene sequences. Mitochondrial. DNA A DNA Mapp. Seq. Anal. 29(3), 403–408 (2018).
MathSciNet Google Scholar
Zhang, L. & Yang, J. Genetic diversity and population structure of Acrossocheilus yunnanensis (Teleostei, Cyprinidae) inferred from four mitochondrial gene sequences. Mitochondrial. DNA A DNA Mapp. Seq. Anal. 29(4), 606–614 (2018).
MathSciNet Google Scholar
Wei, W. et al. Integrated mRNA and miRNA expression profile analysis of female and male gonads in Acrossocheilus fasciatus. Biology. 11(9), 1296–1300 (2022).
Article CAS PubMed PubMed Central Google Scholar
Yan, Y. et al. Age and growth of Acrossocheilus fasciatus (Barbinae, Cyprinidae) from the Puxi Stream in the Huangshan Mountain, China. J. Freshwater Ecol. 25(1), 79–83 (2011).
Article Google Scholar
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 27(6), 764–770 (2011).
Article PubMed PubMed Central Google Scholar
Ranallo-Benavidez, T. et al. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun. 11(1), 1432 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k -mer weighting and repeat separation. Genome Res. 27(5), 722–736 (2017).
Article CAS PubMed PubMed Central Google Scholar
Chen-Shan, C. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 13(12), 1050–1054 (2016).
Article Google Scholar
Bruce, J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 9(11), e112963 (2017).
Google Scholar
Burton, J. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31(12), 1119–1125 (2013).
Article CAS PubMed PubMed Central Google Scholar
Langmead, B. & Salzberg, S. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9(4), 357–359 (2012).
Article CAS PubMed PubMed Central Google Scholar
Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenetic Genome Res. 110(1-4), 462–467 (2005).
Article CAS Google Scholar
Zhao, X. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, 265–268 (2007).
Article Google Scholar
Abrusán, G. et al. TEclass–a tool for automated classification of unknown eukaryotic transposable elements. Bioinformatics. 25(10), 1329–1330 (2009).
Article PubMed Google Scholar
Price, A. et al. De novo identification of repeat families in large genomes. Bioinformatics. 21(1), i351–i358 (2005).
Article CAS PubMed Google Scholar
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27(2), 573–580 (1999).
Article CAS PubMed PubMed Central Google Scholar
Mount, D. W. Using the Basic Local Alignment Search Tool (BLAST). CSH Protoc. 2007, pdb. top17 (2007).
PubMed Google Scholar
Birney, E. et al. GeneWise and Genomewise. Genome Res. 14(5), 988–995 (2004).
Article CAS PubMed PubMed Central Google Scholar
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–439 (2006).
Article CAS PubMed PubMed Central Google Scholar
Majoros, W. et al. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics. 20(16), 2878–2879 (2004).
Article CAS PubMed Google Scholar
Stanke, M. et al. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics. 7, 62 (2006).
Article PubMed PubMed Central Google Scholar
Korf, I. Gene finding in novel genomes. BMC Bioinformatics. 5, 59 (2004).
Article PubMed PubMed Central Google Scholar
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268(1), 78–94 (1997).
Article CAS PubMed Google Scholar
Grabherr, M. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29(7), 644–652 (2011).
Article CAS PubMed PubMed Central Google Scholar
Roberts, A. et al. Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics. 27(17), 2325–2329 (2011).
Article CAS PubMed Google Scholar
UniProt, C. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 46(5), 2699 (2018).
Article Google Scholar
Kanehisa, M. et al. Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res. 42, D199–D205 (2014).
Article CAS PubMed Google Scholar
Finn, R. et al. InterPro in 2017-beyond protein family and domain annotations. Nucleic Acids Res. 45(D1), D190–D199 (2017).
Article MathSciNet CAS PubMed Google Scholar
Chan, P. et al. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Res. 49(16), 9077–9096 (2021).
Article CAS PubMed PubMed Central Google Scholar
Nawrocki, E. & Eddy, S. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 29(22), 2993–2995 (2013).
Article Google Scholar
Li, L., Stoeckert, C. & Roos, D. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13(9), 2178–2189 (2003).
Article CAS PubMed PubMed Central Google Scholar
Edgar, R. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32(5), 1792–1797 (2004).
Article CAS PubMed PubMed Central Google Scholar
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 30(9), 1312–1313 (2014).
Article CAS PubMed PubMed Central Google Scholar
Yang, Z. & Rannala, B. Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. Mol. Biol. Evol. 23(1), 212–226 (2006).
Article CAS PubMed Google Scholar
De, B. et al. CAFE: a computational tool for the study of gene family evolution. Bioinformatics. 22(10), 1269–1271 (2006).
Article Google Scholar
Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40(7), e99 (2012).
Article ADS Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR25949940 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR25949941 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR25933437 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR25949840 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR25949841 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR25949842 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR25949843 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR25949844 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR25949845 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR25947115 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR25947116 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR25947117 (2023).
Zheng, J. Chromosome-level genome assembly of Acrossocheilus fasciatus. GenBank https://identifiers.org/ncbi/insdc:JAVLVS000000000 (2023).
Zheng, J. Genome annotation of Function annotation ressult of Acrossocheilus fasciatus genome genes. figshare. https://doi.org/10.6084/m9.figshare.24198819.v1 (2023).
Zheng, J. Gene family construction. figshare. https://doi.org/10.6084/m9.figshare.24198885.v1 (2023).
Simão, F. et al. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 31(19), 3210–3212 (2015).
Article PubMed Google Scholar
Parra, G. et al. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. 23(9), 1061–1067 (2007).
Article CAS PubMed Google Scholar
Rhie, A. et al. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21(1), 245 (2020).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was funded by the Key R&D Program of Zhejiang Province (2023C02050), and the “Agriculture, Farmer and Rural area”cooperation project of Zhejiang Province (2023SNJF070).

Author information

Authors and Affiliations

Key Laboratory of Genetics and Breeding, Zhejiang Institute of Freshwater Fisheries, Huzhou, China
Jianbo Zheng, Jianhu Jiang, Qianlong Rui, Fei Li, Shili Liu, Shun Cheng, Meili Chi & Wenping Jiang
College of Biological and Environmental Sciences, Zhejiang Wanli University, Ningbo, China
Qianlong Rui

Authors

Jianbo Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Jianhu Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Qianlong Rui
View author publications
You can also search for this author in PubMed Google Scholar
Fei Li
View author publications
You can also search for this author in PubMed Google Scholar
Shili Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shun Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Meili Chi
View author publications
You can also search for this author in PubMed Google Scholar
Wenping Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.Z. and F.L. conceived this study and wrote the manuscript; J.J. and Q.R. collected the samples; S.L. and S.C. analyzed the data; M.C. and W.J. revised the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Fei Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Distributionof the assembled chromosomes of A. fasciatus.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zheng, J., Jiang, J., Rui, Q. et al. Chromosome-level genome assembly of Acrossocheilus fasciatus using PacBio sequencing and Hi-C technology. Sci Data 11, 166 (2024). https://doi.org/10.1038/s41597-024-02999-6

Download citation

Received: 27 September 2023
Accepted: 25 January 2024
Published: 03 February 2024
DOI: https://doi.org/10.1038/s41597-024-02999-6
Springer Nature Limited

Chromosome-level genome assembly of Acrossocheilus fasciatus using PacBio sequencing and Hi-C technology

Abstract

Similar content being viewed by others

Chromosome-level genome assembly of Plagiognathops microlepis based on PacBio HiFi and Hi-C sequencing

A high-quality chromosome-level genome assembly of the topmouth culter (Culter alburnus Basilewsky, 1855)

A chromosome-level genome assembly of East Asia endemic minnow Zacco platypus

Background & Summary

Methods

Sample collection and nucleic acid extraction

Library construction and genome sequencing

Genome size estimation and assembly

Hi-C library preparation and sequencing

Repetitive sequence annotation

Gene prediction and functional annotation

Gene family construction

Phylogenetic and evolutionary analysis

Data Records

Technical Validation

DNA quantification and qualification

Quality control of raw sequencing data

RNA quality evaluation

Evaluation of the assembled genome

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Distributionof the assembled chromosomes of A. fasciatus.

Rights and permissions

About this article

Cite this article

Navigation

Chromosome-level genome assembly of Acrossocheilus fasciatus using PacBio sequencing and Hi-C technology

Abstract

Similar content being viewed by others

Chromosome-level genome assembly of Plagiognathops microlepis based on PacBio HiFi and Hi-C sequencing

A high-quality chromosome-level genome assembly of the topmouth culter (Culter alburnus Basilewsky, 1855)

A chromosome-level genome assembly of East Asia endemic minnow Zacco platypus

Background & Summary

Methods

Sample collection and nucleic acid extraction

Library construction and genome sequencing

Genome size estimation and assembly

Hi-C library preparation and sequencing

Repetitive sequence annotation

Gene prediction and functional annotation

Gene family construction

Phylogenetic and evolutionary analysis

Data Records

Technical Validation

DNA quantification and qualification

Quality control of raw sequencing data

RNA quality evaluation

Evaluation of the assembled genome

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Distributionof the assembled chromosomes of A. fasciatus.

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation