Long-read de novo genome assembly of Gulf toadfish (Opsanus beta)

Kron, Nicholas S.; Young, Benjamin D.; Drown, Melissa K.; McDonald, M. Danielle

doi:10.1186/s12864-024-10747-8

Long-read de novo genome assembly of Gulf toadfish (Opsanus beta)

Research
Open access
Published: 18 September 2024

Volume 25, article number 871, (2024)
Cite this article

Download PDF

You have full access to this open access article

BMC Genomics Aims and scope Submit manuscript

Long-read de novo genome assembly of Gulf toadfish (Opsanus beta)

Download PDF

Nicholas S. Kron¹,
Benjamin D. Young^1,2,
Melissa K. Drown¹ &
…
M. Danielle McDonald¹

75 Accesses
Explore all metrics

Abstract

Background

The family Batrachoididae are a group of ecologically important teleost fishes with unique life histories, behavior, and physiology that has made them popular model organisms. Batrachoididae remain understudied in the realm of genomics, with only four reference genome assemblies available for the family, with three being highly fragmented and not up to current assembly standards. Among these is the Gulf toadfish, Opsanus beta, a model organism for serotonin physiology which has recently been bred in captivity.

Results

Here we present a new, de novo genome and transcriptome assemblies for the Gulf toadfish using PacBio long read technology. The genome size of the final assembly is 2.1 gigabases, which is among the largest teleost genomes. This new assembly improves significantly upon the currently available reference for Opsanus beta with a final scaffold count of 62, of which 23 are chromosome scale, an N50 of 98,402,768, and a BUSCO completeness score of 97.3%. Annotation with ab initio and transcriptome-based methods generated 41,076 gene models. The genome is highly repetitive, with ~ 70% of the genome composed of simple repeats and transposable elements. Satellite DNA analysis identified potential telomeric and centromeric regions.

Conclusions

This improved assembly represents a valuable resource for future research using this important model organism and to teleost genomics more broadly.

View this article's peer review reports

De novo assembly and annotation of the Patagonian toothfish (Dissostichus eleginoides) genome

Article Open access 04 March 2024

Whole-genome assembly of the coral reef Pearlscale Pygmy Angelfish (Centropyge vrolikii)

Article Open access 24 January 2018

De novo transcriptome assembly, annotation and comparison of four ecological and evolutionary model salmonid fish species

Article Open access 08 January 2018

Background

Toadfishes are bony fish of the family Batrachoididae, which consists of 78 species including the Amazon toadfish (Thalassophryne amazonica), speckled midshipman (Porichthys myriaster), plainfin midshipman (Porichthys notatus), oyster toadfish (Opsanus tau), Lusitanian toadfish (Halobatrachus didactylus) among others [1]. In general, toadfish are small, demersal ambush predators with wide mouths (oftentimes with barbels or fleshy projections around them) and eyes set on top of their broadheads [1]. The family exhibits a suite of distinguishing behavioral and physiological characteristics such as paternal care of the nest, complex acoustic communication, the lack of a pelagic larval phase, and, in some lineages, venom and spines [1]. Furthermore, they are tolerant to a range of environmental conditions that occur naturally or a result of anthropogenic impact [2, 3]. These traits make these fish not only interesting on an ecological basis but have made them popular study subjects both as a comparison species to other fish, but also as a potential model for human health and disease.

The genomics of toadfishes are relatively understudied. The bulk of genomic research in batrachoids has focused on karyotype and cytogenetics [4,5,6,7,8]. Particularly, the sequencing, localization, and phylogeny of repetitive elements such as rDNA and GATA repeats has been well described in several genera [9,10,11,12]. For the species with available genomic resources, little has been reported in the literature on the unique characteristics of this order. T. amazonica has been noted to be unique among surveyed teleosts for having a large genome with a unique positive association between chromosome size and GC percentage [13]. Interestingly, two batrachoids, Chatrabus melanurus, and Opsanus beta, were described as having the highest genomic percentage of transposable elements among 100 surveyed teleost genomes [14]. These few data points suggest batrachoids exhibit unique genomic characteristics among teleosts that warrant further study.

The Gulf toadfish, Opsanus beta (Fig. 1), is found inshore within the western Atlantic, from southeastern Florida, USA, through the Bahamas and the Gulf of Mexico. Like other batrachoids, O. beta are resilient to various environmental stressors including hypoxia [15,16,17], ammonia [18,19,20], and various types of waterborne pollution [21,22,23,24,25] making them an intriguing subject for study. Recent work on O. beta has focused on describing the monoaminergic system, with a particular emphasis on serotonin and the role it plays controlling vascular resistance and blood flow [26, 27]. Multiple labs have successfully bred O. beta in a laboratory setting [28]. This provides the opportunity for siblings to be used in physiological studies reducing inter-individual variation, and for families to be used for examination of trait heritability, ontogenetic adaptations, and in trans-generational studies, expanding their potential as model organisms.

Despite well described biology and role as model organisms, few genomic resources are available for O. beta or other batrachoid fishes. To date, only a single representative genome assembly generated with modern long read technology exists for the family Batrachoididae; that of T. amazonica (GCF_902500255.1). Older genome assemblies generated with short read technology exist for C. melanurus and O. beta, but these assemblies are highly fragmentary and do not align to modern assembly standards [29]. To remedy this gap in available resources, we present a de novo long-read genome, mitochondrial genome, and transcriptome assemblies and annotations for O. beta using modern long read technology.

Methods

A schematic of the computational workflow can be found in Supplemental Fig. 2.

Sample collection, nucleic acid extractions, and sequencing

One adult male Opsanus beta (0.068 kg) was selected from the toadfish stock at the University of Miami Rosenstiel School Toadfish Lab. Toadfish are sourced from shrimper roller trawl bycatch in Biscayne Bay, Florida. For a full description of fish housing and care please see [17]. The specimen was sacrificed with an overdose of pharmaceutical grade buffered tricaine methanesulfonate (MS-222) anesthetic at a dose of 3 g.L-1 (pH = 8.0), as is considered acceptable by the American Veterinary Medical Association Guidelines on Euthanasia [30].

Five hundred microliters of blood was drawn from the caudal vessel via caudal puncture using a 23 gauge needle attached to a 1 ml disposable syringe that was primed with 500 ul Acetate-Citrate-Dextrose (ACD) anticoagulant buffer (480 mg citric acid, 1.32 mg sodium citrate, 1.47 mg glucose (dextrose), QS to 100 ml with distilled water) and added to another 500 ul of ACD. The blood sample was then shipped overnight at 4 °C to the University of California Davis Genome Center DNA Technologies and Expression Analysis Core Laboratory (UC Davis) for High Molecular Weight (HMW) DNA extraction, PacBio library prep, and HiFi long reads sequencing.

At UC Davis, 10 uL of settled cells were lysed until homogenous at room temperature in 2 mL of lysis buffer (100 mM NaCl, 10 mM Tris–HCl pH 8.0, 25 mM EDTA, 0.5% (w/v) SDS, 100 µg/ml Proteinase K). RNA was removed by treating lysate with 20 µg/ml RNase A for 30 min at 37⁰C. HMW DNA was then extracted using equal volumes of phenol/chloroform and phase lock gels (Quantabio Cat # 2,302,830). Extracted DNA was precipitated with 0.4X volume of 5 M ammonium acetate and 3X volume of ice cold ethanol, washed twice with 70% ethanol, and finally resuspended in 10 mM Tris, pH 8.0. Purity, yield, and integrity of HMW DNA was assessed with a NanoDrop ND-1000 spectrophotometer, Qubit 2.0 Fluorometer (Thermo Fisher Scientific, MA), and Femto pulse system (Agilent Technologies, CA).

The HiFi SMRTbell library was prepared and sequenced at the UC Davis DNA Technologies Core following standard recommendations from Pacific Biosciences. Briefly, the library was prepared using the SMRTbell prep kit 3.0 (Pacific Biosciences, Menlo Park, CA; Cat. #102–182-700) according to the manufacturer's instructions using sheared (15-18 kb) high molecular weight gDNA. The library was size-selected to remove sequences < 5 kb with the final library having an average size of 15–18 kb. Sequencing used three 8 m SMRT cells (Pacific Biosciences, Menlo Park, CA; Cat #101–389-001) with Sequel II sequencing chemistry 2.0, and 30-h movies on a PacBio Sequel II sequencer.

To obtain as complete as possible transcriptomic snapshot of O. beta, the following tissue samples were stored in RNA later (ThermoFisher Scientific, CAT# AM7020) at -80 °C: liver, kidney, brain, heart, gill, esophagus, swimbladder, muscle, skin, gonad, gastrointestinal tract, gallbladder, urinary bladder, and spleen. A whole juvenile O. beta was also stored in RNA later. Total RNA was extracted using the Quick-RNA Miniprep Kit (Zymo Research, CAT# R1054) following manufacturer's protocol, before being cleaned and concentrated using the RNA Clean & Concentrator-5 Kit (Zymo Research, CAT# R1013). Total RNA with 260/230 absorbance ratio greater ≥ 1.89 and a concentration > 45 ng/μl (RNA HS qubit) were considered of sufficient quality, pooled, and sent to UC Davis for IsoSeq library preparation and sequencing.

At UC Davis, cDNA was constructed using a NEBNext® Single Cell/Low Input kit (New England Biolabs, Ipswich, MA; Cat. #E6421L) with 500 ng of total RNA as input. Resulting cDNA was amplified for 15 cycles using the cDNA Synthesis Amplification Module. Amplified cDNA was purified using 0.86X SMRTbell cleanup beads. SMRTbell library was constructed from 480 ng of purified cDNA with the SMRTbell prep kit 3.0 (Pacific Biosciences, Menlo Park, CA; Cat. #102–182-700). Resulting Iso-Seq library was sequenced on a single 8 M SMRT cell (Pacific Biosciences, Menlo Park, CA; Cat #101–389-001) on a PacBio Sequel II sequencer using Sequel II 2.0 chemistry.

Nuclear genome assembly

PacBio Hifi reads were assembled using methods based upon the vertebrate genome project pipeline [29]. A kmer profile of HiFi reads was generated using meryl v1.3.0 [31] which was then fed into GenomeScope v2.0.0 [32] to estimate genome parameters including genome size as well as bounds for detecting haploid and diploid kmers. HiFi reads were then assembled into a primary and alternate assemblies using HiFiasm v0.16.1 [33] with the -l parameter set to 1 for gentle purging and the –purge-max parameter set to the upper bound calculated from GenomeScope2 [32] estimates. The primary and alternate assemblies were then purged of duplicate kmers using purge_dups v1.2.6 [34] with the purging parameter -a set to 80. The primary purged assembly was then corrected with inspector v 1.0.1 [35] for three rounds using HiFi reads as input. In the absence of 10 × linked reads, BioNano optical maps, or Hi-C contact maps, the purged primary assembly was scaffolded using in silico methods. First, the primary purged assembly was scaffolded and gap filled with ntLink v1.3.9 [36] using HiFi reads as input to generate synthetic linked reads for 5 rounds. The scaffolded assembly was then super-scaffolded to a pseudo-chromosome level with RagTag scaffold v2.1.0 [37] using the RefSeq representative genome of Thalassophryne amazonica (GCF_902500255.1), the closest available relative within the family Batrichoididae with a chromosome level assembly. The divergence time between T. amazonica and O. beta is estimated to be 38 million years ago (CI 32.8—39.8 MYA) [38]. RagTag scaffold arranges contigs according to their primary mapping to reference chromosomes without altering contig sequence, and then stitches contigs together with gaps of arbitrary length (100 Ns) to represent gaps of unknown length.

Primary and alternate assemblies were assessed for kmer completeness, QV, and haplotype purging using Merqury v1.3.0 [31]. Genome length and contiguity metrics were calculated using QUAST [39, 40], Genometools [41], and Gfastats [42]. Genome completeness was measured by calculating the number of Actinopterygii single copy orthologs retained in the assembly via BUSCO [43] with the actinopterygii_odb10 database. Completeness was further assessed with read mapping rate by mapping publicly available paired-end Illumina short read RNA [44] and DNA (PRJNA196921) libraries, as well as input HiFi reads to the super-scaffolded assembly.

The primary assembly was screened for microbial contamination via BLAST + v2.13.0 [45] with megablast against an NCBI database of common contaminants in eukaryotic genomes (ftp://ftp.ncbi.nlm.nih.gov/pub/kitts/contam_in_euks.fa.gz) parameterized as described in [29]. Primary assembly was also blasted against publicly available databases of representative genome sets for prokaryotes and viruses downloaded with the update_blastdb.pl script (ref_prok_rep_genomes and ref_viruses_rep_genomes). Blastn was parameterized to only report hits with an E-value less than 10E⁻²⁰ and minimum bit score of 1000 as described in [46]. Finally, the super-scaffolded assembly was screened for off-target contaminants using Kraken2 v2.1.3 [47] and for adaptor contamination with the GenBank Foreign Contamination Screen (FCS) tool [48].

Mitogenome assembly

The mitochondrial genome was assembled from all HiFi reads using the MitoHiFi version 2.14.2 [49, 50]. In addition to the annotations of the mitogenome generated by MitoHifi [49, 50], the primary mitogenome assembly was also annotated with the Mitos2 webtool [51]. All HiFi reads were then mapped back to the mitogenome assembly using minimap2 v2.25 [52] to assess depth of coverage. The mitogenome assembly and annotations were visualized with the Proksee webtool (https://proksee.ca/) [53]. MitoHiFi [49, 50] was run two additional times in contig mode using the initial nuclear genome assembly to compare results and validate the assembly built from raw reads.

Transcriptome assembly

IsoSeq high quality (HQ) transcripts generated from HiFi reads at UC Davis with the IsoSeq3 pipeline. HQ transcripts were cleaned with seqClean (https://sourceforge.net/projects/seqclean/files/) using the UniVec vector database as reference (https://www.ncbi.nlm.nih.gov/tools/vecscreen/univec/). Cleaned HQ transcripts were then aligned to the super-scaffolded assembly using minimap2 v2.25 [52] and BLAT v35.1 [54] and then assembled into gene models using PASA v2.5.2 [55].

Genome Annotation

Transposable elements and dispersed repeats

A custom library of transposable elements (TE) and dispersed repeats was generated from the super-scaffolded assembly using RepeatModeler v2.0.3 [56] with RMBlast (https://www.repeatmasker.org/rmblast/) as the default search engine. In order to eliminate false positive repetitive elements that do in fact originate from coding regions, the SwissProt database [57] was screened for transposable elements with blastp v2.13.0 [45] against a collection of repetitive elements (RepeatPeps.lib) included with RepeatModeler [56]. The resultant version of the Swissport database was subsequently used to screen the de novo repetitive elements identified in the O. beta assembly for false positives. High confidence de novo repetitive elements from the O. beta assembly were further processed and categorized using repclassifier v1.1 (https://github.com/darencard/GenomeAnnotation/blob/master/repclassifier) and RepeatMasker v4.1.2.p1 [58] as described in (https://darencard.net/blog/2022-07-09-genome-repeat-annotation/) using Actinopterigii repetitive elements from Dfam [59]. Classified repeats were used to generate hard and soft masked versions of the super-scaffolded assembly with BEDtools v4.1.2 [60]. Hard and soft masked versions of the T. amazonica reference were also generated with RepeatMasker [58], using the T. amazonica repeat database from FishTEDB [61] and subsequent repeat landscape analysis. Repeat landscapes for both species were generated using the createRepeatLandscape.pl utility from RepeatMasker [58].

Satellite DNA

Satellite DNA sequences were inferred from the super-scaffolded assembly using TRASH [62]. The centromere-like region of scaffold 1 (position 82,000,000–83,000,000) was analyzed for higher order repeats (HORs) with HiCAT v1.1.0 [63] and visualized using StainedGlass v0.5.0 [64].

Telomeres

HiFi reads, initial primary assembly, purged primary assembly, and final super-scaffolded assembly were screened for candidate telomeric repeat monomers with the telomere identification toolkit (tidk) v0.2.31 [65]. Top repeat sequences and canonical telomere marker monomers (TTAGGG and GATA) were then used as input for tidk::search to count putative telomeric repeats at scaffold ends in the super-scaffolded assembly.

Gene Models

Open reading frames (ORFs), coding regions, and protein sequences were then predicted from the PASA transcriptome using transDecoder v5.5.0 (https://github.com/TransDecoder/TransDecoder) and used to build training sets for downstream ab initio gene predictors. Gene models were generated from the repeat masked super-scaffolded assembly and transcript based gene models from PASA [55] with the funannotate pipeline v1.8.15 [66]. Briefly, models trained on output from PASA and transdecoder were used as input into ab initio prediction software AUGUSTUS v3.5.0 [67], snap v2013_11_29 [68], glimmerHMM v3.0.4 [69], and GeneMark-ES v3.68.0 [70]. Resulting gene predictions along with PASA [55] predicted transcripts were then passed into EvidenceModeler v1.1.1 [71] to generate a consensus set of high quality gene models. Transfer RNAs (tRNAs) were predicted using trnascan v1.4.0 [72]. Gene models were then refined and UTRs added with funannotate::update [66] using IsoSeq high quality transcripts and O.beta illumina short reads (PRJNA313355; [44].

Functional Annotation

High quality gene models were then annotated using funannotate::annotate [66]. Briefly, funannotate::update screened proteins models for protein domains (Pfam v35.0 [73]), CAZYmes (dbCAN v11.0 [74]), biosynthetic classes (MiBIG v1.4 [75]), peptidases (MEROPS v 12.0 [76]), and homologs (UniprotKB/SwissProt v2023_03 [57]) using HMMER v3.3.2 (http://hmmer.org/) and DIAMOND v2.1.7.161 [77]. Funannotate [66] was run with optional eggNOG emapper v2.1.10 [78, 79] using eggNOG database v 5.0.2 and external annotation via InterProScan v5.52–86.0 [80]. Predicted protein models were further annotated for KEGG KO identifiers using ghostKOALA [81].

Comparison with other organisms

To identify regions of collinearity, the super-scaffolded assembly was aligned to the T. amazonica reference using NUCmer v3.1 from MUMmer v3.23 [82] with a minimum alignment length (-l) of 500 bases and visualized using Dot (https://dot.sandbox.bio/).

Results

Assemblies

Nuclear genome

Initial genome estimates from HiFi reads with GenomeScope2 [32] suggested a 2.09 gigabase genome (nearly twice the size from the current O. beta reference), with ~ 53.5% of the genome composed of repetitive elements, and 0.9% heterozygosity (Fig. 2).

Assembly with HiFiasm [33] generated an initial 2.4 gigabase primary assembly comprising 977 contigs, which was refined to 2.15 gigabase assembly comprising 490 contigs after duplicate purging with purge_dups. Comparison of kmer profiles of initial and purged assemblies suggested successful deduplication of pseudo-haplotype assemblies (Supplemental Fig. 1). Kmer profiling suggested a high-quality assembly with a combined kmer completeness of 98.8% (primary 90.6% and alternate 83.0%) and a combined QV of 60.9 (primary 61.6, alternate 60.3). Merqury [31] screen of purged assembly revealed minor decrement in kmer completeness (primary 89.5%, alternate 87.1%, combined 98.3%) and minor improvement in QV (primary 61.6, alternate 61.3, combined 61.4).

Initial screen of purged assembly with inspector [35] identified a 100% mapping rate of HiFi reads to the purged assembly and 48 × coverage. Misassembly correction with three rounds of inspector [35] was able to reduce the number of small-scale errors per megabase from 36 to 0.5 and larger structural errors per megabase from 526 to 168 (Supplemental File 1).

Initial scaffolding with ntLink grouped the 490 contigs into 317 scaffolds. Subsequent mounting to the chromosome scale reference assembly of relative T. amazonica (GCF_902500255.1) with RagTag reduced the total scaffold count to 62 scaffolds, 31 of which were greater than 1 megabase in length (Fig. 3). Of the 62 final scaffolds, 23 had high sequence similarity and comparable length to chromosome-scale scaffolds of T. amazonica (Fig. 4), suggesting a chromosome scale assembly. Pairwise alignment of the final super-scaffolded assembly with that of T. amazonica indicated several inverted segments, most notably on scaffolds 4, 9, and 17 (Fig. 5). Screening of the initial and final assembly for adapters and microbial sequences did not identify any contamination.

The final assembly was highly contiguous, with a total length of 2,151,823,914 bp, a largest contig size of 142,919,290 bp, an N50 of 98,402,768 bp, and an L50 of 10 (Table 1). The final assembly also scored highly in terms of completeness, with 96.1% of Actinopterygii universal single copy orthologs being found as complete and single copy, 1.2% duplicated, 0.9% fragmented, and only 1.8% missing (Fig. 5). This assembly markedly improves upon the current O. beta reference assembly (GCA_900660325) in terms of contiguity and completeness, with metrics similar to those of the T. amazonica chromosome-scale reference (GCF_902500255.1) (Table 1, Fig. 5). Alignment of the 345,629 contigs in the current O. beta reference to the final assembly with minimap2 resulted in 337,383 (97.6%) primary mapped sequences, with 147,794 being secondary and 20,546 supplemental, for a final mapping rate of 98.6%. Mapping of raw illumina reads used to assemble the current reference (SRR2034069) to the final assembly with bwa-mem2 (v2.2.1) resulted in a mapping rate of 98.4%, whereas mapping of HiFi reads to the final assembly with miminap2 resulted in a mapping rate of 100%.

Table 1 Summary statistics from Quast of the de-novo assembly presented here as compared to current reference assemblies for O. beta, and T. amazonica. All statistics are generated using only contigs with length greater than 500 bp

Full size table

Mitochondrial genome

MitoHiFi [49, 50] identified the speckled midshipman (Porichthys myriaster) as the closest relative with an available mitochondrial genome sequence in GenBank (AP006739.1), which was used to identify candidate mitochondrial reads from HiFi reads.

The primary mitogenome assembly measured 19,381 bp in length and included: two rRNAs (12 s and 16 s), 13 protein coding genes, and 24 tRNAs. In addition to the expected number of mitochondrial genes, the assembly contained two extra phenylalanine tRNAs and D-loop-like control regions than is typical of vertebrate mitogenomes (Fig. 6A). The duplication and atypical arrangement of tRNAs and control region resembled the unique mitogenome organization of other toadfishes, namely that of P. myriaster (Fig. 6B) [83].

Running MitoHiFi [49, 50] with the initial primary nuclear genome assembly, as well as the purged alternate assembly as input, generated identical mitogenomes. Comparison with an unverified O. beta mitogenome sequence assembled from Illumina short reads in GeneBank (OP056998.1; 19,394 bp) via BLASTN and Clustal Omega showed that the two sequences were 99.85% identical when the unverified sequences was rotated to being at position 13,771 (data not shown).

Transcriptome

Initial IsoSeq processing of raw HiFi RNA reads identified 150,842 high quality transcripts. Cleaning with seqClean trimmed 3262 transcripts and removed 11. Alignment to the super-scaffolded assembly resulted in 143,021 (95%) genome-aligned transcripts, 142,958 of which were longer than 200 bases. PASA [55] assembled transcript alignments into 44,006 gene models. PASA [55] with transdecoder (https://github.com/TransDecoder/TransDecoder) identified 48,306 coding domain sequences, 45,602 of which could be propagated to the final genome assembly to be used as input for ab initio gene predictors. Of the predicted coding domain sequences, 40,806 (84%) were marked as complete (containing 5’ and 3’ UTRs).

Annotation

Transposable Elements

De novo modeling of repetitive elements with RepeatModeler identified 4,615 transposable elements, with 1,699 assigned to known families and 2,916 unknown. Curation with repclassifier using the Dfam repeat database for Actinopterygii and known de novo families further improved repeat annotation to 3,026 assigned to known families and 1,589 remaining unknown. Masking of repeats identified 78,515,112 bases (3.6%) as simple repeats, 1,587,765,143 bases (66.5%) as interspersed repeats, and a total repeat content of approximately 70.1%; roughly 20% more of the genome than initial estimates by GenomeScope2 (Fig. 7B) [32]. DNA repeats (22.7%), LINEs (16.8%), and LTRs (13.4%) represented the major components of the repeat landscape, with a further 9.1% of the genome belonging to unclassified repeats. Major contributing classes of repeats included DNA/TcMar (9.9%), LTR/Gypsy (9.3%), LINE/L2 (6.2%), LINE/RTE (6%). Kimura distance-based copy divergence analysis suggests progressive expansion of TE families in the O. beta with recent expansions in DNA/TcMar, LTR/Gypsy, and RC/Helitron classes (Fig. 7B).

To validate the high repeat content of O. beta genome, the T. amazonica reference assembly was also masked with RepeatMasker (https://github.com/rmhubley/RepeatMasker) using the FishTEDB [61] T. amazonica specific repeat library. The T. amazonica genome exhibited a similarly high repeat content, with an estimated 72% repeat content (70.5% interspersed). However, the dominant families annotated were distributed differently. While LTRs represented a similar fraction of the genome at 12.08%, DNA elements represented only 12.17%, and LINEs accounted for 44.48% of the genome (Fig. 7A).

Telomeres

Analysis of scaffolds with the telomere identification toolkit (tidk) using monomers previously used to mark telomeres histologically in other Batrachoids (TTAGGG and GATA, [12]) identified peaks in repeat frequency at the ends of some chromosome-scale scaffolds but not all (Fig. 3). de novo search of canonical telomere repeats in raw HiFi reads, primary and alternate assemblies, purged assemblies, and final super-scaffolded assembly identified the canonical ‘AACCCT’ as common but not the most common among candidate telomere monomers. This suggests sequencing depth was insufficient to adequately penetrate and capture full telomeres.