Erwinia pyrifoliae, commonly known as ‘black shoot blight’, is a Gram-negative bacterium belonging to the Erwiniaceae family (Thompson et al. 2019; Lee et al. 2020). This bacterium predominantly infects plants, particularly pears, leading to conditions known as “Bacterial Shoot Blight” or “Asian Pear Blight” (Kim et al. 2001; Lee et al. 2020). Initially reported in pear trees in South Korea in 1995, the disease exhibits symptoms similar to those of Erwinia amylovora (fire blight). However, molecular analysis distinguished E. pyrifoliae as a distinct species (Kim et al. 1999). Subsequent investigations revealed that some bacterial samples from Japan, previously thought to be E. amylovora (Beer et al. 1996), were in fact E. pyrifoliae (Geider et al. 2009; Thapa et al. 2013). In 2013, its unexpected discovery in strawberries in the Netherlands underscored the pathogen’s host non-specificity and geographic versatility (Wenneker and Bergsma-Vlami 2015).

Unlike E. amylovora, little is known about the genetic basis of virulence and environmental adaptation in E. pyrifoliae. The genome sequence of E. pyrifoliae consists of a circular chromosome and plasmids, containing typical bacterial genomic elements such as genes for metabolism, replication, and cell division, alongside pathogenicity-related genes (Smits et al. 2010; Llop et al. 2012; Lee et al. 2020). Previous research has yielded only nine assembled sequences of the E. pyrifoliae genome available/released in National Center for Biotechnology Information (NCBI) GenBank (https://www.ncbi.nlm.nih.gov/datasets/genome/?taxon=79967). There is a pressing need for additional genomic information and gene annotation to enrich the genomic data of E. pyrifoliae. Meanwhile, breakthroughs have been made in previous study of the genetic diversity of this pathogen, which will enable us to better understand the genetic characteristics and pathogenic genes of this pathogen (Ham and Park 2024).

Branch of an apple (Malus domestica cv. Fuji) tree showing black shoot blight were collected from Idong-myeon, Pocheon-si, Gyeonggi-do, Republic of Korea (38°05′24″N, 127°38′52″E). The apple branches were surface sterilized using 70% ethanol and were dissected into 3 ~ 5 mm samples. The dissected samples were immersed in 10 mM phosphate-buffered saline (PBS) (pH 7.2) (Biosesang, Yongin, Korea) in a sterile mortar. Single colonies observed after 48 h of incubation on Nutrient Broth (NB) medium at 27 °C were isolated and subsequently purified. The purified strain was presumptively identified as E. pyrifoliae by internal transcribed spacer (ITS) region sequencing. The primer pair ITS-F (5′-AGAGTTTGATCMTGGCTCAG-3′) and ITS-R (5′-TACGGYTACCTTGTTACGACTT-3′), was used to identify the strain as E. pyrifoliae. Total genomic DNA was extracted using a PureLink™ Genomic DNA Mini Kit (Cat. No. K182002, Thermo Fisher Scientific Inc., Waltham, MA, USA) following the manufacturer’s instructions. Genomic DNA integrity was assessed via 1% agarose gel electrophoresis, while DNA purity was evaluated using a NanoDrop UV–Vis Spectrophotometer (Cat. No. ND-2000, Thermo Fisher Scientific). DNA concentrations were quantified using a Qubit dsDNA HS Quantification Assay Kit (Cat. No. Q32854, Thermo Fisher Scientific) and measured using a Qubit 4 Fluorometer (Cat. No. Q33238, Thermo Fisher Scientific). Libraries for long-read sequencing were prepared through end-repair and dA-tailing, barcode and adapter ligation, and purification of ligated DNA using the NEBNext® Ultra™ II End Repair/dA-Tailing Module [Cat. No. E7546, New England Biolabs Co. (NEB), Ipswich, MA, USA], FFPE Repair Mix NEBNext® Quick Ligation Module (Cat. No. E6056, NEB), and Native Barcoding Kit [Cat. No. SQK-NBD114.24, Oxford Nanopore Technologies Co. (ONT), Oxfordshire, UK], respectively, following recommendations by MinION. Genomic DNA long-read sequencing was conducted using the MinION Mk1C device R10.4.1 (Cat. No. MIN-101 C, ONT) with a SpotON Flow Cell (Cat. No. FLO-MIN114, ONT) according to the manufacturer’s instructions and managed using MinKNOW software v4.1.23 (ONT). Short-read sequencing was performed to refine genomic sequences. Genomic DNA paired-end libraries with 350-bp inserts were generated using the TruSeq Nano DNA High Throughput Preparation Kit (Cat. No. 20015965, Illumina Inc., San Diego, CA, USA). These paired-end libraries were sequenced at Macrogen Co. (Seoul, Korea) using Illumina Sequencing by Synthesis (SBS) Technology (Illumina).

Nanopore long-read sequencing yielded 120,000 raw reads with an N50 length of 14,677 bp and a total sequence length of 754.3 Mb (188.6× coverage), while Illumina short-read sequencing generated 1204 Mb (150.5×coverage) of paired-end sequences (Table 1). We utilized the Trimmomatic v0.38 tool to assess the quality of Nanopore long-read sequences, specifically targeting the removal of adapters, low-quality reads (defined as those containing “N” in > 10% of nucleotides), and duplicated reads. The resultant clean reads underwent assembly using the Long Read Support (beta) plugin 23.0 within the Qiagen CLC Genomics Workbench v.23.0.4 software (Qiagen, Hilden, Germany), employing default parameters. The raw assembly was subjected to two rounds of polishing with short reads from Illumina sequencing. The genomic sequence of YKB12327 was assembled into one chromosome with a size of 4,018,953 bp (53.4% G + C content) and three circular plasmids with sizes of 34,650 bp (49.7% G + C content), 4942 bp (53.4% G + C content), and 3089 bp (49.0% G + C content) (Fig. 1). Validation of the genome assemblies was performed using the BUSCO v4.1.4 software (Simão et al. 2015), leveraging 1614 Nb of BUSCO markers in Embryophyta (odb10). The assembly quality was evidenced by the detection of 124 (100%) complete and single-copy Benchmarking Universal Single-Copy Orthologs (BUSCOs) in the assembled genome, indicating high-quality assembly. Compared with the assembled genomes, the genomic sequence of YKB12327 exhibited high homology to four strains: EpK1/15, YKB12328, Ep1/96, and DSM 12163 (Supplementary Fig. S1). Whole-genome alignment analysis revealed that the chromosome sequence of YKB12327 closely resembled those of the four neighboring genomes, while the plasmids varied significantly in both length and number.

Table 1 Summary for E. pyrifoliae YKB12327 genome sequencing data

The genome sequences were annotated for CDSs, ribosomal RNA (rRNA) genes, and transfer RNA (tRNA) genes using using the NCBI Prokaryotic Genome Annotation Pipeline (PGAP) (https://www.ncbi.nlm.nih.gov/genome/annotation_prok/) (Haft et al. 2018). A total of 3123 protein-coding genes, 22 rRNA genes (5S, 16S, and 23S), and 76 tRNA genes were predicted. Moreover, referring to information reported by Kube et al. (2010), we identified 26 disease-causing genes for subsequent investigation (Supplementary Table 1).

In summary, our study sequenced, assembled, and annotated the genome of E. pyrifoliae YKB12327, offering insights into its genetic makeup. We believe that the availability of the complete genome sequence of strain YKB12327 will further support studies to understand evolution, diversity and structural variations of E. pyrifoliae strains, as well as the molecular basis of pathogenesis.

Fig. 1
figure 1

Circular representation of E. pyrifoliae YKB12327 genome. The innermost circle is the ideogram of chromosome and plasmids in Mb scale, surrounding concentric circles of Prokka on the forward and reverse strand (blue); mobile genetic elements Alien Hunter (orange), MobileOG (purple), and Phigaro (sky blue) on the forward (inner) and reverse strand (outside); positive (green) and negative (red) GC skew; and GC content (black)