Here, we report on a draft genome sequence of Bacillus amyloliquefaciens strain KCP2 which produces a thermostable alkaliphilic α-amylase and protease, useful in bread making, brewing, starch processing, pharmacy, and textile industries. Strain KCP2 was isolated from municipal food waste samples collected in Vallabh Vidyanagar, Gujarat, India (Prajapati et al. 2015a). The cells of the strain are aerobic motile rods arranged singly or in chains and stain Gram positive. The strain is oxidase and catalase positive, hydrolyses gelatine and casein, starch, and Tween 40 and Tween, suggesting that it is proteolytic, amylolytic, and lipolytic, respectively. The α-amylase produced by strain KCP2 has been used in the saccharification and fermentation of raw corn starch and food waste for the production of bioethanol (Prajapati et al. 2015b). The strain is also positive in Vogel-Proskauer test and grows on glucose, mannitol, glycerol, glycogen, salicin, cellobiose, fructose, galactose, maltose, ribose, sorbitol, sucrose, and D-xylose while producing acid (Wang et al. 2008; Borriss et al. 2011).

Bacterial DNA from strain KCP2 was extracted and purified (Prajapati et al. 2015b), and single end sequencing was performed using a 318 chipset and 400 bp chemistry on an Ion Torrent Personal Genome Machine (PGM) housed at the Department of Animal Biotechnology College of Veterinary Science and Animal Husbandry, Anand Agricultural University, Gujarat, India using the standard protocols described by the manufacturer (Roche Diagnostics Ltd., United Kingdom). In total, 2,590,427 reads were obtained, which were corrected with Pollux to 2,554,487 reads (Marinier et al. 2015). The genome was assembled using SPAdes version 3.10.1 (Bankevich et al. 2012) and MIRA version 4.0.2 assembler (Chevreux et al. 2004). The two assemblies and reads were used to re-verify the longest and reliable contigs using custom Perl scripts. The assembled draft genome of B. amyloliquefaciens strain KCP2 consisted of 34 scaffolds, a N50 of 439,503 bp with the largest scaffold of 979,499 bp. The genome size was estimated to be 3,906,932 bp which had a GC content of 46%. BUSCO2 analysis (Simão et al. 2015) using bacterial lineage data set in the presence of conserved orthologous genes among species of the genus indicated 98% completeness. Gene annotation was performed using NCBI Prokaryotic Genome Annotation Pipeline (PGAP) (Tatusova et al. 2016), which identified 3681 protein-coding sequences (Table 1). The nearest alpha amylase sequence was found from B. amyloliquefaciens strain DSM-7, a soil bacterium characterized by its enormous potential to produce extracellular enzymes of industrials importance including amylases and proteases. The PGAP annotation confirmed the presence of amylase in the genome. The Multiple Sequence Alignment (MSA) and phylogenetic analysis of α-amylase show wide diversification (Fig. 1). Further in silico study of α-amylase may provide novel thermostability signatures. The genome sequence and their annotation reported here would also be useful genetic resource for engineering metabolism of B. amyloliquefaciens and improve their usability. The comparative genomics analysis of this strain with 52 sequenced genomes of B. amyloliquefaciens is in progress. We expect to yield a better understanding of B. amyloliquefaciens evolution through genome dynamics, population structure, and phylogenies of species groups.

Table 1 Bacillus amyloliquefaciens strain KCP2 genome assembly data statistics from PGAP (v4.2) pipeline
Fig. 1
figure 1

Phylogenetic analysis of the α-amylaseproteinsequence from Bacillus amyloliquefaciens strain. The α-amylase protein sequence from B. amyloliquefaciens strain KCP2 (highlighted branch in red colour) was used in a Blast search with default parameters against protein databases of B. amyloliquefaciens strain. The phylogenetic tree is generated with simple phylogeny (http://www.ebi.ac.uk/Tools/phylogeny/simple_phylogeny/) program using default parameters and then visualized in FigTree (http://tree.bio.ed.ac.uk/software/figtree/). The taxon named after strain and appended accession number (<StrainName>_<SequenceAccessionNumber>). The scale represents the number of differences between sequences (root age 1.0)