Introduction

The performance of biological wastewater treatment strongly relies on various bacterial groups in activated sludge (AS), including heterotrophic organic-degrading bacteria, ammonia-oxidizing bacteria (AOB), nitrite-oxidizing bacteria (NOB), denitrifiers, and polyphosphate-accumulating organisms (Zhang et al. 2012). These bacterial groups play different roles in the removal of organic pollutants and nutrients from municipal and industrial wastewater. Besides of AS, biofilms are often applied in wastewater treatment plants (WWTPs) to enhance the nitrogen removal from wastewater, as biofilm carrier could effectively retain the slow-growing nitrifiers (Chung et al. 2007), and the anoxic/anaerobic conditions in biofilm might be more favored by the denitrification process (Hibiya et al. 2003). Moreover, compared with AS treatment, biofilms have significant advantages, including accumulated larger biomass, extended sludge retention time, enhanced reactor stability, and better solid–liquid separation (Nicolella et al. 2000). Thus, they are widely used in various systems (Xia et al. 2008), such as fluidized bed reactor, rotating biological contactor, and suspended carrier biofilm reactor.

Full understanding of structure and functions of nitrogen removal bacteria in AS and biofilm are of great microbiological interest and could facilitate the development of operating strategies and enhancement of nitrogen treatment performance (Wagner and Loy 2002). Previous studies have applied several cultivation-independent molecular methods to study the structure and functions of nitrogen removal bacteria in WWTPs, including fluorescent in situ hybridization (FISH) (Hibiya et al. 2003), denaturing gradient gel electrophoresis (DGGE) (Fu et al. 2010), quantitative real-time polymerase chain reaction (qRT-PCR) (Wittebolle et al. 2008), and microarray (Siripong et al. 2006). However, several disadvantages appeared in these methods, including low throughput and resolution and potential PCR bias, and may seriously block their further application. Recently, with aid of high-throughput sequencing (HTS) techniques, several studies evaluated microbial structures and diversities in AS/biofilm systems by using PCR-based 454 pyrosequencing (Biswas and Turner 2012; Kwon et al. 2010), and a study was performed recently to identify active denitrifiers in wastewater by using stable isotope probing (SIP) and microautoradiography (MAR) FISH (McIlroy et al. 2016), which provided deep insights into the bacterial community in AS and biofilm. The newly reported HTS-based metagenomic studies identified the presence of Nitrospira-like complete ammonia oxidizer (comammox) with functions of oxidizing ammonia to nitrate in hypoxic environments (Daims et al. 2015; van Kessel et al. 2015). Then, the metagenomic evidence for the presence of comammox were observed in a drinking water system (Pinto et al. 2015). These findings have changed our perspectives toward nitrogen cycle (Kuypers 2015; Nunes-Alves 2015; Santoro 2016). Microorganisms of the biofilm in aerobic reactors are exposed to oxygen concentration gradient, which may induce the growth of comammox.

Therefore, in order to compare the microbial structures and functions in nitrogen transformation and explore novel functional genes in nitrogen metabolism in suspended and attached biomass, AS and biofilm were collected from an aerobic reactor in a full-scale WWTP of Hong Kong. Metagenomic (>34 Gb) and PCR-based 16S rRNA gene (>15,000 reads per sample) HTS data were employed to (1) investigate the occurrence and abundance of nitrogen removal bacteria in the two samples, (2) identify key enzymes for nitrogen cycle and evaluate their abundances, (3) explore novel functional genes of nitrogen removal bacteria, and (4) find the metagenomic evidence of the novel comammox in the reactor.

Materials and methods

AS and biofilm samples

AS and biofilm samples were collected from an aerobic suspended carrier biofilm reactor of Stanley WWTP in Hong Kong (22.219° N, 114.21° E). Biofilms were cultivated on plastic carriers with 25-mm diameter. The WWTP serves for about 27,000 inhabitants with a flow of 8000 m3/day and is operated in anoxic-aerobic processes with a sludge retention time of 7 days and hydraulic retention time of 17 h.

Illumina sequencing and 454 pyrosequencing

The genomic DNA of AS and biofilm were extracted and the 16S rRNA genes were amplified by PCR (for details, please refer to supplementary information (SI)). For Illumina sequencing, more than 10 μg genomic DNA for each sample was sent out to BGI company (Shenzhen, China) for Illumina HTS using HiSeq 2000 platform. Finally, 34 Gb of Illumina paired-end reads with length of 100 bp was generated, containing 178,692,578 and 186,892,002 reads for AS and biofilm samples, respectively.

To obtain 16S rRNA gene sequences, PCR products were sequenced by FLX Titanium instrument. The acquired sequences were analyzed by using the QIIME pipeline (v 1.3.0) (Caporaso et al. 2010). The sequencing data were initially de-multiplexed and separated into different samples based on barcodes. Then, the sequences in each sample were denoised by AmpliconNoise (implemented in QIIME) using the default parameters except that Perseus algorithm for chimera removal were disabled. Chimera checking was performed using Chimera Slayer (Haas et al. 2011). Finally, 15,379 and 17,991 cleaned 16S rRNA gene sequences with average lengths of 415 ± 26 bp and 417 ± 25 bp were obtained for AS and biofilm, respectively.

Reads merging, assembly, and gene prediction

For better accuracy of alignment and annotation, the Illumina paired-end reads generated from whole genome sequencing were merged to form longer “illumina tags (iTags)” at a minimum merging length of 10 bp. These iTags were then used for the following taxonomic analysis. Assembly of acquired Illumina reads of two samples was conducted by using CLC Genomics Workbench (v 4.9, CLC Bio, Denmark) (Albertsen et al. 2012). The contigs with length more than 1000 bp were retained and applied to gene prediction. MetaGeneMark was applied to find open reading frames (ORFs) from assembled contigs of the samples (Zhu et al. 2010). The predicted ORFs were then used for the following functional analysis.

Taxonomic analysis

Cleaned 16S rRNA gene sequences and merged iTags were used for taxonomic analysis by conducting Blast against Greengenes 16S rRNA database (DeSantis et al. 2006) and SILVA SSU rRNA database (Pruesse et al. 2007) using E-value cutoff of 1 × 10−20 (Mackelprang et al. 2011). Then, the 16S rRNA gene sequences and iTags were assigned to NCBI taxonomies by MEGAN (v 4.70.4) (Huson et al. 2007) using default parameters (min score = 50; to percent = 10; min support = 1; LCA percent = 100) via lowest common ancestor (LCA) algorithm to identify the nitrogen removal bacteria in AS and biofilm.

Nitrogen metabolism gene analysis

Predicted ORFs were used for analyzing nitrogen metabolism genes by conducting Blast against NCBI nr database with E-value of 1 × 10−5 (Mackelprang et al. 2011). The blast results were imported into MEGAN to employ KEGG pathways for annotation (Kanehisa et al. 2011). To screen for diagnostic Nitrospira-like amoA genes of comammox, ORFs were searched against a small set of characteristic amoA genes using Blast. Then, all ORFs were applied as references and Illumina reads were used to map ORFs by CLC software. The mapped reads number, rather than the ORFs number, was applied to quantify the abundance of nitrogen metabolism genes, since quantitative bias may generate when using ORFs in the comparison directly.

A bacterial nitrogen metabolic network was constructed in the present study by modifying KEGG nitrogen metabolism pathway. Total 1167 species (nonredundant) in Bacteria domain containing 51 nitrogen metabolism-related enzymes were downloaded from KEGG database. The abundances (appearance possibility) of each enzyme in 1167 bacterial species were counted and used to identify the major enzymes involved in nitrogen metabolism. The network of nitrogen cycle was also constructed by the same method. The networks of nitrogen metabolism and cycle were visualized by Cytoscape software (v 2.8.3) (Shannon et al. 2003).

Phylogenetic analysis

The 454 16S rRNA amplicons in the two samples, which were identified as NOB, were extracted and combined. The combined 16S rRNA gene sequences were clustered into operational taxonomic units (OTUs) by using Mothur software (v 1.24.0) with 97 % similarity (Schloss et al. 2009). One representative amplicon for each OTU was extracted by using get.oturep command in Mothur. The 16S rRNA genes of NOB species were extracted from the Greengenes database (DeSantis et al. 2006) as references for the phylogenetic analysis. The extracted NOB reference reads were trimmed according to the applied primers, and only V3V4 region of 16S rRNA genes of NOB references remained. After MUSCLE alignment, the trimmed reference 16S rRNA reads and representative reads were applied to construct neighbor-joining phylogenetic tree with Poisson distance and a bootstrap value of 1000 by using MEGA software (v 5.05) (Tamura et al. 2011).

Genome binning

A method combining coverage and tetranucleotide frequencies (TNFs) was applied to conduct genome binning (Albertsen et al. 2013). Briefly, the two metagenomic data sets were combined together and then conducted de novo assembly by using CLC software with a kmer of 63 and a minimum contig length of 1000 bp (McIlroy et al. 2014). Genome coverages of AS and biofilm samples were evaluated by mapping the reads in two metagenomes to the assembled contigs using CLC software, respectively. TNFs and GC content of assembled contigs were also calculated via scripts (Albertsen et al. 2013). The ORFs of contigs within the bins were predicted by Prodigal (Hyatt et al. 2012), and the essential proteins were identified using HMM models and HMMER3.0 (Dupont et al. 2012). The essential proteins were analyzed by conducting Blast against RefSeq protein database (v 52) using E-value of 1 × 10−5 and then annotated by MEGAN via LCA algorithm.

The bin extraction was conducted by using R (v 3.0.2) with several necessary packages. The bins were then initially extracted by plotting the genome coverage of contigs in AS and biofilm metagenomes. During the bin extraction, GC content and taxonomy of contigs were taken into consideration. To purify these bins, Vegan package (v 2.0-5) was applied to conduct principal component analysis (PCA) on TNFs among the contigs. Based on the PCA plot, bins were extracted again to separate the target genomic contigs from outliers. The integrity and redundancy of the acquired bins were estimated by comparing the binning contigs with essential single copy genes at phylum or class levels (Albertsen et al. 2013).

Genome annotation and comparison

The ORFs in the genome bins were predicted by Prodigal (Hyatt et al. 2012). Then, the obtained ORFs were analyzed by conducting Blast against NCBI nr database with E-value of 1 × 10−5. Then, the generated data were imported to MEGAN with default parameters for annotation via SEED subsystems (Overbeek et al. 2005) and KEGG pathways (Kanehisa et al. 2011). To evaluate the genomic similarity between obtained bins and their closest reference genomes, the average nucleotide identity (ANI) (Goris et al. 2007) was calculated by JSpecies software tool (IMEDEA, Esporles, Spain) using Blast with default and evaluated parameters (Richter and Rosselló-Móra 2009).

Statistical analysis

To statistically analyze the HTS datasets, STAMP analysis, experimental repeatability analysis, and diversity analysis were also conducted in the present study (for details, please refer to SI).

Accession numbers of HTS datasets

454 Pyrosequencing and Illumina metagenomic datasets applied in the present study were deposited in the NCBI Sequence Read Archive (SRA) database with accession numbers of SRP053269 and SRS826578, respectively.

Results

Illumina reads processing and experimental repeatability

After iTags merging, more than 67 million iTags with average length of 170 bp were obtained for the two samples (Table S1). For de novo assembly via CLC, about 51 and 45 % cleaned Illumina reads in AS and biofilm samples were used, resulting in 207,811 and 202,112 assembled contigs with average lengths of 2222 and 2045 bp, and then producing 580,369 and 548,216 ORFs, respectively.

To test the experimental repeatability, duplicate DNA samples were extracted independently from the AS sample and sent for Illumina sequencing, which generate 3.5 Gb and 2.9 Gb metagenomic datasets. The taxonomic analysis indicated that the repeatability of metagenomic sequencing was quite good at genus level or above, since the correlation coefficients (R 2) were higher than 0.977 between these technical duplicates, even of 0.895 at species level (Fig. S1).

Nitrogen removal bacteria in AS and biofilm

Three bacterial groups are involved in nitrogen removal in WWTPs, including AOB, AOA, NOB, denitrifiers, and anammox (Wagner and Loy 2002). Figure 1 shows the relative distribution of these bacterial groups at genus level in AS and biofilm. For nitrifiers, the metagenomic iTags indicated that the major AOB was Nitrosomonas and this bacterium had higher abundance in AS than in biofilm (3.3-folds higher). However, for 16S rRNA gene sequences, conflicting results were obtained because only biofilm contained Nitrosomonas while AS did not (Fig. 1). NOB had much higher abundance than AOB in the two samples. The major NOB was Nitrospira in the aerobic reactor. The phylogenetic analysis suggested the same results, as most of the NOB OTUs in AS (∼82 %) and biofilm (∼70 %) samples clustered together with Nitrospira moscoviensis (cluster I), Candidatus Nitrospira defluvii (cluster V), as well as Nitrospira marina (cluster VI), rather than other NOB genera (Fig. 2a).

Fig. 1
figure 1

Relative abundance of 16S metagenomic iTags (a) and amplicon 454 reads (b) of AOB (yellow), NOB (green), and denitrifiers (purple) at genus level in AS and biofilm based on LCA approach annotated by Greengenes. The data were visualized via Circos (v 0.62-1) (color figure online)

Fig. 2
figure 2

Phylogenetic tree of NOB OTUs (a) and their abundances in different clusters based on similarities between OTUs and references (b). The red and blue dots indicated the OTUs only existed in AS and biofilm, respectively. The purple dots indicated the OTUs were shared in both samples. The black and empty dots at the nodes indicated the bootstrap values were >50 and >90 %, respectively. The scale bar represented the number of nucleotide substitutions per site. The number and percentage in a revealed the abundance and relative abundance of the 16S rRNA gene sequences in the specific cluster. The 16S rRNA genes were compared by MUSCLE alignment. The phylogenetic tree was generated by the neighbor-joining method with 1000 bootstrap replications by using the MEGA software (v 5.05). In b, the reference for clusters I, II, and III was Nitrospira moscoviensis NSP M-1, and the reference for clusters IV and V was Candidatus Nitrospira defluvii (color figure online)

For denitrifiers, AS had higher diversity and abundance including several dominant genera, such as Dechloromonas, Hyphomicrobium, Paracoccus, Rhodobacter, and Zoogloea, while for biofilm, only Burkholderia, Hyphomicrobium, and Rhodobacter genera were found to have relatively high abundance, revealing that denitrifiers in biofilm were not as abundant and diverse as in AS (Fig. 1 and Table S3).

Functional genes related to nitrogen cycle in AS and biofilm

The bacterial nitrogen metabolic network containing 51 enzymes was constructed (Fig. 3). The results showed that several enzymes, e.g., glutamine synthetase (96 %), glutamate synthase (75 %), and aminomethyltransferase (72 %), could be produced by most of bacteria, revealing that they may play essential roles for nitrogen metabolism in Bacteria domain. The key enzymes for nitrogen cycle, e.g., ammonia monooxygenase (AMO), hydroxylamine oxidoreductase (HAO), nitrite oxidoreductase (NXR), nitrate reductase (NAR), nitrite reductase (NIR), nitric oxide reductase (NOR), nitrous oxide reductase (NOS), and nitrogenase (NIF), were also included in the network. However, these enzymes were found in fewer bacteria, especially for those nitrification enzymes, like AMO (0.69 %), HAO (1.1 %), and NXR (0.17 %) (Fig. 3).

Fig. 3
figure 3

KEGG network of nitrogen metabolism (a) and cycle (b). The nitrogen metabolism network was constructed by using 1167 bacterial species (nonredundant) in KEGG database. Total 80 nodes were contained in the network, including 51 enzymes, 24 compounds, and 5 connecting pathways. The statistical analysis was performed by two-tailed G-test via STAMP software, and corrected q-values were calculated by Benjamini-Hochberg’s FDR approach. *** indicated q < 10−15; ** indicated 10−15 < q < 10−10; * indicated 10−10 < q < 10−2

The key genes related to nitrogen cycle were identified by comparing ORFs with NCBI nr database (Figs. 3 and 4). For AMO, the abundances of mapped reads were relatively low (less than 0.5 % in nitrogen cycle genes) in the two samples. Similar results could also be obtained when studying the HAO gene. The ORFs annotation showed the major genes related to AMO and HAO belonged to two genera with close phylogenetic relationship, i.e., Nitrosomonas and Nitrosospira (Fig. 4). For NOB, all the detected NXR genes belong to the Candidatus Nitrospira defluvii (Fig. 4). The biofilm, comparing with the AS, had significantly higher abundances of NXR genes according to the STAMP analysis (Fig. 3). For denitrification, the abundances of several functional genes, including NAR, NIR (NO forming), and NOR, were significantly higher in biofilm than in AS (Fig. 3).

Fig. 4
figure 4

Heat map of functional microorganisms in nitrification and denitrification processes based on mapped reads to the annotated ORFs. Fifty-five bacterial genera which contained at least one of related proteins were displayed. The minor genera with relative abundance less than 0.5 % were combined and also shown in the heat map. The NIR here only included nitric-oxide forming genes (EC 1.7.2.1). The neighbor-joining phylogenic tree (1000 bootstraps) on the left was constructed based on the full length of 16S rRNA gene reference sequences downloaded from Greengenes database or NCBI and drawn by MEGA (v 5.05)

Further, the assembled 919 ORFs (470 in AS and 449 in biofilm) related to nitrogen removal were compared to all genes deposited in the NCBI nr database via Blast. About 49 % (453 ORFs) of the assembled denitrification ORFs from the AS and biofilm metagenomes were more than 75 % identical to genes deposited in NCBI nr database, whereas 7.6 % (70 ORFs) of the ORFs had less than 50 % identity to any known proteins (Table 1).

Table 1 Identities of the nitrogen removal genes in AS and biofilm samples

Bins extracted from AS and biofilm metagenomes

Based on the coverage and TNFs, 12 bins with partial to near-complete integrity and relatively low contamination were extracted from AS and biofilm metagenomes (Fig. 5). Among the 12 extracted bins, 3 pairs of bins were found to have similar taxonomic assignment but belong to different species according to the phylogenetic analysis based on the annotation of ORFs (Ishii et al. 2013), e.g., Bin1 and Bin5, Bin2 and Bin4, as well as Bin9 and Bin11 (Table 2). Thus, the genomic and functional similarity between these bins was analyzed accordingly (Fig. S2). According to the ANI estimation, all the paired bins belonged to different species with the ANI cutoff of 95 % (Richter and Rosselló-Móra 2009), except for the two Bacteroidetes bins with ANI of 63 % and aligned genome of 10 %. For functional comparison, although most of SEED functions were comparably represented in paired bins, several metabolic subsystems showed to be significantly different (Fig. S2) according to the STAMP analysis, such as “Carbohydrate,” “Protein metabolism,” “Nitrogen metabolism,” and “Metabolism of aromatic compounds,” indicating that these bins may play different roles in removing carbohydrates, nitrogen, and organic pollutants in the WWTPs.

Fig. 5
figure 5

The genomic binning by using metagenome coverage of AS and biofilm samples (a), the heat map of nitrification and denitrification genes in extracted bins (b), and gene context of NAR genes clusters in contigs of Bin2, Bin4, and Bin9 (c). All circles in a represented assembled contigs, scaled by the square root of their length and colored by taxonomic phylum identified using essential single copy genes. The NAR genes cluster in Gordonia bronchialis DSM 43247 was applied as the reference for Bin2 and Bin4 in C, since it was their closest reference genome (Table 2). For Bin9 in C, its closest reference Mycobacterium smegmatis str. MC2 155 does not contain the NAR genes cluster

Table 2 Evaluation and phylogenetic analysis of extracted bins

The abundances of nitrification and denitrification genes in 12 bins were estimated. The results indicated that no nitrification genes could be found in these bins (Fig. 5b). Different from nitrification genes, abundant denitrification genes were found in most of bins. Besides, several bins were found to contain the NAR genes clusters, e.g., contig_453 in Bin2, contig_50 in Bin4, and contig_2798 in Bin9 (Fig. 5c). Comparing with the Gordonia bronchialis DSM 43247, the NAR operons in Bin2 and Bin4 shared quite similar organization. However, the NAR operon in Bin9 arranged differently, as narI gene was replaced by narX in this bin (Fig. 5c). This result implied the organization of NAR operons would be diverse in different denitrifiers.

Metagenomic evidence for the presence of Nitrospira-like comammox

Nitrospira-like comammox with complete ammonia oxidation capability have been reported recently (Daims et al. 2015; van Kessel et al. 2015). In order to investigate the comammox in the aerobic reactor, we annotated the ORFs from AS and biofilm against NCBI database and found two putative comammox amoA genes (AS.ORF_162729_2 and BF.ORF_167593_2). Their levels of identity to the methane monooxygenase (pmoA) gene both were 89 %, to the corresponding amoA gene of Candidatus Nitrospira inopinata were 91 and 90 %, respectively (Fig. 6). Furthermore, the reads mapped to these ORFs were 296 and 96 in AS and biofilm metagenome, respectively, indicating their low abundances of <0.1 % in either form. Meanwhile, another AS ORF (AS.ORF_163252_2) was affiliated with amoA genes from ammonia-oxidizing bacteria with identity of 88 % to the Nitrosomonas sp. amoA gene (AF327918).

Fig. 6
figure 6

Phylogenetic tree of the amoA and pmoA gene sequences. The red dots indicated the putative amoA genes in the assembly from AS and BF metagenomic data sets. The green dots indicated the reported Nitrospira-like comammox amoA genes from comammox draft genomes. The black and empty dots at the nodes indicated the bootstrap values were >50 % and >90 %, respectively. The scale bar represented the number of nucleotide substitutions per site. The ORFs and genes of amoA and pmoA were compared by MUSCLE alignment. The phylogenetic tree was generated by the neighbor-joining method with 1000 of bootstrap replications by using the MEGA software (v 5.05) (color figure online)

Discussion

For AOB, the metagenomic iTags indicated that the major AOB was Nitrosomonas and its relative abundance was higher in AS than in biofilm (Fig. 1), consistent with previous studies which also found that Nitrosomonas bacteria were the dominant AOB in AS (Koops et al. 1991; Layton et al. 2005; Park and Noguera 2004) and more AOB existed in suspended biomass than in attached biomass (Kim and Kwon 2011). However, for 16S rRNA gene sequences, conflicting results were obtained because only biofilm sample was found containing Nitrosomonas. We checked the AS Nitrosomonas-16S iTags by MEGAN via LCA algorithm and found that AS Nitrosomonas belonged to two species, i.e., Nitrosomonas sp. JL21 and Nitrosomonas sp. Nm51. For biofilm, the Nitrosomonas-like iTags and amplicons could not be assigned into any current species in Nitrosomonas by Blastn against to either Greengenes or SILVA SSU databases with E-value of 1 × 10−20, revealing that biofilm may contain novel Nitrosomonas species.

For NOB, Nitrospira was identified as the major NOB in AS and biofilm samples (Fig. 1), agreeing with other previous studies (Alawi et al. 2009; Ehrich et al. 1995; Robinson et al. 2003; Watson et al. 1986). Further phylogenetic tree analysis also confirmed this result. Interestingly, the NOB OTUs in clusters II, III, and IV, about 18 % and 30 % of 16S rRNA gene sequences in AS and biofilm, cannot cluster closely to any references in the tree, as the similarities between NOB OTUs and their closest reference were 93–96 % (Fig. 2b). This suggested that these OTUs should be new species in Nitrospira genus, according to the well-known cutoff of 98.7–99 % similarity for definition of prokaryotic species (Stackebrandt and Ebers 2006). In the clusters containing novel NOB species, besides the 13 OTUs shared by biofilm and AS (purple dots in Fig. 2a), 13 OTUs were obtained in biofilm exclusively (blue dots), and 4 exclusively in AS only (reds dots), revealing the more diverse novel NOB species (similarity <97 %) existed in biofilm compared to the AS samples.

It is well accepted that the application of biofilm could enhance nitrogen removal in WWTPs. One of the possible reasons is that the biofilm could effectively retain the slow-growing nitrifiers (Chung et al. 2007). However, the nitrifiers in biofilms, especially AOB, did not accumulate to a significantly higher abundance than in AS (Fig. 1). This might be due to the high oxygen requirement for nitrification, normally 4.6 g O2 for 1.0 g NH4 +-N (Paredes et al. 2007), and the nitrifiers in AS could access dissolved oxygen (DO) more easily than those in biofilm with dense biomass. For denitrification, the gradients of DO concentration in biofilm matrix may facilitate the existence of denitrifiers, as major denitrifiers prefer anoxic environment (Zhu et al. 2008). However, the results suggested that denitrifiers in biofilm were also not as abundant or diverse as in AS (Fig. 1 and Table S3). Similar results were also obtained by FISH method (Hibiya et al. 2003), which found that the denitrifiers mainly distributed outside of the biofilm and in the AS. This phenomenon might be caused by the following reasons. First, adequate organic carbon (2.9 g chemical oxygen demand for 1.0 g NO3 –N) is necessary for denitrification (Zhu et al. 2008). Thus, denitrifiers may accumulate in AS and surface of biofilm to access more organic carbon. Second, several denitrifiers could reduce nitrate under aerobic conditions and may prefer growing in AS with more oxygen, e.g., bacteria in Paracoccus (Baumann et al. 1996) and Thauera (Scholten et al. 1999) genera, consisting with the observation that high abundance of Paracoccus was only found in AS (Fig. 1). Considering nitrifiers and denitrifiers together, the abundance and diversity of nitrogen removal bacteria in biofilm did not surpass that in AS.

In the present study, a nitrogen metabolic network for Bacteria domain was constructed by using 1167 non-redundant bacterial species for the first time and was used to evaluate the nitrogen cycle enzymes in the AS/biofilm system. According to the established network (Fig. 3), minor bacteria contained enzymes for nitrification, consistent with the fact that only several genera of bacteria were found to play important roles during ammonia and nitrite oxidation (Purkhold et al. 2000). Several significant differences could be found when comparing AS/biofilm system with the network. For instance, quite few bacteria contained enzymes of NXR, NAR (EC 1.7.1.1), and NIR (EC 1.7.7.1), whereas relatively higher abundances of related genes were found in two samples. For NIF enzyme (EC 1.19.6.1), although 24 % of bacteria contained this enzyme, its gene was not observed in the two samples. All of these indicated the AS/biofilm system may have special features in nitrogen cycle.

According to the analysis of the nitrogen cycle genes, the abundances of AMO and HAO genes were relatively low in AS and biofilm samples (Figs. 3 and 4). This was consistent with the taxonomic analysis results which indicated that the abundances of AOB were quite limited (Fig. 1) and the reactions of ammonia oxidation may be not vigorous even when biofilm presented in the reactor. The ORF annotation suggested that AMO and HAO genes belonged to Nitrosomonas and Nitrosospira (Fig. 4), agreeing with the previous knowledge that the most identified AOB in WWTPs are either closely related to or belong to Nitrosomonas species (Layton et al. 2005). Ammonia-oxidizing archaea, which are abundant in nature and play significant roles in global nitrogen cycle (Prosser and Nicol 2008), were not identified in biofilm or AS by either 16S rRNA genes or AMO ORFs, indicating that they were not major nitrification players in this aerobic reactor, consistent with the observations in previous reports (Yu and Zhang 2012).

Interestingly, Nitrospira, which was considered as a NOB (Ehrich et al. 1995; Lücker et al. 2010; Watson et al. 1986), contained abundant denitrification genes (Fig. 4). This indicated that the Nitrospira may also play significant roles in denitrification process since the NXR of NOB may work reversibly under anaerobic conditions (Starkenburg et al. 2008) (possibly in biofilm bottom area) and NAR/NIR genes existed in Nitrospira (Lücker et al. 2010). Considering the nitrification and denitrification genes together, the abundance and diversity of nitrogen cycle genes in biofilm was significantly higher than that in AS (Figs. 3 and 4). This was different from the taxonomic results, which showed that the abundance of denitrifiers in biofilm did not surpass that in AS (Fig. 1). This difference indicated the nitrogen removal bacteria in biofilm contained more abundant or diverse functional genes related to the nitrification and denitrification, revealing that the increased nitrogen removal ability by applying biofilm in WWTP reactors might be mainly attributed to the removal enhancement, rather than the biomass accumulation of nitrogen removal bacteria.

According to the identities of the nitrogen removal genes acquired from Blast ORFs against NCBI nr database, 7.6 % of the ORFs had less than 50 % identity to any known proteins (Table 1), suggesting that the nitrogen removal bacteria may contain novel functional genes in AS and biofilm samples (Hess et al. 2011). More importantly, for several key genes, e.g., hao, narJ, nirK, and norC, relative higher abundance (10 to 25 %) of assembled ORFs were found to be novel (<50 % identity) in the reactor (Table 1). What is more, several extracted bins, e.g., Bin1, Bin5, and Bin8, also contained novel functional genes (<50 % identity) related to denitrification (Table S2). Especially for Bin8, two of four NAR ORFs were novel and the average identity of the four NAR ORFs was only 52 %, suggesting that this bin could be a new denitrifier. All of these results implied that the current understanding of the nitrification and denitrification processes in WWTPs is still limited even after nitrogen removal in various WWTP reactors has been intensively studied for years.

Based on the coverage and TNFs, 12 bins were extracted from AS and biofilm metagenomes. Most of bins had comparable and relatively high coverage in two metagenomes (Table 2 and Fig. 5a). This demonstrated again that dominant bacteria coexisted in both AS and biofilm samples of this aerobic reactor. Two bins (Bin1 and Bin5), which belonged to Bacteroidetes, had much higher coverage in AS (∼64 for Bin1 and ∼33 for Bin5) than in biofilm (∼0.06 for two bins), suggesting preference of these Bacteroidetes populations in AS. The Bacteroidetes populations in AS might be mainly attributed to the human feces, where this bacterial phylum is dominant (Eckburg et al. 2005). Based on the ANI evaluation, 12 extracted bins should belong to novel bacterial species, as the ANI values between bins and their closest references were varied between 63 % and 81 %, considering the prokaryotic species cutoff of 95–96 % using ANI (Richter and Rosselló-Móra 2009).

In nitrogen metabolism pathways, no nitrification genes were found in 12 extracted bins (Fig. 5b). This could be attributed to the following possible reasons. On one hand, quite few bacteria have nitrification genes in either natural or artificial ecosystems (Purkhold et al. 2000), i.e., only 0.69 %, 1.1%, and 0.17 % bacteria in KEGG collection contain genes related to AMO, HAO, and NXR, respectively (Fig. 4). On the other hand, a NOB bin (Bin12), which belonged to Nitrospira phylum, was extracted in the present study. However, the NXR genes were absent in this bin, probably caused by the incomplete extracted genome (52 % integrity, Table 2). For denitrification genes, abundant ORFs were found in most of bins (Fig. 5b), further demonstrating that the denitrification process could be quite efficient in the hybrid biofilm and suspended growth reactor, consistent with the observation in Fig. 4.

Furthermore, we found the metagenomic evidence for the presence of complete ammonia oxidizers in the AS and biofilm samples. Two of the Nitrospira-like amoA genes were identified to be clustered with one of the identified comammox Candidatus Nitrospira nitrosa (van Kessel et al. 2015), but in very low abundances. The previous reported comammox were observed or enriched with low substrate concentration, which was not the situation in this study. Therefore, comammox cannot play important role in the nitrogen removal in this WWTP although it has changed our perspectives of the nitrogen cycle. The metagenomes have revealed the functional potentials of the novel Nitrospira-like amoA genes, but cannot verify their functions, which requires further confirmation through RNA-based metatranscriptomes.

To sum up, metagenomic coupled with PCR-based sequencing reads were generated from AS and biofilm samples in this study to evaluate the diversity and functions of nitrogen removal bacteria in suspended and attached biomass in a full-scale wastewater treatment reactor. Based on the taxonomic and functional analyses of nitrogen removal bacteria, more nitrification and denitrification genes were found in biofilm than AS. Noticeably, further investigations revealed that higher proportions of novel NOB species (30 % vs 18 %) and nitrogen removal related genes (9.4 % vs 6.0 %) were found in biofilm than AS. Moreover, the identification of Nitrospira-like amoA genes provided metagenomic evidence for the presence but low abundance of comammox. The findings have significant implications in changing our concepts towards biological nitrogen transformations in wastewater treatment.