Introduction

Nitrification is a vital link between aerobic and anaerobic pathways in the biogeochemical cycle of nitrogen (Kraft et al. 2014; Koch et al. 2015). So far, three kinds of functional groups of nitrification microbes have been discovered. Ammonia-oxidizing microorganisms (AOM), including ammonia-oxidizing bacteria (AOB) and ammonia-oxidizing archaea (AOA), catalyze the first step of nitrification, transforming ammonia to nitrite via hydroxylamine. The second step, oxidizing nitrite to nitrate, is executed by the group of nitrite-oxidizing bacteria (NOB). Interestingly, since the pioneering studies by Winogradsky more than a century ago (Winogradsky 1892), the two nitrification steps have been widely thought to be performed by these two distinct groups, until the third kind group was discovered (Daims et al. 2015; van Kessel et al. 2015; Santoro 2016). This group can completely oxidize ammonia to nitrate, named complete ammonia oxidizers (comammox). Detected comammox have been divided into clade A and clade B comammox Nitrospira (COM-A and COM-B). However, with the same substrates and products, the nitrification reaction in different communities is catalyzed by phylogenetically distinct enzymes. The key nitrification-related ammonia monooxygenase genes (amo) and hydroxylamine oxidoreductase genes (hao) of comammox are different from homologs of other groups, which could be reliable markers of comammox (Daims et al. 2015; van Kessel et al. 2015; Daims et al. 2016). Comammox nitrite oxidoreductase genes (nxr) are highly similar to the genes of some strictly nitrite-oxidizing Nitrospira (Daims et al. 2015; van Kessel et al. 2015; Daims et al. 2016).

Recent advances in comammox research not only dramatically changed our perspective on nitrification but also challenged our technology and method to bridge the knowledge gaps (Daims et al. 2016). Comammox, as newly discovered nitrifiers, proved to be widespread in both engineering and natural environments. Previous studies have reported the assembling of complete or near-complete genomes of comammox Nitrospira from a moderately thermophilic biofilm that grew on the metal surface of a pipe (Daims et al. 2015), trickling filter connected to recirculation aquaculture system (van Kessel et al. 2015) and drinking water system (Pinto et al. 2015; Wang et al. 2017b). Key nitrification gene sequences obtained from metagenome data confirmed the presence of comammox in groundwater-fed rapid sand filter (Palomo et al. 2016), full-scale hybrid biofilm, and activated sludge reactor (Chao et al. 2016). Additionally, community analysis based on PCR assays detected comammox in environments like coastal water, river sediments, brackish lake sediments, rice paddy soils, forest soils, and freshwater recirculating aquaculture system (Bartelme et al. 2017; Pjevac et al. 2017; Wang et al. 2017a). Moreover, forest soils with pH ranging from 4.0 to 9.0 all showed the potential of comammox (Hu and He 2017). Even the fine particulate matter in atmosphere could be a habitat for comammox (Gao et al. 2016a). The ubiquitous inhabitation of comammox indicates their potential contribution to nitrification in diverse ecosystems.

Comammox not only have been discovered extensively in various environments but also have been reported to be universally coexisted with AOA, AOB, and strictly NOB. Therefore, community analysis is of great significance for profound understanding of the coexistence in microbial ecology. However, analysis for comammox diversity and abundance was not abundant like other nitrifiers. For example, such information of AOM and NOB in different ecological systems has been widely examined by PCR experiments with corresponding primers targeting 16S rRNA genes or functional genes (Zheng et al. 2013; Kandel et al. 2014; Reeve et al. 2016). Relationships of canonical nitrifying community structure and abundance with environmental factors were also broadly identified (Liu et al. 2017). In contrast, knowledge about the community diversity of comammox and associations with environmental factors is much limited. Study of groundwater-fed rapid sand filter microbial communities showed that comammox amoA genes were the most abundant compared with classical amoA genes of AOA and AOB in the gene catalogs by metagenomics (Palomo et al. 2016). Experiments with highly degenerate primers targeting copper-containing membrane-bound monooxygenase genes showed relatively high content of comammox bacteria compared with AOB among most of the samples (Wang et al. 2017a). However, given the limitation of metagenomics and amplification PCR assays in efficient and accurate quantification, the abundance of comammox bacteria needs to be confirmed by other methods such as comammox-specific quantitative real-time PCR (Gomez-Alvarez et al. 2009; Ushio et al. 2014; Daims et al. 2016; Wang et al. 2017a). Primers targeting amoA of both COM-A and COM-B were designed and quantitative real-time PCR assays were performed in engineering and natural environments; however, overestimation may occur due to the possible non-specific amplification (Pjevac et al. 2017).

Having been ignored for a long time, comammox presently become the research focus. Here, we detected the COM-A in enrichment culture of tidal sediments for the first time. Both metagenome and metatranscriptome technology were applied to identify the metabolic potential and expression patterns of comammox. Given that estuary and coastal areas are a global hotspot for the nitrogen cycle (Baker et al. 2015; Loken et al. 2016; Osburn et al. 2016; Hou et al. 2018) and nitrification is a key step in nitrogen transformation processes (Lam et al. 2009; Koch et al. 2015), comammox discovered in such sites indicated their significance in natural environments. For further study from enrichment culture back to natural environments, a specific primer set was designed using COM-A amoA gene sequences obtained from metagenome and metatranscriptome data in this study and published database, thus providing a good method to re-evaluate the ecological roles of nitrifiers. Overall, this study provides new insight into characteristics of comammox diversity and abundance compared with canonical AOM and strictly NOB in natural environments and potential correlations of environmental variables with comammox diversity.

Materials and methods

Enrichment and cultivation of nitrifying bacteria

In the enrichment and cultivation, a 4.0-L bioreactor, equipped with pH, temperature, dissolved oxygen, and liquid level sensors, was prepared. After sterilization of the reactor tank, silicone tubes, and artificial medium (composition described below), the bioreactor was inoculated with about 50 g fresh sediments from the tidal flat of the Yangtze Estuary (Fig. S1), supplied with the medium at the rate of 1.4 mL/min and operated at the stirring rate of 250 rpm. The system pH was maintained at 7.49 ± 0.09 using a 100 g/L KHCO3 solution, and temperature was kept at 26.9 ± 0.8 °C by a heating pedestal and cooling water circulation system. Membrane filtrated air was flushed into the reactor liquid to ensure the dissolved oxygen concentration at 94.3 ± 3.6% relative to oxygen saturation level. Supernatant was removed through membrane when the volume is beyond 4.0 L. The reactor tank was covered to keep dark all the time. Artificial medium was prepared with pure water and contained 0.147 g/L CaCl2·2H2O, 1.72 g/L NaCl, 0.58 g/L MaSO4·7H2O, 0.054 g/L KH2PO4, and 1 mL/L of trace element solutions 1 and 2, respectively. The concentration of NH4Cl in medium increased gradually from 9 to 36 mg/L. Trace element solution 1 (per liter) contained 5 g EDTA and 5 g FeSO4. Trace element solution 2 (per liter) contained 15 g EDTA, 0.1 g ZnSO4·7H2O, 0.002 g CoCl2·6H2O, 0.2 g MnCl2·4H2O, 0.02 g CuSO4·5H2O, 0.1 g Na2MoO4·2H2O, 0.05 g H3BO4, and 0.1 g NiCl·6H2O. After centrifugation (4000 rpm at room temperature for 5 min) of the liquid which was sampled from the reactor and discard of the supernatant, reactor biomass was prepared for downstream nucleic acid extraction. Concentrations of ammonium, nitrite, and nitrate in the bioreactor were detected by photocolorimetric method (Hou et al. 2007).

DNA extraction, sequencing, and metagenomic analysis

Total genomic DNA of triplicate reactor biomass with about 1-month cultivation was extracted using PowerSoil®DNA Isolation Kit (MO BIO, USA). One mixed DNA sample was prepared for library construction. DNA was sheared with Covaris S220 Focused Ultrasonicator to create 350-bp fragments randomly. The fragments were end-polished, poly-A tailed, and ligated with the full-length adaptor for Illumina sequencing. Then the products were purified using bead-size selection for range ~ 400–500 bp with AMPure XP system, and the libraries were analyzed for size distribution by Agilent 2100 Bioanalyzer and quantified using real-time PCR. The clustering of the index-coded samples was performed on a cBot Cluster Generation System according to the manufacturer’s instructions. After cluster generation, the library preparations were sequenced on an Illumina HiSeq platform, and 150-bp paired-end reads were generated. Raw data were pretreated by removing adapters and trimming the reads (threshold quality = 38; minimum length = 40). After quality control of FastQC (Baraham Bioinformatics), high-quality clean data were assembled to scaftigs with MEGAHIT (Li et al. 2015), and scaftigs with more than 500 bp (Mende et al. 2012; Nielsen et al. 2014) were retained and applied for open reading frame (ORF) prediction with MetaGeneMark (Mende et al. 2012; Karlsson et al. 2012). Sequences with length less than 100 nt were removed (Shinichi et al. 2015). Retained sequences were simplified using CD-HIT with 95% identity and 90% coverage (Nielsen et al. 2014). Finally, SoapAligner was used to filter ORFs with supported read numbers less than 2 (Fu et al. 2012). Retained ORFs, called unigenes, were then used for taxonomic and functional annotation. Gene abundance analysis was based on mapping read number and gene length (Cotillard et al. 2013). Taxonomic analysis was performed by conducting BLASTP against NCBI nr database using software DIAMOND (Buchfink et al. 2014) with a maximum e value cutoff of 10−5, and software MEGAN (Huson et al. 2011) with LCA (lowest common ancestor) algorithm was used for taxonomic annotation. Software DIAMOND was also used for functional annotation by BLASTP with a maximum e value cutoff of 10−5 (Li et al. 2014) against Kyoto Encyclopedia of Genes and Genomes (KEGG) database (Kanehisa et al. 2013). The highest scoring annotated hit(s) (one high-scoring segment pair (HSP) > 60 bits) were selected for annotation (Qin et al. 2012). Compared with the reference of published genomes, the software of checkM was used to estimate the genome completeness (Parks et al. 2015), and the Soap.coverage package was used to calculate genome coverage (Jiang et al. 2015) with nucleotide sequence identity cutoff of 80%, e value < 10−5, and minimum length of 100 bp. With the reference of function annotation, key nitrification-related sequences were aligned manually by BLASTP at the nr database for inspection.

RNA extraction, sequencing, and metatranscriptomic analysis

After 1-month cultivation, triplicate liquid samples were collected from the reactor using clean syringe at the same time. About 45 mL liquid was added in each RNase-free tube. Total RNA of reactor biomass was extracted using EZNA® Soil RNA kit (Omega Bio-tek, Norcross, GA, USA). After removal of rRNA using Ribo-Zero Magnetic Kit (Epicenter, Charlotte, NC, USA), resulting enriched RNA samples were mixed and went through library preparation by TruSeq™ RNA Sample Prep Kit (Illumina, San Diego, CA, USA) and were treated with cBot Truseq PE Cluster Kit v3-cBot-HS (Illumina, USA) for downstream RNA-Seq analysis (Illumina Hiseq, Truseq SBS Kit v3-HS, 200 cycles). Original raw data quality was visualized using FastQC (Andrews 2010) and then cleaned and trimmed using a minimum phred score of 20, a minimum length of 50 bp, allowing no ambiguous nucleotides by Seqprep Sickle (https://github.com/najoshi/sickle) and Sickle (https://github.com/najoshi/sickle). Then rRNA reads were trimmed off by SortMeRNA software after alignment against SILVA SSU (16S/18S) and SILVA LSU (23S/28S) databases (Pruesse et al. 2007). High-quality paired-end reads, named clean data, were assembled using Trinity (Haas et al. 2013) to get transcript sequences (minimum length 300 bp). ORFs were predicted in the assembled sequences using TransGeneScan (Grabherr et al. 2011), and non-redundant gene catalog was constructed using CD-HIT (Cluster Database at High Identity with Tolerance) with 95% identity and 90% coverage (Fu et al. 2012). Gene expression was then evaluated with the value of FPKM (fragments per kilobase of transcript, per million fragments sequenced) (Trapnell et al. 2010) which is computed using RSEM (RNA-Seq by Expectation-Maximization) (Li and Dewey 2011). Taxonomy annotation of the catalog was obtained using BLASTP at NR database with a maximum e value cutoff of 10−5 and at KEGG database for functional annotation. With the reference of function annotation, key nitrification-related transcript sequences were also aligned manually by BLASTP at the nr database for inspection.

Phylogenetic analysis of key amino acid sequences

To analyze the phylogenetic characteristics of comammox enriched from tidal sediments, amino acid sequences of AmoA/B/C, Hao, and NxrA/B from metagenome and metatranscriptome data in this study and reference sequences from NCBI GenBank and Integrated Microbial Genomes databases (IMG-ER and MER) were imported to MEGA software (v 5.03) to construct neighbor-joining phylogenetic tree with a bootstrap value of 1000.

Primer design and evaluation

Due to the close relationships of 16S rRNA, nxrA, and nxrB between comammox and strictly nitrite-oxidizing Nitrospira (Daims et al. 2015; Pinto et al. 2015; van Kessel et al. 2015), functional marker gene of amoA, which is obviously distinguished from homologs of canonical AOM and NOB, is suitable for application in diversity analysis. COM-A, COM-B, and betaproteobacterial ammonia-oxidizing bacteria (β-AOB) amoA sequence databases were constructed using sequences from NCBI Genebank and Integrated Microbial Genomes databases (IMG-ER and IMG-MER) and sequencing data in this study. A BLASTP search was performed in NCBI using one COM-A amoA sequence obtained in this study as a query, word size = 2, BLOSUM 45, E = 10 (Daims et al. 2015), and returned sequences (identity > 80%) were downloaded. Comparable procedures were performed to generate a comprehensive set of COM-B and β-AOB amoA sequences. Sequences in IMG-ER and IMG-MER databases were collected by searching with accession numbers listed in other studies (Pinto et al. 2015; Pjevac et al. 2017). Non-objective gene sequences were filtered by phylogenetic analysis. Then complete sequences of amoA clade A (> 800 bp) and clade B (> 850 bp) were used to search cross-species primers by Primer Premier software (version 5.0) (Lalitha 2000). The parameters for the primer design were 17 to 25 nucleotides in length, GC content of 45 to 65%, and product length of 100 to 600 bp. Some mismatched bases were changed into degenerate bases allowing no 3′ end mismatches (Klindworth et al. 2013). R package BioStrings were used to evaluate the primer coverage in silico (Pages 2014; Berry and Gutierrez 2017). COM-A amoA sequences with the primer binding site were collected, and coverages of the primers allowing 0, 1, and 2 mismatches were calculated, respectively (Gaby and Buckley 2014). To evaluate the primer specificity, coverage of the primer pairs in β-AOB and COM-B amoA database was also calculated.

COM-A amoA gene amplification, sequencing, and diversity analysis of environmental samples

To analyze the comammox diversity, the newly designed primer set was then applied to different natural ecosystems. Sediment samples (0–5 cm) were collected using corers from sites located in lake, river, tidal flat, estuary, and coastal sea in January 2017 (Fig. S1). At each site, five sediment cores were collected and homogenized under a nitrogen atmosphere as a composite sample. Total genomic DNA of these samples was extracted as described above. PCR was performed in a total volume of 25 μL containing 12.5 μL 2 × Taq PCR Master Mix (Sangon, China), 0.2 μL each primer (10 μM; Sangon), 10.1 μL double-distilled water, and 2 μL template DNA. After optimization of the annealing temperature, PCRs were carried out with 5 min at 95 °C; 35 cycles of 94 °C for 30 s, 54 °C for 40 s, and 72 °C for 40 s; and 72 °C for 5 min.

PCR products of COM-A amoA genes were electrophoresed on 1% agarose gels and purified with the Gel Advance gel extraction system (Viogene, China), followed by the ligation to vector to construct the clone libraries using the TOPO-TA cloning kit (Invitrogen, USA). About 50 clones per library were selected randomly for sequencing using an ABI Prism genetic analyzer (Applied Biosystems, Canada) and the Big Dye Terminator kit (Applied Biosystems, Canada).

To analyze COM-A diversity in different ecosystems, retrieved sequences were trimmed with DNAstar software package (DNAstar, USA), then clustered into OTUs (operation taxonomic units) using Mothur software (Schloss et al. 2009) (v 1.33.3) with 95% similarity. One representative amplicon for each OTU was extracted by Mothur and imported to MEGA software and translated into amino acid sequences. Together with other protein sequences from NR and IMG databases, neighbor-joining tree with a bootstrap value of 1000 was constructed.

qPCR assays for COM-A, AOA, β-AOB, and Nitrospira

To analyze the abundance of COM-A, AOA, β-AOB, and Nitrospira in different ecosystems, real-time PCR was performed by an ABI Prism 7500 Real-Time PCR System (Applied Biosystems, Canada). The oligonucleotide sequences and annealing temperature of all primers used in this study are shown in Table S1. Detailed methods for qPCR assays are also given in supplementary files.

Statistical analysis

The Chao 1 species richness and Shannon/Simpson indexes of each clone library were calculated using the Mothur program with 95% identity. Community distribution of COM-A was conducted with principal coordinate analysis (PCoA) using Mothur program and PHYLIP version 3.695 based on the distance matrix (Plotree and Plotgram 1989). To assess the spatial variance of COM-A, AOA, and β-AOB assemblages, one-way analysis of variance (ANOVA) was applied using Statistical Package of Social Sciences (SPSS, version 16.0). Redundancy analysis (RDA) was executed in Canoco 4.5 for Windows to clarify the influence of environmental factors on community composition of COM-A and on abundance of functional genes, including COM-A, AOA, and β-AOB amoA and Nitrospira nxrB (Gao et al. 2016b).

NCBI sequence accession numbers

Metagenomic and metatranscriptomic reads of the reactor biomass are available from the NCBI SRA (SRP122254 and SRP122263). Partial COM-A amoA gene sequences (only the unique ones) from natural environments and bioreactor system have been deposited in GeneBank under accession numbers MG591522 to MG591659 and MG591660 to MG591681, respectively.

Results

Basic information of the reactor system

Nitrifying bacteria of the tidal sediments from the Yangtze Estuary (Fig. S1) were enriched to avoid interference of the complex environmental factors and analyze the nitrification microbes more specifically. The enrichment culture was under the condition that the ammonium concentration of the culture medium gradually increased from 0.5 to 2 mM in the first 3 weeks and then kept at the level of 2 mM. After enrichment cultivation for about 1 month, basic detection values showed that ammonium in the reactor maintained at the level of 0 mM, with contents of nitrate about 2.05–2.31 mM and nitrite below the limit of detection, indicating the enhanced ability of the reactor biomass to transform ammonia to nitrate via nitrite. Detection values of the reactor system are shown in Table S2.

Metagenomic evidence for the presence of comammox

To analyze the community composition in the reactor system, metagenome technology was applied. We sequenced whole-community DNA for the reactor biomass, generating 20G bp of high-quality, paired-end sequence data. Reads were de novo assembled and produced 370,413 scaftigs. Prediction of coding sequences resulted in a non-redundant gene catalog of 762,516 open reading frames (ORFs). Detailed sequencing and assembling information and community abundance are shown in Tables S3 and S4. At the phylum level, Nitrospirae was the most abundant population (39% of all the community), followed by the Proteobacteria (27%). At the species level, 16% of reads were annotated as Candidatus Nitrospira inopinata, Candidatus Nitrospira nitrosa, and Candidatus Nitrospira nitrificans (with genome completeness from 66.8 to 95.9%, genome coverage from 25.1 to 55.9%) which were all published COM-A, and 17% were annotated as Nitrospira defluvii and Nitrospira moscoviensis (with the genome completeness from 56.0 to 100%, genome coverage from 14.2 to 60.8%) which were strictly NOB. As for canonical AOM, about 1% of reads mapped to the family of Nitrosomonadaceae (belonging to β-AOB) and about 0.02% of reads mapped to the order of Nitrososphaerales and Nitrosopumilales (belonging to AOA).

Metatranscriptomic evidence for the expression of comammox

Metatranscriptomic data further validate the results in metagenome. About 14G bp paired-end cDNA sequence data were produced in biomass metatranscriptome, and about 43,417 ORFs were predicted. Detailed sequencing and assembling information is shown in Table S3. At the phylum level, Nitrospirae was dominant in the system, which could represent more than 60% of reads, followed by Proteobacteria (about 17%). Incomplete ammonia-oxidizing Nitrososphaera (about 5%) and Nitrosospira (about 3%) were detected with relative low abundance (Table S4). Detailed expression values of key proteins are shown in Table S5. The expression of comammox nitrification pathways peaked in the nitrogen metabolism, followed by strictly NOB and AOA (Fig. 1). Canonical AOB were also present, but the expression of their ammonia-oxidizing enzymes was extremely low, about two orders of magnitude lower than that of comammox-related enzymes. Interestingly, there was exceptionally high expression of comammox amoC (FPKM = 20,832), near sevenfold higher than that of comammox amoA (FPKM = 2975) and amoB (FPKM = 3356), which might be consistent with the multicopy of amoC in one single genome like Candidatus Nitrospira nitrosa (four copies) and Candidatus Nitrospira inopinata (three copies). The expression of AOA amoA/B/C was at the same level but only half as much as that of comammox amoA/B. Additionally, the nxrA/B gene expression of comammox was about two times greater than that of strictly nitrite-oxidizing Nitrospira.

Fig. 1
figure 1

Stack column of key nitrification gene expression. The height of each little column stands for the expression FPKM (fragments per kilobase of transcript, per million fragments sequenced) value of one transcript sequence in the non-redundant gene catalog. Sequences with same annotation are stacked together. The total height of the columns depicts the expression of corresponding functional genes. Color indicates the bacteria group of transcripts. COM, comammox; NOB, nitrite-oxidizing bacteria; AOA, ammonia-oxidizing archaea; AOB, ammonia-oxidizing bacteria; amoA/B/C, ammonia monooxygenase genes subunit A/B/C; hao, hydroxylamine oxidoreductase genes; nirK, nitrite reductase subunit K; nxrA/B, nitrite oxidoreductase genes subunit A/B

Phylogenies of key comammox functional genes

According to the functional annotation, near-complete comammox nitrification-related sequences were detected from both metagenome and metatranscriptome data. Identity information is shown in Fig. 2a and Tables S5 and S6. In this study, 35 comammox-like AmoA/B/C and Hao sequences were found and their identities to the reported comammox homologs ranged from 79 to 99%, to homologs of AOB and AOA ranged from 55 to 71% and 23 to 40%, respectively. Thirty-one comammox-like NxrA/B sequences shared 92 to 100% amino acid identity (an average of 98%) with known comammox counterparts, and 84 to 100% identities (an average of 94%) with strictly NOB homologs.

Fig. 2
figure 2

Sequence identity and phylogenetic analysis of key functional genes. a Identity information of key amino acid sequences. Numbers are the identity range of the best hit of each comammox sequence in this study to the related homologs which are connected by lines. The microorganism names used in each group are listed below. Taxonomy ID of the microorganisms is described in the bracket. Gene sequences of AOB are from genomes of Nitrosomonas halophila (44576), Nitrosomonas europaea ATCC 19718 (228410), Nitrosomonas communis (44574), and Nitrosospira sp. 56-18 (1895901). Sequences of AOA are from genomes of Candidatus Nitrocosmicus oleophilus (1353260), Candidatus Nitrososphaera sp. 13_1_40CM_48_12 (1805054), Candidatus Nitrososphaera evergladensis SR1 (1459636), and Nitrososphaera viennensis EN76 (926571). Sequences of NOB are from genomes of Nitrospira defluvii (330214), Nitrospira moscoviensis (42253), Nitrospira sp. ND1 (1658518), and Nitrospina gracilis (35801). Comammox sequences in this study are from metagenome and metatranscriptome data. Published comammox sequences are from genomes of Candidatus Nitrospira inopinata (1715989), Candidatus Nitrospira nitrificans (1742973), Candidatus Nitrospira nitrosa (1742972), Nitrospirae sp. genome_bin_8 (Ga0074138), Nitrospira sp. ST-bin4 (1978764), Nitrospira sp. SG-bin1 (1827376), Nitrospira sp. SG-bin2 (1827377), and Nitrospira sp. HN-bin3 (1827378). b Phylogenetic tree showing the affiliation of AmoA superfamily. COM-A, clade A comammox Nitrospira; COM-B, clade B comammox Nitrospira. c Cladogram of AmoA sequence groups of COM-A (blue background), COM-B (green background), and β-AOB (pink background), showing the phylogenetic distance between sequences obtained in this study and published sequences. Purple characters represent sequences from metatranscriptome and green characters represent sequences from metagenome. Genebank or IMG accession numbers of the protein sequences are shown in the bracket

The phylogenies of AmoA/B/C, Hao, and NxrA/B protein sequences from metagenome and metatranscriptome data showed the consistency with the identity information. AmoA, a functional and phylogenetic marker gene of ammonia oxidizers, has been widely used for the detection and characterization of AOA, AOB, and comammox (Purkhold et al. 2000; Daims et al. 2015). Phylogenetic radial cladogram showed that AmoA sequences in this study clustered with their corresponding clades (Fig. 2b), in line with the annotation results. The clade of comammox-like AmoA was phylogenetically different from others in the AmoA superfamily and was closest to clade of β-AOB AmoA (Daims et al. 2015). Intriguingly, many environmental clones classified as PmoA fell into the phylogenetic lineage of comammox AmoA and showed a close relationship with Candidatus Nitrospira nitrificans and Nitrospira sp.genome_bin_8 (Fig. 2c), consistent with the results in other reports (Daims et al. 2015; van Kessel et al. 2015; Chao et al. 2016; Wang et al. 2017b). There was an individual branch formed by comammox amoA-like sequences from both metagenome and metatranscriptome data, and other sequences were close to branches of Candidatus Nitrospira nitrosa and Nitrospira sp. SG-bin2, which were discovered from water system (van Kessel et al. 2015; Wang et al. 2017b). Additionally, there were sequences from metagenome data clustered with metatranscriptome sequences (with 99% amino acid identity). Other protein sequences related to ammonia oxidization of comammox also formed a distinct clade and shared close relationship with β-AOB (Fig. S2). Similar clustering patterns were observed for comammox AmoB, AmoC, and Hao, in which some sequences were separated into individual branches while others were related to branches of reported bacteria like Candidatus Nitrospira nitrosa and Nitrospira sp. HN-bin3.

Nxr, a key enzyme for nitrite oxidation, was possessed by Nitrospira (Kuypers 2015), and the subunits of NxrA and NxrB were widely used for phylogenetic analysis of NOB (Attard et al. 2010; Pester et al. 2014). Different from the phylogenies of proteins related to ammonia oxidization, comammox NxrA/B sequences showed close relationships with strictly NOB of Nitrospira moscoviensis (Fig. S3). However, similar to the phylogeny patterns of comammox AmoA/B/C and Hao, comammox NxrA-like sequences also showed many branches, including an individual branch of two metagenome sequences and two branches near Candidatus Nitrospira nitrosa and Nitrospira sp.genome_bin_8. Given the limitation of the assembly data, only two NxrB sequences in this study were used for phylogenetic analysis and clustered with Candidatus Nitrospira nitrificans and Nitrospira sp. HN-bin3.

Evaluation of the new primer set

To specifically analyze comammox community in the natural environments, primers targeting the atypical amoA gene sequences were designed. In total, 82 sequences of COM-A amoA were collected. The product length of COMA-154F/668R primer set was 514 bp, and the degeneracy of each primer was 2 (Table 1). Primers targeting COM-B amoA failed to amplify the genes in our experiments (data not shown). Allowing less than two mismatches, the forward and reverse primers could cover 89 and 100% of sequences in the COM-A amoA database, respectively (Table 1), with no sequences of β-AOB and COM-B amoA genes covered. During the COM-A amoA gene amplification and qPCR assays of samples from the bioreactor system and natural environments, PCR product gel image showed clear band and qPCR melting curves showed single Tm peaks, indicating the high specificity of the primers (Figs. S4 and S5). Three hundred forty-three clone sequences were retrieved from purified PCR products (Table 2). Forty-one OTUs in total occurred as defined by < 5% divergence in nucleotides. Alignment of OTU representative sequences against NCBI database by BLASTN showed that they all mapped to COM-A amoA-like sequences at the identity range of 80–99% (with amino acid sequence identity from 88 to 100%) (Table S7). These results showed the high specificity and strong applicability of the new primers targeting COM-A amoA genes.

Table 1 Properties of the primers and their coverage in COM-A amoA database
Table 2 Alpha diversity of COM-A in the samples

Diversity analysis of COM-A communities in the environments

Newly designed primer set was then applied to different ecosystems, including sediments of lake, river, tidal flat, estuary, coastal sea (Fig. S1), and the reactor biomass. The diversity analysis of COM-A communities in different ecosystems showed that the OTU numbers, Shannon-Wienner, and Chao1 of comammox communities in the reactor were the lowest. The tidal sediments were the most diverse system based on the highest OTU number, Shannon-Wienner, and Chao1, followed by river and estuary sediments. Chao1 and Shannon-Wienner of different natural samples had the values of 15.00 to 27.63 and 2.00 to 2.63, respectively, indicating the complex structure of COM-A communities (Table 2).

The phylogenetic analysis showed that the amino acid sequences acquired from amplification assays were grouped together with known comammox sequences, and some sequences were clustered with the metagenome and metatranscriptome data (Fig. 3). Groups 1, 2, and 3 were clustered with comammox sequences in drinking water system (Wang et al. 2017b) with amino acid sequence identity from 94 to 98%. Group 1 was the dominant group in the tidal, estuary, and coastal sea sediments, accounting for 60 to 67% of the COM-A communities. Group 2 was typical in the tidal sediments and reactor biomass and was clustered with sequences from metatranscriptome data (0.01% of the total clones, with 99% sequence identity). Group 3 was abundant in the lake sediments, occupied 48% of the COM-A communities. Groups 4, 5, and 6, together with sequences from metagenome and metatranscriptome data, formed an individual branch and shared 95 to 96% amino acid sequence identity with published data. These groups accounted for 94% of the comammox communities in the bioreactor but only 2 to 5% in the natural environments. Groups 7, 8, and 9 existed at all sample sites except the bioreactor, but they were not dominant at each site (accounting for 3 to 18%, 3 to 27%, and 3 to 17% in the samples, respectively). Group 7 was closely related to Candidatus Nitrospira nitrificans and Nitrospira sp. genome_bin_8 (with 94 to 99% sequence identity). Groups 8 and 9 shared close relationships with Candidatus Nitrospira inopinata, Nitrospira sp. HN-bin3, and Candidatus Nitrospira nitrosa, as well as one sequence from metagenome data (with sequence identity from 96 to 100%).

Fig. 3
figure 3

Neighbor-joining tree of amplicon sequences. The phylogeny is based on amino acid AmoA sequences. Sequences obtained from metatranscriptome (purple) and metagenome (green) data are showed in the tree together. The number of sequences recovered from each sampling site is shown following each group. Genebank or IMG accession numbers of the protein sequences are shown in the bracket. COM-A, clade A comammox Nitrospira; COM-B, clade B comammox Nitrospira. Sediment sample names: L, lake; R, river; T, tidal flat; E, estuary; C, coastal sea; Rea, reactor biomass

The principal coordinate analysis (PCoA) results showed the spatial variations of comammox bacteria (Fig. S6). The first two ordination axes explained 74.8% of the variability found in comammox community composition. Two sediment clone libraries of lake and river were clustered together. The COM-A communities retrieved from tidal, estuary, and coastal sea sites were clustered together.

Abundance of COM-A, β-AOB, AOA, and Nitrospira in different environments

The qPCR results showed that three kinds of amoA genes were detected in all samples (Fig. 4). In the reactor system, the COM-A amoA accounted for 58.73% of total amoA genes, in line with the leading position of comammox shown in metagenome and metatranscriptome data. In the natural systems, the copy number of COM-A amoA genes (3.50 × 105 to 1.56 × 106 copies/g dry sediment) was slightly less than gene copies of AOA amoA (7.17 × 106 to 5.70 × 107 copies/g dry sediment), with no significant difference (one-way ANOVA, P > 0.05). However, it was significantly less than gene copies of β-AOB amoA (1.90 × 107 to 3.97 × 107 copies/g dry sediment) in the examined environments (one-way ANOVA, P < 0.01). No significant difference was observed in AOA and β-AOB amoA genes (one-way ANOVA, P > 0.05). The abundance of amoA genes displayed a distinct spatial heterogeneity. In all the sediment samples, the highest gene copies of COM-A amoA were recorded in the estuary sediment, while copies of AOA and β-AOB amoA genes at this site were the lowest. AOA were dominant in the coastal sediment, and oppositely the lowest copy number of COM-A amoA genes was observed at this site. The highest copies of β-AOB amoA were detected in the lake sediment. In all natural samples, COM-A have the potential to take part in the nitrification process, accounting for non-negligible proportion of 0.4 to 5.6% of all the amoA genes (with AOA for 19.0 to 70.1% and β-AOB for 29.3 to 78.4%, respectively).

Fig. 4
figure 4

Abundance of COM-A, AOA, β-AOB, and Nitrospira amoA or nxrB gene copy numbers. The mean values of amoA copy numbers were obtained from triplicate real-time qPCR assays. COM-A, clade A comammox Nitrospira; AOA, ammonia oxidizing archaea; β-AOB, betaproteobacterial ammonia oxidizing bacteria

Gene copies of Nitrospira nxrB ranged from 2.35 × 107 to 1.24 × 108 copies/g dry sediment. The highest and lowest numbers of Nitrospira nxrB were detected in the river and estuary sediments, respectively. Additionally, extremely high gene copies of Nitrospira nxrB in the reactor (accounting for 71.8% in all samples) were in consistent with the high abundance of Nitrospira in metagenome and metatranscriptome data.

Relationships of community structure and abundance with environmental factors

RDA analysis was performed to test the relationships between COM-A communities and environmental parameters (Table S8). The environmental variables in the first two RDA dimensions provided 79.7% of the cumulative variance (Fig. 5a). The results showed that the amoA-based COM-A community structures in the sediments were correlated with ammonium-N at the margin of significance (P = 0.067, F = 3.4, 1000 Monte Carlo permutations), which accounted for 53.3% of the total RDA explanatory power. RDA was also conducted to clarify how abundance of COM-A, AOA, and β-AOB amoA and Nitrospira nxrB genes were constrained by physicochemical characteristics. The first two RDA dimensions accounted for 97.2% of the cumulative variance (Fig. 5b). Results showed that abundance of the functional genes correlated significantly with ammonium-N (P = 0.024, F = 9.3, 1000 Monte Carlo permutations), which accounted for 75.6% of the total RDA explanatory power. In detail, the correlation between abundance of COM-A and ammonium-N concentration was negative, while the abundance of AOA, β-AOB, and Nitrospira was positively correlated with ammonium-N concentration. In both RDA results, the contribution of other investigated environmental variables (including water content, nitrate-N, nitrite-N, total organic carbon, pH, and salinity) were not significant (P > 0.05, 1000 Monte Carlo permutations).

Fig. 5
figure 5

RDA ordination plots for the relationship between COM-A community composition with the environmental parameters (a) and relationship between abundance of COM-A, AOA, and β-AOB amoA and Nitrospira nxrB with environmental variables in the sediments of lake, river, tidal, estuary, and coastal sea sediments (b). The number on each axis means the provided percentage of the total variance. TOC, S, WC, NH4-N, NO2-N, and NO3-N represent total organic carbon, salinity, water content, ammonium-N, nitrite-N, and nitrate-N, respectively. Blue triangles represent the OTUs of COM-A amoA gene sequences from the natural samples. Purple circles mean the sampling sites

Discussion

The microbial communities in tidal sediments are metabolically and taxonomically complicated because of the tidal changes, salinity intrusions, and nutrient pluses resulted from land-sea interaction (Zheng et al. 2013). In this work, we set out to characterize the nitrifying community of tidal sediments present in enrichment culture. By coupling with metagenomics and metatranscriptomics, we first identified the metabolic potential of COM-A in the tidal sediments. Then, we designed a new primer set specifically targeting COM-A amoA to identify the community diversity, distribution, and quantitative contribution of COM-A compared with other ammonia oxidizers in natural environments.

As applied in the drinking water system, activated sludge, and rapid gravity sand filter (Pinto et al. 2015; Chao et al. 2016; Palomo et al. 2016), the technology of metagenome was used here for detection of complete ammonia oxidizers. Meanwhile, the application of metatranscriptomics would help to validate the metabolic potential and elucidate the expression patterns of microbial community (Fortunato and Huber 2016). Both metagenome and metatranscriptome data showed the predominance of Nitrospirae in the community and higher abundance of comammox than that of classical AOB and AOA, similar to the community structure of other nitrifier enrichment cultures (Daims et al. 2015). Additionally, sequences from metagenome and metatranscriptome were clustered together and shared high identity. Coherence of the two databases (metagenome and metatranscriptome) proved the accuracy and reliability of the sequencing and assembling data in this study. Consistency between amino acid sequence identity and phylogenetic affiliation confirmed that the functional genes coding for key nitrification-related enzymes were correctly annotated.

Metagenomic and metatranscriptomic results identified the metabolic potential and expression patterns of comammox. Confirmation of comammox in the reactor microbial aggregates proved that comammox existed in the tidal sediments. Intriguingly, no comammox amoA sequences from the tidal flat were detected by MG-RAST database screening (van Kessel et al. 2015), reflecting that this discovery provided new insights into our perception of comammox habitats. Apart from the engineering systems reported previously like drinking water system and activated sludge (Pinto et al. 2015; Wang et al. 2017b), comammox also appeared in the environment of estuary and coastal ecosystems, where microorganisms of nitrogen cycle were special and complicated (Hou et al. 2013; Bernhard et al. 2016; Gao et al. 2016b; Newell et al. 2016). Comammox enriched and detected in this study all belong to clade A, rather than clade B. However, only one clade also showed complex diversity. The clade of COM-A contained distinct branches in all AmoA/B/C and Hao phylogenetic trees, implying that different comammox species or strains might coexist in the environments. Given the presence of individual branches in the trees and low sequence identities to published sequences, species in the tidal sediments were different from those in other systems (Daims et al. 2015; Pinto et al. 2015). However, whether these microorganisms were novel species was still unknown due to the lack of an appropriate species-level sequence identity cutoff for comammox amoA (Pjevac et al. 2017). Additionally, comammox outcompeted canonical AOM in the enrichment culture, in line with the assumption that comammox would be competitive under substrate-influx-limited conditions which favor competitors that have a high growth yield (comammox) rather than a high growth rate (AOA and AOB) (Costa et al. 2006). Comammox were not only abundant in the bioreactor but also of high expression. However, according to the expression values of amo and nxr genes, exceptional expression patterns of amo were observed. The FPKM values of subunits nxrA and nxrB were at the same level, while the FPKM value of amoC was about sevenfold more abundant than that of amoA/B, which might be attributed to the multiple copies of amoC (Daims et al. 2015; van Kessel et al. 2015) and the similar transcription patterns independent of amoA/B in β-AOB (Sayavedra-Soto et al. 1998; Norton et al. 2002). Transcription mechanism of functional genes in comammox and the effects of high amoC expression were still unknown.

Phylogenetic analysis of representative amplicon sequences of PCR products showed that these sequences shared close relationships with COM-A amoA, including all the published comammox genome sequences and some environmental sample sequences, which were erroneously classified as pmoA previously (van Kessel et al. 2015; Chao et al. 2016; Wang et al. 2017b). Although amoA of β-AOB shared a common ancestor with the amoA lineage of comammox (Daims et al. 2015), the clone sequences in this study did not fall into the cluster of β-AOB amoA. PCR results and in silico primer evaluation illuminated that the primers had the ability for specifically and accurately amplifying comammox amoA genes, as well as the potential for widely covering diverse amoA clusters in various environments, providing a good method to analyze the distribution, diversity, and abundance of comammox in the environments.

All the sampling regions showed the presence of COM-A, in accordance with the characteristic of Nitrospira which was the most widespread among all known NOB (Koch et al. 2015). The number of OTUs in the reactor biomass in this study was within the range in the reported engineered systems (Pjevac et al. 2017), which all showed simple community composition of COM-A. Estimators of COM-A amoA clone libraries in sediment samples in this study were generally higher than that of AOA and β-AOB amoA in other environmental sediment samples (Zheng et al. 2013; Chen et al. 2014; Sheng et al. 2016; Chang et al. 2017). Phylogenetic analysis displayed several distinct branches in the tree of COM-A amoA genes, suggesting that diverse and new comammox might exist in the environment. In addition, there were six groups coexisting in the tidal sediments which were used for enrichment culture, but one group outnumbered others with the percentage from undetected level to final 66%. However, influencing factors and competing mechanism during the community shift within comammox were not clear.

Noticeably, the coexistence of comammox and canonical AOM and strictly NOB were observed in many studies (Chao et al. 2016; Bartelme et al. 2017; Hu and He 2017), implying that comammox contributed to the nitrogen cycle together with AOA and AOB. In this study, the group of comammox accounted for a small scale of total amoA genes at all sites except for the bioreactor. The number of AOB amoA-like sequences was about threefold higher than that of comammox amoA-like sequences in the drinking water system (Wang et al. 2017b). Previous study of full-scale hybrid biofilm and activated sludge reactor also found that comammox had much lower abundance than AOA and AOB (Chao et al. 2016). The low abundance of comammox in the environments was in line with the prediction that advantage of comammox strategy versus AOA or AOB could only be observed in niches of extremely limited substrate (González-Cabaleiro et al. 2017; Kits et al. 2017). However, a research about copper-containing membrane-bound monooxygenase genes reported that mountain and paddy soils were dominated by AOB while relatively high content of comammox was found in samples like activated sludge and coastal water (Wang et al. 2017a). Moreover, up to 52 and 81% of the amoA genes were assigned to comammox in the research of rapid gravity sand filter (Palomo et al. 2016), and the superiority of comammox and AOA in freshwater recirculating aquaculture system was also observed (Bartelme et al. 2017). These results showed that the abundance of comammox in different areas fluctuated strongly, which might be attributed to the complex influences of environmental factors or microorganisms (Gibbs and Federle 2015; Tago et al. 2015; Zhao et al. 2017). In this study, COM-A community and abundance showed correlations with ammonium-N, mirroring that their high affinity for ammonia, their ability to grow on substrates other than ammonia, and their low-energy requirements have important roles in shaping the communities (Kits et al. 2017; Kuypers 2017). Wide distribution, complex diversity, and varied abundance of comammox pointed out that more related research on comammox was essential to get a comprehensive understanding of microbial ecology of nitrifiers and associated nitrogen transformation processes.

To sum up, the discovery of comammox in the tidal sediments broadened the existing knowledge of their habitat range. The enrichment culture of comammox supplied a good method for deep-going research. However, detailed competition mechanisms about why comammox predominated in the bioreactor system and certain comammox amoA group finally outnumbered other groups were urgently needed for further research. Insights into the comammox in different environments with application of the new primers, including abundance, community structures, and phylogeny traits, illustrated an important role played by comammox in nitrifying processes. Future studies, including diversity of comammox in different ecosystems combined with the physicochemical properties of the background, ammonia-oxidizing activity rates coupled with quantitative contribution evaluation of ammonia oxidizers, relationships between complete comammox and strictly NOB, and influencing mechanisms of changing environmental factors to comammox, would give a more profound understanding of the ecology and evolution of these intriguing organisms.