Introduction

The overuse and misuse of antimicrobials has accelerated the naturally existing phenomenon of antimicrobial resistance (AMR) and resulted in new mechanisms that can bypass the current arsenal of “last-resort” antimicrobials, including colistin (D’Costa et al. 2011; Thi Khanh Nhu et al. 2016; Mataseje et al. 2014; Friedrich 2016). The UN Ad hoc Interagency Coordinating Group (IACG) on Antimicrobial Resistance estimates that drug-resistant diseases will cause roughly 10 million deaths per annum by 2050 (IACG 2018; O’Neill 2014). Thus, the WHO now considers AMR an urgent global public health concern (WHO 2014). Active AMR surveillance, including environmental surveillance, is critical for protecting public health (WHO 2014). A rapid, cost-efficient, but sufficiently comprehensive method of environmental AMR surveillance will make regular monitoring feasible and will allow for in-depth studies of different environments, which is central to understanding the role of the environment in AMR.

AMR and the resistome are broad terms that include antibiotic resistance genes among several others. In this study, the resistome will only refer to antibiotic resistance and metal resistance genes while the mobile genetic elements like plasmids, transposons, and integrons are referred to as the mobilome. As the potential health risk posed by environmental AMR genes is related to their mobility (Stokes and Gillings 2011; Ashbolt et al. 2013; Leonard et al. 2018; Cantón et al. 2012; von Wintersdorff et al. 2016), an effective surveillance strategy would need to include not only microbial resistome and resistome host characterization, but assessment of the mobile resistome as well. Several studies have employed a similar strategy, using culture-based methods together with metagenomics sequencing, employing both short and long-read technologies (Zhao et al. 2015; Xia et al. 2017; Luo et al. 2017; Che et al. 2019), or PCR-based methods (Hultman et al. 2018).

Long-read sequencing technology such as those offered by Oxford Nanopore Technology (ONT) have the disadvantage of base inaccuracy due to relatively high sequencing error rates (Jain et al. 2016) compared to short-read sequencing, but offer the advantage of structural accuracy due to the generation of assembly-free long contigs (Urban et al. 2015). The ONT platform also allows for generation of data in real time, making rapid surveillance methods possible. In addition, the ONT MinION technology is highly portable, avoiding the necessity of access to major sequencing facilities (Mongan et al. 2020). Coupling ONT technologies with a bioinformatics pipeline like NanoARG (Arango-Argoty et al. 2019), which is user-friendly and comprehensive enough to provide information on the (1) microbiome, (2) resistome, and (3) mobilome, creates a rapid and sufficiently thorough surveillance method.

The wastewater system is an ideal environment to initiate AMR surveillance as it brings together the pathogenic and non-pathogenic microbes, resistance and mobile genetic elements, antimicrobials, and other chemicals that might further contribute to mobile AMR selection (Pehrsson et al. 2016). Moreover, wastewater effluents have been shown to contribute to the spread of AMR genes (Lamba and Ahammad 2017; Berendonk et al. 2015; An et al. 2018; LaPara et al. 2011; Czekalski et al. 2012), making it even more relevant to public health. Here, we demonstrate the use of a rapid, sequencing approach for monitoring and evaluation of the microbial community, as well as the mobile metal and antibiotic resistome profile of semi-rural Amherst wastewater in western Massachusetts. We also investigated the potential effects of the wastewater treatment process on the above-mentioned profiles.

Materials and methods

Wastewater sampling

Primary and secondary wastewater effluent were collected in triplicate in March and April 2019 using sterile 250-mL polypropylene bottles (Nalgene). The triplicate samples were collected the same day within a one hour timeframe. The samples were either immediately processed or stored at −20 °C.

Wastewater sample processing: filtration, DNA extraction and library preparation

The samples were mixed thoroughly and 100 mL filtered using sterile 0.22 micron MCE filter membranes (Membrane Solutions Corp) and a Pall vacuum manifold (Pall Corporation). The filters were loosely rolled and placed inside bead tubes and DNA extracted following the PowerWater DNA Isolation Kit protocol (Qiagen). The quantity and quality of the isolated DNA were determined using a NanoDrop 2000 UV-vis spectrophotometer (Thermo Fisher Scientific). The initial amount of genomic DNA per sample was 400–500 ng. DNA libraries for multiplex sequencing were prepared without shearing using the Rapid Barcoding Kit SQK-RBK004 (ONT, Oxford, UK) following the supplied protocol. Sample clean-up was performed with AMPure beads (New England Biolabs, MA, USA) prior to addition of the adapter mix (RAP) included in the barcoding kit. The purity (A260/A280 ≥ 1.8) and quantity (≥400 ng) of the DNA samples in this study met the minimum requirements for MinION sequencing (ONT), with the exception of the second set of effluent samples (April secondary samples).

Shotgun metagenomic sequencing and bioinformatics analysis

The pooled library consisting of 12 barcoded samples was loaded onto a flow cell with R9.4.1 pore chemistry (Oxford Nanopore FLO-MIN106). The flow cell had pore counts greater than 1500 prior to sequencing. Shotgun metagenomic DNA sequencing was performed for 48 h with the Oxford Nanopore MinION platform using the NC_48 h_Sequencing_Run_FLO-MIN106_SQK-RBK004 protocol and raw output signals were base called in real time using the MinKnow software (ONT; version 18.12). The 48-h run generated a total of 1,492,345 1D reads, with an average sequence length of 2271 basepairs (bp), a maximum length of 22,587 bp and a total yield of 3.4 Gigabases. The fastq files were delivered to CosmosID (www.cosmosid.com; MD, USA) for data processing and bioinformatics analysis to identify the microbial diversity, virulence factors, and antimicrobial resistance genes. For comparison, the same files were uploaded into Epi2Me for cloud-based processing and bioinformatics analysis (ONT’s Metrichor). The cloud-based WIMP (What’s In My Pot) and ARMA (Antimicrobial Resistance Mapping Application) workflows in the Epi2Me program (ONT) were used for taxonomic identification and characterization of antimicrobial resistance genes respectively. The WIMP workflow aligns MinION reads using minimap2 (Li 2018) against the Centrifuge database (Kim et al. 2016) and ARMA aligns reads using LAST (Kiełbasa et al. 2011) against the Comprehensive Antibiotic Resistance Database (CARD; McArthur et al. 2013).

In order to perform further bioinformatic analyses to identify the resistome and mobilome in plasmid-derived versus chromosome-derived sequence reads, the raw multi-read fast5 files were first converted to single read files using the Python script ont_fast5_api (ONT; version 1.4.4) and then basecalled with the command-line basecaller Guppy (ONT; version 3.2.2) using the default parameters for the FLO-MIN106 flow cell/SQK-RBK004 kit. The base called reads were demultiplexed using the realtime default parameters of DeepBinner (version 0.2.0; Wick et al. 2018) for the SQK-RBK004 kit. NanoPlot (version 1.28.2; De Coster et al. 2018) was used to check the sequence read quality. Filtlong (version 0.2.0; https://github.com/rrwick/Filtlong) was used to filter out reads less than 1 kilobase (kb) in length and to remove the lowest 10% of the reads based on the Phred quality scores. The barcodes and adapters were then removed using the default parameters of PoreChop (version 0.2.3; https://github.com/rrwick/Porechop). These filtered and trimmed reads were then uploaded into the Galaxy server (Afgan et al. 2016), converted to FASTA format, and analyzed to sort out plasmid- and chromosome-derived read sequences using the default settings of PlasFlow (version 1.0; Krawczyk et al. 2018). PlasFlow analysis produced three sets of data: chromosome-derived, plasmid-derived, and unclassified. All three outputs were submitted for further analysis through the NanoARG (Arango-Argoty et al. 2019) workflow to identify the mobile genetic elements (MGEs), antimicrobial resistance genes, and metal resistance genes.

Briefly, the NanoARG workflow initially clusters the sequence reads using the permissive parameters of DIAMOND (Buchfink et al. 2015). To annotate antibiotic resistance genes (ARGs), NanoARG employs the DeepARG-LS (Arango-Argoty et al. 2018) method to query the more extensive and consolidated DeepARG-DB database. To annotate metal resistance genes (MRGs), NanoARG uses the BacMet database (Pal et al. 2014), and for MGEs it uses the MGE database from the NCBI-NR and integron-integrase (I-VIP; Zhang et al. 2018) databases. NanoARG performs taxonomic identification using the default parameters of Centrifuge (Kim et al. 2016) and then against the NCBI-NR and ESKAPE/WHO database for identification of potential critical pathogens.

Statistical analysis

All data processing, statistical analyses and plotting were done in R (version 4.0.2) using the following R packages: vegan (v2.5-6), phyloseq (v1.32.0), mixOmics (v6.12.2), ggplot2 (v3.3.2), zCompositions (v1.3.4), compositions (v2.0.0), and ALDEx2 (v1.20.0).

Metagenomic data

All metagenomic data (FastQ Files) have been deposited in the NCBI SRA database with BioProject ID: PRJNA684899 and accession number SAMN17054264.

Results

Microbial community composition

Multiplexed, whole-genome shotgun metagenomic sequencing using ONT’s MinION platform was used to study the microbial community of Amherst wastewater samples. CosmosID identified a total of five unique kingdoms, 20 phyla, 43 classes, 227 genera, and 469 species across all wastewater samples. Bacteria comprised 93% of the kingdom followed by viruses (5%) and Archaea (0.5%). Of the Archaea, the genus Methanobrevibacter was mostly found in primary samples and the genus Methanosarcina was found only in secondary samples. A similar pattern was also observed for Fungi (1%) and Eukaryota (0.7%), in that the majority were detected in primary samples. About 75% of the bacteria were gram-negative across all samples, consistent with results from ONT’s WIMP analysis (Fig. 1). A number of Operational Taxonomic Units (OTUs; about 45% of the bacteria detected and identified in all samples) represented human and animal pathogens, a majority of which were gram-negative (Fig. 1).

Fig. 1
figure 1

Divergent barplot created using relative abundances shows distribution of Gram-negative and Gram-positive pathogens and non-pathogens in primary (blue) and secondary (orange) wastewater

Proteobacteria was the dominant phylum (~44%) in all samples followed by Bacteroidetes (25–28%), although there was variation with respect to relative abundance and less diversity within secondary effluent samples at the class level (Fig. 2a, b). Both Fig. 2a and b showed a distinct pattern for primary samples compared to the secondary samples. Microbial species belonging to the phylum Actinobacteria were more prominent in secondary samples than primary samples (~12% compared to 2%, respectively). The opposite was observed for Cyanobacteria, i.e., 1.4% in primary samples but non-detectable in secondary samples. Viruses were detected in all primary samples but not in all secondary samples. Four classes of Proteobacteria were detected in the majority of the primary samples whereas only Betaproteobacteria and Gammaproteobacteria were observed in secondary samples (Fig. 2b). A similar pattern was obtained for Bacteroidetes, with three classes detected and identified in primary effluent but only one, Bacteroidia, in secondary effluent.

Fig. 2
figure 2

a, b Microbial composition of wastewater samples. Stacked barplots show community composition of primary and secondary wastewater at the a phylum level and b class level. The taxa were arranged such that the order of the phyla in a corresponds to those of its constituent microbial classes in b

To assess microbial community structure, the raw microbial read counts and the corresponding taxonomic information obtained from CosmosID were consolidated and initially explored using traditional diversity analysis methods and analyzed using the compositional data (CoDa) analysis approach.

Traditional approach

The untransformed microbial counts at the genus level were initially explored to determine whether the overall data showed separation related to wastewater treatment. The alpha diversity metrics Chao1, Shannon and inverse Simpson indices were used to assess richness and evenness within and between primary and secondary wastewater groups (Fig. 3a). Differences between sample groups for all metrics were significant (Mann–Whitney U test, p < 0.01, adjusted using the Bonferroni-Holm method), which indicated that Amherst primary wastewater samples were more diverse and microbially richer than the secondary samples. A similar observation was also seen for data filtered using the phyloseq package in R, which removed roughly 55% of the original data. The cluster analysis and principal coordinates analysis (Fig. 3b, c) obtained from the Bray–Curtis dissimilarity values indicated significant separation in microbial taxonomic composition between primary and secondary wastewater. A PERMANOVA analysis showed 44% of the variation between samples can be accounted for by wastewater treatment. Beta diversity was 0.24 (SD = 0.09) within primary wastewater samples and 0.65 (SD = 0.30) within secondary samples, which increased to 0.76 (SD = 0.20) between primary and secondary samples (PERMANOVA, p < 0.01).

Fig. 3
figure 3

a Alpha diversity indices computed using the Phyloseq (v1.32.0) package in R (v4.0.2)/(Rstudio v1.3.959) indicate that primary wastewater was significantly richer and more diverse than secondary wastewater (Mann–Whitney U test; p values adjusted using Bonferroni–Holm method). b Cluster analysis (ward.D2 method) and heatmap generated in R using relative abundances shows discrimination between primary and secondary wastewater treatment samples with respect to microbial composition at the Genus level. c Principal coordinates analysis was performed using the Bray–Curtis metric in the Vegan (v2.5-6) package to show the significant dissimilarities in microbial composition across samples (PERMANOVA, p < 0.01)

Compositional data analysis

To account for compositionality of the sequencing data (Gloor et al. 2017), clr (centered log-ratio) transformation using the mixOmics package in R was applied in compositional data (CoDa) exploration and analysis. The zero values in the raw counts data were imputed prior to transformation using the count zero multiplicative (CZM) method in the zCompositions package. The compositional biplot shown in Fig. 4a and the unsupervised cluster analysis (generated with the ward.D2 method) clearly present qualitative microbial structure separation at the genus level between primary and secondary samples (Fig. 4b). Both the primary and secondary samples are mainly clustered together. The AS1 sample appeared to be an outlier in the biplot but was not treated as such in this study because all other analyses involving it were generally consistent with the rest of the secondary samples (Table 1).

Fig. 4
figure 4

a, b Compositional biplot and unsupervised cluster dendrogram (ward.D2 method) generated in R show qualitative microbial structure separation at the genus level between primary and secondary samples. The scree plot in a was used to determine the number of factors for principal component analysis

Table 1 Description of wastewater samples and total nucleic acid reads

Effect size was estimated using the ALDEx2 package (Fernandes et al. 2014) to determine specific taxa (genus level) that were statistically responsible for the observed discrimination between wastewater treatment samples. Clr transformation of the raw count data was also done using the same package. The calculated parameters for significant taxa are summarized in Table 2. The diff.btw values represent median difference in clr values between primary and secondary samples while diff.win values represent the median of the largest difference in clr values within primary and secondary samples. The overlap metric represents the proportion of the effect size metric that overlaps zero; i.e., if it overlaps zero, there’s no effect. Figure 5a shows the distribution of taxa significantly different than the sample mean after the Benjamini-Hochberg correction, which controls for false positive identification. Taxa with higher abundance values than the mean in secondary samples had positive diff.btw values, while taxa with higher abundance values than the mean in primary samples had negative diff.btw values. A total of 16 genera were significantly different (adjusted p value < 0.05) between wastewater treatment types (Fig. 5b). Nine out of 16 are associated with secondary samples, two of which, Bacteroides and Acinetobacter, are opportunistic pathogens.

Table 2 Parameters calculated after ALDEx2 size estimation of significant taxa at the genus level that are statistically responsible for the observed discrimination between wastewater treatments
Fig. 5
figure 5

a ALDEx2 plot generated using the ALDEx2 (1.20.0) package in R show distribution of taxa significantly different (red dots) from the sample mean after the Benjamini–Hochberg correction for false positive rate. Taxa with higher abundance than the mean in secondary samples have positive diff.btw values, while taxa with higher abundance than the mean in primary wastewater yielded negative values. b Divergent barplot show the 16 genera that were significantly different (Mann–Whitney U test, adjusted p value < 0.05) between wastewater treatments

Effect of wastewater treatment on resistome, mobilome and mobile resistome

Prior to resistome and mobilome profiling using the NanoARG pipeline, the metagenomic DNA sequence reads were sorted into plasmid- and chromosome-derived sequences (Table 3). A total of 1041 ARGs, 68 MRGs, and 17 MGEs were detected across all samples in this study. Both chromosome and plasmid-derived reads of secondary samples contained higher relative abundance (computed using the method of Ma et al. 2016) of total ARGs than their counterparts in primary samples (Fig. 6a). Despite this, the ARGs in secondary samples are significantly less diverse (Fig. 6a; Table 4) (Mann–Whitney U test, p < 0.05, adjusted using the Bonferroni–Holm method) both in terms of unique ARG genes and unique ARG types (grouped according to the antibiotic class a gene confers resistance to). There are no significant differences in ARG richness across samples. The ARG types identified included aminoglycosides, antimicrobial lipids, beta-lactams, diaminopyrimidines, fosfomycins, MLS, multidrug, peptide antibiotics, phenicols, quinolones, sulfonamides and tetracyclines. Multidrug (~34%), beta-lactam (~20%), and peptide (~11%) ARGs predominated in both primary and secondary wastewater samples.

Table 3 Summary of metagenomic DNA sequence reads sorted into plasmid- and chromosome-derived sequences using PlasFlow (version 1.0)
Fig. 6
figure 6

Stacked barplots generated in R using relative abundances computed according to Ma et al. (2016) compare (a) ARG composition at the ARG type level, (b) MRG composition and (c) MGE composition between plasmid- and chromosome-derived reads of primary and secondary wastewater. (PS (orange): plasmid secondary; CS (orange): chromosome secondary; PP (blue): plasmid primary; CP (blue): chromosome primary)

Table 4 Summary of the p values calculated to determine statistical difference of alpha and beta diversity indices for the different groupings of resistomes and mobilomes along wastewater treatment

The occurrence of ARGs for aminoglycosides, beta-lactams, MLS, multidrug, peptide antibiotics, phenicols, and quinolones mainly found in chromosome reads are significantly higher for primary samples (Mann–Whitney U test, p < 0.05, adjusted using the Bonferroni–Holm method). The most frequent ARG genes in chromosome-derived reads are those for multidrug and peptide antibiotics while in plasmid-derived reads, resistance genes for multidrug and beta-lactams have the highest occurrence. The NanoARG pipeline demonstrated some consistency with the CosmosID workflow, which also identified aminoglycosides, beta-lactams, MLS, multidrug (msrE), quinolones, and tetracyclines. The Epi2Me workflow was less consistent, but also identified aminoglycosides, multidrug (msrE etc), beta-lactams, quinolones, tetracyclines and a variety of bacteria-specific mutations for elfamycin resistance. Of note was the finding that secondary samples for both March and April were enriched in certain OXA β-lactamases.

As shown in Fig. 6a (PS versus PP), the occurrence of beta-lactam resistance genes is noticeably higher in the plasmid-derived reads of secondary samples. ARGs for beta-lactams found in chromosome reads including carO, class A/C β-lactamases, Nmcr, OXA-12, OXA-198, and PBP-1A/2X occur more significantly in primary samples, whereas those found in plasmid reads like class A β-lactamases, OXA-10, OXA-11,OXA-233 and OXA-246 are significantly higher in secondary samples (Mann–Whitney U test, p < 0.05, adjusted using the Bonferroni–Holm method). In both primary and secondary samples, the NanoARG pipeline found more ARGs in chromosome- than plasmid-derived reads (43–54% versus 16–24%; the rest are found in unclassified reads). Resistance genes to antimicrobial lipids, diaminopyrimidines, fosfomycins, and phenicols were mostly absent in the plasmid-derived reads of all samples.

The alpha and beta diversity indices indicated no significant difference between primary and secondary wastewater in terms of overall MRG richness, diversity and composition (Table 4). As observed for ARGs, more MRGs are found in chromosome than plasmid-derived reads (44–52% versus 18–19%; the rest are found in unclassified reads)(Fig. 6b). The occurrence of MRGs for aluminium, cobalt, cadmium, gold, lead, magnesium, mercury, silver, tellurium, and tungsten are significantly greater in primary samples (Mann–Whitney U test, p < 0.05, adjusted using the Bonferroni–Holm method). Metal resistance genes (Fig. 6b) against arsenic, copper, zinc and molybdenum were the dominant MRGs in the majority of the samples.

For the overall resistome (ARGs and MRGs), primary samples were significantly more diverse according to the Shannon metric (Mann–Whitney U test, p = 0.04, adjusted using the Bonferroni–Holm method) (Table 4). The resistome composition was significantly different between primary and secondary samples (PERMANOVA, p = 0.039). A larger proportion of resistome genes were located in chromosome-derived sequences (Fig. 6a, b) except for mobilome genes (Fig. 6c), which were predominantly located in plasmid-derived sequences. The mobilome diversity and composition was significantly higher in secondary samples (Table 4). However, the class 1 integron-integrase gene intl1, which is the least frequent MGE in this study, is exclusively found in the plasmid reads of primary samples. Genetic elements related to transposase were the most prominent MGE in all samples.

About 65% of the total resistome (58% of ARGs and 97% of MRGs) in this study are mobile resistome, which are either found in plasmid reads, found with MGEs in the same read or both. The most frequent mobile resistome genes are resistance genes for beta-lactams, multidrug, peptide antibiotics, molybdenum, arsenic and copper. Mobile MRGs including resistance genes for cadmium, cobalt, copper, gold, iron, lead, manganese, mercury, molybdenum, selenium, tellurium and zinc, and mobile ARGs for aminoglycosides, beta-lactams, MLS, multidrug and peptide antibiotics predominantly occur in primary samples (Mann–Whitney U test, p < 0.05, adjusted using the Bonferroni–Holm method). The overall mobile resistome diversity and composition is significantly higher for primary samples (Table 4). The NanoARG pipeline was able to identify a few clinically-relevant mobile resistome-associated bacterial hosts in all samples using the NCBI/ESKAPE-WHO database. These include pathogenic species in the bacterial orders Clostridiales, Enterobacterales, Vibrionales, Aeromonadales, and Pseudomonadales. Several of the mobile resistome genes carried in these hosts include mcr and OXA variants, which confer resistance to last resort antimicrobials like colistin and carbapenems.

Discussion

Effects of wastewater treatment on microbial community composition

The wastewater system is important to public health as it can serve as a major sink and source of pathogens containing mobile AMR, as well as an indication of community public health (Fouz et al. 2020; Kraemer et al. 2019). The issue of AMR is regarded as a top global public health concern (IACG 2018; O’Neill 2014; WHO 2014) with AMR surveillance viewed as an essential step towards addressing this issue (WHO 2014). This makes regular monitoring of wastewater systems more relevant than ever, as it not only ensures the safety of effluents relevant to animal and human contact, but also allows for studies investigating AMR patterns and potential transmission mechanisms in human-impacted environments. Such studies, especially if done over long timeframes, can help define the extent of the AMR problem, which is relevant in resolving this global issue. In this study, we demonstrate a rapid but sufficiently comprehensive metagenomic method that can make long-term routine wastewater monitoring more feasible.

Initial characterization of the wastewater samples in terms of microbial diversity and composition revealed substantial discrimination between primary and secondary samples, which is expected as each sample underwent a different treatment process. Primary treatment primarily involves sedimentation-based removal whereas secondary treatment involves filtration and biological-based processes (Quach-Cu et al. 2018). Acinetobacter was found to be the dominant genus in all samples, present in primary samples at a slightly higher abundance of 23% compared to 14% in secondary samples. The second most abundant genera differed by treatment sample—Bacteroides for primary samples and Polaromonas for secondary samples. Aeromonas and Ruminococcus species were more prominently detected in secondary samples whereas Arcobacter and Onygenales were more abundant in primary samples. Bacteroides and Ruminococcus are typical gut bacteria and Acinetobacter, Polaromonas, Onygenales, Aeromonas, and Arcobacter are environmental microbes. Several Aeromonas, Arcobacter, and Acinetobacter species are known zoonotic pathogens. Thus, the microbial composition of the wastewater samples comprise a complex mixture of gut and environmental microbes, a conclusion from other wastewater systems as well (Newton et al. 2015; Numberger et al. 2019).

Microbial richness and diversity significantly decreased with the wastewater treatment process, not unexpected given that secondary effluent samples have passed through wastewater treatment steps. However, the observed decrease in microbial richness and diversity in secondary samples did not translate to a relative reduction in pathogenic species (Fig. 1). Although culture-based methods would determine actual viability, the data from this and other studies (Chahal et al. 2016; Osińska et al. 2017; Harnisz and Korzeniewska 2018) show that most wastewater treatment processes are insufficient to remove all potential pathogens from effluents.

Compositional data approach

The compositional data approach supported the initial observations from traditional data analyses, showing microbial structure differences between primary and secondary wastewater. The difference is that the CoDa analysis was mainly done at the genus level for greater resolution of taxa involved in the observed separation. Primary wastewater samples predominantly grouped together, suggesting they contained similar taxa at similar abundances (Gloor and Reid 2016). Further interpretation of the biplot is qualitative at best because only 78% of the variability is explained by the first two principal components. Considering the limited data projection, a few rules can be employed to interpret the biplot data. First, the length of the ray is related to relative taxon variability (Gloor and Reid 2016). The longer the ray, the more variable the represented taxon relative to other taxa. Using this rule, it can be seen that Acidovorax (Betaproteobacteria), Arcobacter (Epsilonproteobacteria), Bacteroides (Bacteroidia), and Ruminococcus (Clostridia) reveal relatively longer rays, hence greater variability with respect to wastewater treatment than other taxa in the biplot. Acinetobacter (Gammaproteobacteria) had the shortest ray, thus the least variability. A second rule concerns the angle between rays, namely correlation between taxa. A smaller angle indicates correlation of abundance of the involved taxa. Rays of taxa that are orthogonal are uncorrelated. Figure 4a shows a small angle between rays for Acinetobacter-Prevotella and Acinetobacter-Bacteroides, suggesting abundance of these taxa were highly correlated while abundance of Acidovorax and Ruminococcus were not. Co-incident rays of Prevotella and Bacteroides are interpreted as a constant ratio of abundance. The unsupervised cluster dendrogram and corresponding barplot (Fig. 4b) support the observed grouping across wastewater treatment, but not several projections from the compositional biplot. This is not surprising given limits of the data projection, e.g., Aeromonas (Gammaproteobacteria), but not Acidovorax (Betaproteobacteria), was a variable taxon, as clearly shown in Fig. 4b, although Arcobacter (Epsilonproteobacteria), Bacteroides (Bacteroidia), and Ruminococcus (Clostridia) were correctly projected to be highly variable. Regardless of the discrepancies, these interpretations support the conclusion that certain microbial taxa distinguish primary from secondary wastewater.

Sixteen genera were statistically responsible for the observed discrimination during wastewater treatment. Seven genera representing major phyla except Actinobacteria were more abundant in primary wastewater. These included environmental microbes such as Flavobacterium and Oscillatoria (cyanobacteria), as well as bacteriophages belonging to the family Myoviridae. Nine genera were more abundant in secondary wastewater samples, including gut (Prevotella, Bacteroides, and Ruminococcus) and environmental (Acinetobacter, Sediminibacterium, Azonexaceae, and Polaromonas) microbes. All major phyla were represented except Cyanobacteria and viruses, consistent with the differences shown in Fig. 2a. Secondary wastewater contained Nitrospira, Azonexaceae, and Tetrasphaera, which are common in biological treatment of primary wastewater (Cydzik-Kwiatkowska and Zielińska 2016; Herbst et al. 2019). While the data will require actual viability testing, the detection of Acinetobacter and Bacteroides in secondary wastewater effluents is concerning especially since species in this genera are known pathogens with some exhibiting carbapenem and phenicol resistance (Higgins et al. 2018; Niestępski et al. 2019).

Effect of wastewater treatment on the resistome and mobilome

Diversity and composition differences between primary and secondary wastewater were observed relevant to the resistome, with more classes of AMR genes and MRGs in primary than secondary samples. The overall decreasing trend in diversity along the wastewater treatment process is consistent with observations from similar studies (Ng et al. 2019; Ben et al. 2017). However, this does not necessarily support a decreased probability of AMR dissemination as certain resistance genes like the OXA β-lactamases were significantly enriched in the plasmid reads of secondary samples. The observed enrichment in plasmids is not surprising given that these enzymes, some of which are also carbapenemases, are mostly carried in plasmids (Antunes et al. 2014; Antonelli et al. 2015).

A comparison with published research using the ONT MinION reveals both similarities and differences. The work by Che et al. (2019) examining wastewater samples from Hong Kong’s Shatin Wastewater treatment plant found similar classes of ARGs, with dominance of genes conferring resistance to aminoglycosides, beta-lactams, MLS, tetracyclines, sulfonamides, phenicols, quinolones and multidrug. However, with the exception of multidrug, a higher percentage of each of these classes of ARGs was located on plasmids, rather than on the chromosome. This is in contrast to our work, which shows a larger portion of ARGs associated with chromosome-derived sequences.

What is consistent with the Che et al. (2019) study is the finding that a higher proportion of ARGs are present on plasmids in secondary wastewater samples as opposed to primary samples.

Multidrug, beta-lactam and peptide AMRs were the dominant classes of ARGs in the NanoARG pipeline, and overall there are more classes of AMR detected in primary samples as opposed to secondary samples. However, total numbers of ARGs were slightly increased for the March secondary samples, and in terms of relative abundance, secondary samples for both March and April were enriched in OXA β-lactamases. This was confirmed by both the CosmosID and Epi2Me bioinformatics pipelines. For CosmosID, OXA β-lactamases represented ~12% and ~4% of AMR genes in March and April primary samples, respectively, and ~40% and ~50% of AMR genes in March and April secondary samples, respectively. Epi2me uses different algorithms and databases and identified a greater number of genes, but the trend was still consistent with OXA β-lactamases representing ~1.5% of AMR genes in both March and April primary samples, and ~11% and ~22% of AMR genes in March and April secondary samples, respectively. This increase in β-lactamases through the wastewater treatment process is consistent with other studies that highlight the risks of release of these ARGs to receiving waters (Amador et al. 2015; Makowska et al. 2020).

As with ARGs, a larger proportion of MRGs were associated with chromosome-derived sequences. MRGs also appear to be enriched in secondary samples on the chromosome, but not on plasmids. Li et al. (2015) examined plasmid ARGs and MRGs from two Hong Kong wastewater plants, and found zinc and copper to be the dominant MRGs, followed by cobalt and arsenic. These genes were also found to be dominant in our study in addition to MRGs for molybdenum and nickel, which may reflect different sources. Li et al. suggest that high copper and arsenic, used in animal breeding, may be present due to slaughterhouse wastewater in one of the two WWTPs they sampled. Although Amherst, MA, can be considered semi-rural, with a population of only ~40,000, it does include a major research university.

Mobilome genes detected in this study were mostly located in plasmid-derived sequences. Genetic elements related to transposase were the most common MGEs in all samples examined. Transposases are implicated in DNA mobility within and potentially between genomes and are therefore important in horizontal gene transfer and bacterial adaptation to new environments (Vigil-Stenman et al. 2017; Cuecas et al. 2017). The class 1 integron-integrase intl1 gene was detected exclusively in all primary wastewater samples and despite the very low occurrence, the observation is still consistent with the suggestion to use this gene as an anthropogenic pollution marker (Gillings et al. 2015). The finding that the secondary effluent samples have higher mobilome diversity and composition does not immediately increase the potential of clinically relevant transfer risks to receiving environments, as these will be related to associations with AMR elements. Clearly significant numbers of resistome genes are either in plasmid-derived sequences or are associated with other mobile genetic elements, with implications for mobilization of antimicrobial resistance (Slizovskiy et al. 2020).

Several of the MGE- and plasmid-associated resistance genes for peptide antibiotics, beta-lactams, multidrug, aminoglycosides, sulfonamide and tetracyclines were matched by the NanoARG pipeline to sequence reads of well-known human pathogens like Clostridium botulinum, Klebsiella pneumoniae, Escherichia coli, Acinetobacter sp., Staphylococcus aureus, Aeromonas hydrophila, Pseudomonas aeruginosa, Vibrio cholerae, Clostridioides difficile, and Salmonella enterica. Interestingly, a number of those reads fall under the current US CDC list of urgent and serious threats of antimicrobial resistance (CDC AR Threats Report 2019). For example, both primary and secondary samples were found to contain MGE or plasmid-associated reads corresponding to carbapenem-resistant Acinetobacter which is at the top of the CDC list of serious threats. Carbapenems belong to the beta-lactam category of antibiotics and are normally used as last resort drugs (Gupta et al. 2017). The finding that these reads were mobile, and hence potentially transferable makes it even more concerning. Interestingly, mobile reads corresponding to drug-resistant Clostridioides difficile, another serious threat, were only detected in primary samples. On the other side, mobile reads corresponding to yet another serious threat, carbapenem-resistant Enterobacteriacea, were mainly detected in secondary samples. A number of reads in both samples were also found to match mobile mcr-1, mcr-2, mcr-3, mcr-4, and mcr-5 genes associated with pathogenic species from bacterial orders Aeromonadales, Enterobacterales and Pseudomonadales. The mcr gene variants confer resistance to colistin, which is used for severe infections related to multidrug resistance (Liu et al. 2016). Sufficient data suggests that the effluent samples in this study contain the necessary components for the potential spread of a clinically relevant AMR—clinically relevant resistomes with transferable traits carried by serious human pathogens. Traditional microbiology methods like culture-based experiments will be needed to verify actual pathogen viability and other factors like environmental selective pressure will most likely play an important role in AMR spread (Hall et al. 2015). Nonetheless, information gathered in this study can provide the baseline to further probe the potential AMR dissemination mechanisms in wastewater effluents.

The differences in ARG and MRG locations between the small Amherst wastewater treatment plant and much larger Hong Kong treatment plants (Che et al. 2019; Li et al. 2015) are not surprising and underscore the importance of studying a wide variety of systems of different sizes, geographic locations and sources of sewerage. We are at an early stage in these studies in terms of reliable and reproducible methodologies, but the ONT MinION provides a highly functional WGS platform to conduct metagenomics studies at different locations. The increasingly large number of bioinformatics platforms produce challenges in reproducibility of data due to the use of different algorithms and databases. For site to site comparisons, it remains important to use the same software packages for analysis.

Conclusions

The Oxford Nanopore Technologies MinION, coupled with a variety of cloud-based bioinformatics workflows, provides a useful, rapid and portable tool for characterizing complex microbial communities and their resistome and mobilome profiles. Once characterized, the microbial communities can be assessed with respect to the influence of treatment on microbial composition. It is concluded that wastewater treatment processes can be evaluated using the handheld MinION, with the potential for future modeling of semi-rural WWTPs without access to more sophisticated tools. This in turn may help in evaluation of risks for receiving waters, the environment, and downstream users, relative to specific pathogens and AMR using data from the methods presented here.