Introduction

Human adenoviruses (HAdVs) are considered as one of the most abundant enteric virus groups in water. Numerous studies are available on their occurrence and their concentration in all types of water matrices all over the world (see review articles by Jiang 2006; Mena and Gerba 2009; Bofill-Mas et al. 2013). The detection of HAdVs in water is commonly carried out by PCR or real-time PCR using either consensus primers which target all HAdV types or specific primers for identifying exclusively the enteric types (HAdV-40 and HAdV-41). But paradoxically, the overall diversity of the HAdV types in environmental samples is still insufficiently documented (Lee and Kim-Sang 2010; Bibby and Peccia 2013b). Better knowledge of their occurrence in water could improve infectious risk assessment for humans exposed to specific HAdV types. Currently, 54 types of HAdVs are approved in the last report of the International Committee on Taxonomy of Viruses (Harrach et al. 2012). They are divided into seven HAdV species named HAdV-A to HAdV-G, which belong to the Mastadenovirus genus of the Adenoviridae family. Some other potential new types have been suggested (Walsh et al. 2011; Robinson et al. 2011; Liu et al. 2011; Matsushima et al. 2012) but are not yet accepted. Until now, sequencing efforts have resulted in the identification of only a small proportion of HAdV types in water samples, despite the fact that all of them can be shed in stool of infected people and then be discharged in wastewater treatment plants or directly into surface waters (Mena and Gerba 2009). The main reported water-associated types belong to the HAdV-A (HAdV-12, HAdV-31), HAdV-C (HAdV-1, HAdV-2, HAdV-5, HAdV-6) and the HAdV-F (HAdV-40, HAdV-41) species, with HAdV-41 and HAdV-2 being the most commonly detected types (Van Heerden et al. 2005; Muscillo et al. 2008; Kuo et al. 2009; Fong et al. 2010; Bofill-Mas et al. 2010; Wyn-Jones et al. 2011; Ogorzaly et al. 2013). Some types from HAdV-D (HAdV-8, HAdV-19) and HAdV-B (HAdV-3) have also been observed, but with very low frequency (Fong et al. 2010; Kokkinos et al. 2010; Bofill-Mas et al. 2010; Wyn-Jones et al. 2011). These data were mainly obtained by the direct sequencing (Sanger sequencing) of PCR products after a purification step. Cloning sequencing (insertion of PCR products into a plasmid vector prior to sequencing) was also used, but sparsely. The major drawback of these conventional sequencing approaches is that only a limited number of representative and especially the predominant sequences can be obtained, while water samples may contain multiple viral strains. Therefore, these methods are limited in output and inadequate for retrieving whole HAdVs diversity in environmental samples.

Today, the high-throughput next-generation sequencing (NGS) technology is an attractive and practical means for studying a viral pathogen’s diversity in clinical samples as well as in environmental samples. Two NGS approaches can be applied in this framework: the random metagenomic which provides information on the full virome of a given sample or a gene-targeted sequencing which specifically detects viruses of interest. Advantages, relevance, drawbacks and limitations of both approaches were discussed in a comprehensive review (Wong et al. 2012). Recently published metagenomic data have confirmed the presence and the prevalence of HAdVs in wastewater, sewage sludge and biosolid viromes (Bibby et al. 2011; Bibby and Peccia 2013a; Aw et al. 2014). Using such an approach, information on family, genus and sometimes species of viruses can be gathered. However, more accurate data on types remains for the moment, difficult to obtain because of an insufficient sequencing depth. This limitation is probably a consequence of the presence of high concentrations of untargeted nucleic acids (plant viruses, bacteriophages or bacteria) conjointly sequenced with the target of interest (Hall et al. 2014). For the same reason, some discrepancies have been described between results from direct viral pathogen detection using real-time PCR and metagenomic analysis (Bibby 2013; Bibby and Peccia 2013a). In order to increase the chance of obtaining detailed data on one or several viral pathogens of interest and an estimation of their proportion in the same sample, an amplicon sequencing approach appears to be preferable.

The objectives of this study were (i) the development of a next-generation amplicon sequencing assay for HAdVs and (ii) the detection and identification of HAdV diversity in wastewater and river water matrices. Viral DNA amplification was performed using a degenerated primer pair targeting the non-conserved hexon gene, which allows the discrimination of all 54 human types described until now. An Illumina benchtop MiSeq system was used for its ability to sequence long reads of 2 × 300 base pairs, which seems highly compatible for the suggested amplicon sequencing approach. In addition, due to its relevance and pertinence for public health issues and microbial risk assessment, typing of infectious particles was performed subsequently to a cell culture isolation and multiplication step. A broad variety of cell lines have been proposed, but the efficiency of viral replication on cell cultures varies with HAdV types. So, the impact of the choice of the cell line (A549 vs 293A) on the overall HAdV diversity observed in wastewater treatment plants is also evaluated and discussed.

Materials and Methods

Water Samples

Influents and the effluents of four Luxembourgish wastewater treatment plants (WWTPs) located in Schifflange, Beggen, Hespérange and Bonnevoie (capacities of 90,000; 210,000; 26,000 and 60,000 inhabitant equivalents, respectively) were collected in March (Schifflange), April (Beggen and Hespérange) and May (Bonnevoie) 2013 (Fig. 1). Two hundred and four hundred millilitres of water samples from inlet and outlet of the treatment plants, respectively, were concentrated by ultracentrifugation as previously described (Skraber et al. 2011). The pellet was resuspended in 10 mL of Dulbecco’s modified Eagle medium (D-MEM).

Fig. 1
figure 1

Locations of the four wastewater treatment plants (filled triangle) and the sampling sites on the Moselle River (plus symbol R1–R6), the Meurthe River (open square R7) and the Sûre River (open square R8). WWTP 1 Schifflange, WWTP 2 Beggen, WWTP 3 Hespérange, WWTP 4 Bonnevoie

In addition, 24 samples of surface water were collected in the French and Luxembourgish parts of the Moselle River watershed (Fig. 1). Six sampling sites were dispatched along a 150-km-long section of the Moselle River and two additional sampling points were added on the two major tributaries, i.e. the Meurthe River and the Sûre River. Three successive sampling campaigns were performed in March, April and May 2014. Viral particles were concentrated from 20 L of acidified water (pH 3.5) using a glass wool method followed by an organic flocculation, as previously described (Wyn-Jones et al. 2011). The pellet was resuspended in 10 mL of D-MEM.

A flowchart of analyses performed on water concentrates for HAdV detection and characterisation is presented in Fig. 2.

Fig. 2
figure 2

Experimental flows to evaluate concentration, infective status and diversity of human adenovirus in water samples using culture-dependent and culture-independent methods. Culture-dependent assays were performed using both cell lines (A549 and 293A) for wastewater samples, while only 293A cells were used for river water samples

Cell Line Infection

The human lung carcinoma cell line A549 and the human embryonic kidney cell line 293A, commonly adopted for HAdV detection, were used in this study. Both cell lines were used for wastewater samples (Fig. 2). Infectivity assays were performed according to both previously published ICC–quantitative real-time PCR (qPCR) protocols (Wyn-Jones et al. 2011; Ogorzaly et al. 2013). Briefly, cells were inoculated in a 25-cm2 tissue culture flask. After a 4-day growing period, cell monolayers were infected with 1 mL of viral concentrates and incubated for 60 min at 37 °C for the attachment of the infectious viral particles on cells. A negative process control (ISO 22174:2005) was included in all our experiments, consisting in the inoculation of a cell monolayer with 1 mL of D-MEM medium. Afterwards, the viral inoculum was discarded, and cells were submitted to three successive washings to remove unbound viral particles. Finally, cells were inoculated with maintenance medium containing 1 % final concentration of an antibiotic–antimycotic solution (6,000 µg/mL of penicillin, 10,000 µg/mL of streptomycin and 25 µg/mL of amphotericin B, Life Technologies). A549 and 293A cells were incubated for 5 and 3 days, respectively, prior to viral DNA isolation. The presence of infectious HAdVs in river samples was evaluated using only the 293A cells, and the ICC–qPCR protocol was adapted for titre microplate format. Conditions and media used remained similar as described in the protocol of Ogorzaly et al. (2013).

DNA Isolation

All viral DNA isolations were carried out using the QIAamp Viral RNA mini kit (Qiagen) to a final volume of 100 µL of elution buffer. The initial sample quantity used for DNA extraction was adapted in accordance with the preceding sample preparation protocol. After cell culture, viral DNA was purified either from 140 µL of A549 cell supernatant after three freeze–thaw cycles or directly from 293A cells. For culture-independent detection, viral DNA was directly extracted from 140 µL and 1 mL of wastewater and surface water concentrates, respectively, using the QIAamp Viral RNA mini kit (Qiagen). A negative extraction control was added in each experiment.

Quantitative Real-Time PCR

Quantification of HAdV genome in samples was performed using consensus qPCR assays, using primers, probes and conditions of amplification described by Hernroth et al. (2002) and Jothikumar et al. (2005) for wastewater and river water samples, respectively. The qPCR standard curves were generated using plasmids DNA having the cloned target regions. The validity of the qPCR results was confirmed using an appropriate set of controls (positive and negative qPCR controls) at each cycler run. Also, qPCR inhibition was evaluated either by the addition of a known concentration target plasmid DNA or by the dilution (10- and 100-fold) of DNA extracts prior to amplification.

Sanger Sequencing

In order to identify the major HAdV type occurring in each water sample, DNA extracts from all samples were amplified and analysed by Sanger sequencing according to a previously described protocol (Ogorzaly et al. 2013).

NGS Amplicon Production and Sequencing

The first step used PCR to amplify HAdV DNA template out of DNA sample using a region of interest-specific primers with attached overhang adapters. This amplification was performed using the degenerated hex1deg–hex2deg primer pair (Allard et al. 2001), which targets a hyper variable region of the hexon gene allowing the discrimination of the 54 HAdV types described until now. A 50-µL PCR mix was prepared for each sample and contained 5 µL of 10× High Fidelity PCR buffer without magnesium salts (Platinum Taq DNA Polymerase High Fidelity kit, Invitrogen), 2 µL of 50 mM MgSO4 (Platinum Taq DNA Polymerase High Fidelity kit, Invitrogen), 2.5 µL of 2.5 mM dNTPs each (Invitrogen), 0.2 µL of Platinum Taq DNA polymerase High Fidelity (Platinum Taq DNA Polymerase High Fidelity kit, Invitrogen), 0.5 µL of each 50 mM modified primer, 29.3 µL of PCR grade water and 10 µL of DNA suspension. Thermal profile was 94 °C for 60 s, followed by 25 cycles at 94 °C for 30 s, 55 °C for 30 s and 68 °C for 60 s. After amplification, PCR products were purified using AMPure XP beads (Beckman Coulter) according to the manufacturer’s protocol with a ratio of 1.8:1 beads to DNA. The concentration of purified amplicons was measured using the Qubit fluorometer (Invitrogen) and the Qubit dsDNA HS assay kit (Invitrogen).

The second step used PCR to fix, at each extremity of the purified amplicons, dual indices and Illumina sequencing primers to obtain barcoded amplicons ready for sequencing. Fixation was carried out using the overhang adapters previously attached to amplicons. The unique index combination enables the subsequent identification of each sample. Amplification was performed in a 50 µL reaction mixture containing 5 µL of PCR products (1 ng/µL), 25 µL of Q5 Hot Start High Fidelity 2× Master Mix (Biolabs), 5 µL of Nextera XT Index 1 (N7−, Illumina), 5 µL of Nextera XT Index 2 (S5−, Illumina) and 10 µL of PCR grade water. Thermal profile was 95 °C for 30 s, followed by 12 cycles at 95 °C for 10 s, 55 °C for 30 s and 72 °C for 30 s, and a finishing step at 72 °C for 5 min. After amplification, barcoded amplicons were cleaned using AMPure XP beads (Beckman Coulter) as described above. The length of purified barcoded amplicons was assessed by capillary electrophoresis (Agilent 2100 Bioanalyser, Agilent Technologies) from 1 µL of the PCR products using the Agilent DNA 1000 kit (Agilent Technologies). Also, purified barcoded amplicons were diluted 1:4000 or 1:10,000 for quantification by qPCR using the KAPA Library Quantification kit (KAPA Biosystems) with dsDNA standard concentrations in a range from 2 × 10−4 to 20 pM, according to the manufacturer’s instructions.

All quantified amplicons were mixed in equimolar amounts to obtain a final DNA library concentration of 4 nM. The final library concentration was controlled by qPCR using the KAPA Library Quantification Illumina kit. The DNA library was then denatured and diluted as recommended by Illumina (MiSeq Reagent kit v3-reagent preparation guide) and loaded at 15 pM on a MiSeq flow cell with 5 % PhiX spiked in. Paired-end sequencing was performed with 301 cycles using the Generate FASTQ workflow. After sequencing, MiSeq reporter software automatically trimmed Illumina adapters and demultiplexed data to provide two FASTQ files (read 1 and read 2) per sample.

Bioinformatics for HAdV-Type Identification

The sequences dataset was analysed using the program Mothur (Schloss et al. 2009) following the standard operating procedure (SOP) for MiSeq provided on the Mothur platform (http://www.mothur.org/wiki/MiSeq_SOP) (Kozich et al. 2013). Briefly, after contig generation, sequences were removed from consideration if they did not have a length ranging between 290 and 320 base pairs. Unique sequences from all samples were merged and aligned to a reference alignment. The reference alignment file was created using the official sequence for each approved HAdV type and CLC Main Workbench 6 software (CLC Bio). The aligned sequences were then treated to trim the PCR primers and to discard sequencing mistakes and chimaeras. Chimaeras were removed using the Uchime algorithm implemented in Mothur. All remaining sequences were finally classified. A value of 99 % of similarity was fixed for HAdV type identification, except for the distinction of types belonging to the HAdV-C species. In this case, 70 % of similarity was used as an identification criterion.

Results

Abundance and Diversity of HAdV in Wastewater Samples

Eight wastewater samples (4 influents and 4 effluents) from four treatment plants were used in this study. Using a culture-independent assay, genetic material of HAdVs was detected in all tested wastewater samples. The DNA concentrations ranged from 6.3 × 104 to 1.1 × 106 and 3.2 × 104 to 2.8 × 105 genome copies/L for influents and effluents, respectively. Sanger sequencing of amplicons generated by direct PCR showed the presence of both HAdV-31 and HAdV-41 types, while next-generation sequencing results showed the presence of adenoviral DNA belonging to the HAdV-A (HAdV-31, HAdV-12), HAdV-C (HAdV-2) and HAdV-F (HAdV-40, HAdV-41) species (Fig. 3). The most abundant species in wastewater of the four treatment plants was the F, and especially the HAdV-41 with a relative abundance of more than 80 %.

Fig. 3
figure 3

Distribution of human adenovirus types identified in wastewater samples isolated directly from water (total viruses) and from A549 and 293A cells (infectious viruses). Data from inlet and outlet of each wastewater treatment plant were pooled

All samples tested were also positive for the occurrence of infectious adenoviral particles. HAdVs isolated from cell culture using A549 and 293A cells were sequenced on the MiSeq platform in order to assess the effect of the cell line on identified HAdV diversity. Globally, the diversity picture obtained using both cell lines was remarkably different (Fig. 3). Of the A549 reads, the majority of contigs (superior to 95 % regardless of the sample site) were assigned to HAdV-C with a high significant value, and more particularly as HAdV-2 (Fig. 3). In samples from the WWTP 2 (Beggen), a small number of reads were annotated as HAdV-31 (2.3 %) belonging to the HAdV-A species, and as HAdV-3 (1.4 %) belonging to the HAdV-B species. The main divergence observed using the 293A cells was the broad detection of infectious HAdV-F specifically, permitted by the 293A cells. Both HAdV-41 and HAdV-40 were represented, with the HAdV-41 being the major type. Also, more species and more types were observed after a preliminary isolation and culture step on 293A cells. The three species HAdV-A, HAdV-C and HAdV-F were very well characterised using the 293A cells. In all samples together, six different types were observed, i.e. HAdV-1, HAdV-2, HAdV-12, HAdV-31, HAdV-40 and HAdV-41 (Fig. 3). They were also found simultaneously in water samples coming out of inlet and outlet of the WWTP 2.

Interestingly, Sanger sequencing results were highly consistent with NGS results for all wastewater samples analysed regarding the predominant HAdV type identified. Indeed, the only HAdV type identified by Sanger sequencing was also at the same time the most abundant type observed by MiSeq sequencing.

Abundance and Diversity of HAdV in River Water Samples

Concerning river water samples, the genome of HAdV was detected in 87.5 % (n = 24) of the samples (Table 1). However, in most of the samples, HAdV quantities were closed to the limit of quantification (LOQ) of the qPCR assay, hindering a determination of the genome copy concentration. Among the 21 samples declared positive, seven samples were quantifiable (superior to LOQ estimated to 200 genome copies/L) and exhibited an average concentration of 2.9 ± 0.5 × 102 genome copies/L. qPCR inhibition was evaluated by the addition of plasmid DNA having the target region at a known concentration in each sample, and it was estimated to be less than 30 % in all samples (data not shown). Due to the low viral DNA quantities in our samples, no amplicon was obtained by qualitative PCR using the hex1deg-hex2deg primers set, precluding the MiSeq amplicon sequencing. To acquire a hexon gene amplicon, a nested-PCR was carried out using the nehex3deg-nehex4deg primer set. Only seven amplified DNA fragments were obtained from the 21 samples that were positive by consensus qPCR, including six HAdV-41 and one HAdV-3 (Sanger sequencing).

Table 1 River water results for the detection and characterisation of human adenoviruses by quantitative real-time PCR, nested-PCR followed by Sanger and/or MiSeq amplicon sequencing

About half of the river samples (54.2 %, n = 24) were found positive for the occurrence of infectious particles using the 293A cells. The MiSeq amplicon sequencing workflow was applied on these positive samples. One hundred percent of obtained reads were annotated as HAdV-41 in 12 out of 13 samples. For the last one, two different types were determined together, the HAdV-6 and the HAdV-41. Identical results were found by applying Sanger sequencing because of a moderate quantity of HAdV and a poor diversity in these river water samples.

Discussion

Although today the metagenomic approach is preferred for determining the viral diversity in wastewater, the findings in this study establish that the amplicon sequencing approach can also be highly suitable and appealing for the rapid detection and identification of viruses of interest in complex water matrices. Using such an approach, the detection specificity is favoured over the production of high data quantity for obtaining a maximum of relevant information from the millions of reads produced in the sequencing run (Lange et al. 2014). To the best of our knowledge, this is the first work which provides information on HAdV-type occurrence and prevalence in wastewaters and surface waters using a next-generation amplicon sequencing assay. An analogous study has been reported using another NGS platform, but the results were restricted to HAdV species identification (Bibby and Peccia 2013b). The data treatment developed in the present study allows to assign all reads generated by NGS to a single HAdV species and type with a very high significant value (99 % of similarity selected as critical value). One difficulty was, however, encountered for the distinction of types belonging to the HAdV-C species. Reads assigned to HAdV-2, HAdV-6 and HAdV-1 strains presented a similarity of approx. 70, 80 and 99 %, with the reference-type strain (J01917, HC492785 and AF534906, Genbank accession number), respectively. But all of these amplicons were unequivocally assigned to HAdV-C species using the critical value of 99 % of similarity. Other authors also mentioned such an observation in their study (Lee et al. 2004). One possible explanation might be the high genomic similarity of the four human types from the HAdV-C species. Also, reads obtained in this study were only examined in contrast to the reference sequence, but many other sequences are available in databases for one given strain.

Using the suggested MiSeq workflow, the distribution and the relative abundance of HAdV types observed in the analysed samples are in agreement with those retrieved from scientific articles taken all together. Overall, only some HAdV types from the 54 types currently described are detected in our wastewater and freshwater samples. The HAdV types 2, 12, 31, 40 and 41 being the most commonly represented, with some occurrences of HAdV-1, HAdV-3 and HAdV-6. Also, the high prevalence of the HAdV-41 highlighted here, especially in river water, is in accordance with what has been previously reported (Xagoraraki et al. 2007; Haramoto et al. 2007). Thus, the workflow used here is perfectly suited for evaluating HAdV diversity in wastewater samples and for typing infectious particles. However, some limitations have been observed for river samples sparsely contaminated by viruses. No hex1deg–hex2deg amplicon was obtained, despite the genomic signal in real-time PCR and detection by culture assay. The low viral DNA quantities present in the concentrated water samples and the lack of sensibility of the qualitative PCR assay can be considered as responsible of the failure of amplicon generation using hex1deg-hex2deg primer pair. Consequently, a nested-PCR had to be applied to obtain a PCR fragment usable for Sanger sequencing. This latter approach allowed obtain information for some additional samples but unfortunately not for all of them. In this particular context, a preliminary enrichment step can be an interesting alternative (Hall et al. 2014), as demonstrated here. The use of a cell culture step prior to the HAdV detection and characterisation by NGS has a double asset. On the one hand, it provides information on infectious particles only, which can be directly related to a public health risk assessment. On the other hand, multiplication inside host cells allows drastically increase the viral DNA quantity for subsequent typing analysis. However, this preliminary culture step may potentially alter the original species/types distribution according to the experimental procedure applied, especially regarding the cell line type or the culture duration.

Currently, the detection of infectious HAdVs in environmental samples is widely performed using either the A549 or the HEK 293A cells. As expected, the results reported reveal completely disparate HAdV diversity profiles according to the cell line used, that confirms and reinforces previous studies (Jiang et al. 2009; Ogorzaly et al. 2013). Our results clearly demonstrate that the A549 cells promote the multiplication of HAdV types belonging to the HAdV-C species at the expense of others. The presence of some annotated sequences from HAdV-31 and HAdV-3 types suggests that HAdV-A and HAdV-B species can also be detected using the A549 cells, albeit to a lesser extent. Data from the literature described that A549 supported the replication of the HAdV-C, HAdV-B, HAdV-D species but not the replication of HAdV-F (Jiang et al. 2009). Based on our results, no conclusion can be done on the HAdV-D species since it was not detected during the present study. In accordance with our results, the 293A cell line enables the multiplication of the HAdV-A, HAdV-C and HAdV-F species in the same way. Other work also reported the replication of the HAdV-D (Jiang et al. 2009) on this cell line. In addition, the relative abundance of each HAdV type observed using the 293A cells is closely related to the diversity observed by culture-independent typing; however, the results are not directly comparable (only infectious particles are identified in the first way, whereas DNA fragments from infectious and non-infectious viruses are conjointly characterised in the other way). A high proportion of infectious particles in our samples, as established using the infectivity assay, is a quite probable explanation of this likeness. The 293A appears to be the most suitable cell line to allow the detection of a wide diversity of HAdVs occurring in water samples, giving thus a more realistic types/species distribution than using the A549 cell line. A recently published study described the use of a metagenomic approach on the MiSeq Illumina platform to identify enteric pathogenic viruses isolated from wastewater with or without a cell culture step on A549 cells (Aw et al. 2014). In the latter study, no contig was identified as HAdV-F after a culture step on A549 cells, while in the original sample, about 15 % of the annotated contigs were associated to this enteric species, using a direct assay. Unfortunately, no information was provided on the infective state of the HAdV-F, preventing conclusive interpretation. However, it is important to note that this observation is largely in agreement with the findings presented in the present work, and supports our conclusion on the inability of A549 cells to give a clear overview of the HAdV diversity in a water sample.

Results obtained in this study are very consistent with previous data on HAdV types circulating in water matrices, confirming the high prevalence of enteric types. The accuracy of sequences generated by this NGS method was demonstrated by comparing sequences to those generated by Sanger PCR amplicon sequencing. The HAdV type identified by Sanger sequencing and the most abundant type observed by NGS were always exactly the same in all samples, regardless the water type and the sample preparation protocol. Moreover, alignment of the Sanger and MiSeq generated sequences revealed very high level of matching. The major asset of MiSeq sequencing over the Sanger approach is the ability to simultaneously retrieve multiple sequences belonging to different HAdV types from a unique sample. The overall diversity of HAdV in aquatic environment is thus accessible, in particular the less represented types. The described NGS workflow appears to be an attractive alternative because it is easier to set up, more informative and more cost-efficient, as a high number of samples can be analysed simultaneously.

In conclusion, a significant output of our study is the expansion of the available information on the HAdV-type identification and diversity in complex environmental water thanks to the next-generation amplicon sequencing approach. This use for water sample analysis has the potential to provide relevant, accurate and specific information about the diversity of pathogens of concern with a unique sequencing protocol. The sample preparation step remains however a critical factor for the effective identification of viruses, as viral nucleic acids need to be present in high enough concentration to guarantee the detection. Similar studies could be conducted to monitor simultaneously different enteric viruses in order to improve the knowledge about the pathogenic viruses circulating in aquatic environments.