Introduction

Human rhinoviruses (HRVs) are a highly prevalent and genetically diverse group of human pathogens that cause a wide range of disease. Although they have been traditionally linked to mild, self-limiting upper respiratory tract infections, there is now evidence for a role of HRV in lower respiratory tract infection [13, 24, 31], exacerbations of chronic lung disease [20, 43, 70], severe systemic infections [66, 69] and gastrointestinal disease [14].

HRVs are single-stranded, positive-sense RNA viruses that belong to the genus Enterovirus, family Picornaviridae. HRVs are classified into three species on the basis of phylogenetic relationships and sequence divergence: Human rhinovirus A, Human rhinovirus B, and Human rhinovirus C [25, 49, 56]. While species A and B viruses can be readily isolated and cultured in vitro (and were among the earliest identified human picornaviruses), the equally prevalent HRV-C was only discovered very recently following the introduction of PCR-based diagnostic and surveillance screening methods [1, 21, 26, 28, 53]. For reasons that are currently poorly understood, HRV-C replicates poorly, if at all, in conventional cell culture, although recently, a variant of HRV-C was successfully cultured in sinus mucosal organ culture [4].

Serotypes of HRV-A and HRV-B were originally defined by serological cross-neutralisation assays [12, 19] and further characterised by receptor specificity and antiviral sensitivity profile. More recently, HRV-A and HRV-B have been genetically divided into two groups into which the 74 and 25 HRV types previously defined by serological methods have been assigned [56]. More recent analyses of phylogenetic relationships and genetic divergence among HRV strains has led to the reclassification of EV-68 as HRV-A87 [3] and of HRV-A21 and the Hanks strain as the same genotype [27]. Three new HRV-A types have also been defined on the basis of sequence divergence in the VP1 region: HRV-A101–HRV-A103 (52, www.picornaviridae.com). For the more recently described species C rhinoviruses, difficulties associated with in vitro culture have precluded the development of serology-based virus typing, and variants are currently divided into 51 genotypically defined types. Classification into types is based on the current system used for enteroviruses [46], in this case based on an assignment threshold of 13 % nucleotide divergence in the VP1 region [59].

The epidemiological profile of HRVs and occurrence of infection with different types varies substantially every epidemic season [57], with high frequencies of asymptomatic infections [50] and no identified link between any strain, type or species of HRV with particular disease presentations. Despite an apparent difference in tissue tropism, the HRVs share many genetic similarities with other members of the genus Enterovirus. For instance, they have a single-stranded, positive-sense RNA genome of around 7100 bases with a 5’ untranslated region (5’UTR) of around 600–650 bases. The coding region contains a single open reading frame and is cleaved post-translationally into four structural (VP1–VP4) and seven non-structural (2A–3D) proteins. The 5’UTR contains a type I internal ribosomal entry site (IRES) and a conserved 5’ cloverleaf structure [48, 49].

EV serotypes, in particular EV-B, have been documented to undergo frequent recombination [30, 34, 40, 44, 61], and it has been proposed that structural and non-structural genome regions evolve independently [33, 45, 55]. EVs can thus be classified into a series of distinct recombinant forms (RFs), each possessing a phylogenetically distinct grouping in 3Dpol and other non-structural genes. These RFs have been shown to arise at intervals, dominate circulation for a few years and then disappear entirely [39, 41]. Recombination breakpoints in EV generally localise to two hotspot regions: between the 5’UTR and P1 and between P1 and P2 [61].

In contrast, recombination is generally thought to be rare and sporadic within the coding region of HRV [22, 29, 65], and a contemporary set of HRV-C sequences were found to cluster completely congruently in three coding genome regions [38]. To date, there is limited evidence of inter-species recombination within the coding region of HRV. It is well established, however, that HRV-A and HRV-C have undergone recombination within the 5’UTR, whereby over half of all HRV-C types group within the HRV-A clade [17, 68]. In addition, recent reports have highlighted the possibility that several contemporary HRV-A strains have been formed by recombination [49].

Until recently, data from the non-structural region of the HRV genome had only limited availability, as most HRV types had only one full genome sequence available. In addition, sequence data for HRV-C strains spanned a very limited time period due to their relatively recent discovery and the tendency of studies to focus on capsid-based typing. In the course of this analysis, we obtained sequences from isolates of all three HRV species spanning 14 years in order to determine whether recombination had shaped the contemporary diversity of these types. In addition, we also performed a comprehensive analysis of recombination between the VP1 and 3Dpol regions of all available HRV full genomes. This study represents an attempt to comprehensively characterise all detectable recombination events in the history of diversification of HRV.

Materials and methods

Sample selection

HRV-positive isolates were selected from distinct geographical locations (UK and Finland) over a period spanning 14 years. These included 28 HRV-positive clinical respiratory specimens from Finland collected between 1995 and 1997. VP4/VP2 sequences for these isolates were previously published with accession numbers in the series EU590043–EU590113 and AY015114–AY015174. Using previously defined pairwise nucleotide p-distance thresholds in the VP4/VP2 region [59, 67], these 28 samples were identified as belonging to 22 HRV types (10 HRV-A, 6 HRV-B and 6 HRV-C).

In addition, a total of 552 respiratory samples from patients previously referred to the Specialist Virology Centre at the Royal Infirmary of Edinburgh were screened by our previously described nested PCR method, specific for the 5’UTR region of HRV/HEV [67]. Positive samples (n=136) were amplified and sequenced in the VP4/VP2 region as described previously [67]. Samples that were of the same 22 types described above (n= 41) were identified and selected for further analysis. A further 23 HRV-C-positive samples were also identified, which represented 11 of the previously 28 provisionally assigned HRV-C types [59]. In total, 92 HRV isolates were analysed during the course of this work.

Amplification of the VP4/VP2, VP1 and 3Dpol regions of HRV-A, -B and –C

RNA was extracted from clinical samples using a QIAGEN Virus Spin Kit (QIAGEN, UK), according to manufacturer’s instructions. Reverse transcription reactions were carried out using random hexamers and a Promega Access Reverse Transcription Kit with modifications as described previously [15]. Amplification of the VP1 and 3Dpol regions was carried out using the primers listed (Supplementary Table 1; Supplementary Data) and under the conditions described below.

HRV-A VP1

The full HRV-A VP1 region was assembled by amplification and sequencing of two overlapping sequence fragments. The same first-round product was utilised for second-round reactions for both VP1 fragments. For fragment 2, VP1 amplification was carried out as described previously [67]. For those samples that proved problematic to amplify, a combined reverse transcription and first-round PCR utilising the SuperScript III system (Invitrogen, UK) was carried out as described previously [42]. For fragment 1, the second round was a hemi-nested PCR reaction with primers 1943s and 2504as and previously described temperature cycling conditions. For HRV-A28-positive specimens, amplification of fragment 1 was not possible with standard primers, and a specific inner antisense primer was designed.

HRV-B VP1

Amplification of HRV-B VP1 was carried out using hemi-nested primers. PCR conditions were as for HRV-A, except for the use of an annealing temperature of 48 °C in the second round. For any samples that could not be amplified by this method, the SuperScript III system was used.

HRV-C VP1

SuperScript III was used to perform a combined reverse transcription and first-round PCR reaction directly on extracted RNA. The second round of the nested PCR reaction was carried out with an annealing temperature of 45 °C and in a 50-μl reaction volume. PCR products were loaded into wells on a 2 % agarose gel and separated at 150 V for 45 minutes. DNA bands of around 1000 bases were excised manually under UV transillumination and purified from the agarose gel using an Illustra GFX PCR DNA and Gel Band Purification Kit (GE Healthcare, UK) according to the manufacturer’s instructions.

HRV-A, -B and –C 3Dpol

Amplification of the partial 3Dpol region was carried out under previously described conditions [38] but using an annealing temperature of 45 °C in the second round. The amplified segment covered positions 6414–6896 in HRV-A (numbered according to FJ445111–HRV-A1), 6475-6960 in HRV-B (numbered according to X01087–HRV-B14) and 6361-6835 in HRV-C (numbered according to EF582385–HRV-C4).

All amplicons were sequenced using a BigDye Terminator Kit (Applied Biosystems, Warrington, UK). Sequences generated in this study have been submitted to GenBank and assigned accession numbers in the series KC342054–KC342173.

Sequence alignment and pairwise nucleotide p-distance calculations

All available HRV full genomes were downloaded from GenBank on 6/6/12. As a number of currently published HRV polyprotein sequences contain large gaps within the coding region, only those that were more than 90 % complete in the VP4/VP2, VP1 and partial 3Dpol regions were included. The regions analysed were 616–1002 for VP4/VP2, 2305–3126 for VP1 and 6361–6835 for partial 3Dpol (with all numbering according to reference strain HRV-C4, strain 024 (EF582385)). These comprised 136 HRV-A, 53 HRV-B and 121 HRV-C sequences, which were then compiled into separate datasets for each species, together with sequences generated in the current study: 28 HRV-A, 17 HRV-B and 24 HRV-C. A second dataset containing all available full genomes that were more than 90 % complete in the P1, P2 and P3 coding regions was constructed. Sequence alignments were performed in SSE v1.0 [58], using a combination of CLUSTAL/MUSCLE algorithms implemented in the program and manual inspection of nucleotide and amino acid alignments.

Pairwise nucleotide p-distances for individual regions were calculated using the program SequenceDist, within the SSE v1.0 package. For HRV-C, pairwise nucleotide p-distances in the VP1 region were used to confirm the minimum genetic divergence of the new HRV-C types [59].

Analysis of mean substitution rates

Sequences representing six HRV-A and HRV-C types were selected for analysis of mean substitution rates in the VP1 and partial 3Dpol regions. For VP1 datasets, every available GenBank entry that was >90 % complete across the region and had a date of isolation specified was included. 3Dpol datasets contained only sequences generated in this study and published full genomes. HRV-A and HRV-C datasets were screened with GARD, SBP and RDP for evidence for intra- and inter-type recombination prior to substitution rate analysis.

Analysis of mean substitution rates was carried out in the Bayesian Evolutionary Analysis of Sampling Trees program (BEAST) [8]. Each analysis used the SRD06 model of nucleotide substitution, as recommended for protein coding data [8]. The analysis was carried out using a relaxed uncorrelated lognormal molecular clock and specified a constant population size, with auto-optimization of all other priors during execution of the program. Each MCMC chain was run for 20 million states, and the output was recorded every 1000 states. All datasets were analysed in duplicate to ensure convergence of results. In order to confirm that all observed results were a consequence of input sequence data, each dataset was subjected to an “empty” execution of BEAST with sequence data excluded. Additionally, all datasets were analysed with the dates of isolation of individual sequences scrambled.

Output was analysed in the TRACER program, within the BEAST package. Phylogenetic trees generated by Bayesian methods were annotated in TreeAnnotator v1.6 and visualised in FigTree. These were then directly compared to trees produced by bootstrap re-sampled maximum composite likelihood neighbour-joining trees to ensure consistency of phylogenetic relationships inferred using the two methods.

Phylogenetic analysis

Phylogenetic trees were constructed using the MEGA 5.0 software package [63] by the neighbour-joining method [54] from 100 bootstrap-resampled sequence alignments using the maximum composite likelihood (MCL) [11] distances with pairwise deletion for missing data. Phylogenetic trees were examined for changes in branch length and tree topology between non-consecutive genome fragments (VP4/VP2, VP1 and 3Dpol) and between coding regions of HRV full genomes (P1, P2 and P3).

Recombination analysis

RDP v4.0 [36] analysis was undertaken to further investigate observed phylogenetic incongruities. Putative recombinant sequences were analysed with their observed nearest neighbours in each region. Each sequence set was examined using a combination of algorithms, including RDP [37], GeneConv [47], MaxChi [62], Chimaera [51], SiScan [10] and Bootscan [35]. A potential recombination event was accepted for further analysis if it was detected by more than two of the above algorithms, with a p-value of less than 0.05.

All datasets were further analysed using the GARD and SBP [23] programs available on the HyPhy datamonkey webserver (www.datamonkey.org). Prior to analysis of each dataset, the model selection tool included was employed, and the appropriate model of nucleotide substitution was used for further analysis.

In addition, as a control measure, several groups of sequences that were observed to display congruent phylogenetic relationships between VP1 and 3Dpol were analysed by these methods. All analyses of putative non-recombinant sequence groups confirmed that there was no evidence of recombination.

The combination of different recombination detection algorithms detailed above were used as a screening tool in order to effectively screen each alignment for evidence of recombination. If evidence of recombination was detected by these methods, the recorded event was further analysed by phylogenetic analysis of putative recombinant regions and further characterised by analysis with the GroupScan program within the SSE v1.0 software package.

Determining putative recombination breakpoints

In order to verify putative recombination events and determine recombination breakpoints where possible, sets of sequences underwent GroupScan analysis in the SSEv1.0 software package. GroupScan scores the extent of phylogenetic grouping of a query sequence with predefined sequence groups and can pinpoint potential breakpoints if sequences that are closely related to both parental groups are included. The GroupScan method relies on detection of bootstrap-supported changes in tree topology along an alignment and was used for the determination of recombination breakpoints. For each analysis, the three sequenced regions of each isolate were concatenated into a single sequence, and a fragment size of 300 bases with an increment of 30 bases was used. Additionally, each analysis was repeated with a fragment size of 600 bases and an increment of 30 to ensure consistency of results. In each instance, the query sequence was the potential recombinant sequence/group and was scanned against the nearest neighbour/group in both regions. The intersection of the two lines was calculated to give an estimation of the recombination breakpoint. Breakpoints were compared and found to be highly similar in scans carried out with a 300-nucleotide fragment and those done with a 600-nucleotide fragment.

Results

Genotype assignment for HRV-A, HRV-B and HRV-C sequences

A total of 164 HRV-A, 70 HRV-B and 145 HRV-C variants that were more than 90 % complete across the analysed regions were included in the recombination analysis. A separate recombination analysis was performed with complete genome sequences (201 HRV-A, 71 HRV-B and 59 HRV-C).

Examination of phylogenetic trees constructed for the VP1 region of members of all three HRV species revealed consistent grouping of individual (sero)types into clades with uniformly high bootstrap support (Figs. 1, 2). These type groupings were also apparent in the VP4/VP2 and full P1 capsid coding region. However, the relationships between HRV type groups were generally less well resolved in VP4/VP2 trees as a result of greater sequence conservation and shorter fragment length (data not shown).

Fig. 1
figure 1

Neighbour-joining phylogenetic trees showing the VP1 and partial 3Dpol regions of HRV-A strains. Where possible, HRV-A type groups have been collapsed for ease of reference. Contiguous clades are marked by coloured boxes. Clades 2, 3 and 4 are marked by purple, green and black boxes respectively, while clade 1 is divided into sub-clusters, which group together in 3Dpol. HRV-A types that undergo a bootstrap-supported change in nearest neighbour between VP1 and 3Dpol are marked with a red line. Changes in tree topology that are bootstrap supported on P1, P2 and P3 trees only are marked with a yellow line. Orange arrows indicate the formation of one 3Dpol clade from three VP1 clades

Fig. 2
figure 2

A: Neighbour-joining phylogenetic trees of HRV-B VP1 and partial 3Dpol sequences. Putative recombinant sequences and type groups are marked with black lines. HRV-B type groups are collapsed where possible for ease of reference. Contiguous clades are marked with black boxes. B: Neighbour-joining phylogenetic trees of HRV-C VP1 and partial 3Dpol sequences. HRV-C type groups have been collapsed where possible for ease of reference. Contiguous clades are marked with black boxes

Previously proposed pairwise nucleotide p-distance divergence thresholds were used to support phylogenetic genotype groupings and define putative new HRV types [59, 67] (Table 1).

Table 1 Pairwise nucleotide p-distances of newly proposed HRV types to the nearest neighbours

One newly described HRV-A sequence, JN815252, did not group with any currently assigned HRV-A type and shared nearest neighbours, HRV-A29 (FJ445125) and HRV-A44 (DQ473499), in every region studied. Using the previously determined pairwise nucleotide p-distance threshold of 12 % divergence in VP1 for identification of new HRV-A types [67], JN815252 fell within the intra-serotype distance range on comparison with both HRV-A29 and HRV-A44 (0.1051 and 0.1083, respectively). Divergence between HRV-A29 and HRV-A44 has previously been shown to fall within the intra-serotype distance range [67], and these two serotypes have also been demonstrated to be serologically cross-reactive [7]. These three sequences were treated as the same serotype for recombination analysis.

In species B, JF781500, JF781501 and JX074052 formed a bootstrap-supported clade in VP1 with a pairwise nucleotide p-distance of 0.015 to 0.018 from each other and of consistently greater than 0.19 from the nearest assigned HRV-B type (HRV-B35). JX074053 also displayed a p-distance of 0.2267 from HRV-B35, its nearest relative. By current criteria [67], these two groups represent new HRV-B types and have been formally designated HRV-B101 and HRV-B102 (Table 1) (http://www.picornastudygroup.com/types/enterovirus/hrv-b.htm).

HRV-C VP1 sequence data generated during this study confirmed type assignment of eleven HRV-C types that had been provisionally identified as new types based on VP4/VP2 sequences [59]. These had pairwise distances that fell above the VP1 divergence threshold (Table 1) and have been formally designated as HRV-C36 – HRV-C46 (http://www.picornastudygroup.com/types/enterovirus/hrv-c.htm). The previously proposed nucleotide divergence threshold of 13 % was readily identifiable in the distribution of pairwise VP1 p-distances generated from the expanded VP1 dataset (Fig. S1, Supplementary Data).

Analysis of mean substitution rates in selected HRV-A and HRV-C types

Three non-recombinant types each of HRV-A (HRV-A9, -A28 and –A78) and HRV-C (HRV-C9, -C12 and –C18) with dates of isolation spanning at least 13 years were selected for analysis of mean substitution rates. The HRV-A dataset assembled contained sequences spanning 40 years, while HRV-C sequences spanned 14 years. Phylogenetic trees analysed for each type included showed no bootstrap-supported change in topology within each type or species (data not shown). Insufficient sequences of individual HRV-B types were available for inclusion of members of this species in the analysis. Analysis was carried out separately for HRV-A sequences (n=18) and HRV-C sequences (n=23).

For both analysed regions of HRV-A and HRV-C, mean substitution rates and 95 % highest posterior density (HPD) intervals were on the order of 10−3–10−2 substitutions per site per year. Calculated mean substitution rates were largely similar between the two coding regions and two HRV species considered (Fig. 3). Although both regions of HRV-C showed 2- to 3-fold elevated substitution rates (6.9138 × 10−3 and 9.8491 × 10−3 for VP1 and 3Dpol, respectively) when compared to the estimates obtained for HRV-A strains (3.9954 × 10−3 and 3.4586 × 10−3 for VP1 and 3Dpol, respectively), the 95 % HPD intervals overlapped for the two species.

Fig. 3
figure 3

Mean substitution rates and 95 % highest posterior density intervals for two coding regions of selected HRV-A and HRV-C sequences

In order to ensure that the specified priors were not inadvertently influencing the posterior distribution of the calculated mean substitution rate, the BEAST algorithm was executed with sequence data excluded for each dataset. As additional confirmation, dates of isolation for individual sequences were randomised using the SSE v1.0 software package, and analysis with BEAST was repeated. Scrambled datasets gave extremely low substitution rate estimates well below the lower HPD interval of the test sequences (data not shown).

Comparison of pairwise nucleotide p-distances in VP1 and 3Dpol

The similarity of mean substitution rates between VP1 and 3Dpol regions in HRV-A and HRV-C is consistent with the observed linear relationships between the pairwise p-distances of sequences in these two regions (Fig. 4). This linear relationship was observed in HRV-B and HRV-C, in both intra- and inter-(sero) type distance ranges (Fig. 4). Both distributions contained few if any outlying data points, and lines of best fit had gradients of approximately 1 (0.9042 and 0.9418, respectively). This correlation provides evidence for maintained equal substitutions rates in the two regions throughout the period of diversification into different types.

Fig. 4
figure 4

Comparison of pairwise nucleotide p-distances between VP1 and partial 3Dpol regions of HRV-A, HRV-B and HRV-C. The line of best fit is indicated by a dotted red line. For HRV-A, the line of best fit is given separately for putative recombinant and non-recombinant distributions. A graph depicting HRV-A pairwise comparisons shows putative recombination events involving full HRV-A type groups [such as observed extensively within clade 1 (Fig. 4)] in red. Sporadic recombination events involving single HRV-A sequences are marked in yellow. For graphs depicting HRV-B and HRV-C, all examples of putative recombination events are marked in red

In contrast, HRV-A sequences showed lower divergence in the 3Dpol region compared to VP1 and a gradient of 0.8714 for the line of best fit (Fig. 4). The distribution additionally contained a large number of outlying data points, consistent with the occurrence of several recombination events during HRV-A diversification. HRV-A pairwise distances could be divided into three distinct distributions, consisting of non-recombinant types, potential recombination events involving full HRV-A type groups and sporadic instances of phylogenetic incongruity involving single HRV-A sequences (marked with blue, red and yellow, respectively, in Fig. 4; see phylogenetic analysis below). For the putative non-recombinant types that grouped consistently in VP1 and 3Dpol regions, a gradient of approximately 1 (0.9011) was observed. Meanwhile, consideration of pairwise p-distance comparisons between potentially recombinant HRV-A sequences alone gave a calculated gradient of 0.7216.

Analysis of phylogenetic incongruity in HRV-B and HRV-C sequences

In accordance with the observed conformity of HRV-B VP1 and 3Dpol pairwise distances (Fig. 4), the majority of HRV-B sequences displayed congruent clustering in phylogenetic trees and clustered within their genotype group, as defined by VP1 (Fig. 2A). The only bootstrap-supported putative recombinant sequence observed was HQ123444. This full genome sequence showed a pairwise p-distance of at least 0.19 from HRV-B35 strains in VP1. However, on inspection of all published HRV full genomes, HQ123444 was identical to HRV-B35 from position 6353 in the 3Dpol region. This incongruity was detected using RDP, GARD and SBP.

In addition, a closer phylogenetic grouping of the HRV-B17 and HRV-B70 type strains, as compared to contemporary HRV-B70 sequences, was observed in the 3Dpol region. Although this change of topology did not have adequate support in 3Dpol (Fig. 2A), it was supported in both the P2 and P3 regions (data not shown).

The majority of HRV-C sequences fell within six bootstrap-supported clades, which were congruent between all regions analysed (Fig. 2B). With the previously reported artefactually recombinant sequence GQ223227 excluded from analysis [17, 38], no evidence of inter-type recombination was observed in HRV-C. Any observed inconsistency in branching order or phylogenetic relationships between HRV-C types could not be confirmed by analysis with RDP, GARD or SBP.

Analysis of phylogenetic incongruity in HRV-A sequences

In contrast to HRV-B and HRV-C, the comparison of VP1 and partial 3Dpol pairwise nucleotide p-distances in HRV-A suggested a number of separate recombination events between VP1 and 3Dpol (Fig. 1) and between the P1, P2 and P3 regions (Fig. S2; Supplementary data). Putative recombination events identified by phylogenetic analysis were confirmed by analysis with RDP, GARD and SBP.

Several HRV-A types showed a pattern similar to that of HRV-B70, with contemporary strains grouping separately from their respective older type strains in the non-structural protein coding region. As reported previously [65], HRV-A76 strains collected between 1999 and 2010 grouped most closely with HRV-A56 in 3Dpol (Fig. 5 A1). The intra-type pairwise nucleotide p-distances of HRV-A76 strains ranged from 0.0839 to 0.0933 in VP1 and from 0.2085 to 0.2292 in 3Dpol. HRV-A68 and HRV-A33 also displayed a similar pattern, whereby the older strains grouped preferentially with HRV-A28 (Fig. 5 A2) and HRV-A76 (Fig. 5 A3) type strains, respectively, in 3Dpol. The HRV-A40 type group includes three recently described full genome sequences that were not included in VP1 and 3Dpol analysis due to gaps present in published sequences, which resulted in sequence completeness below the threshold of 90 %. However, on analysis of the whole P3 region, the three contemporary HRV-A40 strains (JX074051, JQ245067 and JN798579) formed a bootstrap-supported clade separate from the HRV-A40 type strain, which maintained its grouping with HRV-A85.

Fig. 5
figure 5

Neighbour-joining phylogenetic trees representing distinct patterns of recombination observed within HRV-A sequences. The branch to the tree root has been collapsed, and the outgroup used for analysis is given below. A: Phylogenetic trees showing examples of incongruent topology between VP1 and 3Dpol involving [1] HRV-A56 and HRV-A76 (outgroup: A7), [2] HRV-A28 and HRV-A68 (outgroup: A95) and [3] HRV-A33 and older HRV-A76 strains (outgroup: A88) B: Phylogenetic tree showing branch-length discrepancies between VP1 and 3Dpol of HRV-A80 (outgroup: A46)

HRV-A80 sequences showed a substantial difference in branch lengths between two regions, with no accompanying change in tree topology (Fig. 5B). The HRV-A80 type strain had a pairwise nucleotide p-distance of 0.078 from contemporary HRV-A80 strains in VP1. However, in 3Dpol, the contemporary HRV-A80 strains were much more divergent from the type strain, with a p-distance of 0.207. All of the above putative recombination events were verified by analysis with GARD, SBP and RDP.

With the exception of HRV-A12 and HRV-A78, all HRV-A sequences grouped into four bootstrap-supported clades (numbered 1-4 in Fig. 1) branching basally in both VP1 and 3Dpol trees. Clades 1–4 were maintained in all regions studied, and although bootstrap-supported changes in tree topology were relatively common within each of these clades, there were no instances of recombination occurring between them.

Bootstrap-supported changes in branching order and tree topology were observed within clades 1, 2 and 3 (marked with red lines in Fig. 1). The number of bootstrap-supported nodes that each putative recombinant sequence violated in the transition between VP1 and 3Dpol grouping ranged from two to eight (Table 2). Clade 1 formed several sub-clusters of HRV-A types, and there was a clear breakdown in phylogenetic relationships of HRV-A types between VP1 and 3Dpol. For example, one monophyletic bootstrap-supported clade in 3Dpol (orange box; Fig. 1) was formed from three distinct bootstrap-supported VP1 clades. Phylogenetic relationships between involved HRV-A types within this 3Dpol clade were largely incongruent when compared with the VP1 tree topology. However, there was poor bootstrap support for nearest-neighbour relationships between HRV-A types in this region, regardless of whether 3Dpol or full P3 region trees were inspected. Notably, the large cluster of putative recombinant HRV-A pairwise nucleotide p-distance comparisons between 0.25 and 0.32 VP1 p-distance (coloured red in Fig. 4) corresponded to the formation of this new 3Dpol clade. Putative recombination events involving HRV-A types from all three clades were further analysed (Table 2).

Table 2 Phylogenetic incongruities observed between VP1 and 3Dpol in HRV-A sequences

Sequence groupings in phylogenetic trees constructed for whole P1, P2 and P3 coding regions were largely consistent with those observed in VP1 and 3Dpol; P1 was similar to the VP1 tree and P2/P2 was similar to 3Dpol (Fig. S2; Supplementary Data). Sequence analysis of these longer regions did, however, identify bootstrap-supported phylogenetic incongruity of a further six HRV-A types between regions (yellow lines in Fig. 1; listed in Table S1; Supplementary Data).

All phylogenetic incongruities listed were additionally confirmed by RDP, GARD and SBP analysis (data not shown).

Analysis of putative recombination breakpoints

In order to determine putative breakpoints in HRV-A sequences showing evidence for recombination between VP1 and 3Dpol, full genomes from each HRV-A type underwent GroupScan analysis and were compared directly to their nearest neighbours in P1, P2 and P3. Two representative examples of Grouping Scan output are given, with putative recombination breakpoints highlighted (Fig. 6A). HRV-A18 has a breakpoint near the P1/P2 boundary, and HRV-A63 has a breakpoint within P3. The genome fragment between positions 2000 and 2500 in HRV-A63, which shows a similarly elevated grouping score with HRV-A59, did not correspond to any bootstrap-supported change in tree topology in this region (data not shown).

Fig. 6
figure 6

A: GroupScan of two representative recombinant HRV-A types. The proposed recombination breakpoint is indicated by a dotted line. B: Diagram showing calculated coding region recombination breakpoints for all HRV-A recombinant types, mapped onto a diagram of the full genome of HRV-A sequences (numbering taken from FJ445111–HRV-A1). Those from the large recombinant 3Dpol clade, which lacked bootstrap support for nearest-neighbour groupings in the analysed regions, are excluded

The majority of breakpoints occur near the P1/P2 boundary, specifically within the 2A coding region (Fig. 6B). There was also a second putative hotspot region around the P2/P3 boundary, and most of these occurred within the 3A coding region or at the 3A/3B junction.

Recombination in the 5’ untranslated region of HRV-A, -B and –C

All sequences that were more than 90 % complete across the 5’UTR fragment numbered 167–626 (numbered by FJ445111) were included in the analysis (167 HRV-A, 38 HRV-B and 59 HRV-C). The occurrence of recombination in the 5’UTR was assessed by visual inspection of phylogenetic trees for HRV-A (Fig. 7A), HRV-B (Fig. 7B) and HRV-A/HRV-C (Fig. S3: Supplementary Data).

Fig. 7
figure 7

A: Neighbour-joining phylogenetic trees constructed using HRV-A 5’UTR sequences that were >90 % complete from 167-626 (numbered according to FJ445111), compared to the corresponding VP1 region. Putative recombinant sequences are marked by a black line. B: Neighbour-joining phylogenetic trees constructed using HRV-B 5’UTR sequences that were >90 % complete from 167-626 (numbered according to FJ445111), compared to the corresponding VP1 region. Putative recombinant sequences are marked by a black line

All 5’UTR sequences from HRV-B clustered together and were distinct from those of HRV-A and HRV-C (data not shown). In contrast, HRV-A and over two-thirds (69 %) of HRV-C sequences were interspersed in the 5’UTR, consistent with inter-species recombination, as proposed previously [17, 38, 68].

Inspection of HRV-A sequences exclusively revealed 17 putative recombinant sequences, from nine HRV-A types (black boxes; Fig. 7A), demonstrated by bootstrap-supported changes in the nearest neighbour in phylogenetic trees. Two putative recombinant sequences (JQ837724 : HRV-A1 and JN621245 : HRV-A67) represent contemporary isolates, which form a bootstrap-supported grouping distant from other members of the same type. Within HRV-B 5’UTR sequences (Fig. 7B), only three putative recombinant types were noted, and two consisted of a single sequence, which was the sole representative of the type in question.

Discussion

Identification of novel HRV strains

Novel strains of HRV representing new genotypically assigned types have been described frequently during genetic analyses of HRV variants identified during routine clinical screening [6, 32, 52, 67]. This suggests that there is still potential for as yet undiscovered genetic variation within all three species of HRV. Consistent with these previous studies, our analysis identified 13 putative new types of HRV-B and HRV-C. These putative new types have been submitted for consideration to the Picornavirus Study Group and have been assigned new type numbers.

Estimation of mean substitution rates and analysis of pairwise p-distances

This study represents the first comparison of mean substitution rates between distant genome regions of HRV-A and HRV-C. The estimates of mean substitution rates obtained in both regions correlate with other published estimates for single-strand RNA viruses [9, 18]. Most previously documented substitution rate analysis for picornaviruses has focused on the capsid coding region, particularly VP1 [5, 16]. In one comparison of several picornaviruses, it was found that members of the genus Enterovirus (EV) had significantly higher mean substitution rates than other members of the family Picornaviridae (including members of the genera Aphthovirus, Teschovirus, Hepatovirus and Cardiovirus), with very limited overlap of the 95 % HPD intervals between the two groups [16]. Our estimates for both HRV-A and HRV-C showed overlapping 95 % HPD intervals with EV and non-EV examples cited in this paper. Two studies of the evolution of serotypes of EV have included estimates of substitution rates in the 3Dpol region [16, 41], and these are of the same order of magnitude as our estimates for HRV-A and HRV-C.

Similar to our observations for HRV-A and HRV-C, a general congruency of substitution rates between VP1 and 3Dpol has previously been observed in the evolution of recombinant groups of EV71 [40] and recombinant groups of three HEV-B serotypes: E9, E11 and E30 [41].

As has previously been observed in analysis of EV71 isolates [40], similar mean rates of nucleotide substitution between two distant genome regions should lead to a distribution of pairwise p-distances in these two regions with a gradient of 1 and a y-intercept of 0. This suggests that any discontinuities or outlying comparisons in distributions of pairwise p-distances in these two regions relate to potential recombination events (coloured red and yellow in Fig. 4). Outlying data points were typically the result of comparison with variants that displayed bootstrap-supported changes in tree topology in the two regions. The most striking example of this is a large cluster of recombinant sequences between 0.23 and 0.34 pairwise nucleotide p-distance in VP1 for HRV-A (coloured red in Fig. 4), which correspond to the HRV-A type groups in clade 1 (Fig. 1) that undergo a bootstrap-supported change in tree topology between VP1 and 3Dpol.

Analysis of recombination in HRV-A, HRV-B and HRV-C

A small number of recombination events have been hypothesized to have played a role in the formation of a number of HRV-A and HRV-B types [49]. More speculatively, it has also been suggested that the entire HRV-B species was formed by recombination of HEV with HRV-A [64].

In common with one previous analysis [49], we found HRV-A to have the highest number of recombinant sequences among circulating strains, with only very infrequently detected recombination events within the coding regions of HRV-B and HRV-C.

HRV-A sequences are divided into four bootstrap-supported clades, which were consistently observed in the regions analysed in this study (VP4/VP2, VP1 and 3Dpol). Further analysis indicated that these distinct clades were, in fact, present in every gene region, except for the relatively short and highly conserved VP4 and 3B regions. Recombination appears to have occurred relatively frequently within clades 1-3 (Fig. 1). However, there was no evidence of recombination occurring between the clades within any coding region, and therefore, we would suggest that these HRV-A clades are potentially now diverging on distinct and evolutionarily separate paths.

Within these four clades, most evidence of recombination involves full HRV type groups, as opposed to individual sequences. This would suggest that these events are likely to have taken place during the process of diversification into genetically distinct types.

One previous study, considering only one example from most HRV types, detected 23 recombinant genomes resulting from 12 different recombination events [49]. Our analysis identified events that are analogous to most of these. However, as we used bootstrap-supported phylogenetic trees to define nearest neighbours in different coding regions, our estimates of recombination parents occasionally differed (Table 2). As the detailed phylogeny of nearest-neighbour relationships are not well resolved, particularly in the non-structural protein-coding regions, we felt it was useful to consider the nearest bootstrap supported clade members as representing nearest-neighbour groups. Additionally, in some cases, a recombinant sequence behaves as an outgroup to a new clade, and therefore, we have treated the entire new clade as the single nearest neighbour. This approach was facilitated by the use of GroupScan to confirm recombinant sequences and to determine putative recombination breakpoints. In preference to bootscanning, this method determines the extent of grouping of a query sequence within entire pre-defined groups, rather than losing potentially phylogenetically informative sequence data by reducing control groups to single consensus sequences. This eliminates the issue of bias of results when a query sequence is genetically divergent from both control groups [60].

Our results also affirm the previously reported finding that contemporary HRV-A76 sequences have undergone a recombination event with HRV-A56 [65]. In addition to the originally reported single recombinant sequence, we have noted that all contemporary isolates group together and therefore that the recombinant group of contemporary HRV-A76 sequences is likely to be the present dominant lineage. This pattern was observed in several other HRV types, indicating that while contemporary recombination occurring within the time period of our dataset (around 50 years) is not frequent, it can occur.

In contrast to the extensive phylogenetic incongruity observed in HRV-A, HRV-B and HRV-C show relatively infrequent or entirely absent recombination within the coding region. One putative recombinant sequence, HQ123444, belonging to the HRV-B species has been reported previously [32]. However, the previous analysis used different methods for determining recombinant sequences and did not detect the 100 % identity of this sequence with HRV-B35 sequences in the 3’ end of 3Dpol, instead noting an association with HRV-B35 throughout. In fact, this sequence is identical to published strains from position 6353 onwards. One might speculate on possible explanations for this observation, as no contemporary HRV-B35 full genomes are available for comparison. It is possible that this putative recombinant may also represent a sequencing or assembly error.

In accordance with our previous study, there is a striking lack of recombination within HRV-C coding regions [38]. However, in both cases, HRV-C analysis is limited by the lack of strains older than 15 years. In HRV-A and HRV-B, all instances of contemporary sequences grouping separately from older strains involve a sequenced example that was originally collected over 40 years ago. It is therefore possible that HRV-C sequences undergo recombination at similar frequencies, but the current available time span of sequences renders their detection problematic.

Recombination within the 5’UTR of HRVs

Within HRV-A sequences, the 5’UTR was the only region analysed in which clades 1 to 4 were not maintained. We noted considerably more recombination events within the 5’UTR of HRV-A than were seen in previous studies [49]. This included two instances of contemporary sequences grouping distantly from other members of their type group. The discrepancy in results could be due to different alignment methods and the fact that we relied upon comparisons of robust groupings within phylogenetic trees to determine recombinant sequences. In addition, several putative recombinant sequences that were identified previously displayed inadequate bootstrap support on phylogenetic trees (Fig. 7A).

Recombination involving the 5’UTR of HRV-B was relatively rare, and several previously reported putative recombinants showed no bootstrap support [49]. However, the 5’UTR region of HRV-B has not been extensively sequenced to date, and only nine HRV-B types were represented by more than one sequence.

Similarly to EV [55], HRV showed evidence of inter-species recombination within the 5’UTR region. The majority of analysed HRV-C 5’UTR sequences had a bootstrap-supported grouping within the HRV-A clade (Fig. S3: Supplementary Data). HRV-C sequences were not interspersed evenly throughout the HRV-A clade, but all fell within three distinct clades. These findings are in accordance with other reports of inter-species recombination in this region [17, 38].

Analysis of recombination breakpoints within the coding region

Recombination breakpoints were determined for all putative recombinant HRV-A sequences with well-supported nearest-neighbour groups (Table 2). Similarly to EV [30, 34] and HPeV [2], the majority of breakpoints are near the P1/P2 junction. Coupled with the observation of high within-species divergence within the capsid coding region, this suggests that there may be a biological compatibility barrier that decreases the reproductive fitness of progeny that are recombinant within the capsid region. Therefore, although inter-typic recombination may occur throughout the full length of the genome in the course of a natural co-infection, recombination within the capsid region may not be represented in clinical sampling.

This study demonstrates that, despite a close phylogenetic relationship to EV, recombination within all three species of HRV was substantially less frequent than that observed within EV species. In addition, there was only very limited evidence of recombination occurring within the 40-50 years that the dataset spanned. Following upon previous reports that have provided some glimpses of the occurrence of recombination in HRV-A [49, 65], our study represents a comprehensive analysis of recombination in all published HRV full genome sequences to date, an additional analysis of HRV sequence fragments obtained from isolates spanning a 14-year time period and, to our knowledge, the first attempt to catalogue all currently detectable recombination events and potential breakpoints within HRV-A.