Introduction

Potato virus Y (PVY) is a serious problem in production of solanaceous crops worldwide, affecting potato, pepper, tomato, tobacco, and other agriculturally important plants in the family Solanaceae [23, 25, 29, 33, 50]. Potato virus Y is the type member of the genus Potyvirus, family Potyviridae, and PVY has a positive-sense, single-stranded, linear RNA genome coding for a large polyprotein, which is later processed into mature virus proteins by virus-specific proteases, and an out-of-frame protein, P3N-PIPO, which is essential for infectivity [2, 68]. PVY exists as a complex of strains that may be defined based on hypersensitive resistance (HR) reactions towards three known N genes in potato [11, 28, 59], or based on genome sequences and recombination patterns [11, 24, 29, 36]. Currently, fourteen strains of PVY have been defined [24, 29], including five non-recombinant (PVYO, PVYEu-N, PVYNA-N, PVYC, and PVYO-O5) and nine recombinant strains [8, 9, 24]. Fourteen additional recombinants and genome variants have been reported [7, 10, 13, 14, 16, 17, 24, 26, 29, 38, 42, 47, 53, 56, 57, 67]. PVY recombinants are a major concern for potato production, because they are often associated with two different tuber syndromes, potato tuber necrotic ringspot disease (PTNRD) in susceptible potato cultivars [4, 22, 35] and canoe-shaped cracks [5].

Potyviruses are well-known for their propensity to evolve through accumulation of mutations and more rapidly through recombination, where different strains of the virus exchange large (several hundred nt-long) segments of their genome, perhaps to gain selective advantages in a particular host or in different environments [19, 20]. The number of recombinant patterns characteristic of various PVY isolates around the globe was found to be relatively limited, with parents represented mostly by PVYO and PVYEu-N (also called PVYN) sequences [27]. Additional types of PVY recombinants have been found with other genome sequences; for example, the PVY-NE11 strain contains sequences of strains PVYEu-N, PVYNA-N, and PVY-NE11 [38], and strain PVYE contains sequences of strains PVYEu-N, PVYO, and PVY-NE11 [16].

The biological significance of these recombination events between PVY strains is not yet fully understood, although some recombinants, like PVYNTNa and PVYE, were characterized on a set of potato indicators carrying different resistance genes [16, 34, 51]. It was hypothesized that PVY recombination may be driven by viral evolution to overcome various resistance genes conferring strain-specific HR in potato [15, 29, 32, 34]. Substantial natural genetic diversity was revealed among PVY isolates belonging to the strains currently defined as non-recombinant, such as PVYO [24, 30, 31, 47], PVYEu-N [24], or PVYNA-N [47] such that the origin and evolution of PVY recombinants carrying segments of these parental sequences can be addressed.

Recently, 285 whole genomes of PVY from different strains were subjected to an analysis of genome diversity to determine the origins of all common field recombinants of PVY [24]. The study compared phylogenies of whole genome sequences clustered using the unweighted pair group method with arithmetic mean (UPGMA), which is a simple hierarchical clustering algorithm, with separate phylogenies for the five common O- and N-specific segments from recombinant PVY genomes clustered using the more robust maximum likelihood method [24]. Besides establishing phylogenetic relationships between common genome segments and revealing possible parental lineages from non-recombinant PVY genomes, this analysis also identified 13 unusual or “unclassified” PVY sequences with potentially novel recombination structures. Individual trees inferred for each of the five PVY genome sections indicated that these isolates were placed in unusual (or “inconsistent”) lineages for one or more of the sections relative to their placement in the whole genome tree. This finding prompted our interest in new recombinant structures of PVY which may be poorly represented in the public databases, and thus escape identification with routine tools used for phylogenetic and recombination analysis.

To search for rare or poorly sampled recombinant structures, we analyzed an expanded set of 396 complete or nearly complete PVY genomes (> 9350-nt in length). Ten new recombinant structures were identified that have not been described previously, most of which involved PVYNA-N and PVY-NE11 as parental sequences. Existence of these novel recombinants and strain variants may shed light on patterns of interactions between virus cistrons and plant resistance genes, and perhaps on specifics of virus translocation to tubers and/or vector transmission. It also creates additional challenges for PVY strain detection and differentiation, and demonstrates a need for additional bioinformatics tools to correctly type PVY isolates to strain.

Materials and methods

Sequence dataset and phylogenetic analysis

A total of 396 whole-genome PVY sequences were used for analyses (Supplemental Table 1). The dataset included 386 sequences from the GenBank database [6], comprising 263 genomes submitted by others plus 119 genomes sequenced and submitted by our lab and discussed previously [13, 24]. All complete or nearly complete PVY genomes were selected for the analyses; these represented virus isolates collected in 34 countries across all continents except Antarctica (Supplemental Table 1). A necessary caveat here is that there is sampling bias based on the available data, e.g., certain countries or regions are more or less likely to sample and deposit PVY sequences into GenBank. Even within a given region, there will still be sampling bias for symptomatic plants in important growing areas or in countries with well-funded and appropriately equipped laboratories.

Ten additional PVY genomes were newly sequenced in our lab and were also included in the analyses (Supplemental Table 1). These PVY isolates represented virus isolates circulating in potato in the U.S., and came from winter grow-out seed lot trials conducted in Hawaii in 2012 and 2013 (2 isolates from Montana and 2 from Idaho) and in Western Washington in 2014 (3 isolates from Washington), and also from the Othello, WA, seed lot trials conducted in 2015 and 2016 (1 isolate from Montana and 2 from Idaho). Their GenBank accession numbers (MF624282 to MF624291) and a detailed description are summarized in Supplemental Table 1. These ten PVY isolates were initially typed to strain using serology and RT-PCR as described previously [10, 37, 46] and maintained in tobacco (cv. Burley) in an insect-free growth room. Total nucleic acids were extracted from infected tobacco plants, and the nearly complete genome was amplified using RT-PCR and a set of 48 virus-specific primers, as described in Green et al. [24]. Sanger sequencing of these amplified PCR fragments was performed by Genewiz, Inc. (South Plainfield, NJ) and assembled with the SeqMan program of the Lasergene 9 Suite (DNASTAR), as also detailed in Green et al. [24].

Sequence alignment was conducted using MUSCLE as implemented in MEGA 6, with some manual adjustment [12, 63]. A whole-genome UPGMA tree was generated in RDP4.61 in order to roughly determine isolate relationships [39]. Model selection and maximum-likelihood phylogenetic analyses of recombinant sections of the genome [41, 52, 61, 62] were conducted as previously described [24].

Recombination analysis

Recombination analysis was performed on all isolates using RDP4.61 in order to infer structures for each isolate [39]. Isolates were checked, one group (consisting of roughly a couple dozen isolates) at a time in order to simplify the RDP4.61 output. At least one representative potential parent of each possible parental type was included each time, meaning at least one isolate of PVYO, PVYO-O5, PVYEu-N, PVYNA-N, PVYC, and PVY-NE11. Six recombination algorithms (RDP, GENECONV, Chimaera, MaxChi, Bootscan, and SiScan) were used to identify potential recombinants and parents [21, 40, 48, 49, 55, 60]. Recombination events were inferred when at least four of the six algorithms detected the event (p < 0.001). Novel recombinant boundaries were inferred if they were more than 80-nt from established boundaries because this length was considered sufficiently long to place the isolate into a separate phylogenetic clade. Strains were considered rare/unclassified if only one or two isolates for that strain had ever been found.

Results

Ten new, nearly whole genome PVY sequences were determined in this work, representing strains PVYN-Wi (3 isolates), PVYN:O (3 isolates), and PVY-NE11 (4 isolates) (MF624282 to MF624282; Supplemental Table 1). When these ten new genomes were compared to the sequences available in GenBank and analyzed phylogenetically and using recombination detection programs, several inconsistencies were found in the assignments of PVY genomes deposited in GenBank. Some of these inconsistencies have been noted previously [24].

The inconsistencies involved ten whole genomes for PVY isolates deposited in the GenBank database plus four isolates (MF624288 to MF624291) that were sequenced as part of this study. The unexpected placements for sequences of PVY isolates N-Nysa, E30, FZ10, and 1104 are presented in Fig. 1, as illustrations. Specifically, both N-Nysa and E30 were placed among PVYNA-N sequences when nt 5,850-9,200 (section 4 in [24]) were analyzed, while both FZ10 and 1104 were placed among PVY-NE11 sequences for this same section (Fig. 1 A and B), despite the fact that all these isolates had initially been typed as other strains by whole-genome UPGMA (e.g. N-Nysa and E30 were between the PVYE and PVYNTNb clades, 1104 was closest to the PVYNTNb clade, and FZ10 was closest to the PVY-SYR-III clade) (Fig. 1 C). Other isolates in the collection of 285 had similar disagreements when their UPGMA whole-genome placement was compared to their ML genome-section placements. These mismatched assignments suggested potential novel recombinant structures for these sequences.

Fig. 1
figure 1

Maximum-likelihood phylogenetic analysis of the PVY genome (section 4 in [24]) for select PVY isolates from the PVYNA-N lineage (A), and from the PVY-NE11/PVYE lineage (B), and whole-genome UPGMA analysis including the same select isolates (C). Yellow highlights the PVY sequences analyzed in this work determined to have unclassified structures

To identify novel PVY recombinants, whole genomes of PVY isolates were subjected to a thorough recombination analysis. Most complete PVY genomes in GenBank were added to the existing collection of sequences from the previous study [24]. Ten PVY genomes were newly sequenced in our lab as an ongoing aspect of this work, resulting in a total combined list of 396 isolates. The isolates were subjected to a detailed examination using programs from the Recombination Detection Program (RDP) version 4.61 package [39] (Supplemental Table 1 lists all 396 whole PVY genomes). Consequently, 28 isolates were determined to either have a noticeably shifted recombinant junction (shifted by > 80 nt) relative to known PVY recombinants, or have at least one novel, long (> 80 nt) recombinant section in their genome (Fig. 2). Detailed lists of inferred breakpoints are given in Table 1 (“common” recombinants) and Table 2 (rare/“unclassified” recombinants).

Fig. 2
figure 2

A schematic diagram of potato virus Y (PVY) recombinant structures. The ruler on top represents the PVY genome (ca. 9.7-kb), while individual cistrons are presented as rectangles with corresponding protein names. Potential parental sequences are grouped on top, with different parents colored differently, vertical lines delineate five recombinant genome segments analyzed in [24]. A group of the most common, defined recombinants is designated “Common Recombinants” and fragments originating from different parents are shaded accordingly. All rare/“unclassified” PVY recombinants are presented at the lower portion of the figure, with segments from different parents shaded accordingly. Isolate names are highlighted in orange when sequences were found to be recombinant in this work but not previously reported correctly. Un-highlighted isolates or strains were previously described

Table 1 Recombinant junction positions in genomes of all “common” (> 2 isolates sequenced) recombinant types
Table 2 Recombinant junction positions in genomes of all rare / “unclassified” (only 1 or 2 sequenced isolates) recombinant types

Overall, 28 PVY sequences were determined to belong to “rare” or “unclassified” strains of PVY, meaning structures which thus far have been sampled rarely; only one or two isolates have been sequenced (Table 2). Of these 28 isolates, eleven were identified as having ten novel, previously unreported PVY recombinant structures, with the remaining seventeen isolates having fifteen unique structures that had been reported for only one or two isolates. These structures are presented in Fig. 2, with the breakpoints listed in Table 2. Note that two of the isolates, DF and SD1, have the same novel (poorly sampled) structure, which is why although there are ten novel recombinant structures there are eleven novel recombinant isolates. Additionally, it was revealed that the strains PVYN:O and PVY-NE11, which had previously been considered single strains, are each actually composed of two types of genome structures: a “short” and a “long” version. In the case of PVYN:O, the “long” version has the typical junction 2 breakpoint at nt 2,390, while the “short” version has this junction shifted towards the 5’ terminal end, at nt 2,307 (Table 1, Fig. 2). The majority of PVYN:O isolates are the “long” type (Table 3). For PVY-NE11, the breakpoint between the PVYEu-N segment and PVYNA-N segment is located at nt 2,009 for the “long” version and at nt 2,220 for the “short” version (Table 1, Fig. 2). The majority of PVY-NE11 isolates are the “short” version (Table 3). Supplemental Table 1 specifies to which version of each strain the isolates belong.

Table 3 Breakdown of the numbers of each strain used in this study

Discussion

Evolution of positive strand RNA viruses is driven by accumulation of mutations, recombination, and reassortment [19, 44, 45, 54, 58]. PVY presents an interesting model to study evolution of a virus which is under diverse selection pressures in different hosts and environments. Recombination and mutations create a large pool of genomes, allowing the virus to survive and even prosper under various conditions [29, 50, 59]. An impressive genetic diversity of PVY strains and genetic variants provides an opportunity to address the biological role of different cistrons and genetic determinants in the PVY genome responsible for adapting to new hosts, overcoming resistance genes, and leading to better fitness of a particular strain in a particular host and environment [29, 50]. However, the same genetic diversity of PVY leads to difficulties in interpreting biological data and to uncertainties in conclusions stemming from correctly classifying PVY strains and genetic variants.

The diversity of the PVY recombinants appears to be greater than previously suspected [27], and may be even greater still, taking into account the as yet unsampled variants that may exist in the field. The ten novel recombinant structures found in PVY isolates (see Fig. 2), along with the sub-division of the PVYN:O and PVY-NE11 strains, bring the total number of recombinant patterns known for PVY up to 36. These 36 recombinants are composed of “parental” regions from four non-recombinant strains PVYO, PVYEu-N, PVYNA-N, and PVYC, and sequences from the recombinant, PVY-NE11 (Fig. 2). Eight of the recombination patterns contain sequences from three different parents, PVYO/PVYEu-N/PVYNA-N or PVYO/PVYEu-N/PVY-NE11, and therefore represent products of multiple recombination events resulting in complex structures (Fig. 2). Understanding the driving forces behind this extensive PVY evolution through recombination will require biological tests, perhaps utilizing reverse genetics tools.

Here we show that, when typing a whole genome sequence to strain for a particular PVY isolate, it is essential to perform recombination analysis in order to be certain of the true structure of individual isolates. Novel recombination patterns can be missed and wrongly identified as being one of the more common structures if appropriate tools and approaches are not used. In this study, 28 PVY isolates were found to represent 25 “rare” (poorly sampled) recombinant structures or patterns (Fig. 2), ten of which were correctly identified only when subjected to combined phylogenetic and recombination analyses. These same isolates would have been mistyped if only subjected to strain typing using cruder, more typical methods like multiplex RT-PCR (e.g. see Elwan et al. [13] for isolate Egypt24 and Green et al. [24] for isolates AL100001 and NY110001), whole genome UPGMA analysis (as demonstrated in Fig. 1), or through sequencing only the capsid protein of PVY isolates. The new recombination patterns represented by isolates FrKV15 and GBVC_23 [3] may pose a unique challenge for PVY strain detection and differentiation, since both RT-PCR typing methodologies currently in use [10, 37] would identify these isolates as PVYO based on the presence of one O-specific PCR band.

Correct identification of recombinant structures of PVY isolates, and hence correct typing of an isolate to strain, is not merely a scientific exercise but may have long-lasting biological and economic consequences. For example, isolate N_Nysa, studied in this work, was used by a Polish breeding program for selecting potato cultivars resistant to different PVY strains, under the assumption that it represented a typical isolate of the PVYNTNa strain [69]. As was shown here, N_Nysa is actually an unusual recombinant (Fig. 2), with a different genome structure and, hence, likely has different biological characteristics in potato cultivars carrying resistance genes. Consequently, breeding data and conclusions obtained using this PVY isolate cannot be applied to the PVYNTNa strain resistance directly, and will likely need reevaluation.

Nevertheless, besides creating difficulties for correct typing, new recombinants may provide a useful window through which one can examine the selection pressures shaping PVY evolution. For example, recombinant breakpoints in the central section of the genome, between positions 2,600 to 5,850 are rarely seen; only three structures (isolates AL100001, Nicola, and NY110001) have recombinant breakpoints in this area (Fig. 2). Additionally, recombinants generally have O-type parents for this section (Fig. 2). It can thus be concluded that there is some selective advantage to PVY having an unbroken O-section in the middle of the genome. Recently, the P3 and CI cistrons of the PVY genome were demonstrated to control the switch between pepper and potato adaptation [66], and hence perhaps this O-type area of the virus genome is linked with the adaptation of most recombinants to potato. No recombinant breakpoints have been found in or near the region encoding P3N-PIPO ORF either, supporting the essential role that this protein plays in infection. Similarly, most recombinants have N-type parents for the 5’ third of the genome (approximately positions 700 to 2,390), and thus an unbroken N-section is likely beneficial there (Fig. 2). This preference for the N-type sequence between positions 500 to 2,400 in the virus genome may be related to the PVY genetic determinants involved in overcoming Ny and Nc resistance genes in potato, which were mapped in the HC-Pro cistron of the virus [43, 64, 65]. Conversely, the fragment of the genome between nt 5,850 and approximately nt 9,300 is easily the area with the most structural diversity between recombinants, and contains the majority of the novel recombinant segments (Fig. 2), suggesting that this is an area of the genome either with no particularly dominant ideal structure or parent, or that responds rapidly to new selection pressures. Another important observation is that the non-recombinant PVYO-O5 strain has never been found in any recombinant structure, despite the propensity for PVY recombination involving its closest strain relative, PVYO. It is not yet understood why PVYO-O5 has not yet been found to recombine in nature, but the answer may hold important implications regarding the driving forces of PVY evolution through recombination.