Introduction

Genetic diversity and evolution play an important role in viral pathogenesis [1]. The rapid evolution of RNA viruses can be attributed to high rates of mutation and large population sizes as well as recombination between viral genomes [2,3,4]. While there have been numerous reports of recombination in positive-sense and segmented RNA viruses, natural recombination between negative-sense non-segmented RNA viruses has been reported infrequently and is still controversial [5,6,7,8,9,10,11,12,13,14,15,16,17].

The evolution of Paramyxoviridae family members has been a matter of recent discussion. One member of this family, Newcastle disease virus (NDV), or avian avulavirus 1, is a fatal and destructive agent of Newcastle disease (ND), which can cause lethal infection in approximately 250 bird species and threatens the poultry industry, causing major economic losses worldwide [17,18,19]. NDV, also known as avian paramyxovirus type 1 (APMV-1), is a member of the genus Avulavirus. It has a 15-kb negative-sense non-segmented RNA genome containing six genes encoding the nucleoprotein (NP), phosphoprotein (P), matrix protein (M), fusion protein (F), hemagglutinin-neuraminidase (HN), and RNA-dependent RNA polymerase (L) [20, 21]. Infection with different APMV-1 isolates results in diverse clinical signs, varying from mild respiratory or enteric symptoms (avirulent viruses) to fatal acute respiratory and neurological signs (virulent isolates) [22,23,24]. Furthermore, several studies have demonstrated that genomic mutations can lead to increased pathogenicity in low-virulence isolates [25]. There is also evidence that natural recombination a key factor in NDV evolution [26]. Recently, sporadic homologous recombination events have been reported, but in those studies, only partial F and HN sequences were examined [27] or only a few genomic sequences were analyzed [17, 28, 29]. Therefore, sufficient data on genomic changes due to recombination and their affect on virulence are still lacking.

Adaptive evolution occurs when there is a selection pressure on certain amino acid residues. In the case of positive selection, there is an increase in the number of genetic variants, which reflects evolutionary pressure, whereas negative selection results in genetic conservation [30,31,32].

The purpose of the present study was to identify probable homologous recombination events at different genomic positions in all of the available NDV genome sequences in the GenBank database as of 02/02/2018, in order trace genetic changes and evolution of NDV isolates in different geographical regions. Positive and negative selection was observed in each gene of NDV infecting different bird species.

Materials and methods

Data retrieval and sequence analysis

A data collection of full-length genome sequences from a large number of isolates was performed. In order to obtain an accurate recombination analysis, all 462 published genome sequences (>15 kb length) of NDV isolates available as of 02.02.2018 in the GenBank database were retrieved. The collected viral sequences represented avirulent and virulent isolates as well as isolates from different avian hosts, geographic regions, and years of detection. Furthermore, class I and II classifications including genotype 1 and genotypes I-XIX, respectively, were also determined based on the best-known classification methods [33,34,35,36] for all of the APMV-1 isolates.

Detection of recombination events

Whole genome sequences (15 kb length) of 462 NDV isolates were used to evaluate their divergence. The EditSeq tool from the DNASTAR package was used to separate each coding gene fragment from the remaining genome sequence of each isolate. Multiple alignments of all six genes were performed separately using the ClustalX v2.0 program [37]. Subsequently, we performed a search for possible recombination events detection using RDP4 software, using a P-value cutoff of 0.001 and a window size of 200 nt, and only positive events that were confirmed by six or more methods i.e., RDP [38], Gene Conversion (Geneconv) [39], Maximum chi-square test (MaxChi) [40], maximum mismatch chi-square (Chimaera) [41], Bootscan [42], sister-scanning (SiScan) [43], and 3Seq [44] were used. The report statistics indicated a low probability of misidentification, and the major and minor parental isolates were also characterized. Table 1 lists putative recombinant isolates and their recombination positions identified using RDP4 software as well as their divergence from their parental isolates. Using the information provided with the sequences, we also identified the geographic linkage between recombinant viruses and parental isolates, which is listed in Table 2. A schematic map showing recombination breakpoints in the recombinant isolates is shown in Figure 1.

Table 1 Putative recombinant isolates and positions of recombination events identified using RDP4 software
Table 2 Probable geographic linkage of the recombinant avian avulavirus 1 isolates and their parental viruses
Fig. 1
figure 1

Schematic map of recombination breakpoints in avian avulavirus 1 isolates

Analysis of selection pressure and non-synonymous-to-synonymous codon ratios

To detect selection pressure, we first analyzed all six genes of all of the isolates under study to investigate their evolutionary adaptation from 1948 to 2017. The aligned sequences were analyzed using HyPhy v2.2.4 software [45], with a P-value of <0.05 indicating selection. The ratio of non-synonymous (dN) to synonymous (dS) codons were examined to identify positions where positive and negative selection had occurred, i.e., dN/dS > 1 indicates positive selection and dN/dS < 1 indicates negative selection. Table 3 shows the positive and negative selection data for the NP, P, M, F, HN and L genes of all of the isolates under study.

Table 3 Positive and negative selection in each gene of 462 avian avulavirus 1 genome sequences isolated between 1948 and 2017

Results

Following data mining and preparation of all 462 NDV genome sequences available as of 02.02.2018 in the GenBank database, recombination analysis was performed, demonstrating the presence of multiple recombinant sequences (Table 1). Recombination events were considered likely if they were identified by six or more of the detection methods in the RDP4 program suite (RDP, Geneconv, MaxChi, Chimaera, Bootscan, SiScan, and 3Seq). The analysis resulted in the detection of 18 distinct recombinant NDV isolates. The isolates HM357251, KJ769262, KM056351, JX193083 and JX316216 were predominantly grouped with the genotype II cluster but showed variation in parts of the genome. The NDV isolate with accession number of HM357251 (isolated in India) contained breakpoints at nt 6487 and 8199 (HN), encompassing a region resembling genotype VII isolates from China, while the rest of the genome (i.e., nt 1-6486 and nt 8200-15186) was related to genotype II isolates. The Indian NDV isolate KM056351 had genotype XIII (Indonesia) genetic characteristics from nt 9334 to 9892 (L gene), 9975 to 10474 (L gene), and 11572 to 11854 (L gene). The Chinese isolate JX193083 resembled genotype I isolates from South Korea at nt 4609 to 6278 and nt 78 to 1574, corresponding to the F and NP gene, respectively. The other two Indian avian avulavirus 1 isolates were recombinant with genotype III isolates (China) at position 1 to 3283 (NP and P) for KJ769262, and 5412 to 7449 (F and HN) and 122 to 3074 (NP and P) for JX316216 viruses. The genome sequences of the JX193079 and JX193081 isolates from China were phylogenetically classified into the genotype I group, but both of them contained genotype-II-like sequences at positions 6264 to 6502 (F and HN junction) and 8244 to 8624 (partial L sequence). The dominant cluster for classification of South Korean isolate JX401404 was also genotype I, with the region from nt 10,227 to nt 11,758 resembling representatives of genotype II from China in the L gene. An Indian genotype IV isolate with the accession number HQ011508 was found to be closely related to genotype II isolates from China at nt 10,402-11,808 (L gene). Genotype VII sequences were found in isolates JX193076, JX532092, GQ338309, DQ659677, JX854452, and JX193074. Hence, genotype VII isolate JX193076 from China showed a shared high degree of similarity to a genotype III isolate from China in the region from nt 8246 to nt 8426 (L). The JX532092 virus from Pakistan was found to be a chimera of genotypes VII and II, with a sequence resembling genotype II isolates from India from nt 8923 to nt 9583 (L). Poor signal quality prevented identification of the exact ending breakpoint. The isolate JX854452 from Pakistan was also identified as a putative recombinant of genotypes VII and XIII. Based on six or more detection methods, this isolate was closely related to genotype XIII isolates from Indonesia at nt 1009-nt 1654 (NP) and nt 2313-nt 3500 (P), and rest of genome sequence grouped with the genotype VII cluster. The Chinese isolates GQ338309 and DQ659677 also clustered with genotype VII but were similar to genotype III isolates from China for nt 13230 to 13574 (L) and genotype VI isolates from the USA from nt 9966-10,549 (L). Three isolates of genotype XIII (KP089979 [India], JN682211 [Pakistan], and KF740478 [India]) showed evidence of recombination and were most similar to genotype II isolates at positions 3827 to 4355 (M), 8858 to 9610 (L) and 2030 to 2318 (P), respectively. Surprisingly, we identified one potential recombinant isolate that appears to have resulted from recombination between members of the class II (genotype I) and class I clusters. This Chinese NDV isolate, JX193078, contained sequences from nt 10,576 to nt 12,805 (L), nt 1840 to nt 2412 (P) and nt 88 to nt 506 (NP) that grouped with the class I cluster (South Korea), while the remainder of the genome appeared to belong to genotype I of class II.

Data from selection pressure analysis of the NP, P, M, F, HN and L gene sequences of 462 NDV isolates including various pathotypes and genotypes recovered from 1948 to 2017 showed that the L, F, and HN genes had the highest rates of positive selection while the M and NP genes were the most conserved through 69 years of avian avulavirus 1 circulation worldwide (Table 3).

Discussion

To survive strict conditions and evade the host immune system, avian avulavirus 1 isolates have evolved by acquiring mutations and variations in their genomic characteristics over time that have sometimes led to change in their virulence or survival [17, 25, 28]. Although the occurrence of natural recombination has been reported to be low for negative-sense RNA viruses [5, 46,47,48], recent studies focusing on reassortment and recombination of negative-sense RNA viruses have suggested a role of recombination in the evolution of these viruses [6, 49]. In case of NDV, a few isolates have been identified as putative recombinants, demonstrating recombination events in different segments of the genome, including the NP, P, M, F, HN, and L segments [17, 28, 29, 50]. As discussed previously, vaccination programs are likely to favor the emergence of recombinants, which could result in changes in viral population dynamics [28]. This has led to the emergence of viruses with genomic features of both avirulent (genotype I or II) and virulent genotypes simultaneously. These variants might be better adapted to evading the host’s immune response after vaccination [17, 28, 29, 50].

Here, we have analyzed all 462 published genomic sequences of NDV isolates available in the GenBank database on 02.02.2018, representing strains circulating over a 69-year period. Recombination events were identified in each viral gene fragment (Fig. 1), which may indicate that there are no specific positions in the genome where recombination is favored. Furthermore, since putative recombinants were isolated from various species of birds, we propose that the recombination process may not be host-dependent. Evaluating the geographic linkage of recombination events, we observed that India, China and Pakistan had the highest prevalence of recombinant isolates and that circulation of viruses between India and China has resulted in the emergence of isolates with genetic characteristics of both virulent and avirulent genotypes simultaneously i.e., II and III, II and IV, II and VII, II and XIII. This may be due to the flying routes of migratory birds, which we have discussed in our previous work [51], which result in circulation of certain isolates and their adaptation to a new niche. Another piece of evidence shows that the emergence of a hybrid NDV isolate of genotypes II and VII with Pakistani and Indian origins involved a peacock in Pakistan and might have been related to the peacock trade in these countries. Putative recombination occurring in Chinese and South Korean viruses appears to have only involved ducks (hybrid of genotype I and II, genotype I and class I) and may be related to the live-bird market industry. Evidence of recombination between class I and class II viruses at the genomic level has been presented for the first time in this study, showing the potential for reassortment of genetic fragments from different positions in the viral genome. In this case, the NP, P, and L genes of the putative recombinant show class I genetic characteristics, which is of potential importance for viral evolution. Recombinant NDV isolates from Indonesia appear to be related to isolates from Pakistan and India. Specifically, recombination between genotype II and XIII strains resulted in the emergence of a novel isolate with parents from India and Indonesia. This variant might have been selected to evade the immune response induced by vaccination [28]. However, the identification of Pakistani and Indonesian parents of a hybrid of genotypes VII and XIII infecting a pheasant suggests that factors other than vaccination may lead to adaptation and emergence of recombinants, since both genotypes are associated with virulence. Furthermore, the emergence of isolates with evidence of recombination between genotypes III and VII (virulent) and genotypes VI and VII (virulent) are also in concordance with the existence of an adaptation process rather than a biosecurity issue, as discussed above.

The existence of recombinant isolates from the USA with parents originating from China or Pakistan, as well as an Australian hybrid virus with China as its likely country of origin (Table 2) suggest the interaction of viruses from different continents, which may be a major biosecurity concern. However, it should be considered that some of the sequences in the GenBank database may have been influenced by mixed infections or contamination with PCR products, which could have resulted in errors in interpretation [52]. It is also likely that coinfections have occurred due to the widespread application of live vaccines or avirulent viruses in live-bird markets and wildlife [52]. Still, we suggest that the discussed recombinant isolates and the suggested reasons for their emergence should be considered.

To compare the putative recombinant isolates of NDV with previously reported viruses, we have studied their recombination breakpoints and genotypic characteristics. The genomic composition of avian avulavirus 1 isolates with the accession numbers of EU167540, GQ994434, GQ994433, KT355595, AY562985, AY562989, AY225110 have been investigated recently [17, 28, 29, 50] and found to be recombinants with regions corresponding to avirulent genotypes [17, 28, 29, 50]. Our results are largely consistent with those studies, but in addition, we found evidence that recombination had also occurred between two virulent viruses, which could be of great importance. Moreover, the positions of the recombination breakpoints reported for the recent putative recombinant viruses and the ones in our study suggest that recombination can occur in any of the six genes. In fact, we suggest that the evolution of the genomic composition of NDV strains has undergone changes over time, i.e., the occurrence of recombination between two virulent genotypes, which may be attributed to their adaptation.

In addition to recombination, intrinsic errors in the function of the viral polymerase result in the generation of variants, in which mutations are selected naturally [25]. The evolution of avian avulavirus 1 isolates is also influenced by selection pressures which are subsequently categorized based on increased variation (positive) or the tendency to keep the sequences conserved over time (negative or purifying) [25]. It has been suggested that a trusted database for investigation of selection should consist of both recombinant and non-recombinant isolates [25], and therefore our dataset included all types of viral genomic characteristics. We analyzed the effects of selection pressure or “adaptive evolution” on the available NDV genomic sequences from isolates of different pathotypes, genotypes, and origins from 1948 to 2017. The most variability was found in the L, F and HN genes (Table 3) thus supporting the hypothesis that these genes have been subjected to stronger positive selection, leading to the generation of different genetic variants and adaptation, than the NP, P, and M genes. The L gene, which is present on the largest segment, encodes the viral polymerase, which is responsible for viral transcription and replication [53]. Thus, we propose that the high rate of positive selection in the RNA-dependent RNA-polymerase gene sequence results in a better chance of survival while generating novel adapted mutants. The fusion protein is responsible for virus-cell fusion and injection of the genome content into cytoplasm [54, 55] and is important for virulence. Hence, we suggest that the higher rate of positive selection than negative selection in this segment may help viruses with a constant rate of pathogenicity to survive under extreme conditions. The HN gene also showed relatively high rates of positive selection, but lower than those of the L and F genes. This protein is important in the evolution of virulent avian avulavirus 1 isolates due to its major role in attachment to sialic acid receptors and release from virions into host cells [56, 57]. Previously, Miller et al. showed that the F gene had the highest rate of variability, while the NP sequence was the most conserved [25], whereas Chong et al. reported that the M and L genes showed no sign of positive selection [28]. These conclusions are in discordance with our results. However, both of the previous studies used a more limited database, and were carried out in 2009 and 2010. Since then, many more sequences have been submitted and analyzed. Our data show a lower rate of positive selection in the M gene, while the M and NP gene sequences show higher rates of negative selection, with a tendency for their sequences to be conserved and inadaptable variants to be deleted. The purifying selection in the HN and L genes is also noteworthy.

Conclusions

We have investigated recombination events and selection pressures to gain a better understanding of avian avulavirus 1 evolution and adaptation over time. Based on analysis of 462 complete genome sequences, we have identified 18 novel putative recombinant isolates, some of which provide evidence of recombination between virulent genotypes (VII/XIII, III/VII, and VI/VII). We have also detected recombinant viruses sharing class I and II genotypic characteristics. Furthermore, recombination of avirulent and virulent genotypes (I/II, II/XIII, II/VII, II/III and II/IV) was identified using recombination detection tools. Although we believe it is important to consider the possibility of coinfections or contaminations during the experimental process, which could have resulted in errors, we suggest that the evidence of recombination is nevertheless noteworthy and should be taken into consideration. According to a geographical linkage study of recombinant isolates, we suggest that recombination events mostly occurred in Asia and the Middle East via vaccination strategy failures, evasion of the immune response, live-bird markets, and the bird trade. Still, these results are based only on the available data, and the discussed geographical linkage may be extended to a wide range of countries. In addition, an adaptive evolution study of NDV isolates with various genomic features revealed that the L, F and HN genes have the highest rate of positive selection, which we propose may have led to the emergence of adapted isolates with unaltered pathogenicity but a higher chance of survival under strict conditions. Moreover, the M and NP genes showed higher purifying selection rates than the other gene fragments, suggesting the possible deletion of weaker sequences in order to maintain sequence conservation. Further complementary experimental analysis should be performed to confirm these results.