Introduction

Peste des petits ruminants virus (PPRV), which was classified as a Morbillivirus in the family Paramyxoviridae, is considered as an important pathogen that contributes to a highly contagious disease in goats and sheep (Banyard et al. 2010; Gibbs et al. 1979). The PPRV is an enveloped single-stranded negative-sense RNA virus (Chard et al. 2008). There were four distinct lineages of PPRV, namely lineage I, II, III and IV, based on the geographical distribution of PPRV, but only one serotype of PPRV was found (Banyard et al. 2010). The PPRV genome consists of six transcriptional units (N, P, M, F, H and L genes) which can synthesize the nucleoprotein (N), polymerase complex (P), matrix protein (M), fusion glycoprotein (F), haemagglutinin glycoprotein (H) and large polymerase (L) (Bailey et al. 2007). According to the schematic diagram of PPRV virion structure (Banyard et al. 2010), viral genome was closely encapsulated by N protein to constitute a helical nucleocapsid, like other negative stranded RNA viruses. As for PPRV N protein, the molecular weight was about 58 kDa (Ismail et al. 1995). This protein was considered as a major viral protein in Morbilliviruses (Diallo et al. 1987), and it played an key role in PPRV replication (Parida et al. 2015). Interestingly, PPRV N gene can be expressed in insect, bacterial and mammalian cells and the product of the gene expression contributes to formation of nucleocapsid-like particle (Mitra-Kaushik et al. 2001). Meanwhile, the conserved region of N protein functioned as the important requirement for self-assembly of measles virus (a member of Morbillivirus) (Bankamp et al. 1996). Like N protein of measles virus, the conserved nucleotide sequence of N protein was a key factor in the formation of virus-like particles of PPRV (Liu et al. 2014). The findings implied that the specific nucleotide usage pattern of N gene would function as the formation of secondary structure of N protein. Currently, the analysis of viral protein structure has been an important pathway to find out biological and immune functions and improve vaccine designs. Unfortunately, there was no report about any viral proteins of PPRV. This current situation might have a limit in estimating the potential effects of genetic information from gene sequence on the formation of viral protein structure of PPRV and lead to a lack of relationship between gene sequence and protein secondary structure in vaccine design. Although the genetic features of PPRV N gene were significantly different from those of other viral genes (H, F, L, P and M genes), the N genes of Morbillivirus represented highly conserved genetic features (Baron et al. 2016). Moreover, the studies for secondary structure of N protein of measles virus have been carried out, and the related precise information about secondary structures of N protein can benefit for investigations into structural features and biological functions of N proteins of other members of Morbillivirus. The nucleotide usage pattern had obvious effects on synonymous codon usage patterns of PPRV genes, and synonymous codon usage patterns of N gene distinguished from that of other viral genes (Ma et al. 2017). The synonymous codon usage pattern was identified as an important genetic factor in impact on evolutional dynamic, translation efficiency, tRNA abundance and secondary structure of protein, etc. (Bahir et al. 2009; Clarke and Clark 2008; dos Reis et al. 2003; Singh et al. 2017; Straube 2017; Welch et al. 2009; Weygand-Durasevic and Ibba 2010). It was found that synonymous codon usage bias profoundly influenced the formation of secondary structures to regulate biological function of native protein (Chaney and Clark 2015; Rodriguez et al. 2017). Some previous studies pointed out that various tRNA abundances could reflect synonymous codon usage patterns in the cell environment (Hanson and Coller 2017; Mioduser et al. 2017; Nossmann et al. 2017; Rafels-Ybern et al. 2017). Subsequently, some reports represented that the synonymous codon usage bias had obvious correlations with the tRNA abundance in host cells (Kanduc 2017; Pouyet et al. 2017). Compared with the obvious effect of amino acid usage patterns on the formation of secondary structure of native protein, synonymous codon usage bias had a potential to regulate the correct folding units in protein secondary structure (Gu et al. 2004; Guisez et al. 1993; Marin 2008). Both theoretical in silico modeling analyses and the experimental data indicated that nucleotide and codon usages at mRNA levels had additional layers of information on fine-tunes in vivo protein folding during co-translational process, beyond the amino acid usages (Komar 2009). Analyses of synonymous codon usage bias in different folding units (α-helix, β-strand and the coil) and changes of translation speed at the transition boundary from one folding unit to another folding unit might give a new insight into the formation of viral protein product in host cells. According to the effect of synonymous codon usage bias on protein secondary structure, the study for synonymous codon usage bias in different folding units of native protein would benefit for providing new insights into biological functions or vaccine design related to PPRV N protein.

Materials and methods

The coding sequences of PPRV N protein

The 45 genomes PPRV were downloaded from the National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/Genbank/) and the detailed information was listed in Table S1. To better identify unique genetic feature reflected by the overall codon usage bias between N gene and other viral genes (H, F, L, M and P genes), an effective number of codons (ENC) analysis was applied to quantify the absolute codon usage bias for these PPRV genes (Wright 1990). Meanwhile, to better clarify the adaptation of PPRV N gene to host, an improved CAI calculation method, depending on CAIcal server, was applied for analyzing extents of adaptation of viral genes (N, H, F, L, M and P genes) to host (Ovis aries) (Puigbo et al. 2008a; Sharp and Li 1987). The codon usage frequencies of O. aries (natural host of PPRV) was selected for a reference and the related data was obtained from the Codon Usage Database (Nakamura et al. 2000). To better clarify genetic features reflected by the overall codon usage bias and its adaptation to host for each viral gene, the ENC values and CAI values of the six genes were estimated by the statistical test (One-Way ANOVA) via SPSS 16.0 software for Windows, respectively.

Alignment of amino acid sequence of PPRV N protein and the model of N protein structure

To identify genetic diversity of amino acid usages in PPRV N protein, the genetic diversity of N genes in 45 PPRV strains was analyzed by MEGA 5.0 software. Phylogenetic tree of N protein was constructed using the Neighbor-Joining method via gamma distributed rate variation and a boot-strap of 1000 replicates.

There were two reliable biological conclusions: (i) the N proteins represented highly conserved genetic features in Morbillivirus, (ii) the precise secondary structure of N protein in measles virus was obtained by X-ray diffraction method (Guryanov et al. 2015; Gutsche et al. 2015). The information benefited for modeling the reliable secondary structure of PPRV N protein. In this study, each PPRV N gene was estimated and modeled for its specific 3D structure via SWISS-MODEL program which is supported by Swiss Institute of Bioinformatics and Center for Molecular Life Sciences (https://swissmodel.expasy.org/). SWISS-MODEL program relied on a fully automated protein structure homology-modeling server, accessible via the ExPASy web server, or from the program DeepView (Swiss Pdb Viewer).

Modeling the change of tRNA abundance at the transmission boundary from one specific folding unit to another one

Based on the information on the 3D structure of PPRV N protein deriving from SWISS-MODEL program, the specific folding unit can be located in amino acid sequence of each N protein. According to the previous reports regarding to the effect of synonymous codon usage bias on the formation of the specific folding unit in the target protein (Ding et al. 2014; Zhou et al. 2013b), the correlation between synonymous codon usage bias and the specific folding unit in N protein can be calculated by the formula below:

$$P=\frac{{{f_{obs}}}}{{{f_{\exp }}}}$$
$${f_{obs}}=\frac{{{N_{(i,\sec - k)}}}}{{{N_{(k)}}}}$$
$${f_{\exp }}=\frac{{\sum {{N_{(i,\sec - j)}}} }}{{{N_{total}}}}$$

where N(i,seck) represents the amount of a particular synonymous codon coding for the corresponding amino acid in a specific secondary unit of protein, sec-k represents the corresponding amino acid in the interesting secondary unit (the α-helix, β-strand or coil); N(k) represents the amount of the corresponding amino acid in the interesting secondary unit. In addition, \(\sum {{N_{(i,\sec - j)}}}\) represents the total number of amino acid in the specific secondary unit, sec-j corresponds to a certain type of the three secondary structure units (α-helix, β-strand or coil), and Ntotal represents the total number of codon in the target protein. Furthermore, we defined that when P value was much higher than 1.5, the synonymous codon had a strong tendency to exist in the interesting secondary unit; on the contrary, when P value was much less than 0.5, the synonymous codon had a strong tendency to avoid the interesting secondary unit (Zhou et al. 2013b).

Mapping changing trends of tRNA abundance at the transition boundary

To better model the translation speed caused by tRNA abundances at the transition boundary from one folding unit to another, we depended on the relative synonymous codon usage value (RSCU) of O. aries (natural host of PPRV) (Zhou et al. 2013a) in order to reflect the corresponding tRNA abundance, and therefore carried out for the formula below:

$$R=\sqrt[n]{{\prod\limits_{1}^{n} {(Wij/Wj)} }}$$

where R value represents that the overall codon usage bias for a particular codon position in the interesting gene, Wij represents the specific synonymous codon (i) for the corresponding amino acid (j), Wj represents the highest RSCU of a synonymous codon for the same amino acid and n represents means of the given PPRV N coding sequences. A codon site with R value close to 1.00 is made by high tRNA abundance, while one with R value close to 0.00 is made by low tRNA abundance.

Results

Unique genetic feature of codon and amino acid usages for N gene

To estimate the magnitude of the codon usage bias of viral genes, the ENC values were calculated for all PPRV strains. Although the overall codon usage bias of N gene was significant different from those of F, H, L and P genes, the ENC data of the six genes of PPRV were more than 50 (Fig. 1a). The result suggested that the relatively weak synonymous codon usage bias of PPRV genes was strongly influenced by mutation pressure from viral nucleotide composition. Furthermore, to better quantify the magnitude of the adaptation of viral genes to host represented by CAI data, this magnitude (0.66 ± 0.007) of N gene was highest and had significant difference from those of the reset genes (p value < 0.001) (Fig. 1b). Because CAI analysis was able to reflect the adaptation of viral genes to the host cellular machinery (Butt et al. 2016), the result implied that PPRV N gene had a better fitness to host and higher gene expression than other viral genes.

Fig. 1
figure 1

The codon usage bias for the six genes of PPRV analyzed by ENC and CAI methods. a ENC data of the six genes, b CAI data of the six genes. CAI analysis of viral genes in relation to its host (O. aries). The p value was calculated by One-way ANOVA method of SPSS software. When p value < 0.05, the two groups have significant differences. ***means p value < 0.001, **means p value < 0.01, *means p value < 0.05

Turning to amino acid usages of N protein among different PPRV strains, the sequences were similar to each other (more than 95% similarity in amino acid sequence) (Fig. S1) and did not contain any insertion or deletion in the given sequences, suggesting that PPRV N protein was highly conserved and can be modeled for its 3D structure. According to the modeling 3D structure of PPRV N protein modeled by SWISS-MODEL program (Fig. S2), PPRV N protein was rich in the α-helix units (AA46–AA59, AA66–AA76, AA84–AA91, AA124–AA130, AA160–AA182, AA190–AA200, AA213–AA223, AA227–AA242, AA249–AA246, AA268–AA278, AA291–AA305, AA319–AA324, AA330–AA344, AA360–AA383, AA388–AA403), of which the specific structure was capable to benefit for viral RNA sequence binding to this protein.

Codon usage bias in different folding units of PPRV N protein

Based on P values for the 61 codons which were involved in the formation of different folding units in N protein, we estimated the effect of synonymous codon usage pattern on the specific folding unit. In detail, some synonymous codons had propensity to exist in the α-helix unit, including AUU for Ile, GUC for Val, UCU/AGC for Ser, UAU for Tyr, UGU/UGC for Cys, CGG for Arg, AUG for Met and UGG for Trp, however, there were some synonymous codons which tended to avoid the formation of this folding unit, namely CCU, CCA/CCG for Pro, ACA/ACG for Thr, GAU for Asp, CGC for Arg and GGC for Gly (Table 1). As for codons usage with a bias for shaping the β-strand unit, these codons tended to be selected, namely CUU for Leu, AUU/AUA for Ile, GUU/GUC/GUA for Val, AGU for Ser, CCC/CCG for Pro, ACC for Thr, AAU for Asn, CGC for Arg, GGG for Gly, UGG for Trp, while other codons had strong tendency to avoid in this unit, including UUC for Phe, CUC/CUA for Leu, GUG for Val, UCU/UCC/UCA/UCG/AGC for Ser, CCU/CCA for Pro, ACU/ACA/ACG for Thr, GCU/GCC/GCG for Ala, UAU/UAC for Tyr, CAU/CAC for His, CAG for Gln, GAU for Asp, GAG for Glu, UGU/UGC for Cys, CGU/CGA/CGG/AGG for Arg (Table 1). Turning to some synonymous codons usage with link to the formation of the coil unit, there were CCU/CCA for Pro, ACA/ACG for Thr, CAU/CAC for His, GAU for Asp, GGC for Gly strongly selected by this unit, but the codons (CUU for Leu, AUU for Ile, GUA/GUC for Val, UGU/UGC for Cys, CGC for Arg, UGG for Trp) were slightly selected by this unit (Table 1). Compared with different P values for the same codon which was selected by different folding units, AUU for Ile, GUC for Val and UGG for Trp had an obvious tendency to shape the α-helix and β-strand, but the same codons were not able to form the three different units at the same time (Table 1). Of note, the two synonymous codons (UGU/UGC) coding for Cys had a significant relation with the formation of the α-helix; the two codons (CGC for Arg and UGG for Trp) had the obvious tendency to shape the β-strand unit; the two synonymous codons (CAU/CAC coding for His) had a strong tendency for the formation of the coil unit. The result represented that the selection of some synonymous codons was particularly involved in the formation of the different secondary units in the PPRV N protein to some degree.

Table 1 The effect of codon usage pattern on the different secondary structure units in PPRV N protein

The fluctuations of translation speed caused by tRNA abundances from one type of folding unit to another

According to the information about 3D structure of N protein (Fig. S2), there were four types of transition boundaries from one folding unit to another folding unit, including between the coil unit and the β-strand unit and between the coil unit and the α-helix unit. To better model changing trends of translation speed caused by variant tRNA abundances, we depended on 59 synonymous codon usage values plus the two codons (AUG for Met and UGG for Trp) usage values being 1.0 to reflect tRNA abundances corresponding to 61 codons, and therefore modeled changing trends of translation speed at the four types of transition boundary (Fig. 2). Compared with changing trends of translation speed at the two transition boundaries (between the coil unit and the α-helix unit) (Fig. 2c, d), the significant fluctuations occurred at the two transition boundaries (between the coil unit and the β-strand unit) (Fig. 2a, b). The results suggested that the usages of synonymous codon might be affected by tRNA abundances, and the different translation speed caused by various tRNA abundances likely took part in the formation of the specific downstream folding unit.

Fig. 2
figure 2

The changes of R value from one type of folding unit to another in PPRV N protein. a Transition boundary from coil to β-strand, x-axis showing the three positions at the C-termination of coil and the three positions at the N-termination of β-strand, b transition boundary from β-strand to coil, x-axis showing the three positions at the C-termination of β-strand and the three positions at the N-termination of coil, c transition boundary from coil to α-helix, x-axis showing the three positions at the C-termination of coil and the three positions at the N-termination of α-helix, d transition boundary from α-helix to coil, x-axis showing the three positions at the C-termination of α-helix and the three positions at the N-termination of coil

Discussion

In this study, we evaluated the effects of synonymous codon usage on the formation of different folding units of PPRV N protein. There was a significant correlation between the selection of some codons and the specific folding unit (the α-helix, β-strand or coil). Although PPRV was considered as an RNA virus with high mutation rate (Bao et al. 2017), the selection of synonymous codon usages for viral product of PPRV was influenced by translation selection deriving from host (Ma et al. 2017). The overall codon usage bias of N gene reflected by ENC data represented an equivalence relation between the nucleotide changes at the third codon position by mutation pressure and natural selection. To further confirm the influence of natural selection, CAI analysis was frequently used as a measure of gene expression and to their hosts, which reflected the influence of natural selection (Hickey et al. 1995). The higher CAI value is, the more influence of translation selection is in synonymous codon usage bias of the given gene (Carbone et al. 2003; Puigbo et al. 2008b). The prefer fitness of N gene to host at codon usage reflected translation selection stronger functioned on codon usage of N gene to maintain its normal biological function for adaptation to host. The conserved amino acid sequence of N protein among different PPRV strains enabled the modeling secondary structure to precisely represent the relationship between codon usage patterns and the specific folding unit. As for translation of the virus mRNA sequence, the fine-tuning translation kinetics had obvious effects on the viral product folding (Aragones et al. 2008, 2010). In this study, the synonymous codon usage patterns of the PPRV N gene had a good adaptation to the natural host, and a simple statistical analysis indicated that synonymous codon usage pattern was strongly related to the formation of folding unit. For example, a strong selection of CCU/CCA for Pro existed in the coil unit instead of the α-helix and β-strand units and a biased usage of CGC for Arg existed in the formation of the β-strand rather than the other two folding units in N protein (Table 1). Our findings were in agreement with other previous reports (Gu et al. 2003; Gupta et al. 2000; Ma et al. 2013; Xie and Ding 1998; Zhou et al. 2013b), suggesting that selection of a specific synonymous codon was maintained on the PPRV N gene for the proper protein structure. Thus, the only target of improvement of exogenous gene in host cell by means of optimal codon replacing rare one in the same synonymous codon family need to be re-taken seriously. Although an agreeable view about the basic features ruling self-folding of proteins into complex secondary structures in in vivo or in vitro is still unclear, translation kinetics might control the co-translational folding pathway and translational pausing at rare codons might cause a time gap to delay the defined portions of the nascent polypeptide emerging from the ribosome (Komar 2009; Kramer et al. 2009). Our study showed that the transition boundaries between the coil unit and the β-strand unit in the PPRV N protein had obvious tendencies to change translation speed (Fig. 2). Translation is physically and functionally combined to the folding and targeting of newly synthesized proteins. Synonymous codon usages and the variations of isoaccepting tRNAs exerted a powerful selective force on translation fidelity and stretches of codons pairing to minor tRNAs form putative sites to locally attenuate translation (Zhang and Ignatova 2009). The adjustment of the translation speed depending on the presence of rare codons in coding sequence may affect the folding efficiency of newly synthesized proteins (Tsai et al. 2008).

In conclusion, the established relationship between the synonymous codon selection and a specific folding unit was based on the biological feature. The synonymous codon pairing to different degrees of tRNA abundance can adjust the translation speed of newly peptide to shape the proper secondary structure of the target protein.