Introduction

SARS-CoV-2 is a betacoronavirus that emerged in 2019 and spread worldwide, leading to an ongoing global pandemic [1, 2]. By May 27, 2022, the number of human infections had reached 527 million, resulting in 6.28 million deaths (Johns Hopkins University statistics; https://coronavirus.jhu.edu/map.html). SARS-CoV-2 has a single-stranded positive-sense RNA genome with 29,903 nucleotides that contains 14 open reading frames (ORFs) encoding 29 proteins [3]. One of these viral proteins, the spike glycoprotein (S protein), is an envelope protein that is responsible for the recognition of the host receptor ACE2 and fusion of the viral and host cell membranes [4].

SARS-CoV-2 has a broad host spectrum, probably because ACE2, the receptor used by the virus for cell entry, is a relatively conserved protein in mammals [5]. Many animal species, including cats, dogs, tigers, lions, ferrets, minks, and white-tailed deer, have been found to be susceptible to SARS-CoV-2 infection [6,7,8,9,10,11]. The earliest animal infections of SARS-CoV-2 in the COVID-19 pandemic were pet cats and dogs [6, 7, 12]. Later, in a report on infection of SARS-CoV-2 in tigers, lions, and human keepers at a New York zoo, the epidemiological and genomic data indicated human-to-animal transmission [8]. Other nondomestic animals, including snow leopards and gorillas, have also tested positive for SARS-CoV-2 after showing signs of illness [13, 14]. Notably, a study from the Netherlands reported the transmission of SARS-CoV-2 both from humans to minks and back from minks to humans on mink farms [15]. Eighty-eight minks and 18 staff members from sixteen mink farms were confirmed to be infected with SARS-CoV-2, as determined by high-throughput sequence analysis. The adaptation of SARS-CoV-2 to the mink receptor and viral evolution in the mink host are thus worthy of further study.

Codon usage bias refers to differences in the frequency of occurrence of synonymous codons for protein translation. For a certain virus, the codon usage pattern may vary when it is adapted to a different host cell [16]. Codon usage bias in some viruses is mainly driven by natural selection pressure [17], while in other viruses such as Ebola virus, mutational bias is a major force determining codon usage [18]. Viruses differ significantly in their host specificity, and analysis of the viral genome structure and composition can contribute to the understanding of virus evolution and adaptation in their hosts [15, 19].

For SARS-CoV-2, exploration of its codon usage pattern in different hosts, especially that of the gene coding for the spike protein, will help to reveal adaptations related to cross-species transmission. Surveillance of nucleotide substitutions and selection in SARS-CoV-2 genomes is important for studying viral evolution and tracking viral transmission. In particular, studying the S gene is important for predicting the efficacy of vaccines and adjusting vaccine design in a timely manner. To investigate the natural selection of SARS-CoV-2 that might play a role in virus evolution, fitness, and transmission, we analyzed the base composition and codon usage of viral genomes isolated from human and animal hosts.

Materials and methods

SARS-CoV-2 sequences and data collection

A total of 258 SARS-CoV-2 genome sequences from humans, cats, dogs, tigers, lions, hamsters, minks, and white-tailed deer were used for genetic analysis. Information about these isolates is given in Supplementary Table S1. All genome sequences were obtained from the GISAID database (https://www.gisaid.org/). Isolate Wuhan/WIV04 was used as the reference strain.

Evolutionary analysis

All 258 SARS-CoV-2 genome sequences were used for phylogenetic analysis. The evolutionary history was inferred using the maximum-likelihood method and the Tamura-Nei model [20]. The tree with the highest log likelihood (-1129076.99) is shown. The percentage of trees in which the associated taxa clustered together is shown next to the branches. Initial tree(s) for the heuristic search were obtained by applying the neighbor-joining method to a matrix of pairwise distances estimated using the maximum composite likelihood (MCL) approach. A discrete gamma distribution was used to model evolutionary rate differences among sites (five categories [+G, parameter = 1.7203]). The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. There were 29,899 positions in the final dataset. Evolutionary analysis were performed using MEGA-X [21].

Identification of mutations

Sequences were aligned using MEGA X, and single-nucleotide polymorphisms were analyzed using the SNiPlay pipeline by uploading an aligned Fasta format file (https://sniplay.southgreen.fr/cgi-bin/analysis_v3.cgi) [22]. Complete sequences, including the coding regions, 5' UTR, and 3' UTR, were used for the analysis.

Calculation of nonsynonymous and synonymous substitution rates

The number of nonsynonymous substitutions per synonymous site (dN) and the number of synonymous substitutions per nonsynonymous site (dS) for each coding site were calculated using the Nei–Gojobori method (Jukes–Cantor) in MEGA X. The Datamonkey adaptive evolution server (http://www.datamonkey.org) was used to identify sites where only some branches had undergone selective pressure. The mixed-effects model of evolution (MEME) and fixed-effects likelihood (FEL) approaches were used to determine the nonsynonymous and synonymous substitution rates.

Codon usage analysis

The codon adaptation index (CAI) of coding sequences was calculated using R script [23]. CAI analysis of the coding sequences from different hosts was performed using DAMBE 5.0 and CAIcal [24, 25]. The codon usage patterns of different hosts were obtained from the codon usage database (http://www.kazusa.or.jp/codon/), and the relative synonymous codon usage (RSCU) values were determined using MEGA X software. The accession numbers for mink, cat, dog, tiger, and lion SARS-CoV-2 are MT457401.1, MT747438.1, MT215193.1, MT704316.1, and MT704312.1. Bat-CoV refers to isolate RaTG13 (GenBank no. MN996532.2), obtained from a bat, Pangolin-CoV refers to pangolin coronavirus (GenBank no. QLR06867.1).

Neutral evolution analysis

Neutrality plot analysis was performed to investigate the influence of natural selection and mutation pressure on the codon usage bias [26]. The GC12 values were plotted against GC3 values with a regression line. The slope of the regression line represents the evolutionary speed of the mutation pressure and natural selection pressure. For points lying close to the regression line, there are no significant differences at the three codon positions. If the point is located above the regression line, it means that mutation pressure dominates evolution, whereas, for the points below the line, natural selection plays a more important role.

Spike protein structures and their docking with ACE2

The crystal structure of the receptor-binding domain (RBD) of the SARS-CoV-2 S protein in complex with human ACE2 (PBD ID: 6M0J) was used for structural analysis. Structures of ACE2 and the mink SARS-CoV-2 S protein were predicted using the SWISS Model server (https://swissmodel.expasy.org/). The stability of RBD-ACE2 complexes was calculated using mCSM-PPI2 (http://biosig.unimelb.edu.au/mcsm_ppi2/). The predicted protein structures and pairwise comparisons were performed using PyMOL software.

Molecular dynamics simulation

The binding free energy (E) and the minimized annealing energy were predicted by molecular dynamics (MD) simulation using YASARA [27]. We performed three iterations for the energy minimization of each complex structure of wild-type or mutant mink SARS-CoV-2 RBDs bound to human or mink ACE2. The relative binding energy (∆E) is reported as the mean and standard deviation values from three replicates.

Selection coefficient index

The selection coefficient index (S) of all SARS-CoV-2 codons was calculated using the FMutSel0 model in the program CODEML (PAML package) [28]. The fitness parameter of the most common residue at each position was fixed at 0, while the other fitness parameters were limited to the range of −20 < F < 20.

Plasmid construction and cell culture

Gene fragments encoding the RBD and N-terminal domain (NTD) of the SARS-CoV-2 S protein (NCBI ID no. MN996528.1) were inserted into the plasmid pCAGGS (donated by Prof. Jianguo Wu). Gene fragments encoding the mink RBD and mink ACE2 (NCBI ID no. MW269526.1) were synthesized by Genscript Inc. Mutations were introduced into the RBD gene using a Mut Express II Fast Mutagenesis Kit (Vazyme, C214). All of the wild-type and mutant gene fragments were cloned into the pCAGGS vector with a 6× histidine tag, using EcoRI and XhoI restriction sites. A gene fragment encoding the mink ACE2 was inserted into the vector pcDNA3.1-eGFP to generate pcDNA3.1-mACE2-eGFP. The ligation products were introduced by transformation into competent E. coli Top10 cells. Recombinant plasmids were verified by DNA sequencing. Plasmid pcDNA3.1-hACE2-eGFP expressing hACE2 fused to eGFP was purchased from Fubio Ltd. (MC_0101086). HEK293T and BHK-21 cells were cultured in high-glucose DMEM medium in a 5% CO2 atmosphere in a 37℃ incubator.

Protein expression and purification

Recombinant pCAGGS plasmids for expression of the RBD or NTD variants were introduced by transfection into CHO cells in 245-mm dishes using Lipofectamine 6000 according to the manufacturer's recommendations. The supernatants were collected 5 days after transfection and centrifuged. The soluble proteins were purified using HisPur™ Ni-NTA Resin (Thermo Scientific, 88221) and eluted in 20 mM Tris-HCl (pH 8.0) buffer containing 150 mM NaCl.

Flow cytometry

BHK-21 cells were transfected with pcDNA3.1-hACE2-GFP and pcDNA3.1-mACE2-eGFP using Lipofectamine 6000 (Beyotime, C0526) according to the manufacturer's instructions. The cells expressing hACE2-GFP and mACE2-GFP were collected at 24 h post-transfection, resuspended in phosphate-buffered saline (PBS), and then incubated with the purified His-tagged RBDs at a final concentration of 30 μg/mL at 37ºC for 30 min. The NTD was used as a negative control. After being washed twice with PBS, the cells were incubated with anti-His/APC antibodies (1:5000) and then examined using a BeckMan CytoFLEX Flow Cytometer. The data were analyzed using FlowJo V10 software.

Statistical analysis and mapping

Statistical analysis was performed using ANOVA followed by Turkey's post-hoc test (Fig. 2C and F) or Student's t-test (Fig 3B). The data were considered significantly different if the P-value was less than 0.05. SPSS 20.0 software was used to perform regression curve fitting. ***, P < 0.001; **, P < 0.01; *, P < 0.05; ns, not significant. The figures were made using GraphPad PRISM 5.0.

Results

Sequence and analysis of SARS-CoV-2 isolated from animals

As of June 20, 2021, more than 2.53 million SARS-CoV-2 genome sequences had been uploaded to the GISAID database. It is important to study the mutation rates and selective pressures on the SARS-CoV-2 genome during the spread of the epidemic. In addition to humans, SARS-CoV-2 infects other animals (Fig. 1A) and evolves in these animals. A phylogenetic tree was constructed based on animal-derived whole-genome consensus sequences, using the SARS-CoV-2 human isolate WIV04 as the outgroup (Fig. 1B). Most SARS-CoV-2 sequences isolated from animals in the same geographic region were clustered together, so a single clade could contain sequences from different animals (See Supplementary Figure S1 for details). Previous investigations found evidence of human-to-animal spillover and further transmission of SARS-CoV-2 in minks and white-tailed deer [12, 15], and the available sequences suggesting mink-to-human transmission were mainly from regions where mink infection had been reported (Netherlands and Denmark).

Fig. 1
figure 1

Composition and substitution analysis of SARS-CoV-2 isolated from animals. (A) The reported animals infected with SARS-CoV-2 with the defined transmission route from humans to animals. (B) Phylogenetic tree constructed by the maximum-likelihood method with the Tamura-Nei model in MEGA X with 500 bootstrap replicates. Red dots represent human sequences from infected animals, and blue dots represent sequences from infected white-tailed deer. (C) The proportions of uracil, guanine, thymine, and cytidine substitutions (nonsynonymous) in SARS-CoV-2 isolated from human or animals. (D) Base pair changes observed in the SARS-CoV-2 genomes. All of the transitions and transversions are listed in Supplementary Table S2. (E) The synonymous and nonsynonymous substitutions in mink SARS-Cov-2. (F) The relative proportion of each nucleotide substitution in the mink SARS-CoV-2 genome.

In the cluster of SARS-CoV-2 from minks, the sequences had more substitutions than the human SARS-CoV-2 isolates transmitted from infected animals, when compared to the reference sequence WIV04 (Supplementary Table S2). Cytidine substitutions in mink SARS-CoV-2 accounted for nearly 50% of the total substitutions, whereas the replacement of nucleotides with cytidine in isolates from animals other than minks and deer accounted for only 30% of the substitutions (Fig. 1C). Adenine substitution in SARS-CoV-2 in other animals was threefold higher than in mink SARS-CoV-2 (Fig. 1C). To track how the substitutions in the mink SARS-CoV-2 genome occurred, we analyzed all of the mutations in the mink SARS-CoV-2 genome, using the WIV04 genome as a reference sequence. As shown in Fig. 1D and F, cytidine-to-uracil transitions occurred in more than 40% of cases, which was eightfold higher than the rate of uracil-to-cytidine substitution. Notably, guanine and adenine substitutions were more than threefold higher in nonsynonymous mutations than in synonymous mutations (Fig. 1E).

Mutational spectra of spike proteins in human and animal samples

Comparison of S gene sequences from humans and animals revealed considerable variation and allowed the identification of several highly variable residues. C-to-U substitutions were scattered throughout the SARS-CoV-2 genome and accounted for 24.06% of the substitutions in the S gene in all epidemic strains analyzed in this study (Fig. 2A). The dN-dS ratios indicated that natural selection had occurred at most of the mutated sites in the S gene (dN-dS>0 indicates positive selection, and dN-dS<0 indicates purification selection). The data also suggested that positions 222, 262, 439, and 614 were exposed to strong positive selection pressure, while positions 294, 413, 1018, and 1100 had undergone purification selection (Fig. 2B).

Fig. 2
figure 2

The mutation spectrum of the spike protein and selection pressure analysis. (A) Substitutions in the animal-derived SARS-CoV-2 S gene. (B) dN-dS value for S gene sequences. dN = nonsynonymous changes/nonsynonymous site. dS = synonymous changes/synonymous site. (C) CAI values for SARS-CoV-2 sequences from humans and animals. "Bat-CoV" refers to RaTG13 from bat, "Pangolin-COV" refers to pangolin coronavirus (GenBank no. QLR06867.1), "other animal-SARS2" means SARS-CoV-2 isolated from the indicted animals, "the first SARS-CoV-2" refers to the virus isolated from a human host. (D) ENC plot analysis of the coronaviruses from animals and humans. (E) Neutrality plot analysis of the coronaviruses from animals and humans. (F) CAI values of the S sequences of SARS-CoV-2 from humans and animals.

CAI analysis was used to quantify the codon usage similarities between different coding sequences based on a reference set of highly expressed genes [29]. To evaluate the adaptation of SARS-CoV-2 in different hosts, we calculated the average CAI for the viral genome (Fig. 2C). Interestingly, the CAI value for SARS-CoV-2 from bats was significantly higher than that from humans, while dog-SARS2 had a much lower CAI value than human SARS-CoV-2 (Fig. 2C). ??Effective number of codons?? (ENC) plot analysis was used to investigate factors influencing SARS-CoV-2 codon usage bias. The results (Fig. 2D) showed that most of the SARS-CoV-2 isolates gave values slightly below the standard curve (R2 = 0.7562, P = 0.0023), indicating that the codon usage bias was affected by both mutation pressure and evolutionary pressure, consistent with previous reports [30, 31]. Neutrality plot analysis showed that the linear regression coefficient of SARS-CoV-2 sequences was -0.1156 (Fig. 2E), indicating that the first and second positions of the codons were mainly affected by mutation pressure and that the third positions were mainly influenced by selection pressure. Codon usage analysis showed that SARS-CoV-2 S genes from pangolins, cats, dogs, tigers, lions, and deer had significantly lower CAI values than those from humans (Fig. 2F), implying that the viruses were less adapted in these animal hosts. All of the nucleotide substitutions in the codons of the S genes are listed in Supplementary Table S3.

Stronger binding to the mink receptor by SARS-CoV-2 spike mutants

Structure analysis of the SARS-CoV-2 and ACE2 complex and sequence alignment of the spike proteins showed that the spike protein interacts with amino acids 34, 41, 79, 82, and 354 of human and mink ACE2 (Supplementary Fig. S2A and B), which form electrostatic and hydrophobic interactions with residues Asn439, Tyr453, Phe486, and Asn501 of the spike protein. An alignment of ACE2 amino acid sequences of humans, minks, ferrets, tigers, cats, and dogs showed that the critical changes H34Y, L79H, and G354R had occurred in minks and ferret ACE2 (Supplementary Fig. S2B). On the other hand, virus mutation is another important factor that should be considered for transmission of viruses between animals and humans. Sequence alignment of the RBD of the spike proteins revealed that most of the residues binding to the receptor were conserved in these virus strains (Supplementary Fig. S2C). However, residue 453 of the mink SARS-CoV-2 spike protein, which was predicted to interact with residue 34 on ACE2 (Fig. 3A), had changed from Y to F. An MD simulation suggested that the binding interaction of F453-Y34 in minks was stronger than that of Y453-H34 in humans (Fig. 3B). The N501T substitution in the spike protein of mink isolates also resulted in stronger binding to the mink receptor (Fig. 3B). Overall, the change in receptor binding affinity was mainly due to the mutations that altered hydrophobic interactions (Y453-H34, F486-M82) and a polar H-bond (N501-Y41) at the binding interface (Fig. 3C). Flow cytometry results showed that the mutations Y453F, F486L, and N501T all enhanced the binding of the spike RBD to the mink ACE2, whereas only F486L and the double mutant Y453F&F486L showed increased binding to the human receptor (Fig. 3D). These data indicate that key point mutations in the spike protein contribute to the adaptation of SARS-CoV-2 to minks.

Fig. 3
figure 3

Analysis of binding of the SARS-CoV-2 spike protein with human and mink receptors. (A) Comparison of the spike structure of mink SARS-CoV-2 with that of reference strain WIV04. The changed residues within mink SARS-CoV-2 are highlighted as yellow balls. (B) The free energy of binding of the wild-type RBD or mutants from mink SARS-CoV-2 to the human and mink receptor. (C) Amino acid changes involved in the stability of the RBD-ACE2 complex. Detailed structures for Y453, F486, and N501 are arranged from top to bottom. The green lines represent hydrophobic interactions, the orange lines indicate polar H-bonds, the red lines represent hydrogen bonds, and the pink-purple lines represent clashes. (D) Measurement of the binding of RBD mutants to human ACE2 (hACE2, upper panel) and mink ACE2 (mACE2, lower panel) by FACS. His-tagged wild-type RBD, RBD mutants, and NTD were incubated with cells expressing eGFP-fused ACE2. NTD was used as a negative control.

Codon usage and fitness analysis of SARS-CoV-2-encoded proteins

Amino acid substitutions within the SARS-CoV-2 spike receptor-binding motif (RBM) may contribute to host adaption and cross-species transmission. N439K, S477N, and N501Y were found to be the most common variations in the RBM region of the SARS-CoV-2 spike protein (Fig. 4A). Amino acid 439 in the spike does not bind directly to ACE2, but it acts to stabilize the 498–505 loop [32], and the N439K substitution was not found in the animal CoVs analyzed in this study (Supplementary Fig. S2C). A previous computational analysis combined with entropy analysis of the spike showed that the S477N mutant may be less stable than the wild-type protein [33]. For the mutations N501T (AAU>ACU) in minks and N501Y (AAU>UAU) in humans, nonsynonymous nucleotide substitutions were in the first and second codon positions, and interestingly, statistics data showed that both A>C and A>U substitutions occurred at very low frequency in the SARS-CoV-2 genome (Fig. 1F and 2A). Comparison of the synonymous codon usage of mink SARS-CoV-2 and SARS-CoV demonstrated similar codon usage patterns for these two viruses strains (Fig. 4B), consistent with their adaptability in ferrets, which served as hosts for both viruses. The selective coefficient indices shown in Fig. 4C reveal that the relative fitness of SARS-CoV-2 codons differs, with CGA and CGG showing higher fitness scores than the others. The codons for the N501 mutants, ACU for T and UAU for Y, had low fitness scores, indicating that receptor binding ability rather than codon usage bias was the main determinant of selection for these two mutations. In addition, when comparing the 12 ORFs in the SARS-CoV-2 genome, diverse RSCU values were obtained for the different ORFs. UCA-encoded Ser in ORF7b and AGG-encoded Arg in ORF6 were notably preferred (Fig. 4D and Supplementary Table S4).

Fig. 4
figure 4

Codon usage and fitness analysis of the SARS-CoV-2-encoded proteins. (A) Probability of mutations in the receptor-binding motif (RBM). The data were analyzed using MEGA X software. The frequency was calculated using the Datamonkey server, and the figure was produced using WebLogo (https://weblogo.berkeley.edu/logo.cgi). (B) The synonymous codon usage of SARS-CoV-2. The figure was produced using WebLogo. The mink SARS-CoV-2 sequence (GenBank no. MT396266) was compared with that of SARS-CoV strain Tor2 (GenBank no. NC_004718.3). (C) Selective coefficient index for SARS-CoV-2 codons. The codons for the N501 mutants are shown in blue and the codons with the highest fitness are highlighted in red. (D) Analysis of relative synonymous codon usage in SARS-CoV-2-encoded proteins.

Discussion

Tracking of viral variants transmitted among animal hosts or transmitted to animals by human contact could be helpful for understanding the evolution and host adaptation of SARS-CoV-2. The correlation between codon optimization of viral genomes and their host adaptation process has been observed in some viruses such as rotaviruses, ??cyprinid herpesvirus 3??, and Marburg virus [26, 34, 35]. Mink was the first extensively farmed species affected by COVID-19, and epidemiological investigation has suggested that mustelids, including minks and ferrets, are more susceptible to SARS-CoV-2 than other animals [36]. Mink-to-mink and mink-to-human transmission of SARS-CoV-2 have been reported on several mink farms in the Netherlands, Denmark, the USA, and Spain [37,38,39]. Some mutations have accumulated in the viral genomes during transmission of the virus between humans and minks. However, it is challenging to pinpoint whether mutations happened before or after the virus spillover to mink, because it is difficult to distinguish the sequences circulating in the human and mink populations from those involved in cross-species transmission.

In this study, we compared the sequences of mink-derived SARS-CoV-2 isolates and sequences from humans who had contact with infected minks, using human SARS-CoV-2 as a reference sequence. The substitutions are listed in Supplementary Table S2. Some nonsynonymous substitutions, such as C1380U in ORF1a and C14408U in ORF1b, were found in both mink isolates and those from humans who had contact with infected minks. Some unique mutations were found only in the mink isolates and not in those from humans who had contact with infected minks. For example, the nonsynonymous substitutions G520A, G1599U, and A2280C in ORF1a and U23018C in the spike protein (F486L) were present only in the mink isolates and not in the human isolates, which is consistent with the data reported in a previous study [38]. Taking into consideration the high mutation numbers and frequencies, these unique substitutions in mink SARS-CoV-2 are more likely to have appeared after the spillover of infection from human to minks and accumulated during virus spread on mink farms with a large animal population.

Other animals, including tigers, lions, and white tail-deer, have also been found to be susceptible to SARS-CoV-2 infection [12, 40]. Adaptive mutations have also been reported in isolates from deer, and it appears that minimal adaptation is required for onward transmission in minks and deer following human-to-animal spillover [41]. Here, we compared the sequences from deer with those from humans who had contact with infected minks and identified some new nonsynonymous substitutions in the deer sequences, e.g., G8083A and C10319U in ORF1a and G25563U and G25907U in ORF3a. No unique mutations were found in the S gene of the deer sequences.

Given that the usage of synonymous codons in viral genomes varies with the host [26], adaptation to different hosts may affect the codon usage bias of the virus. A previous study revealed that synonymous mutations in SARS-CoV-2 may boost the adaptation of the virus to human codon usage and positively affect viral evolution [42]. In our study, we compared the codon bias of SARS-CoV-2 in minks with that of SARS-CoV in ferrets, both of which can infect both ferrets and minks. Threonine (T) and tyrosine (Y) residues shared comparable codon biases in SARS-CoV-2 and SARS-CoV (Fig. 4B). Notably, the N501Y mutation occurred exclusively in humans, while N501T was found more frequently in minks, suggesting that amino acid 501 played a key role in the virus adaptation in humans and minks that was not related to codon bias. Our data here also show that Y453F could enhance the binding ability of the spike to mink ACE2 (Fig. 3B and F), and we speculate that this mutation was beneficial for virus adaptation and transmission in minks and thus resulted in the extensive spread of SARS-CoV-2 among minks.

The WebLogo diagram in Fig. 4B shows that SARS coronaviruses preferentially have U- or A-ending codons, as has been shown previously [43]. The biased use of purine nucleotide at the third codon position could lead to an imbalance in the tRNA pool and a decrease in host protein synthesis in infected cells. The C-to-U substitution is the most frequent mutation in most of the reported SARS-CoV-2 sequences isolated from animals, with an 8-fold higher level of C-to-U substitutions in the mink sequences than U-to-C substitutions (Fig. 1C and F). This is higher than the previously reported 3.5-fold higher level in minks [15], suggesting that SARS-CoV-2 has evolved over time in minks. Our CAI data suggest that the virus may have adapted with optimized codons better in bats than in dogs and other animals, which is consistent with a previous finding that humans are more favored hosts for SARS-CoV-2 adaptation than dogs [30]. A similar substitution level was found across the whole genome of mink SARS-CoV-2 when compared to human SARS-CoV-2, indicating that the virus could still be in the process of adaptation to its new host species. Further studies are needed to investigate whether the CAI values will increase when the virus has had more time to adapt.

The spike protein is essential for both host adaptability and virus infection. We discovered that three nonsynonymous changes in the RBM domain – Y453F, F486L, and N501T – emerged independently in the mink lineage. These residues are directly involved in receptor binding at the interface of the S-ACE2 complex and are thus important for adaptation of the virus to new hosts. In addition to the mutations in the S protein, amino acid substitutions in ORF1a, ORF9b, E, N, and M have also been identified as significantly associated with increased fitness [44], suggesting that the RBD of the spike protein is not the only region that affects SARS-CoV-2 fitness.

Other mutations within the RBM domain should be monitored for viral transmission. Some residues in this region have been reported to be involved in evading host humoral immunity. For example, the B.1.351 (Beta) SARS-CoV-2 variant carrying the E484K and N501Y mutations, the B.1.617.2 (Delta) variant carrying the L452R mutation, and the BA.2 (Omicron) variant containing the E484A mutation, have improved ability to enter cells, and they can re-infect recovered or vaccinated individuals [45]. The currently available vaccines are less protective against the Delta and Omicron variants, but they can still prevent severe cases [46]. The fitness of these SARS-CoV-2 mutants should be evaluated further to optimize vaccine design and block virus transmission.

In conclusion, we have shown in this study that spike proteins with the mutations Y453F and N501T in mink SARS-CoV-2 recognize the mink receptor better than the human receptor. Our findings may provide a new perspective for the understanding of natural selection and viral fitness of SARS-CoV-2.