Introduction

Marine mammals, which migrated from terrestrial to aquatic environments, are fascinating subjects in the study of evolutionary adaptations to aquatic life. They are classified into several groups under three orders: Cetartiodactyla (cetaceans), Carnivora (pinnipeds and marine fissipeds), and Afrotheria (sirenians) (Berta et al. 2006). Cetaceans, pinnipeds, and sirenians share several common physiological and anatomical features that facilitated their adaptation to aquatic life, despite independently evolving from different ancestors. Marine mammals have unique adaptations to their hemostasis systems, which enable them to survive in aquatic environments. For instance, the deletion of certain blood coagulation factors (F12 and KLKB1) (Yim et al. 2014; Huelsmann et al. 2019) and an increase in activated partial thromboplastin time (aPTT) have been reported in cetaceans (Robinson et al. 1969). This delay in clotting prevents thrombosis and hypertension, which occur when a clot blocks the pulmonary artery. In contrast, the manatee possesses all blood coagulation factors (Barratclough et al. 2016) and tends to experience hypercoagulation compared to other land animals (Gerlach et al. 2015).

Positive selection analysis of genes can be performed to study the evolutionary changes resulting from adaptation. Several studies on the selective pressure of genes in marine mammals (Foote et al. 2015; Zhou et al. 2015; Chikina et al. 2016; Yuan et al. 2021) have used terrestrial mammals as background groups and performed positive selection studies on marine mammal groups. For instance, Chikina et al. (2016) found that marine-accelerated genes are involved in functions related to marine adaptation, including muscle physiology, lipid metabolism, and sensory systems. Yuan et al. (2021) reported analytical results for cetaceans, pinnipeds, and sirenians using 16 terrestrial mammalian species from diverse taxonomic orders. This study identified positively selected genes (PSGs) involved in thermoregulation and vision in marine mammals.

To determine the genetic distinctions between terrestrial and adaptive aquatic mammals within their respective orders, we conducted a comprehensive analysis of positive selection across all available genomes and genes in the three marine mammal groups. Moreover, we aimed to identify the commonly evolved characteristics across diverse groups of marine mammals. A total of 56 high-quality genomes from three orders were used. To determine the evolutionary differences between marine and terrestrial mammals, we performed a branch-site model analysis, revealing 460, 614, and 359 PSGs in cetaceans, pinnipeds, and sirenians, respectively. Through gene enrichment, network, and comparative analyses of the predicted PSGs, common genes and biological functions were identified as the underlying molecular bases for aquatic adaptations. Our primary objective was to broaden our understanding of aquatic adaptations in marine mammals and derive relevant genes.

Materials and methods

Data collection

From the NCBI RefSeq database, we retrieved 24 genomes of Cetartiodactyla (including twelve cetaceans), 24 genomes of Carnivora (including eight pinnipeds), and 6 genomes of Afrotheria (including one sirenian). In addition, two Xenarthra genomes, the closest relatives of Afrotheria, were added to the Afrotheria genome for statistical significance. Based on data from the gene ortholog database (ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_orthologs.gz), we extracted 17,898, 17,883, and 17,759 orthologous gene groups for Cetartiodactyla, Carnivora, and Afrotheria + Xenarthra, respectively, from the 56 genomes (Supplementary Fig. S1, Table S1).

Quality control pipeline for multiple sequence alignment

A multiple sequence alignment (MSA) with adequate accuracy is required for positive selection analysis. Due to the relatively low accuracy of MSA using the extracted coding sequences (CDSs) of the orthologous genes, multiple calculation methods were used (Supplementary Fig. S2). CDSs with frameshift exhibit poor MSA quality. To reduce frameshift errors, CDSs were converted into protein sequences and aligned using Clustal Omega. Subsequently, protein alignment was converted back into a CDS alignment using previous translation information, and CDSs with premature stop codons were excluded. The Clustal Omega guide tree option was used for the alignment. The NCBI taxonomy and TimeTree database were used to obtain the taxonomic information (Kumar et al. 2017). A guide tree for each MSA was generated in advance by replacing the pairwise alignment step with a faster and more accurate MSA step, for the clustering sequences. Using in-house programs to scan each CDS MSA, sequences with a low alignment quality were detected and excluded from the MSA. In addition, the front and back areas of the MSA with low alignment accuracy were trimmed to produce a high-quality multi-alignment region, which we termed Clean Segment Block (CSB). After manual inspection, low-quality sequences or alignment sites that were not detected by the scan programs were identified and adjusted for realignment. The final MSA, obtained after completing the aforementioned steps, was used as the input for positive selection analysis (Supplementary Fig. S2).

Positive selection analysis

The branch-site model of the CODEML program in the PAML 4.9j (Yang 2007) package was used to predict PSGs. An ancestral branch of cetaceans, pinnipeds, and sirenians was set as the independent foreground in each of the Artiodactyla, Carnivora, and Afrotheria + Xenarthra datasets, using a branch-site model. To ensure statistical reliability, analysis was performed only when seven or more species were included in the CSB for each orthologous gene dataset.

The significance of the likelihood difference (2ΔL) between the null hypothesis model (ω = 1) and the alternative hypothesis model was evaluated using a chi-square distribution with np2-np1 degrees of freedom. Using empirical Bayes analysis, all genes that rejected the null hypothesis and adopted the alternative hypothesis presented positively selected codons in the foreground branch. After performing false discovery rate (FDR) correction for multiple testing, the PSGs used for analysis were filtered using a cutoff value of FDR < 0.05.

Prediction of biological functions of PSGs

To investigate the functional distribution of PSGs in each group of marine mammals, we used clusterProfiler (Yu et al. 2012) to perform gene enrichment analysis.

For a more integrated view of the functions related to PSGs, hub genes were predicted for each group by the protein–protein interaction network using the STRING V11 database and 11 topological analysis methods (Degree, Edge Percolated Component [EPC], Maximum Neighborhood Component [MNC], Density of Maximum Neighborhood Component [DMNC], Maximal Clique Centrality [MCC], Bottleneck, EcCentricity, Closeness, Radiality, Betweenness, and Stress) of CytoHubba 0.1 (Chin et al. 2014), provided in Cytoscape 3.8.2 (Shannon et al. 2003). The top 10 genes ranked by each method were selected, and the genes shared by at least five methods were identified as “hub genes.”

In addition, common PSGs from each group pair, including, cetaceans + pinnipeds, pinnipeds + sirenians, and sirenians + cetaceans, were investigated to identify common genes that were subjected to selection pressure to adapt to marine environments.

Results

Identification of PSGs and enrichment analysis

We selected 11,449 orthologous genes from Cetartiodactyla, 14,838 from Carnivora, and 5,270 from Afrotheria + Xenarthra for positive selection calculation using CODEML after MSA. Using the branch-site model, 460, 614, and 359 PSGs (adjusted p-value < 0.05) were detected in cetaceans, pinnipeds, and sirenians, respectively (Supplementary Tables S2–S5).

Functional enrichment analysis using PSGs for each group of marine mammals found gene ontology (GO) terms related to hemostasis across all groups, with a significance threshold of p < 0.05. These terms included "coagulation,” “regulation of hemostasis,” and “negative regulation of hemostasis” (Fig. 1A). For hemostasis systems, we identified six PSGs (PLAU, PRKG1, TSPAN8, COL3A1, MMRN1, and SYK) in cetaceans, one (TPH1) in pinnipeds, and four PSGs (GGCX, PLA2G4A, DGKE, and PEAR1) in sirenians (Table 1). Specifically, four genes (PLAU, MMRN1, GGCX, and PEAR1), related to various aspects of hemostasis and coagulation, were predicted to cause protein damage due to amino acid substitutions in at least one program in the analysis results (Table 1).

Fig. 1
figure 1

Association between PSGs in marine mammals and the hemostatic system. A Gene enrichment results using clusterProfiler (p < 0.05). This study included 460 genes from cetaceans, 614 genes from pinnipeds, and 359 genes from sirenians. B The major components of hemostasis. C The number of PSGs in marine mammalian groups related to each function was obtained from the GeneCards database (https://www.genecards.org/)

Table 1 Prediction of the potential effects of amino acid substitutions in positive selection sites in genes related to hemostasis

Hemostasis is a complex physiological process involving three major steps: vasoconstriction, platelet activation, and the coagulation cascade (Fig. 1B). Blood clots formed during this process are typically removed from the body through a mechanism that dissolves them. Thus, we identified genes related to hemostasis or coagulation using the GeneCards database based on five keywords: “hemostasis,” “coagulation,” “thrombosis,” “embolism,” and “fibrinolysis.” The results of matching the PSGs of each group of marine mammals to the derived genes are shown in Fig. 1C. Our findings indicate that aquatic adaptations in marine mammals are more strongly associated with blood coagulation than with clot dissolution in the hemostatic system.

Investigation of common PSGs and hub genes

Common genes were investigated for each pair of PSGs in the cetaceans + pinnipeds, pinnipeds + sirenians, and sirenians + cetaceans groups (Fig. 2A, Supplementary Table S6). Seven genes (C2orf73, PAX5, SCN9A, ITPRID2, PMFBP1, MYCBPAP, and SRRM4) were identified as common PSGs in the cetaceans + pinnipeds group. Both groups had the same amino acid substitution (I990T) in SCN9A, which encodes the voltage-gated sodium channel Nav1.7 (Fig. 2B). Previous studies analyzing the SCN9A gene in sperm whale and baleen whale lineages also found positive selection sites located in critical functional regions of the gene. Congenital insensitivity, or diminished response to pain, is caused by mutations in SCN9A (Raouf et al. 2010; Ding et al. 2022). To investigate the effect of this amino acid change on protein structure and function, we performed an analysis using the PolyPhen program (Adzhubei et al. 2013). This analysis predicted that amino acid alterations would have detrimental effects on protein function.

Fig. 2
figure 2

Common genes for each group of marine mammals. A Venn diagram illustrating the shared genes in cetaceans, pinnipeds, and sirenians. Seven PSGs were detected in cetaceans and pinnipeds. B Identical amino acid substitution in the SCN9A gene of cetaceans and pinnipeds. Evolutionary changes led to isoleucine (I) to threonine (T) amino acid substitution at site 990 of the SCN9A protein in cetaceans and pinnipeds, based on the human protein sequence. The silhouette images were collected from PhyloPic (http://www.phylopic.org/). C Schematic diagram of how the KRAP gene affects calcium ion regulation. The positive selection sites for KRAP and the predicted effects of amino acid substitutions on protein function can be found in Supplementary Table 7. CREB binding protein (CBP); E1A binding protein P300 (EP300); Endoplasmic reticulum (ER)

SRRM4 encodes serine/arginine repetitive matrix protein 4 (Safran et al. 2010), and its mutations are associated with deafness (Nakano et al. 2012). Marcovitz et al. (2019) studied the G310E mutation of SRRM4 in toothed whales, dolphins, and bats that underwent convergent evolution of echolocation, and a clear protein-coding signal was observed in echolocating bats and whales. Pinnipeds do not echolocate (Schusterman et al. 2000) but require auditory adaptations in aquatic and terrestrial environments for an amphibious lifestyle (Hanke and Dehnhardt 2013). Consequently, amino acid changes in SRRM4 may help pinnipeds adapt to aquatic environments (Supplementary Table S7).

ITPR Interacting Domain Containing 2 (ITPRID2), also known as KRAP or SSFA2, plays a crucial role in the localization and function of inositol trisphosphate receptors (IP3Rs) that mediate the release of Ca2+ from the endoplasmic reticulum (Arige and Yule 2021; Vorontsova et al. 2022). The effects of amino acid changes at positively selected positions on protein function and structure were predicted in each group of aquatic mammals using PROVEAN (Choi and Chan 2015), SIFT (Ng and Henikoff 2001), and PolyPhen-2 (Adzhubei et al. 2013). At least one selection site for each gene was identified as potentially damaging (Supplementary Table S7).

The MAT2B gene, observed in both pinnipeds and sirenians, encodes a member of the MAT family. Studies have linked this gene to tumorigenesis in hepatocellular carcinoma, colon cancer, and malignant melanomas. Additionally, SPA17 is found in sirenians and cetaceans (Peng et al. 2013; Tang et al. 2017; Yuan et al. 2019). Positive selection for this gene has been observed in high-altitude frogs. However, this gene is newly discovered in marine mammals (Sun et al. 2018). SPA17 encodes a cell surface transmembrane protein that plays a role in male reproductive functions, such as spermatogenesis and sperm-egg interactions (Chiriva-Internati et al. 2009). For the first time, we present this gene as a positive selection candidate for aquatic reproduction in fully aquatic mammals, including cetaceans and sirenians.

Using hub gene analysis, we identified five genes (DKC1, CCT4, PA2G4, CCT2, and RRM1) in cetaceans, four genes (APOA1, CD4, EP300, and SOX9) in pinnipeds, and seven genes (TMED1, COX15, DERL2, STYK1, MAP2K5, RB1CC1, and RBX1) in sirenians (Table 2).

Table 2 List of hub genes for each group of marine mammals

Discussion

Previous studies identified PSGs by comparing marine mammal branches across the entire mammalian tree. In contrast, our study focused on comparing each marine mammal with terrestrial mammals of the same order. Through an extensive analysis using the branch-site model in CODEML, we detected 460, 614, and 359 PSGs in cetaceans, pinnipeds, and sirenians, respectively.

We performed a functional enrichment analysis of PSGs from three groups of marine mammals and found that genes related to hemostasis were positively selected across all lineages. Hemostasis is the process by which bleeding from the blood vessels is stopped (LaPelusa and Dave 2023). This process includes three main stages: vascular contraction, platelet clotting, and coagulation cascade (Robinson et al. 1969; LaPelusa and Dave 2023). Previous studies have reported the loss or selection of blood coagulation factors in Cetacea (Yim et al. 2014; Huelsmann et al. 2019). Under hypoxia, extended pulmonary arteries have been observed in sea lions (Olson et al. 2010).

We identified four genes (GGCX, PLA2G4A, DGKE, and PEAR1) that may cause protein damage owing to amino acid substitutions at positively selected sites. To our knowledge, this is the first study to identify MMRN1, GGCX, and PEAR1 as PSGs in marine mammals. PLAU, also known as urokinase-type plasminogen activator, encodes a secreted serine protease that converts plasminogen to plasmin. This gene is associated with diseases such as Quebec platelet disorder, which causes bleeding disorders (Paterson et al. 2010). This disease is characterized by the overexpression of PLAU (Liang et al. 2020). MMRN1, also known as multimerin-1, encodes a protein found in platelets and blood vessel walls that may have platelet adhesive functions (Tasneem et al. 2009). MMRN1 deficiency is associated with Quebec platelet disorder, and it plays a role in thrombus formation (Leatherdale et al. 2021). GGCX, also known as gamma-glutamyl carboxylase, encodes an enzyme required for the activation of vitamin K-dependent clotting factors. Mutations in this gene are associated with bleeding disorders (Hao et al. 2021; Rishavy et al. 2022). The PEAR1 (platelet endothelial aggregation receptor 1) gene activates platelet cells by encoding a protein in the platelet (Nanda et al. 2005; Johnson 2016). Therefore, these genes may be linked to delayed blood coagulation in cetaceans (Lohman et al. 1998). These results are consistent with those of previous studies. In contrast to other marine mammals, manatees tend to exhibit higher coagulation (hypercoagulation) than terrestrial mammals (Gerlach et al. 2015; Barratclough et al. 2016). An herbivorous diet comprising vitamin-K-rich seaweed and seagrass reduces blood clotting time (Shiau and Liu 1994; Wei et al. 2023). The exact mechanisms require further research; however, their dietary habits and genes (such as GGCX) might be related.

Although no PSG related to blood coagulation factors or platelets was found in the pinnipeds, we identified TPH1 as a PSG. The TPH genes, including TPH1 (expressed in the periphery) and TPH2 (expressed in the brain), are responsible for the production of serotonin (5-HT), a potent vasoconstrictor. TPH1 is involved in blood vessel regulation. In a TPH1 null mouse experiment, symptoms such as reduced cardiac output, increased heart rate, and abnormal blood circulation were observed (Côté et al. 2003). This gene is also associated with pulmonary hypertension through serotonin regulation. Although pulmonary hypertension typically occurs in oxygen-deficient mammals, it is absent in diving mammals such as pinnipeds (MacLean 2007; Olson et al. 2010). Therefore, TPH1 may have enabled pinnipeds to adapt to land and sea habitats by affecting their blood circulation-related functions.

For the first time, we reported positive selection results for KRAP, which encodes a protein involved in the function of IP3Rs in Ca2 + release, in marine mammals (Vorontsova et al. 2022). Calcium ions in the cytosol control various cellular activities, such as cell death, survival, secretion, gene transcription, and metabolism. Moreover, calcium regulation is associated with anti-obesity (Berridge et al. 2000; Zemel 2002; La Rovere et al. 2016; Kania et al. 2017; Zhang et al. 2019). IP3Rs regulate calcium ion levels by releasing them from the endoplasmic reticulum (Arige and Yule 2021; Thillaiappan et al. 2021; Vorontsova et al. 2022). KRAP is necessary for IP3Rs to function correctly and allow for an accurate location (Fig. 2). KRAP (-/-) mice have been reported to exhibit a higher basal metabolic rate, glucose tolerance, and hypoinsulinemia than wild-type mice (Fujimoto et al. 2007; Chou et al. 2010; Hägg et al. 2015). A recent study has reported that cAMP response element-binding proteins (CREB) regulate KRAP expression (Arige et al. 2021). In previous studies, positive selection for the CREB-binding protein (CBP) and EP300 (known coactivators of CREB) has been reported in cetaceans and pinnipeds, respectively (Noh et al. 2022). Our study identified EP300 as a PSG and hub gene in pinnipeds, and there is evidence linking CBP and EP300 to fat metabolism and obesity (Nishimura et al. 2015; Namwanje et al. 2019). These results suggest that the regulation of calcium signaling may also affect metabolism and fat accumulation in marine mammals.

In summary, we identified PSGs that have undergone selection pressure for adaptation to the aquatic environment in marine mammals, including the hemostatic system. These selected genes provide insight into the physiological characteristics of marine mammals and help address questions regarding their adaptation to aquatic life.