Introduction

The family Unionidae, commonly known as freshwater mussels/pearly mussels, is one of the speciose groups of the order Unionida with more than 700 species [1,2,3,4]. It comprises six subfamilies viz., Ambleminae (329 species), Gonideinae (157 species), Modellnaiinae (1 species), Parreysiinae (116 species), Unioninae (173 species), and Qiyangiinae (only fossil species) [5]. They inhabit rivers, lakes, ponds and have a wide distribution in all the continents except Antarctica. They play an important role in the freshwater ecosystem by nutrient recycling [6], water purification [7] and bioturbation [8]. Freshwater mussels have been used as model organisms for evolutionary studies due to a unique pattern of mitochondrial inheritance called doubly uniparental inheritance (DUI) in these animals [9,10,11]. Some of the freshwater mussels produce quality pearls and are contributing to the global economy [12].

However, populations of freshwater mussels are declining due to anthropogenic activities such as river pollution, alteration of river banks, and climate change [13]. The conservation status of freshwater mussels is assessed under different categories of IUCN and several species are on the verge of extinction [14]. Further, conservation efforts have been impeded by a lack of accurately identified conservation units and little knowledge on molecular systematics [15]. Phylogenetic relationships among the extant species would provide additional information on the evolutionary significant units.

Lamellidens marginalis was originally described by Lamarck in 1819 from West Bengal, India (erstwhile Bengal). He placed the species under the genus Unio Philipsson and the family Naiida (Larack: Les Nayades). Later, the species was grouped in the family Unionidae Rafinesque, 1820. Based on anatomical variations (number of demibranches), Simpson [16] established the genus “Lamellidens” and placed the species ‘marginalis’ under this genus. Under the Unionidae family, L. marginalis is classified in the Parreysiinae subfamily and Lamellidentini tribe. It has a wide distribution across India, Nepal, Bangladesh and Sri Lanka [17]. It is a good dietary source of minerals and contains a considerable amount of calcium, phosphorus, iron, sodium, potassium, magnesium, manganese, zinc and selenium [18]. This species has been reported as an alternative food source due to the presence of essential amino acids and fatty acid [18]. As other Unionidae family species, L. marginalis can also secrete a nacre layer around the foreign particle and produce quality pearls [19]. In India, it is the most widely used species for freshwater pearl production [20]. Often it causes pressure on natural stocks as the seed for culture is collected from the wild stocks. It could lead to the depletion of the stocks/extirpation of the species in the absence of conservation measures. Further, the diversity of the Unionidae species from India has not been studied using molecular markers. Thus, knowledge of phylogenetic diversity is essential for identifying the management units. The evolutionary relationship of the Unionidae species would provide information on trait evolution such as pearl production in this group. Mitochondrial DNA has been successfully used to study the phylogenetic diversity and evolutionary relationship of the Unionidae family species [21].

Accurate and robust phylogenies are possible with a wide coverage of species, genome and geographical locations [3, 22]. In this context, the genomic resources for the Parreysiinae subfamily (Lamellidentini tribe) are lacking. Thus, the present study is carried out to decipher the complete female mitochondrial genome of L. marginalis along with its phylogenetic relationship within the Unionidae family.

Materials and methods

Sampling and DNA sequencing

One individual of female Lamellidens marginalis was collected from the Damring River (25° 30′ N 90° 30′ E), Garro Hills, Meghalaya. The sex of the mussel was identified by histological methods. The total genomic DNA was isolated from mantle tissue using the Exgene™ DNA purification kit (GeneAll Biotechnology Co. Ltd., Seoul, Korea). An Illumina paired-end library was prepared using the TruSeq DNA Sample Prep Kit™ following the manufacturer’s instructions (Illumina, San Diego, California, USA). The libraries were sequenced by the MiSeq Benchtop sequencer™ using paired end 250 bp read length (Illumina, San Diego, California, USA). PRINSEQ v0.20.4 was used to check the quality of the sequences and to trim the low-quality data (Phred scores < 20) [23].

Mitogenome analysis

Geneious Prime™ software was used for denovo assembly of ~ 2.5 million reads (average length 400 bp; range: 300–500 bp) to produce a single circular form of complete mitogenome. Assembled mitogenome was annotated using MITOS web server [24] and confirmed by the NCBI ORF Finder and Blastn analysis (nucleotide BLAST). The sequence was submitted to the NCBI GenBank with an accession number MT230549. The structure of transfer RNA (tRNA) was predicted using ‘tRNAscan’ webserver [25] with a search mode ‘tRNAscan only’ using invertebrate mitochondrial genetic code. The frequency of bases, codon usage and genetic distance values were estimated using MEGA X [26].

Phylogenetic analysis

For phylogenetic analysis, a dataset was prepared by downloading the reported mitochondrial genomes of the Unionidae (n = 35) family from the NCBI GenBank (Supplementary Table S1). Margaritifera dahurica and M. falcata were assigned as outgroups in the phylogenetic and biogeographic analysis. The jModelTest 2 software was used to estimate the evolutionary models [27] and the model General Time Reversible with addition of invariant sites and a gamma distribution of rates (GTR + I + G) [28] was found as the best model to describe the evolutionary relationship among the Unionidae family. Phylogenetic trees were reconstructed using PAUP ver. 4.0 [29] by implying Maximum Parsimony (MP), Maximum Likelihood (ML) and Neighbor-Joining (NJ) methods. Mr Bayes3.2 was used to reconstruct the tree using the Bayesian inference (BI) method [30]. The Bayesian analysis was performed with the following conditions: 10 million iterations with sampling every 1000 generations, two parallel runs, with one cold chain and three heated chains. The stationarity of posterior probabilities was assessed by observing the congruence in split frequencies of standard deviation.

Divergence time estimation

BEAST v. 2.5 was used to estimate the divergence time based on the fossil calibration data. A lognormal relaxed clock algorithm with a Yule speciation process was deployed for the time calibration [31]. Hasegawa–Kishino–Yano (HKY) nucleotide substitution model followed by Markov Chain Monte Carlo (MCMC) analysis was performed to estimate phylogeny [32]. Five replicate of BEAST searches were run for proper randomization of Effective sample size (ESS). Then phylogenetic log files having ESS < 300 were excluded from further analysis. Other log files obtained from all the runs were then combined using the LogCombiner v. 1.8.4. TreeAnnotator v. 1.8.4 was used to produce a maximum clade credibility phylogenetic tree.

Results

Mitogenome organization and nucleotide composition

The length of the mitochondrial genome was 15,732 bp consisting of 23 tRNAs, 2 rRNAs and 13 protein coding genes (PCGs). Out of 38, 11 genes (i.e., trnH, trnD, nad3, nad4l, nad4, nad5, cox1, cox2, cox3, atp8 and atp6) were located on the heavy strand (G + T content: 61.7%), whereas remaining 27 genes were encoded on the light strand (G + T content: 38.3%) (Fig. 1). The nucleotide frequencies are A: 25.8, T: 36.9; G: 24.8 and C: 12.5% with a AT content of 62.7%. Intergenic spacer regions of 1 to 272 bp were spread over in 31 locations across the genome. Among these, four major noncoding regions (> 100 bp) were observed between trnQ-nad5 (272 bp), nad5-trnF (204 bp), trnH-nad3 (141 bp) and trnA-trnH (177 bp). In total, 1096 bp of non-coding region, i.e. 6.9% of the total mitogenome was observed (Table 1). The unassigned region between trnH-nad3 showed an ORF with a length of 135 bp (44 aa). The predicted ORF showed a putative transmembrane domain (TM) with low probability values (Supplementary Fig. S1). The region between nad5 and trnF showed hairpin-loop structures with A + T content value of 66.5% (Supplementary Fig. S2). Overlapping regions varied from 2 to 61 bp were found between trnK-rrnS (2 bp), trnG-nad1 (61 bp), cytb-trnP (9 bp), trnL2-rrnL (9 bp) and rrnL-trnY2 (55 bp).

Fig. 1
figure 1

Gene map of the Lamellidens marginalis mitogenome. Genes encoded on the heavy strand are mapped outside the outer circle and are transcribed counter clockwise. Genes encoded on the light strand are mapped inside the outer circle and are transcribed clockwise. The inner graph is the GC content of mitochondrial sequences, and the circle inside the GC content graph marks the 50% threshold. Gene map was generated with the OrganellarGenomeDRAW (OGDRAW) 1.3.1. (https://chlorobox.mpimp-golm.mpg.de/OGDraw.html)

Table 1 Mitochondrial genome organization of Lamellidens marginalis

Protein coding genes and codon bias

The total length of the 13 PCGs were 11,049 bp, encompassing 70.23% of the total mitochondrial genome. The base composition was A: 26.3, T: 30.8, G: 27.7 and C: 15.3% with AT content value of 57.1%. The longest and shortest genes were nad5 (1617 bp) and atp8 (177 bp), respectively. Five types of start codons i.e. ATA, ATT, ATG, GTG and TTG were observed. Of 13 PCGs, six genes (atp6, cox2, cox3, nad2, nad4l & nad6) showed typical ‘ATG’ as start codon, three genes (nad1, nad5 & cyt b) revealed ‘ATT’ as initiation codon, and two genes (cox1, nad4) showed ‘TTG’ as start codon. Genes atp8 and nad3 possessed ‘GTG’ and ‘ATA’ as initiation codons, respectively. Typical stop codons (TAA/TAG) were found in all the genes. A total of 3,683 codons were predicted from 13 protein coding genes. More number of synonymous codons were observed for Leucine, Arginine and Serine (Table 2). Codon bias was observed for TTT (F) followed by TTA (L) (Table 2). The relative synonymous codon usage analysis showed high value for the codon “AGG (R)” (Fig. 2). The values of “AT” and “GC” skewness are − 0.240 and 0.216, respectively. It confirms bias toward T over A and toward G over C.

Table2 Codon usage of mitochondrial protein coding genes of Lamellidens margninalis
Fig. 2
figure 2

Relative synonymous codon usage (RSCU) of the mitochondrial genome of Lamellidens marginalis. The 20 codon families consisting of a total of 60 degenerate synonymous codons are plotted on the x-axis. The label for the codons that compose each family is shown in the boxes below the x-axis, and the colours correspond to the colours in the stacked columns. The most used synonymous codon in each family is in green. The RSCU values are shown on the y-axis

Across the species, nad6 gene showed a high divergence value of 0.427 followed by nad2 (0.397) and atp8 (0.396) (Fig. 3). The arrangement of mitochondrial genes was compared across the species of Unionidae and found that L. marginalis displayed a UF1-type gene arrangement, except the location of putative ‘f-orf’.

Fig. 3
figure 3

Gene-wise divergence rates across the Unionidae species

Transfer and ribosomal RNA genes

Out of 23, 21 tRNAs were encoded by the light strand while the remaining two tRNAs were encoded by the heavy strand. The length of tRNAs varied from 59 bp (trnY1) to 68 bp (trnS1). Two additional tRNAs were identified for amino acids Leucine, Serine and Tyrosine. Most of the tRNAs showed clover leaf-like structures without variable loop (Supplementary File 1). Fourteen tRNAs (trnT, trnW, trnM, trnE, trnS1, trnS2, trnA, trnH, trnD, trnG, trnV, trnI, trnC and trnY) showed mismatch between ‘G-U’ within stem region of tRNA.

The 12S rRNA is located between trnK and trnR. The nucleotide frequency was A: 36.4, T: 26.8, G: 15.4 and C: 21.5% with a A + T content of 63.2%. The large subunit (16S rRNA) was located between trnL2 and trnY2 with a nucleotide frequency of A: 29.1, T: 34.7, G: 21.0 and C: 15.2% (Supplementary Fig. S3 and S4).

Phylogenetic and biogeographic analysis

The phylogenetic trees built by Maximum Parsimony (MP), Maximum Likelihood (ML) and Bayesian Inference displayed the similar tree topologies. In the consensus phylogenetic tree, the species formed into three major clades with significant bootstrap/posterior probabilities (bootstrap: 90%; posterior probability: 0.9). Clade I consist of species of Unioninae (tribes: Anodontini; Cristariini; Lanceolariini; Unionini; Nodulariini), Gonideinae (Lamprotulini) and Parreysiinae (Lamellidentini). Clade II includes species of Gonideinae (Gonideini; Pseudodontini; Lamprotulini; Rectidentini and Chamberlaniini). Clade III comprises species from the subfamily Ambleminae (tribes: Lampsiliini and Quadrulini). Perrysiinae formed as a sister group to the Unioninae subfamily. Gonideinae formed as a sister group to the Perrysiinae and Unioninae. Ambleminae formed as a distinct clade (Fig. 4). In the biogeographic analysis, the most recent common ancestor (MRCA) of Unioninae and Parreysiinae was placed in the Jurassic (mean age 189 Ma, 95% HPD 179–199 Ma) (Supplementary Fig. S5).

Fig. 4
figure 4

Phylogenetic tree of Unionidae reconstructed from concatenated f-type mitochondrial DNA protein-coding genes. Values for branch support are represented in the following order: Maximum Parsimony/Maximum Likelihood/Bayesian Posterior

Discussion

Based on previously reported data, size of the F-mitochondrial genome in the Unionidae family varies from 15,637 (Anodonta anatina) to 16,746 bp (Chamberlainia hainesiana) with a mean value of about 16 kilobases (kb). In the present study, the mitogenome size was 15,732 bp. The variation in the genome size is due to the difference in the length of the non-coding region [11] and gene duplication [33]. In the present study, the cumulative size of the non-coding region was 1096 bp dispersed throughout the genome.

In the study, codon bias was observed for TTA (F) followed by TTA (L). Previous studies also showed a similar observation in the Unionidae family including the basal molluscan representative Katharina tunicata [34, 35]. It shows a historical constraint of codon usage across the phylum [36].

Unlike other animals, mussels have doubly uniparental inheritance wherein both the parents transmit their mitochondrial genome to the offspring [37,38,39,40]. Accordingly, the mitogenome transmitted from female and male is known as “F-genome/F-type” and “M-genome/M-type”, respectively. The gene arrangement differs in these gender-associated mitogenomes and F-type mitogenomes have trnH between nad3 and trnA regions. The M-type mitogenomes consist of trnH between nad5 and trnQ [41, 42]. In the present study, trnH was present between nad3 and trnA region and confirmed the type of mitogenome. Further, gender-specific unique novel open reading frames (f-orf: female; m-orf: male), that could code for novel proteins have been reported from Unionidae family species [43, 44]. Previous studies have reported unassigned regions between trnE-nad2 and attributed as f-orf [41, 45]. However, in the present study, we could not find the considerably large fragment (> 100 bp) in this region. Instead, the region between trnH-nad3 showed an orf with a putative transmembrane domain. No previous studies have observed an “orf" in this region. Further, the length of the putative ‘f-orf’ is relatively less than the other reported ‘f-orf’s. The Blastn analysis of the putative “f-orf” has not shown any hits/matches with the NCBI GenBank reference database. We hypothesize that this region could be the “f-orf” in L. marginalis. Nevertheless, this has to be confirmed with a large sample size. In the present study, the unassigned/intergenic spacer regions between trnQ-nad5, nad5-trnF, trnA-nad3 could be the control regions that regulate DNA replication and transcription [46].

In Bivalvia, the number of tRNAs and their arrangement is dynamic and often differs from the standard set of 22 genes [47]. In the present study, an additional tRNA for tyrosine was observed and it could be due to tandem duplication of the original gene [48]. Previous studies have attributed slipped-strand mispairing and imprecise termination of replication mechanism for tandem duplication of the genes [49, 50]. Further, the gene for tRNA (Glycine) is completely overlapped with the nad1 gene and this kind of gene integration has been reported in other bivalves [51]. It shows the selective constraint on the size of the mitochondrial genome.

The location, content and structure of the control region vary greatly among the Unionidae species [41, 52]. Previous studies assigned the non-coding region between trnQ-nad5-trnF as the control region because of their ability to form the secondary structures and high AT content [41, 53]. In this study also, the same region displays the properties of the control region.

In Unionidae, unlike other animal groups, gene arrangement in the mitochondrial genome is more variable [36]. By comparing the reported mitochondrial genomes, Lopes-Lima et al. [54] reported two types of gene orders in F-type mitogenomes of the Unionidae family (UF1, UF2). Lamellidens marginalis gene arrangement corresponds to the UF1-type arrangement (cox1-cox2-nad3-trnH-trnA-trnaS2-trtrnE-nad2). Previous researchers reported UF-1, UF-2, and UM-1 gene order for female and male mitochondrial genome respectively, in the family Unionidae [54].

In Bivalvia; genes atp6, atp8, nad2, nad4L, nad5 and nad6 have been reported to have high evolutionary rate than other genes [55, 56]. In this study also, we observed a high divergence rate in nad6, nad2, atp8 and nad5. We hypothesize that these genes (except atp8; see [57]) could also be used for population genetic studies of F-type mitochondrial lineages of the species.

Previous studies used morphological and anatomical characters such as shell size [58], morphology of glochidia larvae [59] and demibranches [60] to classify the mussels. However, these traits are prone to phenotypic plasticity and often inflate the species number [61]. With the development of molecular biology and statistical tools, several researchers have used molecular phylogenetic approaches to resolve the taxonomic ambiguity, describing new species, and establish evolutionary relationship among the Unionidae species [54, 61,62,63,64,65,66,67,68]. However, the phylogenetics of the Unionidae family is incomplete due to limited sampling of species and geographical coverage.

In the present phylogenetic tree, the species of subfamily Unioninae formed monophyly with significant bootstrap values. This clade consists of tribes Cristariini, Anodontini, Unionini, Lanceolariini and Nodulariini. Two species of Lamprotulini (Lamprotula gottschei and Schistodesmus lampreyanus) have formed within the cluster of Unioninae. Probably these species could be misidentified by the previous researchers and the same has been reported especially for the L. gottschei [60,61,62,63,64,65,66,67,68,69,70]. The tribe Cristariini and Anodontini formed as sister species. The species of the tribe Unionini (Acuticosta chinensis, Unio pictorum, Aculamprotula tientsinensis and A. tortuosa) formed as paraphyly. This observation warranted further studies on these species. The present species L. marginalis (Parreysiinae) formed a sister branch to the Unioninae subfamily and a similar observation has been reported by previous researchers [71, 72].

In the present study, the divergence time of the Parreysiinae from its most recent common ancestor (MRCA) was placed in the Mesozoic era (Jurassic period). Bolotov et al. [22] reported Parreysiinae is the most ancient clade and MRCA of the Parreysiinae could be originated in western Indo-China. A Large number of species from India is required for further biogeographic studies of lamellidens species.

In conclusion, the study characterized the complete mitochondrial genome of Lamellidens marginalis and reports its phylogenetic position within the Unionidae family. This information will be useful for the management and conservation of the mussel resources.