Introduction

Crabs are the members of decapod crustaceans belonging to the Brachyura suborder with short stalked eyes; short, broad and more or less flattened bodies (carapace) with small abdomens that are folded under the thorax; inhabiting marine, brackish, or freshwater. Crab fishery is now emerging as an important sector in Bangladesh. The country exports mainly three crab species which includes mud crabs, Scylla serrata or other Scylla species; three spot swimming crabs, Portunus sanguinolentus and blue swimming crabs, Portunus pelagicus (Roy et al. 2012). In the fiscal year 2018–2019, Bangladesh has exported 470.23 metric tons of crabs after fulfilling the local demand (DoF 2019). A total of 38 crab species under 11 families are recorded from Bangladesh among them 18 are categorized as Data Deficient, 15 Least Concern and five Vulnerable (IUCN Bangladesh 2015). Most of them are either heavily exploited or under intense pressure from habitat destruction as well as anthropogenic and natural activities.

The authentic identification of organisms is crucial for biodiversity assessment and conservation. The identification of the brachyuran crabs is usually based on the morphometric and meristic characteristics. Traditional morphological identification sometimes becomes unable to discriminate look alike or damaged specimens. Molecular characterization enables to discriminate closely related species or cryptic species (Bezeng and van der Bank 2019), damaged specimens, eggs, larva (Brandão et al. 2016), or any stages of life where morpho taxonomy is incompetent. One of the key factors for the successful application of DNA barcoding is the availability of reliable sequences in reference libraries. Newly generated DNA barcodes can be checked for taxonomic conflicts, species identification and products analysis by comparing their sequences against this barcode reference library. The ambiguity in taxonomic identification of some crab’s species, description of new species and even mislabeling detection of crab species in the markets have successfully been resolved from different geographic locations (Abbas et al. 2016; Balasubramanian et al. 2014; Lai et al. 2010; Knowlton and Leray 2015; Raupach et al. 2015; Van der Meij et al. 2015) using the molecular approach.

DNA barcoding, using COI as a universal gene region and a standard analytical technique, greatly facilitated species discovery and identification in a wide variety of lineage (Hebert et al. 2003a, 2003b, 2004a, 2004b; Hajibabaei et al. 2006; Lopez-Vaamonde et al. 2021; Montes et al. 2017; Ševčík et al. 2016; Ward et al. 2005). In most organisms, the COI gene has been suggested as the standard barcoding marker, and the genetic distance and phylogenetic tree-based analysis are suggested as the ideal barcoding approaches (Hebert et al. 2003a; Ratnasingham and Hebert 2007; Ward et al. 2005). High species-level identification rates are well defined for many species based on COI barcoding, including 98% for marine fishes and 93.6% for birds (Ward 2009), 95.27% for northwestern Pacific mollusca (Sun et al. 2016), with increases in species diversity observed in many regions (Puckridge et al. 2013; Ward et al. 2008; Zemlak et al. 2009). However, some complexity, such as the hybridization and introgression of species and the discrimination of recently segregated species, concerns the use of COI barcoding (Moritz and Cicero 2004; Ward 2009; Ward et al. 2005).

As an alternative candidate barcode, mitochondrial markers like 16S ribosomal RNA could be considered. The 16S rRNA sequence as a conserved gene can measure the true divergences between distantly related organisms and can be easily amplified and sequenced across various animals (Lakra et al. 2009; Ma et al. 2015, 2013). In many organisms it has been used successfully to distinguish specific species, including Zoantharia (Sinniger et al. 2008), hydrozoans (Zheng et al. 2013), fishes (Chakraborty and Iwatsuki 2006; Lee et. al. 2014), and amphibians (Vences et al. 2005). The combination of conserved and variable regions makes the gene popular for reconstructing animal phylogenies (Vences et al. 2005), allowing the study of the old evolutionary relationship and also recent separation events.

The present study aims to explore the use of the COI and 16S rRNA gene in DNA barcoding of crabs of Bangladesh. We focus on interpreting pros and cons of two candidate barcodes in studying genetic divergence and understanding phylogenetic relationships among species. Understanding the effectiveness will allow us to make the definite use of the marker genes in diversity, evolution and conservation study.

Materials and methods

Specimen collection and species identification

The study was conducted in the southern region of Bangladesh. Sample specimens include adult crabs collected mainly from the Cox’s Bazar (21.43 N 91.82 E), Moheshkhali (21.29 N 91.53 E), Banshkhali (21.99 N 91.95 E), Hatiya (22.30 N 91.06 E) and Patuakhali (22.36 N 90.33 E) coastal areas, between July 2017 to December 2019. Immediately after collection, the crab specimens were kept in the cool icebox and carried to the Advanced Fisheries and DNA Barcoding Laboratory, Department of Zoology, University of Dhaka. Morphological identification of the collected species was preliminarily done during the field sampling and then validated based on the published taxonomic literature (Carpenter 2002; IUCN Bangladesh 2015; Ahmed et al. 2008). Tissue from the claws of each fresh specimen was dissected out with a sterile blade and preserved in 90% ethanol for further molecular analysis. The voucher specimens were deposited at the Dhaka University Zoology Museum (DUZM) and tagged with DUZM voucher ID.

DNA extraction, PCR amplification and DNA sequencing

DNA was isolated from a 5 mg tissue sample of each specimen using Invitrogen™ PureLink™ Genomic DNA Mini Kit. DNA was extracted following the manufacturer’s protocol. The quality and quantity of the extracted DNA was measured using NanoDrop™ spectrophotometer. COI and 16S rRNA gene sequences were amplified by polymerase chain reaction with the primer LCO-1490 (forward) 5’ TCAACAAATCATAAGGACATTGG 3’ and HCO-2198 (reverse) 5’ TAAACTTCAGGGTGTCCAAAGAATCA 3’ (Folmer et al. 1994) for COI and primer 16Sar (forward) 5’ CGCCTGTTTATCAAAAACAT 3’ and 16Sbr (reverse) 5’ CCGGTCTGAACTCAGATCATGT 3’ (Palumbi et al. 1991) for 16S rRNA sequences. The PCR was conducted in 25 µl volumes containing 23 µl of PCR Master Mix and 2 µl of DNA sample, mixed and spun for 30 s for homogenization of the mixture. PCR Master Mix consists of 12.5 µl Taq Polymerase, 8.5 µl Nano Pure water, 1 µl forward primer and 1 µl reverse primer. For both COI and 16S rRNA, the annealing temperature used was 54 ℃ for 30 s. The PCR amplifications were performed on Applied Biosystems Thermal Cycler (Thermo Fisher Scientific) under the following conditions: an initial denaturation at 95 ℃ for 5 min followed by 41 cycles of 95 ℃ for 30 s, 54 ℃ for 30 s, 72 ℃ for 1 min, and a final extension at 72 ℃ for 5 min. To protect the amplified gene from the damage the PCR products were kept at room temperature for 15 min, and then stored at -26 ℃ until further downstream application. PCR products were separated in 1% agarose gel, and purified using PureLink™ PCR purification kit. The good quality purified PCR products of DNA concentration > 10 ng/µl were sent to First BASE laboratories, Malaysia for sequencing. Sequencing was done by Sanger dideoxy sequencing technology using ABI PRISM 3730xl Genetic Analyzer exploiting the BigDye R Terminator v3.1 cycle sequencing kit chemistry.

Bioinformatics analysis

The assembled contigs were prepared by the CAP3 DNA assembly program using bioinformatics software Unipro Ugene (Okonechnikov et al. 2012). Analysis includes 36 DUZM COI and 16S rRNA sequences, along with 49 sequences of similar species retrieved from NCBI GenBank database including an outgroup Thenus indicus. All the sequences of COI and 16S rRNA were aligned automatically using MUSCLE and then adjusted manually (Edgar 2004). The boxplot distribution of the %GC content was constructed with the help of Rstudio platform (Team 2015). For distance-based method, genetic pairwise divergence for each marker was determined by calculating Kimura two-parameter (K2P) (Kimura 1980) distance using MEGA X (Kumar et al. 2018). Nucleotide saturation was tested by calculating the substitution saturation index using DAMBE version 7.0.35 (Xia 2018; Xia et al. 2003). Phylogenetic trees were constructed for COI and 16S rRNA sequences using Mega X (Kumar et al. 2018) based on Maximum Likelihood statistical method, where robustness of clustering was determined by bootstrap analysis with 1000 replicates.

Results

A total of 36 sequences were generated (20 COI and 16 16S rRNA) from 14 species of crabs belonging to 7 families. Among them Galene bispinosa (Family Galenidae), Charybdis japonica and Portunus reticulatus (Family Portunidae) were the newly recorded species from Bangladesh (Ahmed et al. 2021). Morphological identification of 14 species were further validated by molecular characterization based on both COI and 16S rRNA sequences, where the nucleotide Blast showed ≥ 96% identity with the available sequences and then deposited in the NCBI GenBank (Table 1). The average length of the aligned sequences was 596 bp and 483 bp for COI and 16S rRNA, respectively. 16S rRNA sequence of Zosimus aeneus was however shorter than others, with low identity coverage in Blast, which might be due to poor sequencing or DNA extraction. The alignment of the partial COI sequences showed a Maximum Likelihood estimate of the transition/transversion (R) bias 1.85. The nucleotide frequencies of the COI sequences were 25.84% (A), 36.37% (T/U), 20.69% (C), and 17.1% (G). The Maximum Likelihood estimate of the transition/transversion (R) bias was 3.135 for the 16S rRNA sequences, and the nucleotide frequencies were 34.8% (A), 34.72% (T/U), 11.38% (C), and 19.1% (G). The GC content calculated was summarized as boxplot distribution in Fig. 1, representing the %GC at the species level of the COI and 16S rRNA sequences. Among all the three codon positions of the COI sequences, large variation with the highest SEM value 1.185 was observed at the 3rd codon position. The overall %GC content was higher for the COI sequences with the mean value of 37.79 ± 2.02 in comparison with the 16S rRNA sequences of mean value 30.48 ± 1.26.

Table 1 NCBI GenBank accession numbers of the Cytochrome C oxidase subunit 1 (COI) gene and 16S ribosomal RNA (16S rRNA) gene sequences of crab species generated in this study
Fig. 1
figure 1

Boxplot distribution of the %GC content of the COI and 16S rRNA sequences at the species level

Genetic divergence pattern analyses

The K2P% genetic distances within each taxonomic level were summarized in Table 2. The average genetic distance for COI gene within species, genus and family were 0.234 ± 0.353, 16.89 ± 4.108 and 21.83 ± 2.360, respectively. In contrast, for 16S rRNA the average divergence within species, genus and family were 0.052 ± 0.197, 4.886 ± 1.311 and 9.799 ± 1.824, respectively. The pattern of K2P% divergence at different taxonomic ranks within species, genus, and family was plotted in Fig. 2. In both the markers, genetic divergence increased progressively with higher taxonomic level, which supports a marked change in genetic divergence at the species boundary.

Table 2 Genetic divergence based on Kimura-2-parameter (K2P%) distances at different taxonomic rank
Fig. 2
figure 2

Histogram of the K2P% divergence within species, genus and family of the a COI and the b 16S rRNA sequences

Saturation test

To identify saturation, the substitution saturation index Iss value was compared with the critical Iss.c value. For the COI sequences, Iss < Iss.c at 1st and 2nd codon position and Iss > Iss.c at 3rd codon position for both symmetrical and asymmetrical tree construction, indicating saturation at 3rd codon position. In the 16S rRNA sequences, Iss < Iss.c for the symmetrical and asymmetrical tree topology, suggesting little or no saturation.

Phylogenetic tree analysis

The intraspecific monophyletic clustering with high bootstrap percentage of 99–100% BP was observed for both markers, reflecting accurate taxonomic assignment of the species. However, in phylogeny within genus 16S rRNA has comparatively higher clade support than COI. Moreover, the phylogenetic tree of the COI sequences showed long-branch attraction (LBA) artifact, as interfamilial species were found to be in monophyly (Fig. 3). On the other hand, congeneric and confamilial sequences were clustered together with no phylogenetic discordant in the ML tree of the 16S rRNA (Fig. 4).

Fig. 3
figure 3

Maximum Likelihood (ML) phylogenetic tree constructed based on COI sequences. The sequences of the present study were represented as DUZM

Fig. 4
figure 4

Maximum Likelihood (ML) phylogenetic tree constructed based on 16S rRNA sequences. The sequences of the present study were represented as DUZM

Discussion

In this study, 36 partial sequences (20 COI and 16 16S rRNA) of 14 different crab species were successfully generated using two widely recognized identifying markers, COI and 16S rRNA. A series of comparative analysis were conducted for both the marker genes to clarify their strengths and drawbacks in species identification.

The mean of overall %GC content calculated where COI has 37.79 ± 2.02% GC (range: 34.6%-42.1%, SEM: 0.452) which was significantly higher than the 30.48 ± 1.26% GC of 16S rRNA (range: 28.3%-32.9%, SEM: 0.315) (p value < 0.0001). GC rich region has been proved to create incongruences in phylogenetic tree topology (Romiguier et al. 2010; Spencer 2006), as these regions have higher rate of evolution (Roux et al., 2016), which likely to cause long-branch attraction artifacts and issues related to heterotachy-driven biases (Philippe et al., 2005). Among COI sequences, the average %GC content was 50.67 ± 2.05, 41.70 ± 0.526 and 20.93 ± 5.30 for the 1st, 2nd and 3rd codon position, respectively (1st > 2nd > 3rd) (Fig. 1). However, the variation range was highest at 3rd codon among three codon positions of the COI sequences, ranging from 12.7–32.7% with SEM value 1.185, a similar pattern was observed in decapods (Matzen et al. 2011). The range of 1st and 2nd codon was however 46.6–53.6% and 40.9–42.9% with SEM value 0.459 and 0.118, respectively.

The pattern of mean K2P% within species < within genus < within family represents increased divergence with higher taxonomic levels for both the COI and 16S rRNA sequences. All the species could be discriminated efficiently for both the marker with a threshold of 2% divergence within the species (Hebert et al. 2003b). For the COI sequences, the mean K2P% divergence of individuals within species was 0.234 compared to 16.89 for species within the genus. Thus, congeneric species were approximately 72 times more divergent than conspecific individuals. Within genus the highest divergence of 22.68% was found between P. reticulatus and P. sanguinolentus and the lowest was 7.99% between P. reticulatus and P. pelagicus. Within family, the highest divergence was 27.65% between P. pelagicus and C. natator and the lowest was 18.24% between P. pelagicus and C. feriata (Table 2). In case of 16S rRNA marker, the mean divergence within genus was 4.886% which was 94-fold higher than the mean divergence 0.052% within species. The highest congeneric divergence was 5.93% between C. natator and C. feriata and the lowest was 2.459% between P. pelagicus and P. reticulatus. The highest confamilial divergence was 12.18% between P. reticulatus and C. natator and lowest was 5.979% between P. pelagicus and S. olivacea (Table 2). Higher mean divergence in congeneric COI sequences and large genetic divergence between closely related species indicated that COI could be better in discriminating against intragenic species than 16S rRNA. The utility of the species discrimination relies on the principle of the barcoding gap, estimated from the difference between the maximum K2P% within species and minimum K2P% within the genus. The value of the barcoding gap was 6.55% and 1.67% in COI and 16S rRNA, respectively.

Genetic saturation of each marker gene was studied for better understanding their efficiency in providing the phylogenetic signal. Similar to high GC content, genetic saturation is also responsible for creating long-branch attraction (Lartillot et al. 2007). The substitution saturation index was measured in COI and 16S rRNA sequences to test the saturation. At the 1st and 2nd codon position of the COI sequences, Iss 0.4307 was significantly lower than Iss.c 0.7056 (p < 0.0001). However, Iss 0.9929 was significantly higher than Iss.c 0.6265 (p < 0.0001) at the 3rd codon, making COI incongruous for phylogenetic tree construction (Fig. 3). In contrast, 16S rRNA had Iss value 0.4197 less than Iss.c 0.7207 (p < 0.0001), indicating no saturation, making the gene sequences more efficient for constructing species phylogeny.

Phylogenetic analysis among crab species have been reported in a number of studies (Haye et al. 2002; Hernández et al. 2019; Ocampo et al. 2013; Schubart et al. 2001; Scott Harrison 2004). However, it was difficult to contrast our results with those from other authors where different species were studied. Here we attempt to understand the evolutionary relationship among crabs commonly found in Bangladesh and also identify which marker is effective enough in providing true phylogenetic signal. Maximum Likelihood (ML) was chosen as statistical method for the phylogenetic analysis, due to their robustness. The lowest BIC and AICc value reveal the best fit substitution model GTR + G + I, and HKY + G for COI and 16S rRNA, respectively. Thus, the respective evolutionary model was chosen for the phylogenetic tree construction of each gene. In the ML analyses of the COI and 16S rRNA sequences, all the morphologically assigned species formed monophyletic clusters with strong bootstrap support (Figs. 3 and 4). No taxonomic deviation at the species level confirmed the reliability of the sequences and the efficiency of both the marker genes in species discrimination. Our study includes species mostly from family Portunidae, where genus Scylla was in paraphyly with Charybdis. Comparing the clade support within genus, 16S rRNA showed moderate to high percentage, with 87% BP within Charybdis and 98% BP within Portunus. In COI sequences, although P. pelagicus and P. reticulatus were grouped with maximum support, the clade within genus Charybdis was poorly supported by 43% BP. Also, within Charybdis, C. natator and C. japonica were closely clustered compared to C. feriata. Furthermore, in the COI phylogenetic tree, M. planipes, (family Matutidae) was clustered with P. sanguinolentus of a different family Portunidae, sharing their recent common ancestor with 52% BP (Fig. 3). This inefficiency in providing true relationship might results from the saturation at the 3rd codon position of the COI sequences. In contrast, the species of the same genus and family were grouped together with no branch length inconsistency observed in the ML tree topology of the 16S rRNA (Fig. 4). This evident that 16S rRNA would be much efficient in delineating species at the species, genus and family level and determining true divergence and evolutionary relationship among crab species compared to the COI. Whereas COI deficiently resolves the relationship between highly associated congeners due to the high GC content and substitution saturation, it was, however, better in differentiating closely related species when other markers show inadequate variability.

Conclusion

The present study demonstrated that both the COI and 16S rRNA genes could efficiently discriminate at species level. COI was better at distinguishing closely related crab species, showing a wide range of divergence within the genus and family. However, saturation and high %GC content at the 3rd codon position of the COI sequences, make the marker inefficient in providing true phylogenetic signal. Contrarily, 16S rRNA showed no substitution saturation and low %GC content, thus, proved to establish fewer incongruities in the phylogenetic tree construction. Further study with other crustaceans such as shrimps, lobsters, crayfish, prawns, krill, etc. might be performed to develop a strong conclusion for COI and 16S rRNA gene efficiency in the identification and phylogenetic delineation of crustaceans.