Introduction

Penaeus monodon (Fabricius, 1798), the giant tiger prawn, is widely distributed throughout the Indo-West Pacific region and is the most important penaeid shrimp for both aquaculture and capture fisheries (Motoh, 1985; Kumar et al., 2007; You et al., 2008; Mandal et al., 2012; Waqairatu et al., 2012; Vaseeharan et al., 2013). Due to its high nutrition value and economic importance, this species has been studied for sustainable aquaculture production (Kumar et al., 2007; Mandal et al., 2012) and has been a subject of numerous studies on genetic variation in order to maintain its genetic diversity through management of wild populations (e.g. Benzie et al., 2002; You et al., 2008; Mandal et al., 2012; Waqairatu et al., 2012), but the sampling effort has been uneven in different geographic regions (Keyse et al., 2014).

Several marine species in the Indo-West Pacific region are characterized by distinct eastern and western genetic lineages (Gaither et al., 2001; Keyse et al., 2014), including P. monodon (Benzie et al., 2002) and Fenneropenaeus indicus (Milne-Edwards, 1837; Alam et al., 2015). These phylogeographic patterns may have resulted from restricted genetic connectivity of populations across the Sunda Shelf during periods of low sea level in the glacial periods of the Pleistocene (Voris, 2000). In F. indicus distinct lineages were also found west of the Sunda Shelf, pointing to geographic isolation of the south-east African coast from the Bay of Bengal (Alam et al., 2015), in accordance to the split of the Western Indian Province from other regions in the Indo-West Pacific region (Bowen et al., 2012).

Penaeus monodon is known for high variation in molecular markers, which is geographically structured in the Indo-West Pacific region (Benzie et al., 2002; Kumar et al., 2007; You et al., 2008; Zhou et al., 2009; Mandal et al., 2012; Waqairatu et al., 2012). Variation in the mitochondrial DNA has been widely assessed for taxonomic identification and in phylogeographic and population genetic studies e.g. control region (CR) (Walther et al., 2011; Vaseeharan et al., 2013), and the 16S rRNA gene and a downstream fragment of the 16S rRNA and tRNAVal genes (Lavery et al., 2004; Kumar et al., 2007; Pascoal et al., 2008; Vaseeharan et al., 2013). The highest mtDNA diversity in P. monodon was found in Indonesia, with less diversity identified in south-east Africa and west Australia (Benzie et al., 2002). High genetic variation was also reported in the Indian population, being highest in the Andaman region followed by the east coast and the west coast (Kumar et al., 2007; Mandal et al., 2012). Penaeus monodon populations in the Indo-West Pacific region were broadly structured as three mtDNA groups, i.e. south-east African, south and south-east Asian and Pacific populations (Benzie et al., 2002; You et al., 2008; Waqairatu et al., 2012). The south and south-east Asian group is further divided into two groups, west and east of the Sunda Shelf (Benzie et al., 2002; Kumar et al., 2007; You et al., 2008; Mandal et al., 2012). The populations in the south-east African group were unique and genetically differentiated from the other main groups, which were partially admixed (Kumar et al., 2007; You et al., 2008; Mandal et al., 2012). The high genetic diversity in Indonesia was explained by Benzie et al. (2002), which was resulted from admixture due to migration of the divergent African populations into south-east Asia.

The high mtDNA diversity in P. monodon was proposed by Walther et al. (2011) to result from a paralogous duplication event but studies of the mitochondrial CR defined three distinct lineages across the species distribution (A, B and C); A was mostly found in the Pacific Ocean region, B from the juncture of the Indian-West Pacific Oceans and C from India and Africa (Zhou et al., 2009; Walther et al., 2011). Walther et al. (2011) detected different haplotypes within a subset of individuals from Indonesia and concluded that these could have arisen due to duplication, either due to insertions of mitochondrial sequences into the nuclear genome (numts) or within the mitochondrial genome, and concluded that previous studies had thus highly overestimated the mtDNA variation. Although Walther et al. (2011) acknowledged that the occurrence of such divergent lineages could be due to heteroplasmy as a result of bi-parental inheritance as has been observed in crosses between genetically distinct individuals (Dokianakis & Ladoukakis, 2014), it did not affect their main conclusion.

Nuclear genetic variation in P. monodon, as assessed by allozymes, DNA sequences and microsatellites, shows patterns at both small and large geographic scales. In allozymes, differences between the north and west coasts of Australia reflect colonization by populations from the east (Benzie et al., 1992, 1993). In contrast, South Africa, Mozambique and Madagascar, show panmixia with high gene flow among all populations (Forbes et al., 1999). Microsatellites revealed a clear structure in the Indo-West Pacific populations of P. monodon (You et al., 2008; Waqairatu et al., 2012) and a split between the eastern Indian Ocean and western Pacific, based on the elongation factor 1-alpha gene, was observed by Duda & Palumbi (1999).

Penaeus monodon is the most economically targeted shrimp species in Bangladesh (Quader, 2010), but despite this, no molecular studies have been conducted on this species from Bangladesh, and only a single phylogeographic study has included a Penaeid shrimp from the country (Alam et al., 2015). The diversity of shrimps is under continuous threat due to post-larvae collection, coastal and marine pollution, impacts of coastal aquaculture, natural disasters, sea level rise, persistent organic pollutants, etc. However, the population of P. monodon appears to be large, considering the annual harvest sustained by suitable habitats in Mangrove forests and large rivers (Quader, 2010; Department of Fisheries Bangladesh, 2013). Therefore, a study on population structure and genetic diversity of P. monodon in Bangladesh waters and in comparison to other Indo-West Pacific regions is of great importance for proper management and conservation of the species.

The main objectives of this study are to analyse the genetic diversity and origin of P. monodon in Bangladesh, based on mtDNA variation, and to reevaluate the larger phylogeographic patterns within the Indo-West Pacific region. To reach our objectives we hypothesize that (1) there is a high diversity within the Bangladesh population, as had been previously reported in other Indo-West Pacific regions, reflecting a large and single panmictic population which has expanded after the last glacial period of the Ice Age, (2) the Bangladesh population shares its most recent ancestry with the other populations from the Bay of Bengal (India, Sri Lanka and Thailand) and the distinct lineages within the Indo-West Pacific region correspond to the major geographic barriers in the region and (3) the sequence variation within mitochondrial CR reflects orthologous variation.

Materials and methods

Sample collection and DNA extraction

A total of 86 wild origin P. monodon were collected from four locations along the Bangladesh coastline (Fig. 1) during the period of December 2012 to September 2013 and preserved in 96% ethanol. Artisanal fishermen collected 17 sub-adult specimens (average weight of 40 g) from Sundarban mangrove forest (SB). Samples of adults, with an average weight of 100 g, were collected from the Barguna coast (BC, 10 individuals), Middle ground (one of the four fishing grounds) of the Bay of Bengal (MB, 32 individuals) and the coast of Sent Martin’s Island (SM, 27 individuals) by fish/shrimp trawlers. In addition, five specimens of P. monodon of wild origin were sampled from Negombo Lagoon (n = 3) and Trinco (n = 2) of Sri Lanka in 2008 and 2005, respectively. Total genomic DNA was extracted from ~1 mg pleopod tissue through overnight incubation at 56°C in a mixture of 6% Chelex and 0.2 mg ml−1 proteinase K followed by 10 min at 95°C.

Fig. 1
figure 1

Sampling sites along the coast of Bangladesh. Capital letters indicate the four sampling sites- SB Sudarban, Sathkhira, BC Barguna coast, Barguna, MB Middle ground (one of the four fishing grounds), Bay of Bengal and SM Sent Martin’s Island, Cox’s Bazar

Polymerase chain reaction (PCR) and sequencing

PCR was performed in a final volume of 10 µL, which included 30–150 ng DNA, 0.2 mM dNTP, 0.1% Tween 20, 1 × Standard Taq Buffer (New England Biolabs), 0.5 mg Bovine Serum Albumin, 0.5 U Taq Polymerase and 0.34 mM each of forward and reverse primers. Three regions of mtDNA were amplified and sequenced; the CR, a fragment of the 16S rRNA gene and a combined fragment of the 16S rRNA and tRNAVal genes. A 939 bp fragment of the CR was obtained using one reverse primer 1R (Chu et al., 2003) and the forward primer, PmCON-2F (Wilson et al., 2000) from 28 specimens. To look for the existence of heteroplasmy or duplicated sequences, as done by Walther et al. (2011), and to confirm the sequences, a second forward primer, 12S (Chu et al., 2003) was used together with the same reverse primer (1R) to amplify and sequence a 517 bp fragment from 82 specimens, including 24 specimens which had been sequenced with PmCON-2F. The shorter fragment from 14 specimens was also resequenced with the reverse primer 1R. In total, 86 sequences were obtained from Bangladesh (GenBank accession numbers: KT006166–KT006251), and a shared aligned region of 517 bp was used for the downstream analyses of genetic diversity and demographic changes. A 370 bp sequence of the 16S rRNA gene was obtained using 16STf (MacDonald et al., 2005) and 16Sbr (Palumbi et al., 1991) primers, for 10 (of which 4 specimens were sequenced from both directions) and five individuals from Bangladesh and Sri Lanka, respectively (GenBank accession numbers: KT006252–KT006266). A 491 bp sequence of the combined 16S rRNA and tRNAVal region was amplified from both forward and reverse directions using 16ScruC4 and 16ScruC3 primers (Pascoal et al., 2008) for five and three individuals from Bangladesh and Sri Lanka, respectively (GenBank accession numbers: KT006158–KT006165). Information about the primers in this study are given in Appendix 1.

The amplification protocol of the CR fragments included an initial denaturation at 94°C for 4 min, 37 cycles of denaturation at 94°C for 30 s, annealing at 49°C for 45 s and extension at 72°C for 1 min, and a final extension at 72°C for 6 min. The protocols for the 16S rRNA gene and the combined fragment of 16S rRNA and tRNAVal genes were altered by increasing the annealing temperature to 55°C. The PCR products were sequenced with the Big Dye Terminator kit 3.1 (Applied Biosystems) and run on a Genetic Analyser (3500xL Applied Biosystems). All sequences were edited and aligned using ClustalW Multiple alignment implemented in Bioedit Sequence Alignment Editor (Hall, 1999). Detailed information of the sequences, including all GenBank sequences utilized for this study, is shown in Table 1 and Appendix 2.

Table 1 Information about the sequences for different fragments/genes of the mitochondrial DNA utilized for the study

Genetic diversity of P. monodon within Bangladesh

Genetic diversity of Bangladesh P. monodon, based on 86 CR sequences (517 bps), was summarized for each sampling site and across all sites, by calculating gene diversity (h) and nucleotide diversity (π), using ARLEQUIN v3.5 (Excoffier & Lischer, 2011). Haplotype richness was calculated using the hierfstat package (Goudet, 2005) in R (R Core Team, 2014). The relationship among CR haplotypes was investigated by constructing an unrooted network using a median-joining algorithm in NETWORK version 4.6.1.3 (Bandelt et al., 1999). Analysis of molecular variation (AMOVA) of the samples from Bangladesh was conducted based on pairwise differences among sequences, and the proportion of variation among groups (Φ ST) tested with 1000 permutations, using ARLEQUIN v3.5 (Excoffier & Lischer, 2011). The total genetic diversity of the samples from Bangladesh was compared with the estimated genetic diversities of other populations in the Indo-West Pacific region, based on data listed in Table 1 and Appendix 2.

Population demographic changes in P. monodon within Bangladesh were estimated, based on the 86 CR sequences (517 bps), by comparing the fit of the distribution of pairwise nucleotide differences (mismatch distribution) with the expected values of a demographic expansion, following the method developed by Rogers & Harpending (1992). The fit was estimated by the sum of square deviation (SSD) and the raggedness index (Harpending, 1994), and tested using ARLEQUIN v3.5 (Excoffier & Lischer, 2011). The time since expansion was based on the median of the mismatch distribution (τ) and the mutation rate, µ = 3.44% per site per Myr, for the CR (see below), as t = τ/(2 µl), where l is the length of the sequence. Tajima’s D (1989a, b, 1993) and Fu’s F (1997) tests were performed, to further estimate the demographic changes, or possible deviation from neutrality, using ARLEQUIN. In addition, the demographic changes were evaluated using the bayesian skyline plot (BSP) analysis using BEAST v1.7.5 (Drummond et al., 2007). The BSP analysis estimated the posterior probability of the effective population size (N e) using Markov Chain Monte Carlo (MCMC) procedures by tracing the ancestry to the most recent common ancestor within P. monodon. Markov chains were run for 5.0 × 107 generations and sampled every 1000th generation. The general time reversible (GTR) model with gamma and invariant sites parameters (G+I) was used, derived from a PhyML Test (Guindon et al., 2010) using the APE package (Paradis, 2006) in R (R Core Team, 2014). A strict molecular clock of 3.44% per million years (Myr) was used for the BSP analysis, which was calibrated for the CR based on a comparison of the divergence between Penaeid species for the CR and COI region (see calibration below). Bayesian Skyline was used as prior for the coalescent analysis, with a piecewise-constant as a skyline model. Log files were visualized for the posterior distributions of the Markov Chain statistics using TRACER v1.5 (Rambaut & Drummond, 2009), and 10% of the samples were discarded as burn-in during Skyline reconstruction. Skyline data was exported from TRACER v1.5 and redrawn in R (R Core Team, 2014).

The calibration of the molecular clock for the CR [3.44% (2.4–4.5) % per Myr] was based on the comparison of the divergence between Penaeid species for the CR and the COI region assuming a 2% (1.4–2.6) % sequence divergence per Myr as for the COI gene of the genus Alpheus (Knowlton et al., 1993; Knowlton & Weigt, 1998). The average dissimilarity between the CR region of P. monodon (EU426760.1) and F. indicus (FJ002577.1), Fenneropenaeus chinensis (Osbeck, 1765) (HM358499.1), Marsupenaeus japonicas (Bate, 1888) (AY853470.1) and Litopenaeus vannamei (Boone, 1931) (GQ857086.1) was 24.50%, whereas the average dissimilarity of COI gene of P. monodon (KF604891.1) with F. indicus (HM214712.1), F. chinensis (EU366250.1), M. japonicas (AY787755.1) and L. vannamei (AY781297.1) was 14.25%. The molecular clock for CR region was estimated as 3.44% per Myr from the ratio of the dissimilarities between CR and COI regions, or (24.5/14.25*2). The molecular clock rate for the 16S rRNA was calibrated in the same way and resulted in 1.12% per Myr, where the average dissimilarity of 16S rRNA gene between P. monodon (EU105473.1) and F. indicus (FJ002574.1), F. chinensis (AY264908.1), M. japonicas (EU056321.1) and L. vannamei (HQ127458.1) was 8.00%.

Origin of Bangladesh Penaeus mondon and phylogeography in the Indo-West Pacific region

The origin of Bangladesh P. monodon and the phylogeographic patterns were studied by constructing phylogenetic trees and by analysing genetic distances between the population samples. In total, 840 sequences for the CR of P. monodon, including 754 from GenBank and 86 from Bangladesh, were used to reconstruct the mtDNA phylogeny. The sequences were aligned using ClustalW Multiple alignment of Bioedit Sequence Alignment Editor (Hall, 1999) and were trimmed to 496 bp fragment, the sequences from the study of Walther et al. (2011) and Mkare et al. (2014) were aligned for 225 bps. The sequences of the CR were classified into the different lineages (A, B and C), as defined by Zhou et al. (2009) and Walther et al. (2011), by constructing a maximum likelihood tree with partial deletion treatment for missing data and a general time reversible (GTR) model with gamma and invariant sites parameters (G+I) in MEGA ver. 6.06 (Tamura et al., 2013) (tree not shown). Divergence times of lineages A, B and C, based on the CR, were estimated in BEAST v1.7.5, using the calibrated molecular clock of 3.44% per Myr, and a constant population size as the tree prior. Markov chains were run for 1.0 × 107 generations and sampled every 1000th generation. Effective sample sizes (ESS) of the Markov Chain sampled model parameters were inspected using TRACER v1.5 (Rambaut & Drummond, 2009), and deemed sufficient if >200. Maximum clade credibility trees were constructed using TreeAnnotator v1.7.5 (Rambaut & Drummond, 2013) with a 10% burn-in. The divergence times were observed by reading Bayesian inference trees using the software FigTree v1.4.0 (Rambaut, 2012).

Phylogenetic analyses of the sequences from the CR lineage C were also performed separately as all Bangladesh sequences resolved onto this lineage. Trees were constructed with and without the shorter sequences from Indonesia, i.e. 225 bps instead of 496 bps. Three hundred and eleven sequences were obtained when omitting Indonesia and 337 sequences when including Indonesia. The phylogeny included 12 CR sequences from Kenya (You et al., 2008), and the shorter sequences from Mkare et al. (2014) were omitted. Fenneropenaeus chinensis was used as an outgroup for all analyses described above. Four different tree priors were evaluated for population growth, as implemented in BEAST: constant size with both strict clock and relaxed clock, exponential growth, logistic growth and expansion growth with strict clock.

The geographic patterns in variation of the CR were evaluated using 714 sequences from lineages A, B and C, and, separately, for 337 sequences from the C lineage (excluding the shorter sequences from the study of Mkare et al. 2014; see Table 1; Appendix 2). For both analyses, pairwise genetic distances (Φ ST) based on the 496 bp fragment were calculated and tested by 1000 permutations using ARLEQUIN v3.5 (Excoffier & Lischer, 2011), except for comparisons with the Indonesian samples which were based on 225 bp. Multidimensional scale plots (Venables & Ripley, 2002) were drawn to visualize the differences among populations in R (R Core Team, 2014). Similarly, pairwise genetic distances (Φ ST) among 46 sequences of 16S rRNA from 6 populations (Bangladesh- 10, Sri Lanka- 5, India- 2, Malaysia- 4, China- 23 and Australia- 2 sequences) were calculated and tested by 1000 permutations using ARLEQUIN v3.5 (Excoffier & Lischer, 2011).

Orthologous or paralogous variation in P. monodon mitochondrial DNA?

To address the question of the paralogous duplications, raised by Walther et al. (2011), we searched for multiple peaks of the CR sequence electropherograms and by using two sets of primers (as described above). Furthermore, we evaluated whether the large split between the A and C lineages in the phylogeny of the CR region was observed also in phylogenies of the different mtDNA markers, the 16S rRNA gene and the combined fragment of the 16S rRNA and tRNAVal genes, where we compared specimens sampled from the geographic regions which were characterized by the two lineages.

The 16S rRNA gene tree was constructed from 52 sequences (313 bps), containing 10 sequences from Bangladesh, five from Sri Lanka and 37 from other areas in the Indo-West Pacific (Table 1; Appendix 2). Metapenaeus monoceros (Fabricius 1798) (GenBank accession number: JX089983) was used as an outgroup. The combined 16S rRNA and tRNAVal tree was constructed from 18 sequences (504 bps), including five sequences from Bangladesh, three from Sri Lanka and 10 available sequences from GenBank (Table 1 and Appendix 2). Penaeus semisulcatus (De Haan, 1844) (GenBank accession number: EF589704) was used as an outgroup. The trees were constructed following both strict clock and lognormal relaxed clock (uncorrelated) using BEAST as listed above for the CR fragments and were redrawn in R (R Core Team, 2014) using the package APE (Paradis, 2006), and simplified by collapsing sequences from the same cluster and geographic area into a single taxonomic unit, labelled by the geographic location. Posterior probabilities (PP) were used to estimate the reliability of the internal nodes. ‘GTR’ was determined as the best substitution model for CR (C lineage), with parameters G for gamma-distributed rate variation among sites and I for invariant sites. The ‘TN93’ model with G was selected for the 16S rRNA and the combined fragment of the 16S rRNA and tRNAVal genes. A calibrated molecular clock of 1.12% per Myr for the 16S rRNA gene and a combined fragment of the 16S rRNA and tRNAVal region was used to calculate the divergence time. Variation of the different markers used for the phylogenetic trees, the CR, the 16S rRNA and the combined fragment of 16S rRNA and tRNAval, were summarized with nucleotide diversity (π) using ARLEQUIN v3.5 (Excoffier & Lischer, 2011). As for the CR trees, the 16S rRNA and combined 16S rRNA and tRNAVal trees were also constructed using four priors of population growth.

Results

Mitochondrial DNA diversity of P. monodon within Bangladesh

Genetic diversity of the P. monodon CR sequences from Bangladesh is high, 124 segregating sites are observed and haplotype diversity reaches its maximum value (h = 1.00), as almost all haplotypes are singletons (83 haplotypes were found in 86 specimens) (Table 2). Four indels were observed, ranging in length of 1–2 bp, where two were polymorphic. One haplotype was identified in three specimens from locations SB (2 ind.) and MB, and another haplotype in two specimens from locations MB and SM (Fig. 2).

Table 2 Genetic diversity of P. monodon from Bangladesh based on 86 sequences (520 bps) of mitochondrial CR
Fig. 2
figure 2

Median-joining haplotype network based on CR of mtDNA of 86 individual P. monodon sampled from four locations along the Bangladesh coastline. Each pie represents a haplotype and its size reflects the frequency of samples. Distances between pies correspond to mutational differences between the haplotypes. Shading (from balck to white) denotes different sampling locations (SB, BC, MB and SM; see Fig. 1), respectively

The CR haplotype network (Fig. 2) reflects the high variation, with two distinct clusters (I and II) separated by 13 mutations but without any geographic structure among the four sampling locations along the Bangladesh coastline. No differentiation was observed among the Bangladesh samples with Analysis of Molecular Variance (AMOVA) (Φ ST = −0.0045, P = 0.60), and comparison of the frequencies of the two clusters among the four sampling sites did not reveal any significant structure (P = 0.86). The haplotype network has several alternative connections between haplotypes at the centre of the two clusters. This reflects the large number of haplotypes segregating within the population with several homoplasious mutations. Forty nine of the variable nucleotide sites have more than two different nucleotides.

The high haplotype diversity in Bangladesh is similar as observed elsewhere in the Indo-West Pacific region where it ranges from 0.97 to 1.00 (You et al., 2008; Zhou et al., 2009; Waqairatu et al., 2012; Mkare et al., 2014). The nucleotide diversity (π) in Bangladesh is also comparable to the most samples with a single CR lineage, and it is close to the average of the samples characterized with the C lineage but nominally lower than within the samples characterized by the A lineage, south and east of the Sunda Shelf (Table 3). Variation in samples where the distinct lineages A, B and C are found were considerably larger than in Bangladesh (Table 3).

The mismatch distribution for the Bangladesh P. monodon samples (Fig. 3) followed the sudden expansion model both for the SSD and the raggedness index (P > 0.40), despite clear observed bimodality, with a time of expansion at 143 (95% CI 90–397) Kyr. Deviation from the equilibrium was also observed with the Tajima’s D and Fu’s Fs, which were both negative and significant (Tajima’s D = −1.72, P = 0.011; Fu’s Fs = −24.29, P < 0.001), suggesting an expansion from a bottleneck or a selective sweep. The BSP analysis suggests that the Bangladesh population has a current effective population size (N e) of 102.8 × 106 (CI 22.7 × 106–411.5 × 106) individuals and has undergone a gradual increase in N e for the last ~200–350 Kyr (Fig. 4).

Fig. 3
figure 3

Mismatch Distributions, based on 86 sequences (517 bps) of mitochondrial CR of P. monodon from Bangladesh, under the Sudden Expansion Model

Fig. 4
figure 4

Bayesian skyline plot showing the past population dynamics of P. monodon in Bangladesh estimated from 86 (517 bps) sequences of CR. Dotted lines represent the 95% confidence intervals. Effective population size (N e × 106) per generation is traced back in time from the present to the past

Origin and phylogeography

The most recent common ancestor of the three CR lineages dates back to 6.6 (CI 4.8–8.7) Mya, and 4.1 (CI 2.8–5.4) Mya for lineages A and B (tree not shown). The distribution of the CR lineages showed clear patterns in the Indo-West Pacific region longitudinally (Table 3). Sequences in lineage C were found in all specimens from Madagascar to Bangladesh, except for two specimen from Sri Lanka, and decreased in frequency in south-east Asia, from 50% in Thailand to 12% in China (Table 3). Lineages A and B were solely found in populations from the south-east Asia and the Pacific Ocean regions (Table 3).

Table 3 Population and lineage-wise distribution of CR sequences of P. monodon from the Indo-West Pacific region

The Bayesian inference analyses of lineage C of the CR, showed a complex pattern in the Indo-West Pacific region. The tree had five clusters (PP ≥ 90, Fig. 5). Haplotypes from Bangladesh are characterized by the same notations (I and II) as in Fig. 2. A tree based on shorter sequences that included Indonesia produced a less-supported tree where the Indonesian sequences aggregate within clusters I and II (tree not shown). The earliest splits [~3.6 (CI 2.6–4.3) Mya, cluster IV–V] within lineage C were found for unpublished CR sequences from the east coast of India and one sequence from China (PP ≥ 90, Fig. 5). The CR trees (lineage C) obtained with the different priors showed similar topologies with slight variation in divergence time, except for the tree based on relaxed clock which had wider confidence intervals (Appendix 3).

Fig. 5
figure 5

Bayesian inference tree based on 311 CR (496 bps) sequences of P. monodon of lineage C (without Indonesia) under ‘GTR+G+I’ model following strict clock of 3.44% per Myr. Numbers at the nodes represent divergence in million years and shadings represent Bayesian posterior probabilities (PP, %). The tree is rooted with F. chinensis (GenBank accession number: DQ518969) with a divergence time of ~46.2 (CI 32.3–65.0) Mya (PP ≥ 90). Numbers following names of countries represent the number of haplotypes/sample size. Note: the clustering within clades I to III is insignificant except for a single node with PP ≥ 90, and different haplotypes from the same country have been grouped into a single branch

Nucleotide diversity of all CR sequences was 0.091, substantially larger than within the C lineage where it was 0.040, reflecting the deep divergence of the A, B and C lineages. When excluding the highly divergent lineage from Kenya and Madagascar the diversity within the C lineage lowered to 0.032, but it was still higher than within Bangladesh (0.024) pointing to population structure in the Indian Ocean, as described further below. Less nucleotide diversity was found within the 16S rRNA gene (0.024) and the combined fragment of 16S rRNA and tRNAVal (0.035) genes than for the CR sequences. Even though the samples do no not cover the same geographic range, the two datasets include samples from the geographic areas characterized by the distinct A (Australia) and C (e.g. Bangladesh) CR lineages.

Pairwise genetic distances (Φ ST) for CR showed clear differentiation between populations of all lineages (Fig. 6; Table 4) and also within lineage C. The Madagascar and Kenya populations, which cluster within the C lineage, are closely related to each other (Φ ST = 0) but significantly different from other populations (Φ ST ranges from 0.44 to 0.82, P < 0.01; Table 4). The separation among the other populations follow the first discriminant axis from India in the west, to the Philippines in the east (together with the Australian sample) (Fig. 6). Differences among populations including lineages A, B and C are almost all significant except where samples harbour large variation resulting in lower Φ ST between populations and reduced statistical power, as between Thailand (W) and Thailand (E), between Vietnam and Indonesia and between Vietnam and China. The Bangladesh population is most similar to the populations of India and Sri Lanka (Φ ST = 0.08–0.11, omitting the small sample from IW) but significantly different from east and south India (Fig. 6; Table 4). The differentiation of the Bangladesh population from Sri Lanka, Thailand (W and E), Indonesia and China were insignificant when lineages A and B are omitted (Φ ST ranges from 0 to 0.03, P > 0.05). The distances between populations based on the 16S rRNA gene support the split based on CR between the Bangladesh population and the populations from the Pacific region (Malaysia, China and Australia, P < 0.05) and did not reveal differentiation from India and Sri Lanka (P > 0.05).

Fig. 6
figure 6

Multidimensional scale plot based on pairwise genetic distances (Φ ST) between 14 P. monodon populations, including variation in 714 sequences of all mtDNA CR lineages in the Indo-West Pacific region. Letters: MG Madagascar, KY Kenya, IW India West, IS India South, IE India East, SL Sri Lanka, BD Bangladesh, TW Thailand West, TE Thailand East, ID Indonesia, VN Vietnam, CH China, PH Philippines, AU Australia

Table 4 Pairwise genetic distances (Φ ST) between P. monodon populations in the Indo-West Pacific region

Orthologous or paralogous variation

No indications were observed of different sequences within individuals from Bangladesh, which could suggest heteroplasmy or duplications, even though the same individuals were amplified for the short and the long fragment of the CR region, as reported for the Indonesian population by Walther et al. (2011). The phylogenetic analyses of the two other mtDNA markers supported the overall split observed among the A, B and C lineages of the CR, although the time estimates differed (Figs. 7, 8). The 16S rRNA tree has three clusters (PP ≥ 90) for both the strict clock and the lognormal relaxed clock models; cluster A with sequences from the western Pacific region deviated from clusters C1 and C2, with sequences from all locations of Indo-West Pacific region (Fig. 7) for about 2.6 (CI 1.5–3.6) Mya. None of the samples represent localities characterized by the CR lineage B in this analysis. Similarly, the combined fragment of 16S rRNA and tRNAVal genes produced three distinct clusters for both strict clock and lognormal (uncorrelated) relaxed clock (PP ≥ 90) (Fig. 8). Cluster A and B were composed of samples from the western Pacific region (Australia) and south-east Asia (Malaysia), respectively. Cluster C comprised samples from Bangladesh and Sri Lanka and also a specimen from Malaysia. Topology and the age of nodes of the trees constructed with the different priors for population growth were similar, except for the tree based on relaxed clock which had wider confidence intervals for the CR C lineage, and exceeded the wider interval obtained when including the lower and upper rate for the strict mutation clock for the CR and the constant size model (Appendix 3).

Fig. 7
figure 7

Bayesian inference tree based on 16S rRNA gene of P. monodon under ‘TN93 + G’ model following strict clock of 1.12% per Myr. Numbers at the nodes represent divergence in million years and shadings represent Bayesian posterior probabilities (PP, %). The tree is rooted with M. monoceros (GenBank accession number: JX089983) with a divergence time of ~14.0 (CI 9.9–18.8) Mya (PP ≥ 90). Numbers following names of countries represent the number of haplotypes/sample size

Fig. 8
figure 8

Bayesian inference tree based on a combined fragment of 16S rRNA and tRNAVal genes of P. monodon under ‘TN93 +G’ model following strict clock of 1.12%. Numbers at the nodes represent divergence in million years and shadings represent Bayesian posterior probabilities (PP, %). The tree is rooted with P. semisulcatus (GenBank accession number: EF589704) with a divergence time of ~14.3 (CI 10.0–19.8) Mya (PP ≥ 90). Numbers following names of countries represent the number of haplotypes/sample size

Discussion

Mitochondrial DNA in P. monodon from Bangladesh is characterized by a high degree of variation where almost all individuals carry a unique haplotype. High genetic diversities have also been reported in P. monodon populations from the Indo-West Pacific region (You et al., 2008; Zhou et al., 2009; Waqairatu et al., 2012; Mkare et al., 2014). The distribution of the mtDNA variation along the Bangladesh coast does not provide any evidence of population structure, reflecting a random mixing of pelagic larval stages and or migration of adults. A single population in Bangladesh based on mtDNA variation was also observed in F. indicus (Alam et al., 2015). The high mitochondrial variation in P. monodon in Bangladesh and the two distinct clusters indicates that the population size is large and may have a long history (Rogers & Harpending, 1992), but the mismatch analyses support a sudden expansion around 90–397 Kyr ago. A similar result was obtained with the Bayesian skyline plot method, indicating growth over the last 200–350 Kyr, well before the onset of last glacial epoch of the Pleistocene and earlier than the expansion in F. indicus from Bangladesh (Alam et al., 2015). This time estimate may though be an overestimate due to a possible admixture of distinct mtDNA clusters, as in Grant et al. (2012) who demonstrated also how decline in population size may eradicate historical information; thus the interpretation of estimates of demographic changes should be taken with caution. Large variation was though observed within both clusters, where distinct haplotypes were found and the high number of homoplasies may further underestimate the number of mutations that have occurred. More markers are needed to evaluate this time interval, suggested by the analyses of the mtDNA and to obtain more precise estimates.

The overall genetic patterns based on all CR sequences closely follows the geographic structure in the Indo-West Pacific region, where neighbouring populations are generally more similar to each other than those sampled further away. The Bangladesh P. monodon population shows the greatest similarities with the neighbouring populations in India and Sri Lanka but differs clearly from all other samples. Such a pattern has also been observed in a Macrobrachium rosenbergii (De Man 1879) population from Bangladesh, which showed closest similarity with the Indian and Sri Lankan population (Hurwood et al., 2014). Madagascar and Kenya populations, coming from a different biogeographical province, the Western Indian Province (Bowen et al., 2012), were distinct from all other populations. The south-east Asian populations differed from the south Asian populations and the south-east African populations, but were somewhat related to the West Pacific populations. Biogeographic events related to the Sunda Shelf are known to have affected diversification of marine organisms in the Indo-West Pacific region (Gopurenko et al., 1999; Tsoi et al., 2007), and have been proposed to explain the splits (among South Pacific, eastern Australia and south-east Asia) and a split (between eastern Indian Ocean and western Pacific) within P. monodon as the two oceans were completely separated at certain times during the Tertiary and Pleistocene periods (Waqairatu et al., 2012). The oldest divergence within the species (estimated as the most recent common ancestor for all lineages), based on the mutation rate for the CR, predates the main temperature fluctuation and sea level changes which started around 2.8 Mya or at the onset of the Pleistocene, when the Indian and Pacific Oceans were completely separated (Voris, 2000; Benzie et al., 2002). The split between lineage C, which characterizes the region west of the Sunda Shelf, and lineage A, the region east of the Sunda Shelf, dates back to 6.6 (CI 4.8–8.7) Mya and lineages B and A diverged from each other before or at the onset of the Pleistocene, 4.1 (CI 2.8–5.4) Mya. The estimates are though dependent on the mutation rate and are older than the estimates obtained for the 16S rRNA and the tRNA genes. Concerns have been raised about using estimates of molecular clock rates from comparisons between species to make inferences about divergence of populations within species as they can lead to overestimation of the times (Ho et al., 2005, 2007), as not all of the variation may become fixed between species. However, the time estimates observed in this study are generally large and exceed the period of 1 Myr where this may be of main concern (Papadopoulou et al., 2010). An admixture of all lineages may have occurred around the Sunda Shelf in Indonesia, Thailand, Vietnam and China, when the Indian and Pacific Oceans connected through the Sunda Shelf or even after the last glacial period (Bird et al., 2005). The admixture appears to have been directional, as lineage C from the Bay of Bengal area, rather than from the distinct African populations as proposed by Benzie et al. (2002), may have migrated eastwards through the region, but only a small number of lineage A individuals have been found west of the region, in western Thailand and Sri Lanka. Baldwin et al. (1998) concluded that the Penaeus genus originated in the centre of the Indo-West Pacific region and migrated eastwards and westwards forming two groups, in the eastern Pacific and the Indian Ocean. Similar diversification may have occurred within P. monodon, but despite these larger overall patterns, further splits are observed within lineage C, which comprises three main clusters: one with south-east Africa populations and two others with mixed populations from south and south-east Asia and from the Pacific Ocean region. The mixed populations, forming clusters that diverged approximately at 2.3 (CI 1.7–2.6) Mya, could possibly reflect isolation due to sea level changes during the Pleistocene in the Sunda Shelf. Such divergence has been reported for other Indo-West Pacific species (see Alam et al., 2015), e.g. in mud crab (Scylla serrata Forskal, 1775) (Gopurenko et al., 1999), kuruma shrimp (Penaeus japonicus Bates 1888) (Tsoi et al., 2007). The Sunda Shelf may not be the only geographic barrier in the Indian Ocean. The large split between the African samples and the samples from India, Sri Lanka and Bangladesh might have occurred due to large geographical distances, and due to different surface and subsurface equatorial ocean currents in western Indian Ocean (Pidwirny, 2006). Similarly, three distinct evolutionary lineages among F. indicus populations were recently reported west of the Sunda Shelf region: one in the Western Indian Ocean and Thailand and two in the eastern Indian Ocean, off Bangladesh and Sri Lanka, which might result from vicariance both before and during Pleistocene (Alam et al., 2015).

Walther et al. (2011) detected different sequences within a subset of individuals of P. monodon from Indonesia and concluded that they could have arisen due to duplication, where lineage C comprised paralogs, either due to insertion in the nuclear genome (numts) or within the mtDNA. The authors argued that previous studies had thus highly overestimated the variation within the mtDNA. No indications of numts, duplications or heteroplasmy were found in our sequences, as evidenced by multiple peaks in the electropherograms or different sequences from the same individuals. In addition, genealogies based on the other two mtDNA markers, 16S rRNA, and a fragment of 16S rRNA and tRNAVal genes support the split between the C and the A lineages, as seen in comparisons of specimens from Australia (CR lineage A) and specimens from Bangladesh (CR lineage C). Although the estimated divergence time differed for the different trees, the confidence intervals were wide and the exact mutation rate of the different markers may also be uncertain. A split between populations from the eastern Indian Ocean and western Pacific has also been observed for the elongation factor 1-alpha gene (Duda & Palumbi, 1999). Sequences of CR from Kenya (Mkare et al., 2014), which clustered in lineage C, from 129 individuals (126 haplotypes) of P. monodon, did not either show any evidence of co–amplification of pseudogenes and/or paralogous genes. Walther et al. (2011) acknowledged that the occurrence of such divergent lineages could be due to heteroplasmy as a result of bi-parental inheritance. Heteroplasmy has been described for several species such as Atlantic salmon (Salmo salar Linnaeus, 1758) and brown trout (Salmo trutta Linnaeus, 1758) (Ciborowski et al., 2007) and among Drosophila species (Wolff et al., 2013; Dokianakis & Ladoukakis, 2014), and may be more likely to be detected in hybridizing species due to higher sequence divergence or higher paternal leakage. The distinct lineages A, C are likely to characterize previously isolated groups within P. monodon, or even cryptic species as in F. indicus (Alam et al., 2015). A secondary contact zone of the two lineages may exist in Indonesia and South-East Asia where paternal leakage may occur, resulting in heteroplasmy within individuals, as observed by Walther et al. (2011).

To conclude, this study has unveiled that the Bangladesh P. monodon population is large and has high genetic diversity, which is in accordance with populations of this species throughout its distribution. The Bangladesh population, although genetically differentiated, was most similar to the populations in India and Sri Lanka which belong to lineage C, one of three main mtDNA lineages within the species. The South-East Asia region was identified as containing an admixture of all lineages, whereas the western Indian Ocean and western Pacific Ocean regions were dominated by lineages C and A, respectively, but distinct population structure also occurred within the Western Indian Ocean. High genetic diversity in Bangladesh P. monodon and different lineages in the Indo-West Pacific region should be considered for proper management of genetic diversity and for aquaculture development of the species. This may be especially of concern where only one of the lineages is found, such as in the Bay of Bengal, where mixing of the different lineages should be avoided. The distinct lineages within the species may even represent cryptic species, considering the large divergence times that predate the split of many other species (e.g. Alam et al., 2015). Further analyses, based on nuclear genomic markers, are warranted to evaluate this evolutionary split of the mtDNA lineages; whether it presents cryptic species or a paralogous event. Breeding experiments among the lineages could provide further evidence of their divergence and possible paternal mtDNA transmission. The high genetic diversity and geographic patterns suggest that P. monodon is not threatened, but several concerns due to harvesting pressure, aquaculture and habitat destruction need to be considered.