Introduction

The Jinsha River flows through three provinces of China: Tibet, Yunnan, and Sichuan, and it has abundant water resources and a rich diversity of fish species (Chen et al. 2002). A number of large power stations, including the Xiangjiaba, Xiluodu, Baihetan, and Wudongde cascade hydropower projects, have obstructed the connectivity of the natural rivers and have had severe effects on the distribution and ecology of fish communities in the Jinsha River (Dudgeon 2011; Li et al. 2020; Liu et al. 2011). Xiangjiaba and Xiluodu Stations are the two lowest cascade hydropower projects in the Jinsha River (Liu et al. 2019). Previous studies have confirmed that the construction of cascade hydropower projects has had a tremendous influence on this region, especially concerning fish diversity (Esguicero and Arcifa 2010; Li et al. 2020; Sun et al. 2014; Xie 2003; Xie et al. 2007). However, since the construction of the Xiangjiaba hydropower project in 2006, biological studies have mainly focused on distributions, fish biological characters, and surveys of fishery resources (Gao et al. 2013; Li et al. 2020; Lin et al. 2015). Therefore, a molecular identification of fishes is in demand to conduct in the Xiangjiaba reservoir area.

The traditional fisheries resource surveys perform species identification based on morphological characters and substantial taxonomic expertise, for example biological and anatomical knowledge (Prokofiev 2010; Rosso et al. 2012; Wang et al. 2018). However, traditional taxonomic methods cannot accomplish species identification of fishes at the early life stages, of adults lacking visible distinguishing features, or of fragments of fish bodies (Shen et al. 2016; Trivedi et al. 2016). In addition, phenotypic plasticity, genotypic variability, cryptic species occurrence, and exotic invasive species might interfere in species delimitations that only depend on visible biological characters (Gutierrez-Gutierrez et al. 2013; He et al. 2011; Packer et al. 2009). DNA-based barcoding is being increasingly employed as an effective method of animal species identification. The method uses a fragment of the mitochondrial cytochrome c oxidase subunit I gene (COI) to differentiate animals within an area (Hebert et al. 2003; Ward 2009). The COI barcode region has been recognized as the standard for animal identification, and its efficiency has been demonstrated for assessing marine and freshwater fish diversity in different geographic areas. The identification success rate using this method has reached 98% (Aquilino et al. 2011; Chen et al. 2015; Lakra et al. 2011; Shen et al. 2016, 2019).

The aims of this study were to utilize the DNA barcoding method to identify fish species in the Xiangjiaba reservoir area and to evaluate the power of this method. Furthermore, this study aimed to construct a robust DNA barcode reference library that would provide basal data for monitoring, conservation, and management of fish resources in the Xiangjiaba reservoir area.

Materials and methods

Sample collection

To assess the fish diversity in the Xiangjiaba reservoir area, we conducted six fishery surveys and obtained more than 3000 fish specimens from 2016 to 2018. A total of 333 specimens from 57 species, 44 genera and 14 families were used in this study. The specimen information is listed in Table S1. These specimens were collected from three different sampling sites in the Xiangjiaba reservoir area in the downstream region of the Jinsha River. The three sites were Shaonvping (in the tail tributary of Xiangjiaba reservoir, 104°30′87.88″E, 28°0.63′27.34″N), Suijiang (in the middle tributary of Xiangjiaba reservoir, 103°98′58.7″E, 28°61′52.85″N) and Guixi (in the tail tributary of Xiangjiaba reservoir, 103°88′21.04″E, 28°32′71.57″N) (Fig. 1). All specimens were stored as vouchers in a 10% formaldehyde solution for further morphological identifications and were deposited in our laboratory (Animal Genetics Lab, Jianghan University). Tissue samples were preserved in 95% ethanol until DNA extraction. A preliminary species identification was conducted by a trained identifier who possessed specialized knowledge of fish classifications and who referred to available published literature (Ding 1994; Nelson et al. 2016; Wu 1977; Wu and Wu 1992; Zou et al. 2020).

Fig. 1
figure 1

Sampling sites of Xiangjiaba reservoir area indicated by the black circles

Genomic DNA extraction, PCR amplification, and sequencing

Total genomic DNA was extracted from fin clips according to the standard phenol–chloroform extraction method (Sambrook and Russell 2001). Universal primers used for amplifying the barcode region were FishF1 (5′-TCAACCAACCACAAAGACATTGGCAC-3′) and FishR1 (5′-TAGACTTCTGGGTGGCCAAAGAATCA-3′) (Ward et al. 2005). The PCR reactions were performed in 30 μl reaction volumes containing 15 μl sterilized distilled water, 12 μl 2 × Taq Master mix (MgCl2, PCR buffer, and dNTP), 1 μl forward primer, 1 μl reverse primer, and 1 μl of template DNA. The PCR protocol consisted of an initial denaturation at 94 °C for 5 min, 30 cycles of denaturation at 94 °C for 30 s, annealing at 50–56 °C for 45 s, and extension at 72 °C for 45 s, with a final extension at 72 °C for 10 min. The PCR products were fractionated by electrophoresis in 1.0% agarose gels and stained with GoldView. The amplified PCR products were purified using a DNA Agarose Gel Extraction Kit (Tiangen,Wuhan, China) and sequenced using an ABI3730 XL automatic DNA sequencer.

Data analyses

Sequence chromatograms were visually inspected and verified using the SeqMan program in DNASTAR Lasergene package (DNASTAR Inc., Madison, WI, USA). Forward and reverse sequences were assembled to produce a consensus contig for avoiding sequencing errors. The assembled sequences were aligned using MUSCLE v3.8.31 (Edgar 2004). The aligned sequences were coding sequences and were trimmed to the same length for further sequence analyses in MEGA 5.0 (Tamura et al. 2011). The morphological identifications of species were further validated through analysis of the sequences of 57 species in the BOLD database (Table S2). Some queries could be not searched accurately in the BOLD database; these were searched via blast in the NCBI database. A total of 333 COI sequences from this study have been deposited in the GenBank database. To exclude nuclear DNA pseudogenes and sequencing errors, we translated all the aligned COI sequences into amino acids. Genetic distances were computed based on the Kimura 2-parameter model and p-distances among species, genera, and subfamilies in the MEGA5.0 program with the default parameters and 1000 bootstrap replicates. Base composition, numbers of invariable and conserved sites, and pairwise genetic distances were computed in MEGA 5.0 (Tamura et al. 2011). To detect whether DNA barcode gaps existed, interspecific and intraspecific K2P distances of all species were calculated and compared, including maximum, minimum, and mean values.

Bayesian inference (BI) was used to construct the phylogenetic relationships of all species using MrBayes v3.2.3 (Huelsenbeck and Ronquist 2001; Ronquist et al. 2012) with the “unlink” and “prest ratepr = variable” model parameters. Two independent runs were performed with four independent Markov Chain Monte Carlo (MCMC) chains (three hot and one cold) for 50,000,000 generations initiated from a random tree, sampling one tree every 1000 generations. Convergence of the BI analyses was first assessed by the average standard deviation of split frequencies less than 0.01 and the potential scale reduction factors (PSRFs) close to 1.0 for all parameters. The Tracer v1.6 software (Rambaut et al. 2014) was used to detect the convergence of the BI analyses. We firstly discarded 12,500 trees as a conservative burn-in, and the remaining samples were used to generate a majority-rule consensus tree. The support values of each node of the BI tree were indicated via the Bayesian posterior probability (BPP). In addition, the neighbor-joining (NJ) tree was reconstructed in MEGA 5.0 with 1000 bootstrap replicates based on the K2P distance model to further evaluate phylogenetic relationships between species.

Results

A total of 333 fish mitochondrial DNA barcodes (672 bp) were obtained for 57 species from prior morphological identifications. The fish species belonged to 14 families (Cyprinidae, Botiidae, Nemacheilidae, Cichlidae, Percidae, Salangidae, Gobiidae, Centrarchidae, Hemiramphidae, Amblycipitidae, Siluridae, Clariidae, Ictaluridae, and Bagridae) and eight orders (Cypriniformes, Cichliformes Perciformes, Osmeriformes, Gobiiformes, Centrarchiformes, Beloniformes and Siluriformes). These sequences determined in this study have been deposited in the GenBank database (Genbank accession numbers: MT571700-MT572032, Table S1). The family Cyprinidae consisting of 32 fish species, covered 56.1% of the species, followed by the family Bagridae with 15.8%. These 57 morphologically identified species were collected from three locations in the Xiangjiaba reservoir area (Table S1 and Fig. 1). The 57 species contains 5 endemic species, 12 exotic invasive species, 37 widespread species and 3 unnamed species. There were no deletions, insertions, or stop codons in any of the amplified sequences. The number of specimens analyzed per species ranged from 1 to 31, with 5.9 individuals per species on average, and 14 species were represented by a single specimen.

COI sequence information

The species names from morphological identifications matched results of blastn searches against NCBI and BOLD databases with at least 99.03% similarity, except for three species only identified at the corresponding genus level due to a lack of sequence information for those three species in the BOLD and NCBI databases. These specimens were named Rhodeus sp., Gnathopogon sp., and Pseudobagrus sp. as putatively undescribed species (Table S2).

The final yielded COI sequence alignments (total = 672 bp) comprised of 386 conserved sites, 286 variable sites, and 274 parsimony informative sites. The overall average nucleotide composition of the data alignment was 24.4% A, 29.6%T, 27.7% C, and 18.3% G, exhibiting an A + T-rich pattern.

Genetic distance, barcoding gap, and species genetic diversity analysis

The K2P distances and p-distances were computed within various taxonomic levels, including the species, genus, and family levels (Table 1). The intraspecific K2P distances ranged from 0 to 1.69% (mean = 0.19%, standard error [SE] = 0.000) at the species level, whereas the intraspecific p-distances ranged from 0 to 1.60% (mean = 0.19%, standard error [SE] = 0.000). The results showed that the maximum K2P distances and p-distances within species were less than 2% (Table S3). The interspecific K2P distances ranged from 0.45 to 11.14% (mean = 2.98%, [SE] = 0.002) at the genus level, whereas the interspecific p-distances ranged from 0.45 to 9.40% (mean = 2.59%, [SE] = 0.001). The inter-genus K2P genetic distances ranged from 2.98 to 31.92% (mean = 11.95%, [SE] = 0.007) whereas the inter-genus p-distances ranged from 2.83 to 22.60% (mean = 9.43%, [SE] = 0.004). The measure of genetic variation increased with higher taxonomic levels. The TRR (taxonomic resolution ratio) values for the two models were 15.68 and 13.63, respectively. The species discrimination for 57 species was obtained using DNA barcoding gaps based on the intraspecific and interspecific K2P distances (Fig. 2). Barcoding gaps existed in all 57 species. No overlaps were detected in all species.

Table 1 Summary of genetic distances (K2P) within species, genus and family levels
Fig. 2
figure 2

DNA barcoding gaps for 57 fish species. The boxplots were constructed based the interspecific distances from Kimura 2-parameter model. Median interspecific distances with maximum and minimum values are represented by the upper and lower bars, respectively. The red color line represents mean intraspecific distance for each fish species while the blue color line represents the maximum intraspecific distance for each fish species

Among the 333 specimens, 140 haplotypes were detected using DnaSP 5.0 (Librado and Rozas 2009). Among 31 individuals of Hemiculter leucisculus, 11 haplotypes were detected. The haplotype diversity (h) and nucleotide diversity (π) for H. leucisculus were h = 0.871 and π = 0.00491, respectively.

Bayesian inference tree-based identification

A Bayesian inference tree was generated based on 333 individuals’ mitochondrial DNA barcodes (Fig. 3). Fifty-seven fish species formed distinct clusters in the Bayesian inference tree comprising 44 genera of 14 families of 7 orders according to the latest classification standards for fishes (Froese and Pauly 2019; Nelson et al. 2016). Species discrimination was highly resolved in the Bayesian inference tree. The fish species with two or more individuals formed con-species with strong support values of high Bayesian posterior probabilities with the exception of the node of Hemibarbus maculatus shared with its closed clade in Bayesian inference tree. As shown in Bayesian inference tree (Fig. 3), 30 Cyprinidae species, 3 Nemacheilidae species, 9 Bagridae species, 2 species for each Botiidae, Cichlidae, Gobiidae, and Siluridae family, and 1 species for the remaining seven families (Clariidae, Ictaluridae, Hemiramphidae, Amblycipitidae, Centrarchidae, Percidae, and Salangidae) clustered together at the family level. In addition, fish species of each genus clustered together as single monophyletic group, except for the genus Pseudobagrus that exhibited a single non-monophyletic cluster. Consistent phylogenetic topologies appeared in the neighbor-joining (NJ) tree.

Fig. 3
figure 3

Bayesian inference tree of 333 COI sequences from 57 fish species constructed with MrBayes v3.2.3. Bayesian posterior probabilities are shown above the nodes. Note: n represents the number of specimens for each species

Discussion

The rapid development of molecular markers has provided an alternative to traditional morphology-based taxonomy, which is not only time consuming but also faces challenges from phenotypic plasticity in species classification (Ewert et al. 2005; Gutierrez-Gutierrez et al. 2013; Packer et al. 2009; Roskam and Brakefield 1999). DNA barcoding using the COI fragment of the animal mitochondrial genome is a now standard criterion for species identification (Hebert et al. 2003; Hebert and Gregory 2005). Importantly, DNA barcoding approaches have opened up a completely new avenue and have become an effective tool in fish species identification (Chen et al. 2015; Knebelsberger et al. 2014; McCusker et al. 2013; Shen et al. 2019; Wang et al. 2018). In practice, the combination of morphological taxonomy and DNA barcoding has been validated to be more effective in species identification (Pecnikar and Buzan 2014). The standard COI threshold suggested is that the average interspecific distance should be 10 times higher than the average intraspecific distance (Hebert et al. 2004). In the present study, the quotient was 15.68 for the K2P distance and 13.63 for the p-distance, both of which exceeded the tenfold threshold (Costa et al. 2007) and were effective for fish species delimitation in the Xiangjiaba reservoir area. The intraspecific K2P distances of all species were below 2%, suggesting that there were no cryptic species in the Xiangjiaba reservoir area. Meanwhile, we found relatively low interspecific genetic distances in several fish species. For example, the interspecific genetic distance between Carassius auratus and Carassius gibelio is 0.60–1.37% below 2%. In comparison, the intraspecific genetic distance of Carassius auratus was 0.00–0.37%. The interspecific genetic distance of Rhodeus lighti and Rhodeus ocellatus is 0.45–0.60%. This phenomenon, that the genera Carassius and Rhodeus had remarkably low genetic distances at the genus level was a common pattern. Such relatively low interspecific genetic distances were also found in several species within the genus Triplophysa (Li et al. 2020). This phenomenon of remarkably low interspecific genetic distance being found in the genus Triplophysa could be explained by mitochondrial introgression between species (Feng et al. 2019).

Fifty-seven fish species from the Xiangjiaba reservoir area were included in our constructed reliable DNA barcode reference library, where each species with two or more haplotypes formed a distinct cluster with high posterior probability. The success ratio of species identification for fishes in Xiangjiaba reservoir area using DNA barcoding was 100%. Among 57 identified fish species, at the genus level, only the genus Pseudobagrus was not monophyletic. Both the genera Tachysurus and Leiocassis clustered with the genus Pseudobagrus under a single node. The genetic distances based on the K2P model within Pseudobagrus ranged from 2.48 to 8.99%. The phylogenetic relationship of this genus is controversial and was unresolved in previous studies (Cheng et al. 2009; Kottelat 2013; Zeng 2013). Zou et al. (2020) proposed that the genera Pelteobagrus, Leiocassis, and Pseudobagrus of the family Bagridae could not be clustered as monophyly, respectively. This result could be caused by rapid speciation and high variability during species evolution. Nevertheless, DNA barcoding was powerful and effective in fish species identification for the Xiangjiaba reservoir area.

During the evaluation of genetic diversity, four species (Coreius guichenoti, Pseudolaubuca engraulis, Pelteobaggrus vachelli, and Saurogobio dabryi) had relatively higher haplotype diversity in the Xiangjiaba reservoir area. This phenomenon indicated that these fishes possessed higher genetic diversity to cope with harsh environmental conditions. Through this study, we have a better understanding of fish species from a molecular perspective, contributing to monitoring, conservation, and management of fish resources in the Xiangjiaba reservoir area.

Conclusions

In this study, 333 fish mitochondrial DNA barcodes (672 bp) were determined from 57 species belonging to 14 families in 8 orders. The mean genetic divergence at the species, genus and family levels were 0. 19%, 2.98%, and 11.95%, respectively based on the Kimura 2-parameter model. There were clear DNA barcoding gaps between the intraspecific distance and the interspecific distances from the K2P model. The Bayesian inference (BI) and neighbor-joining (NJ) trees showed that all individuals clustered as 57 distinct species, confirming the accuracy of the morphological identification. Our results demonstrated that DNA barcoding was highly efficient for the identification of fish species, and the method provided basal data for ecological assessments, management, and conservation of fish resources in the Xiangjiaba reservoir area.