Introduction

Pareuchiloglanis sinensis (Siluriformes: Sisoridae) is an endemic and highland fish species which only detected in some rivers of south-west China, i.e. the Jinsha river, the Dadu river and the Bailong river (figure 1) (Chu et al. 1999). In our study, 48 individuals were collected from the Dadu river, a branch of the Yangtze river. As of March 2014, a total of 26 dams were constructed, some are under construction or planned for the river, which poses a new threat to freshwater ecosystems and fish diversity in the Dadu river (https://www.wilsoncenter.org/publication/interactive-mapping-chinas-dam-rush). To facilitate a better understanding of the genetic diversity and population structure of P. sinensis for resource conservation, we isolated and characterized 28 polymorphic microsatellites of P. sinensis owing to the fact that microsatellites are the markers of choice for a variety of population genetic studies. Compared with the traditional methods of simple-sequence repeats (SSRs) marker development, next-generation sequencing is more cost efficient (Zheng et al. 2013; Liu et al. 2017). RNA-seq data were generated by Ma et al. (2016). In this study, to understand the population genetics of P. sinensis, we used unigenes assembled from RNA-seq for developing polymorphic SSRs with a Perl script, MISA (http://pgrc.ipk-gatersleben.de/misa/misa.html).

Materials and methods

Sample collection

In this study, methods involving fish were conducted in accordance with the Laboratory Animal Management Principles of China. Forty-eight individuals of P. sinensis were collected from the Dadu river.

Fig. 1
figure 1

Distribution of natural species range and sampled locality for P. sinensis. Red triangle indicate natural species range of P. sinensis, and black dot represents sampled locality in the current study.

RNA-seq data

RNA-seq data were generated by Ma et al. (2016). FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc) was used to control the quality of reads. We trimmed the adapter sequence and sites of lower quality reads (Phred score <20) with Cutadapt (Martin 2011). These cleaned reads were assembled using Trinity (Haas et al. 2013) software with default parameters. Contigs longer than 200 bp were retained for further analysis. CD–HIT–EST program (Li and Godzik 2006) with an identity threshold of 95% was used to remove low-coverage artifacts or redundancies. The unigenes were used for further microsatellite marker detection.

Table 1 Summary of SSRs identified in P. sinensis transcriptome unigenes.
Table 2 Characterization of 28 transcriptome-derived microsatellites of P. sinensis. Primers redesigned from original sequence.
Fig. 2
figure 2

Heatmap of frequency of repeats identified by RNA-seq. Dinucleotide repeats were the most frequent (72.53%), followed by trinucleotide (22.56%) and tetranucleotide (4.56%) repeats. SSRs with nine tandem repeats (20.90%) were the most common.

EST-SSR detection and primer development

Microsatellites within the unigene assembly were detected using a Perl script MISA (http://pgrc.ipk-gatersleben.de/misa/misa.html). The SSR loci were considered to contain only two to six nucleotide motifs with a minimum of 6, 5, 5, 5 and 5 repeats, respectively. Mononucleotide repeats were excluded from the EST-SSR search as their polymorphism is often difficult to interpret (Lopez et al. 2015).

The EST-SSR primers were designed using Primer 3.0 (Untergasser et al. 2012) under following criteria: (i) primers’ length ranged from 18 to 25 bases (optimum: 20 bp); (ii) PCR product size ranged from 100 to 300 bp; (iii) melting temperature was between \(58{^{\circ }}\hbox {C}\) and \(63{^{\circ }}\hbox {C}\) (optimum: \(60{^{\circ }}\hbox {C}\)) and (4) a GC content of 40–60% (optimum: 50%).

DNA extraction, PCR conditions and amplification of SSRs

We dissected a small piece of white muscle tissue or fin from the right side of the body of each specimen. All of the tissue samples were preserved in 95% ethanol. Total genomic DNA was extracted from the muscle tissue or fin by performing a standard salt extraction.

The polymerase chain reaction (PCR) amplification was carried out in \(30\ \mu \hbox {L}\) reaction mixture with \(\sim \) \(100\ \hbox {ng}\) of template DNA, \(1\ \mu \hbox {L}\) of each primer (10 pmol), \(3\ \mu \hbox {L}\) of \(10 \times \) reaction buffer, \(1.5\ \mu \hbox {L}\) of dNTPs (2.5 mM each) and 2.0 U of Taq DNA polymerase.

The PCR conditions for SSR included an initial denaturation step at \(94{^{\circ }}\hbox {C}\) for 5 min, followed by 30 cycles of denaturation at \(94{^{\circ }}\hbox {C}\) for 30 s, annealing at \(60{^{\circ }}\hbox {C}\) for 40 s and extension at \(72{^{\circ }}\hbox {C}\) for 30 s, followed by a final extension at \(72{^{\circ }}\hbox {C}\) for 10 min and storage at \(4{^{\circ }}\hbox {C}\).

Amplification products were separated using 20% polyacrylamide gel. Some loci did not amplify in all samples although we adjusted the PCR conditions. These loci were excluded from further testing. Besides, only those loci which showed polymorphism were considered for genotyping analyses. Fluorescently labelled primers were further synthesized to ensure the accuracy of visualized lengths in polyacrylamide gel.

Genotyping

Forward primers (table 1 in electronic supplementary material at http://www.ias.ac.in/jgenet/) were labelled with the FAM or HEX dye on the \(5^{\prime }\)-end. The PCR reaction conditions were the same as described above. The amplified products were detected on an ABI 3130xl Genetic Analyzer, and scored using GeneMapper software (Applied Biosystems, Foster City, USA).

Microsatellite data analysis

Important genetic parameters of polymorphic microsatellite loci such as polymorphism information content (PIC), the number of alleles (\(N_{\mathrm{A}}\)), observed heterozygosity (\(H_{\mathrm{O}}\)), expected heterozygosity (\(H_{\mathrm{E}}\)) were calculated using POPGENE 1.32 (Quardokus 2000). Possible deviations from the Hardy–Weinberg equilibrium (HWE) were tested by Fisher’s exact test with Bonferroni correction.

Results and discussion

In this study, 47,989 unigenes generated using RNA-seq data were used to detect potential microsatellite loci. A total of 7832 sequences were identified containing 9471 SSRs. A total of 1354 sequences contained more than one SSR (table 1). There were 70 motifs obtained, of which the most frequent was AC/GT (428, 54.65%), followed by AG/CT (406, 16.43%), ATC/ATG (138, 5.02%), AGG/CTT (123, 3.99%), AAG/CTT (101, 4.35%) and GTA/CAT (88, 3.79%) (table 1 in electronic supplementary material). Detailed analysis showed that dinucleotide repeats were the most frequent (72.53%), followed by trinucleotide (22.56%) and tetranucleotide (4.56%) repeats. SSRs with nine tandem repeats 1980 (20.90%) were the most common, followed by eight tandem repeats 1333 (14.07%) (figure 2).

To test the applicability and polymorphisms of SSR markers, 120 primer pairs were chosen randomly and validated across 48 P. sinensis individuals collected from the Dadu river (dot in figure 1). Of the 120 primer pairs only 86 (71.67%) were successfully amplified. Twenty-eight of the microsatellite loci showed polymorphism (table 2). Fluorescently labelled primers were further synthesized for these loci. The result showed that the number of alleles (\(N_{\mathrm{A}}\)) for each locus ranged from 2 to 14 and the mean number of alleles per locus was 7. The observed heterozygosity (\(H_{\mathrm{O}}\)) and expected heterozygosity (\(H_{\mathrm{E}}\)) varied from 0.104 to 0.958 and from 0.157 to 0.844, with an average of 0.583 and 0.613, respectively (table 2). Twenty loci exhibited high polymorphism (\(\hbox {PIC} {>} 0.5\)). Across all samples, 14 loci among 28 showed significant departures from the HWE (table 2).

P. sinensis is an endemic species with narrow distribution, which faced threat from human disturbance and habitat destruction. Thus, it is crucial that the current resources of P. sinensis be protected. Microsatellite markers developed in our study serve as a useful tool for the conversation genetic studies and population evaluation of P. sinensis.