Introduction

The Neotropical region, particularly in Brazil, harbors the highest diversity of freshwater fish worldwide (Reis et al. 2016) as a result of a long and complex biogeographic history driven by continental drift and geomorphological dynamics of hydrographic basins (Lundberg et al. 1998; Ribeiro 2006; Ribeiro et al. 2018). This scenario is conspicuously intricate for the coastal basins in eastern Brazil because of the several rearrangements of their drainages related to tectonic activities and sea level fluctuations (Bizerril 1994; Ribeiro 2006; Ribeiro et al. 2016; Thomaz and Knowles 2018). In this sense, both headwater capture events (Christofoletti 1975; Bishop 1995) and retraction of sea level could have determined the past connection in paleodrainages, followed by their separation into the isolated hydrographic basins observed at present (Dias et al. 2014; Thomaz and Knowles 2018). Most likely, these forces have played a major role in the high richness and endemism of ichthyofauna from coastal drainages in eastern Atlantic basins (Gery 1969; Bizerril 1994). Over the last decade, the number of endemic species (e.g. Zanata and Camelier 2009; Severi et al. 2010), cryptic forms (e.g. Bitencourt et al. 2011; Medrado et al. 2012, 2015; Oliveira et al. 2016; Souza et al. 2018) and undescribed or new taxa (e.g. Cetra et al. 2009, 2010; Trindade et al. 2010; Camelier and Zanata 2014; Barreto et al.2018) in rivers from northeastern Brazil have increased. In this sense, DNA barcode (Hebert et al. 2003) studies have been highly efficient in resolving taxonomic uncertainties and describing accurately the richness in Neotropical ichthyofauna (Pereira et al. 2013; Barreto et al. 2018; Anjos et al. 2020).

Nonetheless, little is known about the spatial–temporal diversification patterns in fish species from eastern coastal basins, properly recognized as areas of “insufficient knowledge” (Nogueira et al. 2010). To change this scenario, small-sized and/or phylopatric taxa can be used as potential models to infer biogeographic processes and speciation of freshwater fish in Neotropical rivers since they usually present restricted gene flow, being particularly susceptible to vicariance effects and allopatric isolation, thus accumulating high rates of endemism (Castro 1999; Montoya-Burgos 2003). This is particularly true for the dwarf plecos (Loricariidae: Hypoptopomatinae)—a group of benthonic and low-vagile species commonly found in habitats such as rapid waters and waterfalls from several eastern basins in South America (Roxo et al. 2017a).

The genus Parotocinclus is one of the most diversified taxon within Hypoptopomatinae, comprising 34 recognized species so far (Eschmeyer et al. 2019). However, their taxonomic and evolutionary relationships remain mostly unclear, including evidence of undescribed species and taxa that should be reallocated to other genera (Gauger and Buckup 2005; Roxo et al. 2014) or to distinct tribes (Roxo et al. 2019). Moreover, the “diagnostic” morphological features of several species in this genus are subtle and usually imprecise to differentiate them properly (Armbruster, 2004; Cramer et al. 2011). Taxonomic reports in isolated coastal basins from the state of Bahia, northeastern Brazil, identified five species of Parotocinclus: P. cristatus (Garavello 1977), P. jimi (Garavello 1977), P. minutus (Garavello 1988), P. bahiensis (Britski and Garavello 2009) and P. arandai (Sarmento-Soares et al. 2009).

Like most species from these overlooked river basins, P. cristatus lacks detailed biological information, being restricted to their taxonomic identification and distribution. Originally described only in Almada River basin, southern Bahia (Garavello 1977), the distribution of this species was further expanded to three adjacent drainages, all characterized by intensive human occupation and environmental degradation: Cachoeira and Una (southern range) and Contas (northern range) basins (Schaefer 2003; Cetra et al. 2009; Camelier and Zanata 2014). P. cristatus are restricted to four coastal hydrographic systems in Eastern Atlantic within the Northeastern Atlantic Forest (NAF) ecoregion, suggesting they share a common biogeographic history (Camelier and Zanata, 2014). However, recent studies based on genetic data in other taxa of Neotropical fish from these basins have detected putative new species and evolutionary relationships among lineages that contradict the expected pattern according to present hydrogeological configurations (Barreto et al. 2018; Souza et al. 2018).

The taxonomic uncertainties of Parotocinclus and their biological features (low vagility, small body size, abundance and site-fidelity) are appealing to test their potential in tracing phylogeographic history and to assess the actual diversity of regional species from isolated coastal basins in northeastern Brazil based on molecular markers. These data are particularly important because most rivers from this region combine high endemism, scarce information and increased human impacts (pollution, damming, deforestation of margins, and introduction of non-native species), being regarded as hotspots for the conservation of aquatic organisms (Menezes et al. 2007; Cetra et al. 2010; Nogueira et al. 2010; Gomes et al. 2012). Therefore, the first phylogeographic analysis was carried out on P. cristatus though their entire range, in order to test the hypothesis of cryptic diversity and to infer their genetic structure along the coastal drainages in NAF. Besides shedding some light on the intricate biogeographic processes of eastern Atlantic rivers, the present results also stress the importance of these areas to biodiversity conservation in times when environmental issues and support to science in Brazil have been neglected.

Methods

Sampling and data collection

We collected 108 specimens of Parotocinclus cristatus (4–19 specimens per locality) in the main river and tributaries from Upper, Middle and Lower Almada basin as well as Contas, Cachoeira and Una basins (Fig. 1a, b), encompassing the known range of this species. In addition, three specimens of Parotocinclus jimi were also sampled in a tributary from Middle Contas River basin (see Online Resource, ESM 1) for comparative analyses. The permission for collecting these taxa was granted by the Chico Mendes Institute of Biodiversity – ICMBio (license SISBIO n. 26752). The euthanasia (according to Blessing et al. 2010) and experimental procedures were approved by the Ethics Committee of Utilization of Animals from Universidade Estadual do Sudoeste da Bahia (CEUA/UESB, number 32/2013). Voucher specimens were identified and deposited in the fish collection at Universidade Estadual Paulista (UNESP) in Botucatu-SP, National Institute of Atlantic Forest in Santa Teresa-ES, and in the Zoology Museum at Universidade Federal da Bahia (UFBA) in Salvador-BA.

Fig. 1
figure 1

Map of sampled collection sites (a) of Parotocinclus cristatus (b) in northeastern Brazil. The detail in b highlights the tuft of denticles on occiput, a main diagnostic feature for this species. See online version for colored figure

The genomic DNA was isolated from muscle tissues (stored at − 20 °C in 100% ethanol) using the Wizard® Genomic DNA Purification (Promega) kit according to the manufacturer’s instructions. Two fragments of the mitochondrial DNA (mtDNA) genes Cytochrome c Oxidase subunit I (COI) and Cytochrome b (Cyt-b) were amplified using VF1_t1/VR1_t1 (Ivanova et al. 2007) and GluDG.L /H16460 (Perdices et al. 2002) primer sets, respectively. A nuclear fragment of the Rhodopsin gene was amplified using Rod-F2w/Rod-R4n primers (Sevilla et al. 2007). Each PCR (Polymerase Chain Reaction) comprised 1 × buffer, MgCl2 at 2 mM, 0.2 ng/μL of each primer, 0.2 mM of dNTPs, 50 ng of template DNA, 0.04 U/uL of Taq DNA polymerase (Invitrogen) and ultrapure water to a final volume of 15 μL. The annealing temperature tests for each of the primers after electrophoresis in 1.2% agarose gel revealed unique bands for all tested temperatures (50–60 °C). Thus, the PCR conditions were: a first denaturation step at 95 °C for 4 min; 35 cycles at 95 °C (40 s), 54 °C (40 s) and 72 °C (90 s), and a final extension step at 72 °C for 7 min.

The PCR products were purified in 20% polyethylene glycol (PEG) according to Paithankar and Prasad (1991) and the sequencing reactions were performed bidirectionally using BigDye Terminator v. 3.1 Cycle Sequencing (Applied Biosystems/Life Technologies, USA) according to the manufacturer’s instructions. After precipitation in EDTA (125 mM), 100% and 70% ethanol, the sequences were automatically detected in an ABI 3500 XL Genetic Analyzer sequencer (Applied Biosystems/Life Technologies).

DNA sequence analyses

The DNA fragments were checked using BLAST (Basic Local Alignment Search Tool) in NCBI (National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov) and compared to sequences available in GenBank (http://www.ncbi.nlm.nih.gov/genbank) for homologies. The consensus sequences were aligned using the Clustal W tool (Thompson et al. 1994) after edition in BioEdit Sequence Alignment Editor 7.2.6.1 (Hall 1999). In the case of Rhodopsin fragments, heterozygous sequences were manually edited according to IUPAC (International Union of Pure and Applied Chemistry) code.

All contigs were translated to check if they represent coding regions free of stop codons by using the software MEGA v. 10 (Kumar et al. 2018). The COI sequences as well as the fragments from other genes were uploaded in the BOLD (Barcode of Life Database) platform (project “Phylogeography of Parotocinclus cristatus—PCA” as part of the FISHBOL campaign). The final alignment and edition were performed along with a total of 36 DNA sequences from other Parotocinclus species, 19 sequences of closely related genera from Hypoptopomatinae and six representatives of Loricariidae representatives (used as outgroups) available in BOLD and GenBank (Online Resource, ESM 2, ESM 3).

Genetic structure

To infer the interrelationships among samples, a haplotype network was built for each locus based on median-network method (Bandelt et al 1999) in the software PopART (http://popart.otago.ac.nz). For the following analyses, we used only the mitochondrial markers since the phylogeographic signal of Rhodopsin sequences was weak (see Results).

The number of polymorphic sites (S), total number of mutations (Eta), haplotype (h) and nucleotide (π) diversity as well as the demographic history using Tajima’s D (Tajima et al. 1998), Fu’s Fs (Fu 1997) and R2 (Ramos-Onsins and Rozas 2002) tests based on 10,000 coalescent simulations were estimated using the software DNA Sequence Polymorphism (DnaSP) v. 6 (Rozas et al. 2017). The Extended Bayesian Skyline Plot (EBSP) (Heled and Drummond 2008) available in the software BEAST. 1.10.4 (Suchard et al. 2018) was used to infer putative changes in effective population size over time. Two independent EBSP runs were performed for each phylogroup with 10 million generation chains sampled every 1000 generations and 10% burn-in, following the strict-clock model, mutation rate of 1% under a normal distribution prior and GTR + G substitution model as estimated by jModelTest v. 2.1.10. The quality and the convergence of runs and the coalescence graph were verified using Tracer v. 1.7.1 (Rambaut et al. 2018).

The genetic structure was inferred by a spatial analysis of molecular variance (SAMOVA) in the software SAMOVA 2.0 (Dupanloup et al. 2002) with the number of groups (k) ranging from 2 to 7 with 1000 simulations for each k value. The most suitable k was established according to the highest structure among groups based on FCT and FST index values, as recommended by the abovementioned authors. In addition, a Bayesian Analysis of Population Structure (BAPS) was also carried out in the software BAPS v. 6.0 (Corander et al. 2008), assuming admixture model and 500 repetitions per individual.

The pairwise FST values and gene flow (Nm) were used to evaluate the genetic differentiation between pairs of populations and the partition of the genetic variation within and among populations. Both were determined by an analysis of molecular variance (AMOVA) (Excoffier et al. 1992) in the software Arlequin v. 3.5.2.2 (Excoffier and Lischer 2010) with 10,000 random permutations. The significance levels in pairwise FST values were adjusted using Bonferroni correction (Rice 1989). The Rcmd function available in Arlequin v. 3.5.2.2 was used to graph representation of pairwise FST values. The correlation between geographic (straight line between sites) and genetic distances was estimated by Mantel’s test (Mantel 1967) in the web server IBDWS 3.23 (http://ibdws.sdsu.edu/~ibdws/) (Jensen et al. 2005) for the samples from Almada River basin, since only one location was sampled along the Cachoeira, Una and Contas basins.

DNA barcoding and species delimitation methods

Following the standard procedure in DNA barcode analyses (Hebert et al. 2003), a pairwise distance matrix using the Kimura-2-parameter (K2P) model (Kimura 1980) based on COI fragments was obtained in MEGA v. 10 (Kumar et al. 2018) and used to generate a Neighbor-Joining (NJ) tree (Saitou and Nei 1987) with 1000 bootstrap replicates (Online Resource, ESM 4).

Considering the robustness of the COI marker for discriminating species (Ratnasingham and Hebert 2013) and that the use of combined species delimitation methods increases their reliability (e.g. Fujisawa and Barraclough 2013; Tang et al. 2014; Anjos et al. 2020), we selected five different methods based on distance and coalescence approaches to analyze the COI fragments. These sequences were stored in BOLD were validated as barcode tags, being automatically assigned to Barcode Index Numbers (BINs) through the algorithm RESL (Refined Single Linkage) (Ratnasingham and Hebert 2013). Additionally, ABGD (Automatic Barcode Gap Discovery) (Puillandre et al. 2012) was performed after inclusion of the pairwise genetic distance of COI sequences in a free online platform (http://wwwabi.snv.jussieu.fr/public/abgd/). In the case of the General Mixed Yule-Coalescent (GMYC) analysis (Fujisawa and Barraclough 2013), an ultrametric tree built in the software BEAST 1.10.4 (Suchard et al. 2018) was used as input through the web server http://species.h-its.org/gmyc/ after estimation of the best-fit substitution model by jModelTest 2.1.10 (Darriba et al. 2012). For the Bayesian Poisson Tree Process (bPTP) analysis (Zhang et al. 2013) (also available in https://species.h-its.org/) and Multi-rate Poisson Tree Process (mPTP) (Kapli et al. 2017) (https://mptp.h-its.org/#/tree) a Maximum Likelihood (ML) tree was built in the software RAxML 8.2.10 (Stamatakis 2014) using the CIPRES Science Gateway v. 3.3 (http://www.phylo.org/index.php) (Miller et al. 2010). The input file for bPTP included one individual per haplotype to avoid generating an unrealistic number of species (Blair and Bryson 2017) while the mPTP encompassed the complete sequence database, as recommended by Kapli et al. (2017).

Phylogenetic and divergence time inferences

In order to evaluate the evolutionary relationships among Parotocinclus representatives, phylogenetic trees based on ML and Bayesian inference (BI) were generated from the three amplified loci (COI, Cyt-b and Rhodopsin) using the platform CIPRES Science Gateway 3.3 (http://www.phylo.org/index.php) (Miller et al. 2010). The ML inference was carried out in RAxML 8.2.10 (Stamatakis, 2014) using the GTR + G model and 1000 bootstrap replicates, while BI was performed in Mr. Bayes v. 2.6.3 (Huelsenbeck and Ronquist 2001) using the best-fit substitution model for each locus (GTR + I + G for COI, GTR + G for Cyt-b and GTR + I + G for Rhodopsin). The BI phylogenies comprised two independent runs of 10 million generations with four Markov chains sampling a tree every 1000 generations and a burn-in of 10%. All models used were estimated by the jModelTest 2.1.10 (Darriba et al. 2012) using the Akaike Information Criterion (AIC) for ML and based on the Bayesian Information Criterion (BIC) for BI.

The divergence time was estimated from a calibrated tree in the software BEAST v. 1.10.4 (Suchard et al. 2018) based on the COI database since it encompassed the highest number of sequences and because the mutation rate for this locus (~ 1%) is well established for fishes (Strecker et al. 2004; Ornelas-García et al. 2008; Thomaz et al. 2015). We assumed a strict-clock model as supported by the ML clock test in the software MEGA v. 10 (Kumar et al. 2018). Therefore, we carried out two runs and four chains for 100 million generations sampling every 1000 steps, following a mutation rate of 1% per million years, GTR + I + G substitution model, and a normal distribution prior. The quality and the convergence or runs were evaluated in Tracer v. 1.7.1 (Rambaut et al. 2018) to check if the Effective Sample Size (ESS) values were above 200. The final tree was generated in TreeAnnotator v. 1.10.4 (Suchard et al. 2018) using 10% of burn-in and the branch support was based on posterior probabilities (PP). All trees were visualized and edited in the FigTree 1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/) and Inkscape v. 0.92.3 (https://inkscape.org/).

Results

Genetic structure

The haplotype network based on mitochondrial genes supported the high genetic differentiation between Contas, Almada and Cachoeira + Una groups as well as the structure between samples from Lower and Middle/Upper portions in Almada River basin. On the other hand, the haplotype network based on Rhodopsin sequences discriminated only two haplotypes corresponding to samples from Cachoeira + Una and Almada + Contas basins (Fig. 2a). For the Cyt-b, the group from Middle/Upper Almada River presented the highest levels of haplotype (h) and nucleotide (π) diversity, as well as the only significant signal of population expansion (Table 1). Moreover, the demographic history inferred from EBSP showed a subtle expansion in this group between 80 and 40 thousand years ago, followed by population stability ever since (Online Resource, ESM 5).

Fig. 2
figure 2

Summary analysis of population structure within Parotocinclus cristatus: (a) haplotype network based on COI, Cyt-b and Rhodopsin markers. The circles represent the haplotypes (their size is proportional to the frequency of each haplotype), while hatch marks indicate the number of mutations among haplotypes. Distinct colors are used to represent the collection sites. The boxes bounding the groups in the COI e Cyt-b haplotype network shower the clusters obtained after Bayesian Analysis of Population Structure (BAPS) based on mitochondrial DNA sequences; (b) spatial analysis of molecular variance (SAMOVA) in COI and Cyt-b fragments with estimation of fixation index values for each population cluster (k) after 1000 partition simulations. See online version for colored figure

Table 1 Population genetic parameters based on COI and Cyt-b for each phylogroup in Parotocinclus cristatus

According to SAMOVA, the grouping composed of four clusters recovered simultaneously the highest FST and FCT values (Fig. 2b). On the other hand, the highest posterior probability in, BAPS supported the formation of four and three clusters for the Cyt-b and COI markers, respectively (Fig. 2a). Indeed, the AMOVA considering three hierarchical levels revealed that about 89% to 95% of the genetic variation are explained by the differentiation among Contas, Almada and Cachoeira + Una lineages (Table 2).

Table 2 Analysis of Molecular Variance (AMOVA) among and within Contas, Almada, and Cachoeira + Una genetic groups of Parotocinclus cristatus

The pairwise FST values and gene flow (Nm) were congruent with the structure pattern indicated by both mitochondrial markers (Fig. 3a, c). The non-significant values of FST (p ≥ 0.05) and high values of Nm (> 2) (according to Hebert 2004) presented by samples between the Middle and High Almada indicate active gene flow between these locations. On the other hand, the samples from Lower portions in Almada River basin were highly structured from the others in the same basin and with low gene flow (FST > 0.25; Nm < 1;p < 0.05) (according to Wright 1978). Similarly, high and significant FST (p < 0.05) values and a relatively low gene flow (Nm = 1.3) were observed between the samples from Cachoeira and Una River basins. A significantly positive correlation (p < 0.05) was observed between genetic and geographic distance among samples in the Almada River basin (Fig. 3b, d).

Fig. 3
figure 3

Heatmap highlighting the pairwise FST values and respective significant levels after Bonferroni correction (NS = non-significant, * = significant) (a, c) and graph of Mantel’s test (b, d) based on COI (a, b) and Cyt-b (c, d) fragments from samples of Parotocinclus cristatus. The Mantel’s test was restricted to the samples from Almada River basin. See online version for colored figure

DNA barcoding and species delimitation methods

The COI fragments (589 bp) revealed high genetic distances among samples (Online Resource, ESM 4). The mean genetic distance considering all collected specimens morphologically identified as P. cristatus was 4.4%. Nonetheless, three genetic groups were consistently recovered with intraspecific divergence below < 0.5%: (1) the samples from Almada River basin (0.4%), (2) from Cachoeira and Una River basins (0.2%) and the specimens from Contas River basin (0%). The genetic distance between the Cachoeira + Una and the other samples (Almada and Contas samples) was 9%. The genetic distance between samples from Almada and Contas River basins was 5%.

When the COI sequences of P. cristatus from the present study were compared to those available for congeneric species, the genetic divergence ranged from 10% (Parotocinclus sp1. X Cachoeira + Una group) to 25% (P. sp4 X Almada group). In relation to other genera in the subfamily, the distance values varied from 9% (Pseudotothyris obtusa x Contas group) to 21% (Otothyris travassosi x Cachoeira/Una group) (Online Resource, ESM 4).

In relation to the species delimitation methods, the RESL algorithm available in BOLD platform discriminated four MOTUs: BOLD:ADI4749 (Almada group), BOLD:ADI4931 (Contas group), BOLD:ADH9476 (Cachoeira + Una group) and BOLD:ADP2528 (Parotocinclus jimi, middle Contas River). The same pattern was obtained by ABGD (p < 0.05), GMYC, bPTP (JOB ID: 48393) and mPTP analyses (Fig. 4).

Fig. 4
figure 4

DNA barcode analysis in Parotocinclus cristatus. Three asterisks on branches represent high support values for all inferences (NJ > 98%, ML > 98%, BI > 0.9). Two asterisks indicate that at least two inferences were highly supported (> 98%, > 0.9). The columns on right summarize the results obtained by distinct species delimitation algorithms. See online version for colored figure

Phylogenetic and divergence time inferences

The topologies of the NJ, ML and BI trees were similar for the COI and Cyt-b (835 pb) markers (Fig. 4; (Online Resource, ESM 6). High bootstrap and posterior probability values supported the same clusters indicated by the species delimitation methods, excepting for the bootstrap values (< 98%) in ML inference based on COI data for the Almada group (Fig. 4). In addition, the BI based on Cyt-b sequences recovered two groups within the samples from Almada River basin, separating the individuals from Lower and Middle/Upper portions (Online Resource, ESM 6). The groups from Contas and Almada River basins were closely related, being placed as the sister group of Cachoeira + Una cluster. On the other hand, the interrelationships among the sampled specimens and other congeneric species and other representatives in Hypoptopomatinae has low support values for both mitochondrial markers. The only exception was the grouping among P. jimi, Parotocinclus sp2, P. cearensis and P. spilosoma observed in BI tree based on COI data (Fig. 4).

As expected for nuclear genes, the ML and BI phylogenetic reconstruction based on the Rhodopsin sequences (473 bp) revealed less accentuated phylogenetic structure (Online Resource, ESM 6). Nonetheless, two major clusters were recovered, represented by samples from Almada + Contas River basins and the populations from Cachoeira + Una basins.

Two groups emerged around 4 million years ago—mya (95% Highest Posterior Densities—HPD: 2.79 ± 4.76) (Early Pliocene) when the Cachoeira + Una group diverged from Contas + Almada cluster (Fig. 5). The subsequent cladogenic event was estimated around 2.15 mya (95% HPD: 1.47 ± 2.87) during Late Pliocene, leading to the separation between groups from Contas and Almada River basins. The divergence between the groups from Lower and Middle/Upper portions of Almada basin has taken place about 0.63 Ma (95% HPD: 0.41 ± 0.89) (Pleistocene).

Fig. 5
figure 5

Timetree for Parotocinclus cristatus based on COI dataset (the posterior probability values are shown below the nodes). The numbers in brackets and the blue bars indicate the 95% Highest Posterior Densities (HPD). See online version for colored figure

Discussion

Cryptic diversity in Parotocinclus cristatus

The taxonomic status and the phylogenetic relationships within Hypoptopomatinae are particularly confusing as evidenced by several reports of new descriptions, redescriptions and reallocations of genera and tribes in which putative diagnostic features have proved to be polymorphic or shared among other taxa (e.g. Roxo et al. 2015, 2017b, 2019; Ramos et al. 2016; Lehmann et al. 2018). Moreover, many Neotropical fish species can be more widely distributed than previously recognized (Garavello 1977; Schaefer 2003; Camelier and Zanata, 2014), thus jeopardizing the definition of their actual range. Thus, the biased estimates of richness (Linnean shortfall) and species ranges (Wallacean shortfall) are critical to conservation efforts and environmental policies since threatened species might remain ignored (Bini et al. 2006; Casciotta et al. 2013; Hortal et al. 2015). Currently, DNA-based approaches have been successfully used to resolve both shortfalls in Neotropical ichthyofauna and to infer their evolutionary history, with emphasis in DNA barcode using COI sequences (e.g. Roxo et al. 2017a; Barreto et al. 2018; Souza et al. 2018).

Accordingly, the genetic divergence among the populations of P. cristatus based on COI indicated a complex pattern of cryptic diversity. The mean intraspecific distance within P. cristatus (4.4%) was ~ 3.4 times lower than that observed among congeneric species (14.8%) (Online Resource, ESM 4). On the other hand, three population groups presented reduced intraspecific divergence (0 to 0.4%) but remarkable high genetic differentiation among each other (5 to 9%), as follows: (1) samples from Cachoeira and Una River basins, (2) the specimens from Almada River basin, and (3) the population of P. cristatus from Contas River (Online Resource, ESM 4). Indeed, taking the genetic distance among and within these groups into account, the barcode gap is above the tenfold threshold established for discriminating species in barcoding studies (Hebert et al. 2003). Similarly, the genetic divergence among these groups is much higher than the minimum value of 2%, widely used to indicate interspecific differences in fishes (e.g. Ward et al. 2009). Therefore, the genetic differentiation among groups from Cachoeira + Una, Almada, and Contas hydrographic basins is compatible with the existence of, at least, three MOTUS (see Floyd et al. 2002), characterizing P. cristatus as a species complex.

Corroborating this suggestion, the phylogenetic inferences based on mitochondrial genes and all species delimitation algorithms recovered the same number of independent taxonomic units within P. cristatus (Fig. 4) (Online Resource, ESM 6). Tang et al. (2014) recommend combining Poisson Tree Process (such as bPTP and mPTP) and GMYC methods in order to increase the reliability of species identification, as carried out in this study. In addition, we included the BIN analysis, also considered a highly informative approach to discriminate MOTUs inasmuch as this method is free of putative biases caused by incomplete or poorly resolved phylogenetic inferences (Ratnasingham and Hebert 2007), as observed in the controversial and species-rich group of Hypoptopomatinae catfishes (Roxo et al. 2019).

The only exception to the abovementioned pattern refers to the tree topology and the haplotype network based on Rhodopsin sequences which grouped the MOTUs from Contas and Almada River basins (Fig. 2a). This result might be related to the relatively low evolutionary rate of this nuclear gene, thus failing in discriminating between closely related species (Behrens-Chapuis et al. 2015). On the other hand, the conspicuous and highly supported genetic differentiation observed in a more conservative DNA marker for the Cachoeira + Una cluster in relation to the others reinforces the divergence of the former as a quite distinctive cluster (Fig. 2a) (Online Resource, ESM 6).

Furthermore, the three clades within P. cristatus presented deep divergence in relation to congeneric species, including P. jimi from Contas River basin, with genetic distances up to 25% (Parotocinclus sp. 4). These values are comparable or even higher than those observed among distinct genera (Online Resource, ESM 4). For instance, Epactionotus bilineatus and Hisonotus depressicauda presented a genetic divergence of 9%, similar to that observed among some Parotocinclus species, revealing the confusing systematic relationships of Hypoptopomatinae and the paraphyletic status of Parotocinclus (Gauger and Buckup, 2005; Cramer et al. 2011; Roxo et al. 2019). These issues are likely to be related to the underrepresentation of genetic data as well as to the large number of “obscure taxa” awaiting identification at species-level in this fish group (Page 2016), as also demonstrated in the present study.

Therefore, the actual richness in dwarf plecos remains largely unresolved, thus hindering their phylogenetic reconstruction (Roxo et al. 2017a). Consequently, undescribed taxa from poorly known regions combined with increased threats to local biodiversity, as observed in the presently studied areas, are potentially drawn to extinction before being properly recognized (Niemiller et al. 2013). This scenario is particularly alarming for the coastal basins from eastern Brazil, since Roxo et al. (2014) proposed that unique evolutionary lineages of dwarf plecos are common along these drainages, favored by the phylopatric behavior of this group of fishes. Likewise, the presence of conspicuous MOTUs within P. cristatus from nearby coastal rivers in northeastern Brazil reiterates the hypothesis proposed by these authors.

Phylogeographic and conservation inferences

The AMOVA based on mtDNA markers confirmed that most of genetic variation in P. cristatus are explained by the differentiation among the clusters from Almada, Contas and Cachoeira + Una hydrographic basins (Table 2), providing additional support to the barcode and phylogenetic inferences. On the other hand, under a phylogeographic approach, the mitochondrial makers revealed four groups (Fig. 2, 3) since the populations from Upper/Middle and Lower Almada were clearly discriminated. This pattern was particularly evident in the results based on Cyt-b sequences, showing the high sensitivity of this marker to discriminate species or unique lineages (Leonardo et al. 2016; Velasco et al. 2016). The high haplotype diversity detected in Cyt-b gene combined with the high number of sampled specimens in Upper/Middle Almada portions could also explain the only significant signal of demographic expansion observed in this group (Table 1; Online Resource, ESM 5). Both features could overestimate the population size (Thomaz et al. 2015).

In addition, the fixation index values (FST) and gene flow (Nm) together with the high and significant correlation between genetic and geographic distances also supported the differentiation of lineages within Almada River basin (Fig. 3). Apparently, the distances among collection sites might have influenced the divergence levels in this basin. The sampled areas between Upper and Middle Almada portions are relatively close geographically from each other (~ 11.5 km) thereby favoring the gene flow among specimens and a less accentuated population structure, while the site sampled in Lower Almada is at least 38.71 km apart from the other populations along this river system (Fig. 1). In general, these data fit the stepping-stone model of genetic differentiation in which migration rates increase among nearby subpopulations, while more isolated groups would accumulate high genetic differences (Slatkin 1987). Such divergence within a single basin is important to direct conservation policies which should focus on conserving local populations. Unfortunately, only a small region in Lower Almada (called “Lagoa Encantada”) has been officially protected but lacks effective management and monitoring (Gomes et al. 2012). In fact, protected areas in Brazil are not designed to assure the conservation of freshwaters (Azevedo-Santos et al. 2018).

Again, the Cachoeira + Una clade presented the largest divergence in relation to the other groups, even considering the analysis using nuclear DNA (Rhodopsin) fragments (Online Resource, ESM 6), regarded as less efficient markers in detecting structure than the mtDNA sequences (Zink and Barrowclough, 2008). Such remarkable differentiation might be explained by the long period of isolation between the Cachoeira + Una populations from the others, estimated in about 4 mya (Early Pliocene), resulting in higher accumulation of mutations after vicariance. Analogously, the closer relationship between Contas and Almada groups could be related to a more recent divergence (2.15 mya), while the population split between Lower and Upper/Middle Almada dated back to 800 thousand years ago, during Pleistocene (Fig. 5).

In fact, the warm climate in Early Pliocene increased the sea level (Weitzman et al. 1988; Camelier et al. 2018). Thus, formerly connected fluvial systems (Thomaz and Knowles 2018) could favor their dispersal and colonization, boosted by the environmental plasticity of Parotocinclus species (Cetra et al. 2009) prior this period. As the sea level increased, these would have become isolated ever since leading to the deep genetic differentiation between Cachoeira + Una and Almada + Contas groups.

On the other hand, the genetic similarity between populations from Cachoeira and Una River basins (Fig. 1) contrasts with the great geographic distance (43.1 km in straight line and even greater if we assume the river pathway) among collection sites and to the fact they belong to distinct hydrographic systems. Nonetheless, successive events of headwater capture (inferred by the presence of several river elbows and proximity between tributaries, highlighted in Online Resource, ESM 7) and the fact that both drainages share a common geological formation (“Barreiras”) across most of their extension (Nacif 2000) could account for the close relationship between local ichthyofauna. Additionally, the Una river basin encompasses a less rugged relief situated at lower altitudes than Cachoeira basin (DePaula et al. 2012). Such difference in landscape is particularly suitable to headwater capture (Christofoletti 1975; Bishop 1995) from the high-altitude basins (e.g. Cachoeira) to those at low altitudes (e.g. Una).

Likewise, the genetic similarities between samples from Almada and Contas basins when compared to the other groups should be associated with the neotectonic activities in the Brazilian Atlantic coast (Saadi et al. 2002). A recent report elegantly demonstrated that relatively recent events of river capture should play a major role in the biogeographic patterns of the ichthyofauna from coastal basins in northeastern Brazil because of the narrow extension of the continental shelf along this region (Thomaz and Knowles 2018). This hypothesis is also attested by the close evolutionary relationships observed for distinct fish groups from Almada and Lower Contas such as Characidae (Barreto et al. 2018) and Cichlidae (Souza et al. 2018). Moreover, the photointepretation of Almada River basin reveals a strong pattern of asymmetric straightness, i.e., most tributaries are found in the left margin along the northern portion and they present a high number of anomalies such as accentuated meandering and several sharp turns (elbows) (Online Resource, ESM 7). These features are compatible with intense geological activity (Christofoletti 1975; Bishop 1995) that could have increased river capture events from Contas into Almada River basin when compared to the nearby southern basins.

The present results revealed a more complex biogeographic pattern than that previously suggested by parsimony analysis of endemicity which placed the ichthyofauna from the Cachoeira and Almada River basins as closely related (Camelier and Zanata 2014), showing the importance of molecular techniques and geological data to infer the evolutionary history of Neotropical fishes. In this sense, caution is advised when the evolutionary relationships of regional ichthyofauna is based only on morphological features and geographic distance among basins, particularly in controversial taxa, such as Hypoptopomatinae and many other Neotropical fish groups. Finally, the high levels of endemism and hidden diversity in Parotocinclus from eastern coastal basins and insufficient knowledge justifies categorizing these drainages as priority areas for freshwater biodiversity conservation before species communities and phylogenetic diversity are eventually lost.