Introduction

Known neotropical freshwater fishes are comprised of more than 5600 species but estimates for the actual total exceed 7000 (Albert and Reis 2011). The substantial amount of neotropical biodiversity that remains unknown is considered a problem to conservation biology and is named taxonomic impediment. Proposed remedies include initiatives from an integrative taxonomy (Dayrat 2005) to “turbo-taxonomy” (Riedel et al. 2013) approaches, combining DNA-based identification, concise morphological description and high-resolution images.

The DNA barcoding initiative was proposed as a standard system for species identification, utilizing 650 base pairs fragment of the mitochondrial gene cytochrome oxidase subunit I (COI) (Hebert et al. 2003). Barcoding has previously been used to flag cryptic species, to accelerate the description of new ones (Hebert et al. 2004; Bickford et al. 2006; Handfield and Handfield 2006; Smith et al. 2006; Bucklin et al. 2007; Asgharian et al. 2011; Baldwin et al. 2011; Melo et al. 2011; Silva et al. 2013) as well as for the identification of fish products (Carvalho et al. 2011, 2015) and icthyoplancton (Becker et al. 2015). However, this monogenic approach has known limitations in identification of species due to hybridization, introgression or low genetic differentiation (Ward 2009). For instance, even in the possession of the entire mitochondrial genome of two recently derived fish species (Nannoperca australis Günther, 1861 and N. obscura (Klunzinger, 1872)), it was infeasible for them to be genetically distinguished (Prosdocimi et al. 2012).

Despite the controversy regarding the application of DNA barcoding in the discovery of species (Brower 2006; Collins and Cruickshank 2013), barcode libraries were successfully developed for the identification of fishes in marine and freshwater environments (Ward et al. 2005; Hubert et al. 2008; Lara et al. 2010; Carvalho et al. 2011; Mabragaña et al. 2011; Nwani et al. 2011; Pereira et al. 2011; Zhang and Hanner 2011; Mejía et al. 2012; Ribeiro et al. 2012; Rosso et al. 2012; Weigt et al. 2012; Zhang and Hanner 2012; Keskin and Atar 2013; Pereira et al. 2013; Gomes et al. 2015). The rate of species discrimination in these studies ranged from 95.0 to 100 %.

Current DNA barcoding studies are constantly flagging potential new species, and guidelines are needed to order and classify species diversity (Gomes et al. 2015). Following this concept, deep genealogical divergent lineages are considered good species or candidate species according to the standards of divergence for the group under study (Padial et al. 2010). Therefore, biodiversity discovery and species identification can benefit from an integrative taxonomy approach, combining morphological characteristics and a standard DNA identification system, such as the DNA barcoding (Ratnasingham and Hebert 2013; Gomes et al. 2015). For instance, even without a hypothesis of population coherence, the single-locus general mixed Yule-coalescent (GMYC) model performed well for both testing and discovering species from large sample sets of Madagascan insects (Monaghan et al. 2009). Other algorithms such as RESL, ABGD, CROP, and jMOTU were subsequently tested for their speed, and effectiveness in recovering species boundaries, with RESL being selected for Barcode Index Number (BIN) system (Ratnasingham and Hebert 2013). These alternative approaches provide a substitute for traditional tree-based methods for preliminary biodiversity screening and species identification.

We used an extensive sample collection throughout the basin and analyzed morphological characteristics, barcode sequences and data available on BOLD (Barcode of Life Database) to perform an integrative taxonomic identification of the fishes from the Jequitinhonha River Basin (JRB), one of the least studied river basins even though it is considered to encompass a large proportion of threatened species in Southeast Brazil (Rosa and Lima 2008). JRB is part of the Coastal Drainages of Eastern Brazil (CDEB), which are a series of isolated hydrographic basins draining the Brazilian Shield. Their present conformation is probably the result of ancient (Gondwanaland breakup) and recent (headwater captures between adjacent basins) tectonic events (Albert and Reis 2011; Camelier and Zanata 2014). Together, they comprise an ecoregion with the highest proportion of endemic freshwater fishes of Brazil (Abell et al. 2008). However the basin’s fish fauna is only partially documented, mainly due to the absence of samplings in the headwaters and low-order streams (Machado et al. 2008).

Currently there are 72 known species for the JRB formally described, but our observations indicate the presence of around 110 native and non-indigenous fish species in the JRB. The unknown ichthyofauna of the JRB enabled the testing of the efficiency of our integrative taxonomy approach for species discovery and identification of cryptic diversity, thus helping to accelerate the detection of hidden biodiversity by flagging potential new candidate species.

Materials and methods

Fish sampling

We sampled fishes from September 2011 to July 2013 at 51 sites along the upper, medium and lower JRB. These sites included the mainstream, major tributaries and low-order streams. We used several fishing gears (i.e., gillnets, seines, sieves, castnets, baited traps and angling) in order to recover the largest number of fish species as possible. Fragments of fresh tissue fin clips, muscle or gill were stored in ethanol (100 %) for molecular analysis. When possible, at least five specimens of each species sampled were selected for DNA sequencing, and if available, from different sampling sites. Vouchers were photographed and geo-referenced.

Morphological taxonomy

Traditional morphological taxonomy was conducted using dichotomous keys and species descriptions based on Eigenmann (1917, 1918, 1921), Géry (1977), Menezes and Géry (1983), Britski et al. (1988, 2012), Oyakawa (1993), Silvergripp (1996), Albert et al. (1999), Oliveira and Oyakawa (1999), Pereira and Reis (2002), Castro and Vari (2004), Triques and Vono (2004), Garavello (2005), Kullander and Ferreira (2006), Reis et al. (2006), Oyakawa and Mattox (2009), and Martins et al. (2014). Scientific names were updated according to Eschmeyer (2016).

Voucher specimens were stored in the Museu de Ciências Naturais da Pontifícia Universidade Católica de Minas Gerias and Museu de Zoologia da Universidade Estadual de Londrina. All collection data including date of capture, collectors, geographic coordinates, and photographs are available on the BOLD (www.bold.org) project “DNA Barcode of fish from the Jequitinhonha River Basin”.

DNA barcoding

DNA was extracted from each sample using one of three methods: Phire Animal Tissue Direct PCR, commercial kit NucleoSpin Tissue, or by salt extraction protocol (adapted from Aljanabi and Martinez 1997). The barcode region was amplified using the primers described in Ward et al. (2005) and Ivanova et al. (2007). PCR consisted of 2.5 µL of 10× PCR buffer, 2.0 µL dNTPs (10 mM), 0.75 µL MgCl2 (50 mM), 0.5 µL of each primer (10 mM), Taq polymerase (5 U/µL), 1.0 µL of DNA and 17.65 µL of ultrapure water in a final volume of 25 µL. Amplification conditions consisted of an initial denaturation step at 95 °C for 2 min, followed by 35 cycles of denaturation at 94 °C for 30 s, annealing of primers at 54 °C for 30 s and extension at 72 °C for 1 min, with a final extension step at 72 °C for 10 min. The PCR reaction using the Animal Tissue Direct PCR Phire Kit followed the manufacturer’s instructions with minor changes to the PCR. The efficiency of PCR amplification was confirmed by visualization of the amplified DNA fragments on 1 % agarose gel.

DNA sequencing was performed using the BigDye™ Terminator v.3.1 (Applied Biosystems) commercial kit in a reaction with a 10.0 µL final volume that contained: 1.0 µL PCR product, 1.0 µL of primer (10 µM), 1.0 µL of Big Dye Terminator, 1.5 µL of 5× buffer and 5.5 µL of ultrapure water. The reaction was carried out in a thermocycler Veriti™ (Life Biosystems) according to the following conditions: 2 min initial denaturation at 96 °C followed by 35 cycles of denaturation at 96 °C for 30 s, annealing at 50 °C for 15 s, and extension at 60 °C for 4 min. DNA sequences were generated in the ABI3500 automatic sequencer following the manufacturer’s instructions.

Data analysis

The DNA sequences were edited in SeqScape v.1.0 software, which was also used to obtain the consensus sequences. COI sequences were uploaded to the Barcode of Life Database platform (BOLD) which was used to estimate standard DNA barcode statistics such as the genetic distance analysis of Nearest Neighbor Distance (NND), Barcode Index Number (BIN) as well as the intra and inter genetic distances between species, genera and families. The Nearest Neighbor Distance (NND) analysis estimates the minimum genetic distance between pairs of species. The Barcode Index Number (BIN) uses the algorithm RESL to link the correspondence between groups of specimens identified to a species through prior taxonomic work to those inferred from the analysis of COI sequence variation. The BIN analysis revealed clusters classified as discordant (clusters that include more than one species) and concordant (clusters constituted of one species) and singletons (a single sequence reported).

The intra and inter genetic distances were calculated using the nucleotide evolution model Kimura 2-Parameter (K2P) (Kimura 1980). The MEGA software v.5 (Tamura et al. 2011) was used for distance calculations as well as the construction of dendrograms obtained by the Neighbor-Joining method (NJ) (Saitou and Nei 1987).

Delimitation of cryptic and candidate species

We followed the approach of Padial et al. (2010) and Gomes et al. (2015) to identify candidate and cryptic species within morpho-species identified using traditional morphological taxonomic keys. In brief, potential candidate species were flagged if meeting the following two criteria: (1) conconcordant BIN cluster and (2) nearest neighbor distance (NND) higher than 2 %. Cryptic species were classified as possessing an intraspecific divergence higher than 2 % and no differing morphological characteristics between specimens a priori. All candidate species were named using a combination of the binomial species name of the most similar or closely related nominal species, followed (in square brackets) by “Ca” (for candidate) plus a concordant attached BIN number. All indices were automatically estimated in BOLD Workbench (www.boldsystems.org).

Results

A total of 260 DNA barcodes were generated for a total of 52 morpho-species composed of 31 species identified a priori through traditional morphology-based data, and 21 undescribed species identified to the genus level, representing over 49 % of the estimated fish fauna for the JRB. An average of 5.07 specimens per species was analyzed, ranging from 1 to 13 specimens per species. The barcodes showed no insertions, deletions or stop-codons. In some cases, good quality sequences featuring stop-codons were observed and removed from the analysis, since they may represent pseudogenes. The average size of the sequences obtained was 630 bp and the observed frequency of each nucleotide was: G (15.26 %), C (23.38 %) A (21.4 %) and T (25.86 %).

We observed an increase in the genetic distance (K2P) between species, genera and families of 0.44, 12.16 and 20.58 %, respectively (Table 1). The average intraspecific genetic distance was approximately 27 times lower than the interspecific congeneric divergence. All species were grouped as monophyletic clades in the NJ tree (Fig. 1). No shared haplotypes between species were observed and 91.3 % of all species presented intraspecific divergence values lower than 2 %, allowing their differentiation using the 2 % heuristic intraspecific cutoff value. However, the Nearest Neighbor Distance (NND) analysis revealed two pairs of species showing K2P genetic divergence values lower than 2 %: Prochilodus hartii Steindachner, 1875 versus P. argenteus Spix & Agassiz, 1829 (1.09 %) and P. costatus Valenciennes, 1850 versus P. argenteus (1.72 %). Yet, these species were grouped in distinct NJ clades and BINs, which allowed their correct identification through DNA barcoding (Fig. 1).

Table 1 Observed genetic distance (K2P) within species, genera and families (SE = standard error)
Fig. 1
figure 1

Neighbor-joining (K2P) tree of 52 species from the JRB and photos of fish depicting cases of deep intraspecific distances and new species

The Barcode Index Number analysis revealed 53 clusters (Table S1, supplementary material), composed of 20 BINs classified as discordants (clusters that include more than one species) and 27 concordants (clusters constituted of one species). Six BINs with a single sequence or singletons were reported for: Lycengraulis grossidens (Spix & Agassiz, 1829) (BIN ACG7769), Steindachneridion amblyurum (Eigenmann & Eigenmann, 1888) (BIN ACK6736), Steindachnerina elegans (Steindachner, 1875) (BIN ACB9900), Trichomycterus jequitinhonhae Triques and Vono, 2004 (BIN ACG7527), Trichomycterus sp.1 (ACE1805) and Trichomycterus sp.2 (BIN ACF7493). Discordant BINS were related to species complexes (e.g. Astyanax) and erroneous entries present in the database, such as sequences of Arthropoda included within the Phalloceros BIN.

Within Pimelodella, two clades with high intraspecific genetic divergence (10.2–11.6 %) were observed (Fig. 1), consistent with two distinct BIN clusters (ACG8222 and ACG8223). Secondary morphological analysis supported the genetic differentiation observed after splitting specimens to their respective phylogenetic clade. Since no phylogenetic information is available for Pimelodella, we suggested the identification of the two new cryptic candidate species as Pimelodella sp.1 [Ca ACG8223] and Pimelodella sp.2 [Ca ACG8222].

New candidate species supported by DNA barcoding and morphology

From the twenty-one undescribed species, including specimens identified to the genus level (sp.), and related with particles “aff.” and “gr.”, fifteen new candidate species were detected since they possessed concordant BINs and genetic p-distances greater than 2 % from the nearest neighbor (Table 2). Astyanax aff. fasciatus (Cuvier 1819), Astyanax aff. lacustris (Lütken 1875), Hypostomus sp.1 and Hypostomus sp.2 were not considered new candidate species due to BIN clusters composed of many other nominal species. Hoplias gr. malabaricus (Bloch 1794), Phalloceros sp. and Rhamdia aff. quelen (Quoy & Gaimard 1824) had a genetic p-distance lower than 2 % from the nearest neighbor and therefore, were not classified as candidate species (Table 2).

Table 2 List of all twenty-one undescribed and fifteen new candidate species from the JRB

Within R. aff. quelen, two clades with intraspecific divergence values reaching 4.47 % were observed, corresponding to two distinct BINs. One concordant BIN (ACF6906) was considered endemic, since it was constituted of two specimens restricted to the JRB. The other discordant BIN (AAA6323) clustered four specimens from JRB and 32 specimens previously deposited in BOLD belonging to other river basins. We nominated the new endemic candidate species as R. aff. quelen [Ca ACF6906].

Three Anostomidae species are currently known from the JRB: Leporinus elongatus Valenciennes, 1850, L. steindachneri Eigenmann, 1907 and Hypomasticus garmani (Borodin, 1929). We sampled a fourth species, nominated here as Leporinus sp. [Ca ABY2741] (Fig. 1). This new candidate small sized Leporinus species is characterized by having a terminal mouth, three round dark blotches on the lateral line, surrounded by many small blotches above and below the lateral line. These patterns distinguished it from H. garmani, by having an inferior mouth. Leporinus sp. [Ca ABY2741] also has three teeth on each premaxilla and four on each dentary plate (dental formula 3/4), distinguishing it from L. elongatus (dental formula 3/3) and L. steindachneri (dental formula 4/4).

Regarding Characidium, only one mopho-species, not yet described, is reported for the JRB (Bizerril and Lima 2005). However, the high intraspecific distance (21 %) between two sympatric morphotypes was correlated with the presence (Characidium sp.1 [Ca ACE1586]) and absence (Characidium sp.2 [Ca ACB9964]) of scales on the isthmus, therefore, allowing their discrimination and their nomination as two distinct candidate species (Fig. 1).

The rare and endangered species Rhamdia jequitinhonha Silvergripp (1996) represents an important barcode record for the JRB ichthyofauna. Since its description by Silfvergripp (1996), only three individuals were known from fish collections (Machado et al. 2008). Here, we have analyzed three more specimens possessing one haplotype, which represent a unique and consistent BIN (ACF9378). When comparing the R. jequitinhnha haplotype to R. aff. quelen haplotypes, a high divergence that ranged from 16.3 to 18.4 % was observed.

Discussion

The integrative approach combining extensive sample collection, DNA barcoding and morphology-based assessment supported fifteen new candidate species, including two cryptic species from 21 undescribed species belonging to the JRB (Table 2). Seven unidentified species, which were only identified to the genus level, were not considered new candidate species due to BIN clusters composed of many other nominal species (Astyanax aff. fasciatus, A. aff. lacustris, Hypostomus sp.1 and Hypostomus sp.2) and due to a genetic distance lower than 2 % from the nearest neighbor (e.g., Hoplias gr. malabaricus, Phalloceros sp., Rhamdia aff. quelen).

The integration of traditional morphological taxonomy and genetic indexes (BIN, NND and the intraspecific distance) allowed the identification of candidate species within Rhamdia, Characidium, Hypostomus, Pareiorhaphis and Pimelodella genera, showing that DNA barcodes are useful to rapidly flag cryptic and candidate species. However, DNA barcode libraries are not yet complete for neotropical fish species, jeopardizing the nomination of species only based on DNA barcodes, using the BIN and NND tools, implemented on the BOLD platform.

The value of the mean intraspecific divergence (0.44 %) for the JRB fishes was similar to those observed for other neotropical basins: 0.13, 0.5 and 1.30 % for the Paraíba do Sul (Pereira et al. 2011), São Francisco (Carvalho et al. 2011) and Upper Paraná River Basins (Pereira et al. 2013), respectively. The 2 % heuristic intraspecific cutoff value correctly discriminated 92 % of all species from the JRB. However, species with interspecific values lower than 2 % were flagged by NND analysis as problematic, such as Prochilodus argenteus versus P. hartii and P. argenteus versus P costatus (intraspecific distances of 1.09 and 1.72 %, respectively). Although considered low intraspecific values, in all cases it was possible to correctly discriminate these species due to their separation into distinct monophyletic clades with no haplotype sharing and the clustering of specimens in distinct BINs. Low interspecific divergence values have been reported before for P. argenteus versus P. costatus (1.4 %) (Carvalho et al. 2011; Pereira et al. 2011).

Newly recorded species for the JRB were barcoded, including Leporinus taeniatus Lütken, 1875, previously known to occur only in the São Francisco River Basin and costal river drainages of Bahia State. This species was exclusively collected in an area subjected to events of stream capture from São Francisco River Basin (Saadi 1995). Furthermore, the divergence among specimens of L. taeniatus from the JRB and São Francisco River Basins was low (0–0.4 %), suggesting that L. taeniatus might be considered an introduced species in the JRB.

One case of a cryptic species was found within Pimelodella, since two distinct BINs consistent with two clusters possessing deep intraspecific distance were recovered (over 10 % of divergence). However, after separating specimens into their two respective clades, morphological divergence was detected (results not shown). In spite of this, we classified this case as a cryptic species, because its morphological differentiation was possible only after our integrative approach using DNA barcodes.

The integrative taxonomic approach, combining traditional morphology-based assessment and DNA barcoding, resulted in the comprehensive identification of the JRB fish fauna, comprising common, rare, introduced, endangered and undescribed species. Therefore, despite severe criticism of the monogenic DNA barcoding approach for species identification and discovery (Brower 2006; Collins and Cruickshank 2013), our integrative taxonomy approach may be a remedy to the fish taxonomy impediment in the neotropics, accelerating species identification by flagging cryptic and candidate species. The correct estimation of the neotropical fish biodiversity is important to adequately conserve the megadiverse ichthyofauna found in this highly impacted region.