Introduction

The Neotropics harbor approximately 4,500 species of freshwater fishes (Reis et al., 2003, 2016); of those over 2,000 are found in the Amazon Basin of which 45% are endemic, making the Amazon Basin the most species rich basin in the world. The Amazon Basin itself is traditionally divided into eastern, central and western basins (Hoorn et al., 2010). The eastern basin drains the Precambrian crystalline craton divided into the Guyana Shield to the north and the Brazilian Shield to the south of the Amazon River, respectively. The western and central basins are highly sedimented and are characterized by extensive floodplains, with the western basin being bordered by the Andean foothills. Amazonian ichthyofauna is similarly divided. The Guyana and Brazilian Shield species tend to have restricted distributions and largely do not share species with each other or with the central and western Amazon Basins, while fishes occurring in the central and western Amazon Basins tend to be broadly distributed (Hubert & Renno, 2006), except for the Andean piedmont which contains a separate and still poorly known fauna. Despite the large number of described species, fishes are some of the taxonomically least studied groups. In fact Reis et al. (2016) estimate that 34–42% of Neotropical freshwater fishes remain undescribed, and the majority of these species are concentrated in the Amazon Basin. The large number of undescribed species in the Amazon Basin is directly correlated with the vastness of the basin—over 5 million km2; inaccessibility of much of the area, principally in regions upstream of geological barriers such as rapids; and insufficient museum material representative of widespread taxa. Finally, cryptic or pseudocryptic (morphological differences apparent but overlooked) species seem to be common as well, and their—at least initial—identification is almost entirely with genetic data.

For the extraordinary species diversity, Amazonian fishes have received comparatively little attention from the molecular biodiversity community, with most studies focusing on phylogeography or phylogeny (e.g., Sivasundar et al., 2001; Hubert et al., 2007; Willis et al., 2007; Farias & Hrbek, 2008; Farias et al., 2010). Molecular biodiversity studies focusing on DNA barcoding of Amazonian fish fauna have begun to be published only recently, focusing on the two recognized species of the cichlid genus Astronotus (Colatreli et al., 2012), 10 species of the rosy tetras of the characid genus Hyphessobrycon (Castro Paz et al., 2014) and 14 species of pencilfishes of the characid genus Nannostomus (Benzaquem et al., 2015). A first truly comprehensive basin-wide molecular biodiversity study investigated the piranhas and pacus of the family Serrasalmidae (Machado et al., 2018), a charismatic and well-studied fish group when compared to the rest of Neotropical ichthyofauna. The serrasalmid study revealed that within the 68 a priori identified species—60 valid nominal species plus an additional eight easily distinguishable morphotypes representing undescribed species—up to 23 were represented by more than one lineage with a total of 118 species/lineages identified. Ichthyofaunal diversity appears to be underestimated by 73.5% in the serrasalmid study and by 78.6% in the pencilfish study. This would appear to be broadly concordant with the estimates of Reis et al. (2016) of 34–42% of Neotropical freshwater fishes concentrated primarily in the Amazon Basin that are yet to be described, rarefaction analyses of species descriptions based on predominantly morphological evidence, and the broad faunal trends indicating a 19% rate of paraphyly across all vertebrates (Ross, 2014). Cryptic diversity, however, appears to be substantially lower in other South American basins, such as the Upper Parana (5.5%—Pereira et al., 2013), São Francisco (8.9%—de Carvalho et al., 2011) and Parnaiba do Sul (5.2%—Pereira et al., 2011) and coastal drainages of the São Paulo and Rio de Janeiro States (11.2–22.5%—Henriques et al., 2015).

In contrast to their African sister clade, Neotropical cichlids have diversified almost entirely in riverine environments filling a wide array of ecological roles. To date, the Neotropical Cichlidae comprise 570 species of which 108 were described in the last 10 years (Eschmeyer & Fong, 2018). Over 50% of these species occur in the Amazon Basin (Kullander, 2003). In spite of their comparatively lower taxonomic diversity in the Neotropics, cichlids are a conspicuous component of Amazonian ichthyofauna. They comprise broadly distributed species such as the filter feeders of the genera Chaetobranchus and Chaetobranchopsis, the sand sifters of the genera Acarichthys, Satanoperca and Geophagus, the laterally compressed, flooded vegetation specialist of the genera Pterophyllum, Mesonauta and Symphysodon, the hyperpredatory species of the genus Cichla, the diminutive leaf litter specialist species of the genera Apistogramma, Dicrossus and Crenicara, as well as narrow rheophilic endemics of the genera Retroculus, Teleocichla and Crenicichla. Cichlids are also popular aquarium fishes, and thus, many more phenotypes than described species are known. Despite general interest in the group by both scientists and hobbyists, taxonomy of many Amazonian cichlid species and species groups is complex and remains poorly resolved (e.g., Ready et al., 2006; Bleher et al., 2007; Farias & Hrbek, 2008; Amado et al., 2011; Colatreli et al., 2012; Willis et al., 2012a, b; Tougard et al., 2017).

Therefore, for the reasons elaborated above, we aimed to evaluate the molecular diversity of cichlids sampled from six localities in the Amazon Basin. Three of the localities, Tapajós, Xingu and Araguaia Rivers drain the Brazilian Shield and are classified as clear water rivers. Two of the localities, Negro and Jatapu Rivers, drain the Guyana Shield and are black water rivers. The last locality, the Purus River, drains the western Amazonian sedimentary basin and is classified as white water river (Sioli, 1984; Venticinque et al., 2016).

Materials and methods

Sampling

Cichlids were collected in six major rivers of the Amazon Basin (Fig. 1) under a license from IBAMA/SISBIO No. 11325-1, and access to genetic resources was authorized by Permit No. 034/2005/IBAMA. Field collection permits are conditional that collection of organisms is undertaken in accordance with the ethical recommendations of the Conselho Federal de Biologia (CFBio; Federal Council of Biologists), Resolution 301 (December 8, 2012).

Fig. 1
figure 1

Map of collecting localities. *Note water-type classification of Sioli (1984) and Venticinque et al. (2016)

From each sampled specimen, we took as small sample of tissue from the right-hand caudal peduncle or the right pectoral fin. The tissues were stored in 95% ethanol until processed in the laboratory; the vouchers were fixed with 10% formalin and after fixation maintained in 70% ethanol. All tissues were deposited in the CTGA tissue collection of the Federal University of Amazonas (UFAM), and the vouchers were deposited in the ichthyological collection of the National Institute of Amazonian Research (INPA).

Vouchers were identified to species by taxonomic specialists using available comparative material, identification keys, original descriptions, and redescriptions of species. Individuals that could not be identified to species level were reported as “Genus sp.” (possible new/unidentified species), “Genus gr. species” (possible new/unidentified species, member of a particular species group), “Genus aff. species” (closely related species, possibly new), or “Genus cf. species” (a species that conforms to diagnosis but occurs outside its supposed distribution, possibly new). For summary statistics, “Genus aff. species” and “Genus cf. species” were considered equivalent to “Genus species”. Some of these taxa are known from the aquarium literature and thus were attributed trade name epithets.

DNA extraction, amplification and sequencing

Genomic DNA was isolated by the phenol–chloroform method (Sambrook & Russell, 2001). The mitochondrial cytochrome c oxidase subunit 1 (COI) gene was amplified via polymerase chain reaction (PCR) using primers from Colatreli et al. (2012) in a final volume of 15 μl per sample containing: 6.8 μl of ultrapure H2O, 1.5 μl MgCl2 (25 mM), 1.5 μl dNTPs (10 mM), 1.5 μl 10 × buffer (100 mM Tris–HCl, 500 mM KCl), 1.2 μl of each primer (2 μM) (see Table S2); 0.3 μl Taq DNA polymerase (1 U/μl) and 1 μl of DNA (concentration ranging from 30 to 60 ng/μl) in Veriti® Thermal Cycler thermocycler (Applied Biosystems). The PCR conditions were one cycle of denaturation at 94°C for 2 min, 35 cycles of denaturation at 93°C for 10 s, annealing at 50°C for 35 s and extension at 68°C for 90 s and a final extension cycle of 68°C for 7 min. The quality of the amplified product was evaluated on 1% agarose gel.

The PCR products were purified with exonuclease and alkaline phosphatase enzymes (Werle et al., 1994) and subjected to sequencing reactions using the Big Dye Terminator Cycle Sequencing v3.1 kit (ThermoFisher) in a final volume of 10 μl per sample containing 4.4 μl of ultrapure water, 1.3 μl of 5 × sequencing buffer, 2.0 μl of forward primer (M13-21) (2 mM) or reverse (M13-27) (2 mM), 0.3 μl Big Dye (Applied Biosystems, Foster City, CA, USA) and 2 μl of purified DNA. The resulting products of this reaction were precipitated with 100% ethanol and EDTA (125 mM) and then resuspended in 10 μl Hi-Di formamide, and sequences were then determined by 3130xl ABI automated sequencer (ThermoFisher).

Data analysis

The forward and reverse sequences were merged, aligned, edited and translated into putative amino acid sequences in Geneious v7.0.6 (Drummond et al., 2014). Sequences were then subjected to BLAST (Altschul et al., 1990) for verification and comparison of the sequenced region with sequences available in GenBank. Subsequently, the sequences were aligned using MAFFT v7.307 (Katoh & Standley, 2013) and then checked manually for insertions, deletions and stop codons using in silico translation to amino acids in Geneious. The alignment was then trimmed to 435 bp to minimize missing data and to remove potentially erroneous bases at the extremities of the contigs. The original dataset was reduced to a new dataset containing only unique haplotypes using the function hapCollapse (http://github.com/legalLab/protocols-scripts).

The best fitting model of molecular evolution (TN93+Γ) was inferred in jModelTest2 (Darriba et al., 2012). Alignments were subjected to Bayesian phylogenetic inference using the software BEAST2 (Bouckaert et al., 2014) under default parameters. We ran two independent runs, generating 20 million topologies in each replicate. After checking for stationarity and convergence of the two chains in TRACER (Rambaut et al., 2013), we combined the two runs, subsampled and burned-in the topologies to produce a final dataset of 3,000 topologies which were used to produce a maximum credibility tree in TREEANNOTATOR (Bouckaert et al., 2014). Species delimitation was performed using GMYC (general mixed Yule coalescent model; Fujisawa & Barraclough, 2013) and bGMYC (the Bayesian implementation of this model; Reid & Carstens, 2012) using the ultrametric topology generated in TREEANNOTATOR, and mPTP (the Poisson tree process method; Kapli et al., 2017) using a maximum likelihood phylogram optimized with the optim.pml function of phangorn (Schliep, 2011). Finally, we also calculated p-distances between all sister taxa and delimited species using the locMin and tclust functions, a distance threshold optimization and clustering approach of SPIDER (Brown et al., 2012). All analyses were carried out using the packages splits_1.0-19 (Fujisawa & Barraclough, 2013), spider_1.5.0 (Brown et al., 2012), phangorn_2.3.1 (Schliep, 2011) and ape_5.0 (Paradis et al., 2004) implemented in the R statistical software v3.4.1 (R Development Core Team, 2011), and in the stand alone software mptp_0.2.3 (Kapli et al., 2017). Results were visualized using ggtree (Yu et al., 2017), also implemented in R.

Results

We obtained 230 partial COI sequences (435 bp) of 56 morpho-species, 34 nominal and 22 undescribed, in 18 cichlid genera. The number of specimens per species varied from 1 to 16 with an average of 4.1 specimens per species (Table 1). There were no unexpected stop codons, insertions or deletions in sequences; 232 sites were variable. All sequences were deposited in GenBank under accession numbers MH931536–MH931765 and in BOLD under the Project ID "CCHLD".

Table 1 Dataset statistics broken down per species (species assigned from morphological assessment and including valid nominal species and putatively undescribed species), including (from left to right): individual count, number of haplotypes, maximum intraspecific divergence (p-distance), minimum interspecific divergence (p-distance), monophyly, and number of delimited clusters by method (mPTP, locMin, bGMYC, GMYC)

The four species delimitation analyses were concordant in delimiting 53 (mPTP), 55 (bGMYC), 56 (locMin) and 57 (GMYC) species/lineages (Fig. 2). Species with multiple lineages delimited by all four methods were: Aequidens pallidus (Heckel, 1840), Apistogramma agassizii (Steindachner, 1875), Biotodoma cupido (Heckel, 1840), Caquetaia spectabilis (Steindachner, 1875) and Crenicichla cf. cincta Regan, 1905. Multiple lineages were delimited in Geophagus proximus (Castelnau, 1855) and G. altifrons Heckel, 1840 by locMin, bGMYC and GMYC, in Satanoperca jurupari (Heckel, 1840) by locMin, mPTP and GMYC, and in Crenicichla regani Ploeg, 1989 by locMin and GMYC.

Fig. 2
figure 2

Maximum clade credibility chronogram from 1,000 posterior trees generated using BEAST. Dataset comprised 137 unique haplotypes (from total of 230). Bayesian posterior probabilities above 0.95 are shown as dark nodes. Species delimitations are shown by method as colored boxes; due the large number of unique colors, some may appear similar

None of the four single-locus species delimitation methods were able to delimit Teleocichla preta Vallera, Zuanon, Kullander & López-Fernández, 2016 and Teleocichla cf. gephyrogramma Kullander, 1988 which occur sympatrically in the rapids of the lower Xingu River (Fig. 1), Cichla melaniae Kullander & Ferreira, 2006 and C. temensis Humboldt, 1821 from the Xingu and Jatapu Rivers (Fig. 1), and A. pallidus (Heckel, 1840) and Aequidens cf. mauensanus Kullander, 1997 from the Jatapu and Purus Rivers (Fig. 1). The minimum interspecific distance between the two species of Teleocichla was 1.1 and 1.8% between the two species of Cichla; minimum distance between the Aequidens species was 1.6%.

Maximum observed intraspecific distance was 11.0% (B. cupido). Minimum observed interspecific distance was 0.2% (G. altifrons and G. proximus Jatapu). When only those taxa that were delimited were included, minimum interspecific distances were greater than 4.4% (Table 1).

Discussion

Broad scale species delimitation studies, whether single- or multi-locus, of tropical freshwater ichthyofauna are practically nonexistent. Yet freshwater aquatic fauna and freshwater aquatic habitats are some of the most endangered biomes on this planet (Dudgeon et al., 2006). Freshwater aquatic habitats are also some of the most biodiverse biomes, harboring ~ 50% of teleost species (Eschmeyer, 1998) in 0.007% of aquatic habitat—rivers, lakes and marshes/swamps (Shiklomanov, 1993). Assessment of the diversity is therefore vital if we want to conserve this biodiversity and the evolutionary and ecological processes that generate and maintain this biodiversity.

As has already been observed in many other DNA barcoding and molecular biodiversity studies, the number of delimited species/lineages generally exceeds the number of nominal taxa or even morpho-species analyzed in the study. Our results indicate the same trend; while we analyzed 56 morpho-species, 34 of which were nominal taxa, the species delimitation analyses indicated between 53 and 57 species/lineages. However, there is not a strict concordance between morpho-species and delimited species/lineages. Assuming that most morpho-species are species (e.g., species of Cichla) and delimited lineages within, for example A. agassizii, also are species, the total number of species is between 62 and 63. This, in principle, would represent a 11–13% increase in the number of species/lineages, a low number even when comparing with the well-known North American US and Canadian faunas (18% in April et al., 2011). In light of other studies of Amazonian fishes, this also is a low percentage but not especially surprising given that relatively small number of localities and river basins were examined, and many of the studied species have broad geographic distributions. Additionally, it should be noted that the examined COI fragment is shorter that most other DNA barcoding studies, and this is known to underestimate diversity (Pérez-Miranda et al., 2018). The rate of non-monophyly in our data was 3.5%. This is a smaller percentage than estimated for Actinopterygii (23%) and Perciformes (22%) in an analysis of all barcode data deposited in BOLD (Ross, 2014) and again is likely at least in part explained by our sampling design.

Although there are only few studies focusing on Neotropical clades, in a study of the taxonomically notoriously complex genus Astyanax distributed from southwestern USA to Argentina, Rossini et al. (2016) delimited 124–156 species/lineages from 116 species identified morphologically. While there was a high concordance between the species/lineages delimited and morpho-species, there were also many cases where different morpho-species shared haplotypes or were not delimited as distinct species/lineages. At the same time, Astyanax bimaculatus (Linnaeus, 1758) alone was delimited in 24 distinct species/lineages. Similarly Castro Paz et al. (2014) observed that three of the 10 nominal species of the “bleeding heart” tetras of the genus Hyphessobrycon were paraphyletic, but at the same time four species shared haplotypes or were not delimited as such in the study. Benzaquem et al. (2015) working with 14 Amazonian species of Nannostomus delimited 25 species/lineages. These additional species/lineages were concentrated in just four broadly distributed species occupying distinct Amazonian drainages.

Multiple deeply divergent lineages within the same morpho-species structured by river basin appear to be general phenomenon observed in Amazonian fishes. Not only are all instances of multiple species/lineages of cichlids identified in this study, and the study of Colatreli et al. (2012), structured by river basin, a recent study of the serrasalmids—the carnivorous piranhas and the vegetarian pacus—shows the same pattern (Machado et al., 2018). Structuring patterns are also geographically nested. Rivers draining the same geological formation, the Guyana Shield, the Brazilian Shield or the Andes (the western sedimentary Amazon Basin) tend to share species/lineages if sharing is observed. If species/lineage sharing is observed across geological formations, it tends to be between the Guyana and Brazilian Shields, probably since both are of similar age, and have only become effectively separated from each with the formation of the Amazon River starting in the Miocene (Hoorn et al., 2010). Their waters are of similar physicochemical composition—essentially sediment and mineral free water which in the case of Guyana Shield rivers tends to be high in humic acids while Brazilian Shield rivers tend to be low humic acid. This contrasts with rivers of the western sedimentary Amazon Basin and the main channel of the Amazon which are nutrient rich, sediment laden and of near neutral pH.

Lineage structuring is not restricted solely at the level of different basins. For examples, ecologically driven lineage diversification at a small spatial scale is observed in the Crenicichla mandelburgeri Kullander, 2009 complex of the Middle Parana River (Piálek et al., 2012; Burress et al., 2018). Endemic rheophilic cichlids are also structured as distinct lineages occupying different sections of the Araguaia (Hrbek et al., 2018) and Congo (Markert et al., 2010) rivers.

Not all cichlid morpho-species were delimited in this study, however. The species C. melaniae and C. temensis, and T. preta and Teleocichla cf. gephyrogramma were not delimited as distinct by any of the four single-locus species-discovery (SLSD) methods, while only three of the four methods delimited C. melaniae, C. piquiti and C. temensis and G. proximus and G. altifrons as distinct species. The species of Cichla shared haplotypes, while distinct clades and species of Geophagus occupied different river basins, but were not delimited as distinct species/lineages. The inability of the SLSD methods to delimit all species of Cichla is not surprising (Willis et al., 2012b; Willis, 2017). The group has recently diversified, and species that diverged following the colonization of upstream habitats from downstream habitats tend to be phylogenetically nested within their downstream congeners (Willis et al., 2007). Hybridization also plays a role in the evolutionary history of this group (Willis et al., 2012b, 2014), as well as in the genus Symphysodon (Farias & Hrbek, 2008; Amado et al., 2011) and probably other Amazonian cichlids as well as those outside the Amazon Basin (Crenicichla in Burress et al., 2018). However, species of Cichla are morphologically distinct and diagnosable, although species-level differences are entirely in external morphology, color pattern and body color (Kullander & Ferreira, 2006).

Although none of the SLSD methods separated T. preta from Teleocichla cf. gephyrogramma, the two species are morphologically distinct. Teleocichla preta is clearly distinguishable from all other species of Teleocichla by size (larger than the other species of this genus), black body color, faint vertical bars or zig-zags, depth of body and caudal peduncle, and the presence molariform teeth on the lower pharyngeal plate, and specifically from Teleocichla cf. gephyrogramma by the absence of a caudal fin blotch (Varella et al., 2016). It is unclear why, in light of clear morphological and probably ecological differences, lack of monophyly was observed. However, introgressive hybridization or accelerated adaptive diversification driven by niche partitioning of rheophilic habitats, as has been observed in other rheophiles (Roxo et al., 2017), is a potential explanation.

Hybridization, as well as recent divergences will result in sharing of haplotypes or non-monophyly of species. All SLSD methods are based on the assumption of species-level monophyly, and the time to monophyly under the neutral model is, on average, 1 Ne for a single locus, where Ne is the effective population size of the organism (Hudson & Coyne, 2002). Therefore, species with a large effective population size which in itself is correlated with the area of available habitat will be non-monophyletic even if they are evolutionary lineages sensu de Queiroz (2007) and diagnosable using other characters. It is also to be expected that many, recently diverged species, even if monophyletic, will not be delimited by the SLSD methods because patterns of intra- and interspecific coalescent are not yet distinguishable due to the recency of the divergence of the taxa. The “recency” of divergence again is a function of effective population size. Therefore, recently diverged species with large effective population sizes often will remain undetected using SLSD methods and other complementary analyses are necessary to detect them.

For all these reasons, species/lineage diversity of Amazonian cichlids is significantly underestimated, but it reflects the general phenomenon prevalent across all Amazonian ichthyofauna.

Implications for conservation

The Amazonian Basin is supporting ever larger human populations, and its natural resources are being exploited for the global commodities market. Its natural vegetation is being converted to farmland and rangeland, and its rivers are being impounded for the production of hydroelectric power and for transportation. It is estimated that up to 55% of Brazil’s hydroelectric potential still awaits exploitations (International Energy Agency, 2013), and nearly all of this potential is within the Amazon Basin (Lees et al., 2016; Latrubesse et al., 2017). Several large dams, such as Tucuruí, Balbina, Santo António, Jirau and the recently completed Belo Monte, have already been implemented in the Brazilian Amazon. An additional 200+ dams, such as the Tapajós Hydroelectric Complex, have been proposed by South American governments (Finer & Jenkins, 2012; Castello et al., 2013; Lees et al., 2016). If these plans were to be enacted, only three Amazon tributaries would remain unimpounded (Castello & Macedo, 2016).

Although hydroelectric projects have been lauded as cheap and clean energy alternatives, dam construction and operation result in substantial environmental stresses including the destruction of rheophilic habitats (Clausen & York, 2008; Castello & Macedo, 2016; Pelicice et al., 2017) and their associated ichthyofauna (Agostinho et al., 2008). Lotic habitats are turned into lentic habitats, with large changes in species composition and community structure of the modified regions. These modified habitats are susceptible to invasion by non-local species which often are introduced intentionally to promote commercial fisheries (Akama, 2017). As a consequence, biodiversity and ecosystem services are lost even before we have a notion of the standing biodiversity.

By depriving ourselves of this biodiversity knowledge, we also deprive ourself of understanding the processes that generated and maintained this biodiversity, and consequently we deprive ourselves of the ability to mitigate this biodiversity loss or restore evolutionary and ecological processes.

As mentioned earlier, freshwater aquatic habitats are just 0.007% of earth’s water, while oceans contain 96.5% of total global water (Shiklomanov, 1993). Freshwater is further unequally divided between habitats capable of supporting ichthyofauna; rivers comprise 2.1% of available freshwater habitat, marshes/swamps 11.8% and lakes the remaining 87.1% (Shiklomanov, 1993). The East African rift lakes contain 29% of all water held in lakes, while the Amazon River Basin holds 20% of global river water (Moura et al., 2016). In this light, Amazonian cichlid diversity is remarkable. The 570 species of Neotropical cichlids, many of which are from the Amazon Basin, are found in just 1.7% of the freshwater aquatic habitat in which the ~ 2,000 species of the East African rift lake cichlids evolved.