Introduction

Cichlid fish exhibit a high diversity of morphology, coloration and behavior (Kullander, 2003). The family is subdivided in four subfamilies: Etroplinae, which is present in India and Madagascar, Ptychochrominae, which is native to Madagascar, Cichlinae, from the Neotropical region, and Pseudocrenilabrinae, which can be found in Africa and the Middle East. Collectively, cichlids form a monophyletic group that is well supported by morphological (Kaufman & Liem, 1982; Stiassny, 1987) and molecular data (Friedman et al., 2013). Molecular clock analyses have suggested that the separation of African and Neotropical cichlids occurred much later (65 MYA) than the geological separation of the African and South American continents (> 100 MYA) during the fragmentation of Gondwana (Matschiner et al., 2017). Thus, an inter-oceanic dispersal event has been proposed as the main cause of the current cichlid distributions. However, other analyses have suggested that the separation between Neotropical and African lineages is much older (~ 100 MYA), in line with the geological evidence (Irisarri et al., 2018).

In many animals, visual systems have been subject to strong evolutionary pressures related to the characteristics of their habitats (Osorio & Vorobyev, 2008; Carleton et al., 2020). In fishes, the photic conditions of the water column have been specifically implicated (Kroger, 2003), and in cichlid fishes, the light environment has been suggested to interact with several behaviors, including intraspecific recognition, feeding, and habitat selection (Parry et al., 2005; Carleton, 2009; Miyagi et al., 2012). The cichlid family has become an important model for evolutionary study of the visual systems, and there has been particularly strong focus on the suite of opsin genes responsible for light absorbing visual pigments (Carleton, 2009).

Visual pigments consist of retinal chromophores (vitamin A aldehydes) that are bound to visual opsin proteins, which absorb a specific wavelength of the light spectrum (Bowmaker, 1995). In cichlids, eight opsin genes are responsible for light absorption, and retinal chromophore usage can shift between A1- (11-cis-retinal) and A2- (11-cis-3, 4 dehydroretinal) derived chromophores (Hárosi 1994; Terai et al., 2017; Escobar-Camacho et al., 2019). Seven of these genes encode for cone opsins, which are responsible for color vision, and absorb wavelengths of light ranging from ultraviolet to red. One further gene encodes a rod opsin, a visual pigment responsible for scotopic vision (Carleton, 2009; Carleton et al., 2010; Escobar-Camacho et al., 2017; Torres-Dowdall et al., 2017).

Several studies have explored the molecular evolution of visual opsin genes within the Neotropical cichlids (Weadick et al., 2012; Escobar-Camacho et al., 2017, 2019; Hauser et al., 2017; Härer et al., 2018), and included comparisons with those of African species (Schott et al., 2014; Torres-Dowdall et al., 2015; Fabrin et al., 2017). These comparisons are interesting because of diversity of habitats occupied by Neotropical species enables novel investigations of the links between the photic environment and the evolution of opsin genes. They also allow further evaluation of the generality of evolutionary mechanisms proposed for cichlids inhabiting the African Great Lakes (Terai et al., 2002; Seehausen et al., 2008; Hofmann et al., 2010; Miyagi et al., 2012; Terai et al., 2017).

The Neotropical aquatic light environments are highly variable. Turbidity, in particular, has a great impact in the water transparency (Costa et al., 2013). Consequently, this variation can affect the molecular evolution of opsin genes, by acting as a selective pressure, and shifting spectra that can be absorbed by a specific visual pigment (Davies et al., 2012). Additionally, the light environment can affect expression levels of opsin genes. For example, a recent study (Härer et al., 2019) showed that a cichlid species do not express all the cone opsin genes, but that this usually varies according to the light conditions. This palette can also vary from juveniles to adults and it can be reversed depending on variations in the environmental light spectra (Härer et al., 2019).

Studying the molecular evolution of the opsin protein coding sequences, including changes in the amino acid substitutions of key spectral tuning sites of the protein, is essential to understand the evolution of visual systems, especially since some substitutions can shift their maximum absorbance (λmax) (Yokoyama, 2008; Carleton, 2009; Carleton et al., 2016). These proteins have been well studied in cichlids inhabiting the African Great Lakes (Hofmann et al., 2009, 2010; Terai et al., 2017), and the long-wavelength opsin (LWS) has been the focus of many studies because it is one of the main cone opsin genes under selection (Terai et al., 2002; Seehausen et al., 2008; Miyagi et al., 2012).

Here we identified codons under selection and compared LWS evolution of Neotropical cichlids (here represented by cichlids from South America) to those from the African Great Lakes. The specific objectives of this study are (1) to identify molecular variation at key sites of the LWS opsin protein using nucleotide and amino acid sequences from a group of Neotropical cichlids, and (2) to compare the variation within these sites to those from African cichlids for which data is available in public databases.

Material and methods

Sampling and molecular analyses

A total of 17 South American cichlid species (Cichlinae) were used in this study (Appendix Supplementary Table 1). Muscular tissue samples were obtained from site 6 of the Long-term Ecological Program (Programa Ecológico de Longa Duração–PELD/CNPq of the State University of Maringá–UEM) in Brazil (sampling was conducted following the Animal Ethical Committee protocol CEUA 123/2010), and from the UEM ichthyological collection and the Pontifical Catholic University of Rio Grande do Sul (PUC-RS). Samples were preserved in 1.5 mL microtubes filled with ethanol.

Table 1 Results of LWS Clade model C (CmC) analysis of African and Neotropical cichlids

DNA extractions were performed using either a Wizard® Genomic DNA kit or a ReliaPrepTM gDNA kit (Promega, Madison, Wisconsin, USA) according to the manufacturer’s instructions. DNA was run in a 1% agarose gel electrophoresis and quantified by comparing it to a λ DNA of a known concentration. All samples were stored at −20°C. PCRs were performed using the primers F1E1 (5´-GCGGTACCATGAAGATACAACAA-3´) and R1E6 (5´-GGATACTTCAG AACCATCATC-3´) (modified from Miyagi et al., 2012). Sequencing focused on obtaining key sites of the LWS opsin protein, thus the primer LWS_R5 (Miyagi et al., 2012) was also used. The expected amplicon size was ~ 2100 bp. Each PCR reaction was run using a 25 μL solution containing 20 mM Tris-HCl (pH 8.4), 50 mM KCl, 1.5 mM MgCl2, 2.5 μM primers, 0.1 mM dNTPs and 0.1 μl of Platinum® Taq DNA polymerase (Invitrogen,ThermoFisher Scientific, USA). Thermal cycling conditions used for amplification were: 3 min at 94°C, followed by 35 cycles of 30 s at 94°C, 30 s at 51°C and 90 s at 72°C, with a final extension step of 7 min at 72°C. PCR products were prepared for sequencing (Rosenthal et al., 1993) using a Big Dye Terminator v3.1 kit (ThermoFisher Scientific, Waltham, Massachusetts, USA) and sequenced in an automatized sequencer (3730XL, Applied Biosystems) at the Laboratório Central de Tecnologias de Alto Desempenho em Ciências da Vida of the State University of Campinas, Brazil. Sequenced sites comprised 175 codons of the coding sequence from the partial transmembrane region 3 to the partial transmembrane region 7 of the protein, based on the crystal structure of the bovine rhodopsin (Palczewski et al., 2000), spanning from exon 3 to partial exon 5. Sequences were edited using the software package BioEdit (Hall, 1999).

Cichlidae database and molecular analyses

Our final database included partial LWS sequences of 61 cichlid species, from which 17 were Neotropical (11 from this study) and 44 were African (obtained from GenBank) (Appendix Supplementary Table 1). Sequences from all samples were first categorized according to their lineage (Neotropical lineage; African lineage) and then aligned using the Clustal W algorithm (Thompson et al., 1994) as implemented in MEGA X (Kumar et al., 2018). Amino acid numbering in the final alignment followed the same order as that of bovine rhodopsin.

The best nucleotide substitution models were selected using the jModelTest 2 (Darriba et al., 2012). Maximum Likelihood (ML) nucleotide trees including all species were built using RAxML-NG (Kozlov et al., 2019). In both trees, the first using only exons, and the second gene tree using exons and introns, the autoMRE function was applied for bootstrap resampling. Notably, not all species were included in the second gene tree (that is, the one containing both exons and introns) as some of them only had coding sequences in public databases. Yet, since all 61 species had coding sequences, these were considered for evolutionary analyses. Trees were edited using FigTree v1.4.3 (Rambaut, 2016).

Amino acid variation following nucleotide substitutions in exons, was analyzed using the codeml function implemented in the pamlX (Xu & Yang, 2013) and FUBAR (Fast Unconstrained Bayesian AppRoximation) (Murrell et al., 2013), run on the Datamonkey platform (Delport et al., 2010). We used the Clade model C (CmC) to analyze the divergence between African and Neotropical clades and the site-models M0, M1a, M2a, M7, and M8 to estimate codon variation. Briefly, CmC compares the clades of a phylogeny by assuming that the divergence between them is under different selective pressure (Yang et al., 2005), thus taking M2a_rel as the null model (Weadick & Chang, 2012). The site-models, on the other hand, infer the ω rate corresponding to the nonsynonymous (dN) and synonymous substitutions (dS) ratio and positive selection. M0, M1a, and M2a are used to compare ω variation, whereas the M7 and M8 models are useful for estimating the sites under positive selection (Yang, 2007). The positively selected sites that were inferred using the M8 model are then presented considering the Bayes Empirical Bayes (BEB) analysis and a posterior probability (PP) > 80%. FUBAR estimates sites under positive or purifying selection also considering synonymous (α) and non-synonymous (β) substitution rates, but using a Bayesian algorithm instead. Thus, the significance of the positive or negative selection is returned as a posterior probability (Murrell et al., 2013).

Results

Partial coding DNA sequences of LWS were obtained. The final alignment was 730 bp when including introns, and 525 bp and 175 amino acids when only using exons. Overall, Neotropical cichlids exhibited the highest nucleotide variation among all samples analyzed here (Fig. 1). This was also reflected in the LWS gene tree containing only exons (Appendix Supplementary Fig. 1). Monophyly of both Neotropical and African clades seems to be well supported when using sequences that include introns. The same, however, was not observed in the nucleotide tree only using exons (Appendix Supplementary Fig. 1).

Fig. 1
figure 1

Maximum likelihood gene tree built using partial DNA sequences of the LWS gene containing exons and introns (730 bp) of cichlid species (HKY + G). Filled circles on nodes indicate bootstrap support > 80%. Scale bar indicates number of substitutions per site

According to the CmC analyses, the African/Neotropical partition was the best-fitted partition (Table 1), which was significantly different from the null model M2a_rel. Furthermore, African cichlids showed a higher rate of positive selection (5.93) compared to Neotropical species (1.33), indicated by ωd which is a measure of divergent selective pressure (Yang et al., 2005).

In accordance to higher ωM0 (1.10), the African lineage had the highest proportion of sites under positive selection (18%). Interestingly, despite the Neotropical clade exhibiting a much lower ωM0 (0.22), this group also had the highest proportion of sites under neutral selection (17%) (site-models results are shown in Table 2).

Table 2 Results of LWS site-models analyses of different cichlid groups

Estimations of codons under selection obtained using the FUBAR test (Table 3) showed that Neotropical cichlids have more sites under selection than species from the African lakes. In contrast to the Neotropical group, for which most sites were estimated under negative selection, the African clade had more sites under positive selection (see Appendix Supplementary Table 3 for details on species from each location). As expected, most sites under selection differed between groups (Table 3). Residue 164 had the highest β value, non-synonymous substitution rate, in almost all groups when compared to other spectral tuning sites. The highest α value (9.907), synonymous substitution rate, for Neotropical cichlids came from residue 261 (Appendix Supplementary Fig. 2). Altogether, our results showed that although the Neotropical clade had the highest variation, positive selection in this group (1.33) was weaker than that observed in African species (5.93).

Table 3 Sites under selection estimated using the FUBAR test

Discussion

Similar to findings from previous studies on RH1 (Schott et al., 2014; Torres-Dowdall et al., 2015), our work supports the monophyly of Neotropical and African cichlids. Yet, this was not the case among Neotropical tribes. This unexpected grouping is most likely due to the effects of purifying selection, which limits substitutions occurring in sites important for protein functioning. Also, we need to consider that a partial fragment of the LWS gene was analyzed in this study. Parallel evolution of coding regions might also mean that trees based on the LWS gene are not appropriate to generate species phylogenies (Terai et al., 2017). Indeed, while Fabrin et al. (2017) demonstrated monophyly of Neotropical and African cichlids using the LWS gene, they did not considered important spectral tuning sites in this family of proteins neither they included other cichlid species.

CmC analyses showed divergence between African and Neotropical cichlids, suggesting different selective pressures might be driving the molecular evolution of this gene in these two lineages. Similar patterns have been observed in studies focusing on the RH1 gene (Schott et al., 2014; Torres-Dowdall et al., 2015), where rates of positive selection were also higher in African than in Neotropical cichlids. This difference in divergence patterns of the LWS gene between African and Neotropical species may be related to a range of different evolutionary history and ecological pressures. It is important to note that the species studied from the Neotropical lineage comprise a relatively ancient group, compared to the African species studies were all relatively recently diverged haplochromine species (Genner et al., 2007; Koblmüller et al., 2011).

Factors such as light intensity and depth have been shown to be strongly correlated to allelic variation of the LWS gene in Pundamilia spp. from Lake Victoria (Seehausen et al., 2008). Similarly, Irisarri et al. (2018) showed that depth has an impact in rates of positive selection of this gene in cichlids from Lake Tanganyika. Therefore, the maintenance of this variation is likely due to ecological factors (Miyagi et al., 2012). Here we did not analyzed as many species as Schott et al. (2014) or Torres-Dowdall et al. (2015), and did not specifically analyze variation in relation to different ecological factors. It may be valuable to explore the role of ecological differences in future studies on the molecular evolution of the LWS gene in Neotropical cichlids.

The divergence observed in the CmC analysis was further supported by that shown in the site-models analyses. Residues under selection differed when comparing African and Neotropical cichlids. Comparing the ωM0 values between African and Neotropical cichlids (Table 2) and considering that cone opsins genes are under a weak purifying selection (Irisarri et al., 2018), the purifying selection of the LWS gene seems to be stronger in Neotropical cichlids than in African cichlids, which was confirmed by FUBAR results. The negative selection in Neotropical cichlids might mean that once mutations are fixed from an ancestral residue, they cannot be readily reverted, leading to accumulations of nucleotide substitutions but not amino acid changes (Storz, 2016).

Differences in spectral tuning sites are likely to be important for adaptation to environments with different light levels, which might be key for speciation of cichlids (Hofmann et al., 2009; Carleton et al., 2016). Notably, residue 164 had a high non-synonymous substitution rate, and a S164A substitution can shift the wavelength of light absorption of the LWS protein (Asenjo et al., 1994; Yokoyama, 2008). Further, the M8 model suggested that this particular site is under positive selection in both African and Neotropical groups, and both 164S and 164A were found in Neotropical species from this study. Escobar-Camacho et al. (2017) also found this substitution in the Amazonian species Pterophyllum scalare (Schultze, 1823), Symphysodon discus (Heckel, 1840) and Astronotus ocellatus (Agassiz, 1831). Additionally, other studies have reported intraspecific variation in the site 164 (164A and 164S) in cichlid species from Lake Nicaragua (Torres-Dowdall et al., 2017) and Lake Victoria in Africa (Terai et al., 2017).

Although other key sites (181, 269, and 292) were not found to be under selection in the Neotropical group, the amino acids next to these sites were under purifying/negative selection. Härer et al. (2018) also analyzed the DNA sequences of LWS in Neotropical cichlids and identified eight amino acids that varied, four (206, 214, 217, and 287) of which are found in our region of study and one (site 217) that we found to be under positive selection in our data (217S, 217G, and 217A). Thus, this site variation warrants further study.

In conclusion, here we showed that there is significant variation in a partial coding DNA sequence of the LWS gene of Neotropical cichlids, and that this gene is under weak positive selection in this lineage. Further, our results support the divergence between African and Neotropical lineages. We also identified codons from Neotropical species that are under selection, most of them under negative selection. Our data suggests that selection pressure in African cichlids is stronger than in Neotropical cichlids. Future work focused on Neotropical species could help to further understand the pressures driving the molecular evolution of LWS, particularly, analyses on the function of different sites under selection and its relevance to light absorption.