Introduction

African cichlids are a model for understanding speciation (Kornfield and Smith 2000; Danley and Kocher 2001; Genner and Turner 2005). Lakes Malawi, Victoria, and Tanganyika all house large flocks of endemic species. The flocks in Malawi and Victoria are each composed of large radiations of closely related haplochromine cichlids that have evolved at least partially through hybridization (Seehausen 2004). These radiations are an excellent natural laboratory for the study of speciation because of the large numbers of species that have arisen in such a short period of time.

Numerous models have been proposed for how species are formed, and the possibility or probability of each is a matter of some contention (Kirkpatrick and Ravigne 2002). The classic model of allopatric speciation requires extrinsic (geographic) barriers to gene flow (Mayr 1982). However, there is also evidence for parapatric and sympatric speciation, in which divergence is driven primarily by ecological (Schluter 2001) or sexual selection (Zahavi 1975; Weatherhead and Robertson 1979; Lande 1981). Sympatric speciation is a particularly attractive explanation for the recent rapid radiation of haplochromine cichlids (Seehausen and van Alphen 1999).

Cichlid visual systems have been the subject of intensive study because of their direct link to mate identification and preference (Maan et al. 2004; Maan et al. 2006). Visual cues appear to be sufficient for conspecific mate recognition (Seehausen and van Alphen 1998; Kidd et al. 2006), and may be a metric for intraspecific mate choice (Maan et al. 2004). The degradation of the visual environment in Lake Victoria has been linked to the breakdown of visual mate recognition systems and species isolation (Seehausen et al. 1997). Behavioral studies have demonstrated that sister species of Lake Victorian cichlids have different chromatic sensitivities, and that male nuptial coloration tends to match these sensitivities (Maan et al. 2006). Therefore, the inherent sensitivities of cichlid visual systems may determine mate choice and contribute to speciation (Kawata et al. 2007; Seehausen et al. 2008).

Visual sensitivities are determined by the photoreceptor visual pigments. These are comprised of a chromophore (typically 11-cis retinal) bound to an opsin protein (Palczewski et al. 2000). The nature of the opsin protein sets the absorption properties, and is therefore a fundamental source of functional variation in visual systems (Yokoyama and Yokoyama 1996). Amino acid changes at multiple sites can alter the characteristic spectral absorbance of the protein (Takahashi and Ebrey 2003; Sugawara et al. 2005). A relatively small number of sequence changes can result in a functionally novel opsin class (Yokoyama and Radlwimmer 2001). This characteristic is especially important in the context of color vision (Carleton et al. 2000; Spady et al. 2005). Gene duplication and divergence has led to the evolution of new opsin classes with unique visual sensitivities in mammals (reviewed by Ebrey and Koutalos 2001), birds (reviewed by Ebrey and Koutalos 2001), crustaceans (Porter et al. 2009), and teleosts (Hofmann and Carleton 2009). Evolution of opsin coding sequences has led to the diversity of opsin chromatic sensitivities that are observed in organisms today.

East African cichlid fishes are an ideal system for the study of functional variation in cone opsins (Carleton et al. 2006). Cichlids have seven unique cone opsin genes that have arisen through a series of gene duplications (Spady et al. 2006). Variation among species in the expression of these gene duplicates causes important changes in visual sensitivity among species in Lake Malawi (Parry et al. 2005). These fishes are diverse in male nuptial coloration, and are thought to be actively speciating (Danley and Kocher 2001). Functional diversity in cone opsin genes that cause differences in female visual sensitivity is a possible explanation for the divergence of female preferences for different male coloration.

Hofmann et al. (2009) proposed that, while shifts in visual sensitivity in the intermediate spectral range (~410–520 nm) can be accomplished by altering opsin gene expression, spectral shifts at the ends of the spectral range can only be achieved by varying opsin sequence. In this study, we sequenced all seven known cone opsin genes in several species from two species rich genera: Metriaclima and Labidochromis. Our hypothesis was that sequence variation altering the amino acid residues at sites in the retinal binding pocket would be found in the SWS1 and LWS opsins as these proteins are at the extreme ends of the cichlid opsin spectral range. Conversely, we also hypothesized that the remaining five opsin genes would be conserved at these functional sites due to the potential for expression shifts among these genes to act as a tuning mechanism.

Materials and Methods

DNA Samples

Twenty-four fish from the genus Metriaclima were lab reared from captive-bred stocks with a known African geographical source. Caudal finclips were taken from three individuals from each of the following eight species: Metriaclima barlowi, Metriaclima benetos, Metriaclima callainos, Metriaclima fainzilberi, Metriaclima lombardoi, Metriaclima mbenji, Metriaclima phaeos, and Metriaclima pyrsonotus. Samples from six species of Labidochromis were taken from a mixture of lab and wild caught individuals. Labidochromis caeruleus and Labidochromis chisumulae samples were prepared from abdominal muscle tissue taken from three frozen individuals that were originally procured from Old World Exotic Fish (FL, USA). Finclips were taken from the following wild-caught fishes at Lake Malawi: one Labidochromis flavigulus, two Labidochromis gigas, two Labidochromis ianthinus, and three Labidochromis vellicans (Table S1). Genomic DNA was extracted using a Qiagen DNeasy kit (Valencia CA, USA).

Gene Amplification and Sequencing

Gene-specific primers were used to amplify each of the seven cone opsin genes: SWS1 (UV sensitive), SWS2B (violet sensitive), SWS2a (blue sensitive), RH2B (blue-green sensitive), RH2Aα (green sensitive), RH2Aβ (green sensitive), and LWS (red sensitive). Most genes were amplified in either two or three fragments. These fragments overlapped to produce one contiguous sequence for all but the RH2B gene, where we did not amplify and sequence across the 1.5 kb of intron two. PCR products were purified using a Qiagen QIAquick kit (Valencia, CA, USA) and then cycle-sequenced using internal primers (Table S2) and the BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems, Foster City, CA, USA). Sequencing was performed on either a 3730xl DNA Analyzer or a 3100 Genetic Analyzer (Applied Biosystems).

Sequence Assembly and Analysis

Sequence assembly was performed using the Sequencher 4.7 software (Gene Codes Corporation, Ann Arbor, MI, USA). For a few Labidochromis sequences, long intronic repeats within the LWS or SWS2B genes made assembly difficult. Sequences were aligned to a genomic reference sequence from Metriaclima zebra and the repeat was excised to allow further analysis. Coding sequences were identified by comparison to previous sequenced cichlid opsin cDNAs and amino acid sequences were inferred. For phylogenetic analyses, gene and protein alignments of large species panels were performed using the LINSI protocol in MAFFT (European Bioinformatics Institute). Bootstrap trees were generated in PAUP (Swofford 2002), and distance trees were created using a general time reversible (GTR) model in GARLI v0.951 (Zwickl 2006; www.bio.utexas.edu/faculty/antisense/garli/Garli.html).

Haplotype blocks were analyzed using the DnaSP 5.10 software package (Librado and Rozas 2009). DnaSP 5.10 computationally separated the provided diploid sequence data to estimate the sequence structure of haploid alleles. Recombination sites were identified for both the SWS1 locus and the tandem array of the RH2 genes. Haplotype blocks were then defined as continuous regions of fixed sequences between these recombination sites and mapped onto the amino acid sequences.

Allele Identification

The cDNA sequences were compared to previously sequenced opsins from cichlids in Lakes Malawi, Victoria, and Tanganyika (Table S3). The Nile tilapia (Oreochromis niloticus) was used as a standard outgroup for all gene families. All known visual pigment spectral shifts involve amino acid sites directed into the retinal binding pocket, which change amino acid polarity (Chang et al. 1995; Takahashi and Ebrey 2003; Yokoyama 2008). Cichlid cDNA sequences were translated to amino acids and aligned with bovine rhodopsin to identify the corresponding retinal binding pocket sites from the bovine rhodopsin crystal structure (Palczewski et al. 2000). (Note: we use these corresponding bovine rhodopsin site numbers for all cichlid opsin amino acid locations). Sites were annotated as to whether they were in the retinal binding pocket close to 11-cis retinal, in the transmembrane regions, or outside of these regions (see Carleton et al. 2005b; Fig. 3 for site locations). Next amino acid changes were examined to determine whether they involved a change in polarity. Only sites that were both in the retinal binding pocket and involved a change in polarity were considered to have the potential for causing functional differences. Functional alleles were then identified for individuals and species (e.g., the M. zebra allele) and were classified by fixed residues at these retinal binding pocket sites. In addition, actual functional differences in alleles were estimated by comparisons between sequences from species with known spectral sensitivity determined by microspectrophotometry (MSP) or protein expression (Parry et al. 2005; Jordan et al. 2006).

Results

All of the genomic sequences were deposited in Genbank and the accession numbers are listed in Table S3. The RH2B gene contains a long second intron (~1.5 kb) that we did not sequence across. Therefore, the sequences for this gene were submitted in two segments that correspond to the regions both upstream and downstream of this intron (Table S3).

Sequences from the seven cone opsin genes revealed significant differences between genes in the degree of overall sequence variation. Three genes showed no likely functional variation, while the other four displayed significant variation with variable sites likely to contribute to functional differences in visual pigment spectral sensitivity. The following genes could not be assembled due to an inability to obtain sufficient unambiguous sequence: one SWS2a (M. benetos), two RH2Aβ (L. flavigulus and L. ianthinus), two RH2Aα (L. vellicans and M. benetos), and one LWS (L. vellicans).

Genes with No Functional Allelic Variation

Three genes lacked any amino acid sequence polymorphisms likely to cause functional shifts: SWS2a, RH2B, and RH2Aα (Figs. 1, 2). The SWS2a gene was invariant in Labidochromis, and displayed no variation across the several species in the genus (Fig. 2). In Metriaclima, SWS2a had no variation at retinal binding pocket sites. Therefore, SWS2a had no sequence variation predicted to have functional significance. RH2Aα had a few variable transmembrane sites in each genus. However, none of these sites were in the retinal binding pocket and none involved changes in polarity. Therefore, RH2Aα also had no functionally significant variation. The RH2B gene had several variable transmembrane sites, but only one has a high likelihood of being functionally significant. Both genera show variation at M44I, a retinal binding pocket site that involves a change in amino acid polarity. Labidochromis also shows polymorphism at A124S. However, although these sites are present in or near the retinal binding pocket, neither of these polymorphic sites are associated with known absorbance shifts in other systems (reviewed by Takahashi and Ebrey 2003). In particular, the A124S polymorphism (~8 Å from retinal) is two sites away from site 122 (~5 Å from retinal), which is known to cause shifts in the spectral sensitivity of rhodopsin pigments (Yokoyama et al. 2008). As it is two-thirds of an alpha-helical turn away from a known tuning-site, it is unlikely that the A124S polymorphism faces in a direction that would result in interactions with the bound retinal. Therefore, RH2B is unlikely to show functional variation.

Fig. 1
figure 1

Summary of opsin polymorphisms in the genus Metriaclima. Colored rows represent genes with suspected functional variation

Fig. 2
figure 2

Summary of opsin polymorphisms in the genus Labidochromis. Colored rows represent genes with suspected functional variation

We discovered pseudogene alleles for two of the opsin genes. M. fainzilberi and M. mbenji were found to be polymorphic for an SWS2a allele that would result in a slightly truncated gene, with a stop codon being inserted eight residues prior to that in the normal sequence. Expression of the SWS2a gene has not been detected in any Malawi rock-dweller, so this truncated allele is not likely to have functional significance (Hofmann et al. 2009). All three individuals of L. caeruleus were homozygous for an RH2Aα allele in which an insertion disrupts the splice site for the third intron. A stop codon is then present in the sequence prior to the beginning of the normal fourth exon.

SWS1

In Metriaclima, 14 nonsynonymous changes were found in SWS1 sequences (Fig. 1). Eight of the polymorphisms were present in the transmembrane regions of the protein, with four in retinal binding pocket sites. All four caused a change in polarity. Based on these polymorphisms, two functional alleles were identified. One corresponds to the M. zebra allele, which absorbs maximally (λmax) at 368 nm, while the other has a λmax of 378 nm and was previously described in Pseudotropheus acei (Parry et al. 2005). These two alleles differ at four key retinal binding pocket sites at amino acid positions 83, 114, 160, and 204. The M. zebra allelic class (henceforth referred to as UV368) is defined by a GSTT sequence at these sites, whereas the P. acei allele (henceforth referred to as UV378) is defined by SAAI. Most of the Metriaclima have the UV368 allele. However, M. callainos and M. phaeos, as well as all of the published sequences for SWS1 from Lake Victoria cichlids, possess UV378 (Fig. 3). Although the Labidochromis are polymorphic for seven nonsynonymous changes in the gene, none of these are binding pocket polymorphisms (Fig. 2). Labidochromis are all fixed for UV368 and show none of the functional variation found in Metriaclima.

Fig. 3
figure 3

Gene tree for the SWS1 gene. The red branches represent reference sequences taken from Lake Tanganyika. The dark blue branches represent the UV378 allele, with the light blue clade being the Victorian subset of the UV378 allele. The pink clade represents all of the UV368 alleles described in this study, as well as some reference sequences. This clade appears to be unique to Lake Malawi

SWS2b

SWS2B showed much less variation than SWS1. In Labidochromis, two nonsynonymous polymorphisms were found in the violet-sensitive genes. Both cause a change in polarity in a transmembrane region (Fig. 2). Of these sites, only the A269T polymorphism is present in the retinal binding pocket. This polymorphism is known to cause a functional change, and is associated with an 11 nm shift in the cottoid fishes of Lake Baikal (Cowing et al. 2002). The T269 allele was found only in L. flavigulus, with all of the other Labidochromis being fixed for A269. All of the Metriaclima were fixed for a variant of the A269 allele (Fig. 1).

RH2Aβ

The RH2Aβ gene showed extensive diversity. The Metriaclima had 17 variable sites in this gene (Fig. 1). Eight of the sites were in transmembrane regions. However, only two were in the retinal binding pocket, with one of those resulting in a change in polarity. Although this gene displays a large amount of amino acid variation, the only polymorphism in a protein region likely to be functionally significant is M183L. Yokoyama et al. (2008) identified residue changes at position 183 as a determinant of λmax in opsins from the RH class. This residue is present in an extramembrane loop that penetrates the retinal binding pocket of the opsin protein and is in close proximity to the bound chromophore (Palczewski et al. 2000). The shift from a residue containing sulfur (M) to one without it (L) at this site likely has a functional effect. Therefore, RH2Aβ was considered to have two primary functional alleles. However, current MSP data for variation at this site in cichlids is insufficient to make a definitive statement regarding the magnitude of this λmax shift. While most of the Metriaclima are fixed for the M183 allele, all three M. lombardoi and one M. barlowi have the leucine residue at site 183.

The Labidochromis had 10 nonsynonymous changes in the RH2Aβ gene (Fig. 2). However, the large amount of polymorphism in this genus does not include any variation in the retinal binding pocket. All of the Labidochromis possess a variant of the M183 allele. Therefore, this gene does not show any functional variation in Labidochromis.

LWS

The Labidochromis had a total of seven nonsynonymous polymorphisms. Three would result in binding pocket polarity shifts (Fig. 2). Nine polymorphic sites were found in Metriaclima, with only one of those causing polarity shifts in the binding pocket (Fig. 1). Both genera had alleles corresponding to the H (A164) and M2 (S164) alleles described by Terai et al. (2006). These alleles differ by a S164A polymorphism, which causes a 7 nm shift in humans (Asenjo et al. 1994). However, expressed proteins of cichlid alleles differing at this site did not show spectral shifts when combined with a type A1 chromophore (Terai et al. 2006). Since Malawi cichlids primarily use A1, this site may not be functionally significant in the cichlid LWS background.

Labidochromis caeruleus is polymorphic for Y261F. This substitution is associated with a 10 nm shift in humans (Asenjo et al. 1994), but MSP data are not available for cichlids or other fishes in general. The F261 variant was found only in H-type alleles, while the Y261 variant is present in both the H and M2 allelic classes.

Haplotype Blocks Around Opsin Genes

The observed pattern of association between amino acid polymorphisms varied both between genes and between genera as well. None of the genes showed clear haplotypes of associated polymorphisms in Labidochromis. The largest continuous differentiated block within the Labidochromis is in the RH2Aβ locus and extends for roughly 300 bases, which includes amino acid positions 56–162 (Fig. 4). Otherwise, the high proportion of heterozygosity and singleton polymorphisms in the genus disrupted any obvious physical structure.

Fig. 4
figure 4

Table of all variable amino acid residues in the RH2Aβ gene in the genus Labidochromis. Pink and light blue portions represent haplotype blocks in the amino acid sequence. Shaded “T”s denote polymorphic sites in the transmembrane regions, while yellow cells represent sites with polymorphisms resulting in polarity shifts

The Metriaclima, however, displayed haplotypes in both the SWS1 gene and the RH2 tandem array. In the SWS1 locus, the extent of variation was clearly correlated with the functional allele type. The UV378 allele seems to be highly conserved in the small number of individuals in which it was found, with no residue polymorphisms after the 21st residue in the protein. This continuous block extends for the remainder of the gene including the introns, which corresponds to ~1 kb (Fig. 5). The UV368 alleles were variable at several sites throughout the gene. Of the 14 polymorphic sites present in the SWS1 gene, the UV368 alleles only shared one common polymorphism (I165) aside from the four previously mentioned functional sites (G83, S114, T160, and T204). Perhaps most importantly, there is not a continuous block across these four sites. Sites 103 and 201 are polymorphic due to recombination, meaning the longest continuous block in the UV368 allele extends for ~380 bp when the intron is included (Fig. 5). This haplotype structure corresponds well with the large genetic distance observed between the various UV368 alleles (Fig. 3). However, it is important to note that the long haplotype structure in the UV378 alleles may be the result of the relatively small number of individual (n = 6) in which this allele was observed.

Fig. 5
figure 5

Table of all variable amino acid residues in the SWS1 gene in the genus Metriaclima. Pink and light blue portions represent invariable blocks in the amino acid sequence. Bright green residue numbers represent sites that are present in the retinal binding pocket and are suspected to be functional. Shaded “T”s denote polymorphic sites in the transmembrane regions, while yellow cells represent sites with polymorphisms resulting in polarity shifts

The RH2 genes display a potential large-scale haplotype structure that extends over a much larger portion of the genome than that observed in SWS1. The three RH2 genes form a 26 kb tandem array (RH2BRH2AαRH2Aβ; Carleton et al. in prep). Corresponding blocks at the margins of these RH2 genes suggest that haplotype blocks may extend through the intergenic spaces in this array, although these regions were not sequenced independently. There are two different haplotype blocks that center around the RH2Aα gene. One is based on the RH2Aα locus of M. barlowi, which is correlated with a conserved block of 500 bp in the RH2B gene. These genes are ~15 kb apart, suggesting that our conserved sequences may represent the edges of a very large haplotype block. Similarly, the RH2Aα locus found in two M. lombardoi is a continuous haplotype block that is correlated with the first half of the otherwise highly variable RH2Aβ gene. The genomic distance between these genes is ~10 kb (Fig. 6). This linkage breaks down approximately 300 bp into the RH2Aβ coding sequence before the L183M polymorphic site. These possible haplotype blocks are visually apparent in the sequence, and they were confirmed computationally using DnaSP.

Fig. 6
figure 6

Table of all variable amino acid residues in the RH2 genic tandem array in the genus Metriaclima. Pink and light blue portions represent invariable blocks in the amino acid sequence. Bright green residue numbers represent sites that are present in the retinal binding pocket that are suspected to be functional, while dark green residues are present in the binding pocket but probably have no functional effect. Shaded “T”s denote polymorphic sites in the transmembrane regions, while yellow cells represent sites with polymorphisms resulting in polarity shifts. The physical distance between the RH2B and RH2Aα genes is ~15 kb, while the distance between the RH2Aα and RH2Aβ genes is ~10 kb. The orientation of the RH2Aα gene is reversed in the genome when compared to the other two, and so its sites are listed in reverse order

Gene Trees and Genetic Distance Between Alleles

Gene trees were constructed using the sequences described in this work as well as several reference sequences from cichlids of Lakes Victoria and Tanganyika and the riverine cichlid O. niloticus. The topology of the SWS1 gene tree indicates that the split between the UV378 and UV368 alleles is quite old, and probably dates to very early in the Malawian radiation. The total branch length within the UV368 clade is roughly threefold greater than that in the UV378 clade, even when the Victorian variations of UV378 are included (Fig. 3). By assuming a model of linear molecular evolution, we can estimate the approximate genetic divergence time of these two alleles. This uses the genetic distances between these two alleles as well as those to the Nile tilapia SWS1 sequence and a range of divergence times previously estimated for Malawi and tilapia of 20 MY (Genner et al. 2007) or 45 MY (Genner et al. 2007; Azuma et al. 2008). Using these, we find that these two alleles diverged between 2.3 and 5.3 MY ago. This is close to the age of the lake, and dates back to the basal radiation of the Malawian cichlids (Genner et al. 2007). While possible rapid diversification of SWS1 alleles could skew our linear estimates of divergence times, the concordance of the allelic split with the early radiation of the Malawian/Victorian cichlid clade is consistent with our data.

The tree for RH2Aβ shows significant variation in divergence between alleles. Some branches are much longer than others, such as that leading to the L183 allele of only two species. This allele is clearly derived within the Malawian flock (Fig. 7). The long branch lengths within this L183 clade might suggest that this allele is rapidly evolving. Notably, the distance between the M183 alleles in Victoria and Malawi is actually less than that observed between the terminal L183 allele of M. lombardoi and the M183 clade (Fig. 7). This suggests either rapid diversification or pseudogenization of the L183 allele of the RH2Aβ gene.

Fig. 7
figure 7

Gene tree for the RH2Aβ gene. The red branches represent reference sequences taken from Lake Tanganyika. The dark blue branches are Malawian versions of the M183 allele, with the light blue clade being the Victorian subset of this allele. The pink clade represents the L183 allele

The LWS gene shows a complex, intertwined relationship of sequences between Lakes Victoria and Tanganyika, but appears to be monophyletic within Malawi (Fig. 8). The M2 allelic clade is nested within the monophyletic H grouping. These two alleles differ by S164A, which suggests that the S164A polymorphism arose independently in the M2 alleles in Malawi and in Victoria. Also, the F261 alleles are not separated from the general H clade by any appreciable distance, indicating either a lack of selection for the allele or a very recent origin.

Fig. 8
figure 8

Gene tree for the LWS gene. The red branches represent reference sequences taken from Lake Tanganyika. The light blue branches are H-type allele in Victoria, whereas the yellow branches represent L-type alleles (Terai et al. 2006). The dark blue clade is the H-type alleles from Lake Malawi, with the green clade being the M2-type alleles. The two pink branches are the two individual Labidochromis caeruleus that possess the functional F261 polymorphism

Discussion

Cichlid opsin genes have been examined at the population level for Lake Victoria cichlids in previous studies (Terai et al. 2006; Seehausen et al. 2008). A few studies have examined single individuals within species from Lake Malawi. However, this is the first study to examine multiple individuals from multiple species within two important Lake Malawi cichlid genera, Metriaclima and Labidochromis. As such, it provides new insight into the degree of both inter- and intraspecific opsin sequence variation in these fishes. Such variation could be important for cichlid mate choice and cichlid speciation.

Opsin sequence polymorphisms with known or potential functional effects were found in four of the seven genes studied. Of these four genes, two were found to have functional variation in Metriaclima (SWS1 and RH2Aβ) and two in Labidochromis (SWS2B and LWS; examples illustrated in Fig. 9). The variable sites in genes SWS1, SWS2B, and LWS are all associated with absorbance shifts of 7 nm or more. These data suggest that the potential for spectral tuning exists in both genera studied, and that the region of spectral tuning varies between these genera. Based on previous analyses in cichlid opsins for species from Lakes Malawi and Victoria, we hypothesized that variation would predominate in the SWS1 and LWS genes (Hofmann et al. 2009). While the presence of variation in the SWS1 and LWS genes was in accordance with this initial hypothesis, the distribution of this variation between genera and the variable sites in SWS2B and RH2Aβ were unexpected.

Fig. 9
figure 9

Panel of species containing functional variants when compared to the reference alleles from Metriaclima zebra (a). b M. callainos—UV378 variant of SWS1. c M. lombardoi—L183 variant of the RH2Aβ gene. d Labidochromis flavigulus—T269 variant of SWS2B. e L. caeruleus—F261 variant of LWS. Photos courtesy of Ad Konings, Cichlid Press

The SWS1 gene represents the shortest-wavelength sensitive opsin in the cichlid visual palette. Spectral tuning in the UV portion of the visible spectrum cannot be accomplished by changes in gene expression, it must occur via sequence variation in SWS1. The genetic distance between the two functional alleles found is fairly large. This may indicate rapid diversification within the UV368 allele, an initial divergence between UV368 and UV378 that occurred around the time of the formation of the lake, or a combination of these two scenarios (Fig. 3). More specifically, UV378 appears to be fairly conserved, as the branch length between the extant alleles in both Malawi and Victoria is fairly short. The UV368 allele, which has only been described in Lake Malawi, has relatively long branch lengths leading to and within its clade, indicating rapid diversification within the lake. Our data suggest that the UV368 allele in particular is undergoing accelerated evolution. This is supported by recent SNP analyses showing that the SWS1 gene has abnormally high Fst within M. zebra populations (Loh et al. 2008). Additional testing should look for evidence of a selective sweep as has been seen in the LWS gene in Lake Victoria cichlids (Terai et al. 2006). The fact that both Malawian and Victorian taxa fall within the UV378 clade suggests that this allele was present in the riverine ancestor of both radiations. This would suggest that the UV368 allele arose and rapidly diversified within the clear waters of the newly formed Lake Malawi (Fig. 3). However, it is important to note that there are no obvious differences in the color patterning or ecology of the observed Metriaclima species that we would intuitively associate with divergent selection for these two SWS1 alleles.

Sequence variation in the LWS gene is associated with speciation in Lake Victoria (Terai et al. 2002; Terai et al. 2006; Seehausen et al 2008). Variation in this gene would also be anticipated in the sand-dwelling cichlids of Lake Malawi, given that many of these species are known to actively express LWS (Carleton et al. 2005a; Hofmann et al. 2009). However, the potentially functional Y261F polymorphism is absent in all of the sand-dwellers examined to date. The Metriaclima possess a subset of the alleles known in Lake Victoria, but the specific polymorphism present in the genus probably causes no functional change without the A2 chromophore present in Lake Victoria cichlids (Terai et al. 2006). The Labidochromis, however, possess the additional Y261F variable site that would cause a similar or possibly larger shift in λmax than that observed between the Victorian L and H alleles (Terai et al. 2006). This variable site may indicate spectral tuning in the long-wavelength pigments of this genus.

The Metriaclima RH2Aβ gene presents an unusual situation. It is the most polymorphic of all the genes described here, with 17 nonsynonymous changes throughout the protein. Although the L183 allele was only found in four fishes (all three M. lombardoi and one of three M. barlowi) and appears to be unique to Lake Malawi, the genetic distance within this clade of four genes is threefold greater than the distance from a Tanganyikan species (Tropheus duboisi) to a Malawian species (Tramitichromis intermedius; Fig. 6). These data, along with the haplotype structure observed in the M. lombardoi individuals, suggests one of three alternative scenarios: (i) rapid functional diversification between the M183 and L183 alleles, (ii) degradation and potential undescribed pseudogenization of the L183 allele, or (iii) recombination of the M183 allele with a separate gene of unknown origin. Future expression studies quantifying relative expression of the RH2Aβ and RH2Aα genes may provide insight into the probability of the first two scenarios. However, the third scenario would require identifying a plausible source of the recombinant sequence.

Intraspecific variation was observed for the LWS gene much as it was for RH2Aβ. For LWS, our L. caeruleus samples included one individual homozygous for the Y261 allele, one homozygous for the F261 allele, and one heterozygote. Given that only three individuals were sampled for each species and multiple functional alleles were still observed, it is likely that these polymorphisms represent alleles of relatively high frequency within their respective species. This has profound implications for speciation processes, as visual diversification within a species could easily result in ecological or sexual isolation. This concept can be strongly illustrated by the presence of the variant LWS alleles in L. caeruleus. If the Y261F polymorphism causes a shift in spectral sensitivity similar to what it causes in humans (10 nm), it would represent a dramatic change in long-wavelength sensitivity in this species. Throughout the lake, various subpopulations of L. caeruleus have solid yellow males, pearly white males, and males with vertical black bars (Fig. 10). Although speculative, it is entirely possible females in different populations possess different LWS alleles and may mate assortatively because of this. Therefore, our data suggest a scenario that would be compatible to divergence via sensory drive (Terai et al. 2006; Kawata et al. 2007; Seehausen et al. 2008). This is a hypothesis that both warrants and requires further investigation.

Fig. 10
figure 10

Variation in male nuptial coloration of Labidochromis caeruleus at several locations across the Lake. a Lion’s Cove, b Lundo Island, c Mbowe, and d Ruarwe. Photos courtesy of Ad Konings, Cichlid Press

There were several instances of pseudogenes found in these sequences. These are the first pseudogenes that we have detected in Malawi cichlid opsins, though pseudogenes have been detected in Tanganyikan cichlid opsins (Spady et al. 2005). The issue of the pseudogenization of RH2Aα in L. caeruleus is an important one. It is unclear whether the duplicate genes RH2Aβ, RH2Aα, or both are actively expressed in the cichlid retina. Since both these genes have a similar λmax, there is the potential for functional redundancy in the system. Therefore, we would predict that in L. caeruleus, RH2Aβ would be expressed. In other species, we have predicted that RH2Aα would be the predominant RH2A gene expressed, based on comparisons of λmax of visual pigments from MSP and protein expression. Due to their sequence similarity, it is difficult to assay the expression of each gene separately. Further studies are needed to separately quantify RH2Aα and β expression. Such studies might confirm that expression of RH2Aβ compensates for the pseudogenization of RH2Aα in L. caeruleus.

A final topic that deserves mention is the physical structure of polymorphisms and their presence in haplotype blocks through the genome. In the Labidochromis, there is no discernable haplotype pattern in LWS. Only two associated variable residues were found in the entire SWS2B locus, with the rest of the sequence being fixed across the two major alleles. This large haplotype block that is common to both alleles strongly suggests that the T269 allele was recently derived from the A269 allele. However, unlike in the Labidochromis, the Metriaclima have large, distinct haplotype blocks in or around their variable genes (Figs. 5, 6). In SWS1, the UV378 allele is fixed across roughly two-thirds of the protein sequence, while the UV368 is much more variable. In M. lombardoi, two haplotype corresponding haplotype blocks are found at the margins of both RH2Aβ and RH2Aα. These blocks surround a 10-kb intergenic region, potentially indicating the presence of a larger block. However, this block may actually represent a haplotype focused around the RH2Aα locus as M. barlowi has two corresponding blocks at the margins of RH2Aα and RH2B, with an intergenic distance of roughly 15 kb. If these blocks extend through these intergenic regions, they are two to three times greater than those described for the Victorian LWS genes (Terai et al. 2006), suggesting very strong or very recent selection on the RH2Aα locus. Extensive population sampling and sequencing of the intergenic regions will be required to determine if reduced variation in this region is truly representative of a selective sweep.

In sum, the work presented here demonstrates that the sequence polymorphisms needed to tune spectral sensitivity are present not just within the Malawian flock as a whole, but within genera and even species. Both the Metriaclima and the Labidochromis possess two polymorphic genes that could tune spectral sensitivities between congeners. Therefore, the potential for spectral sensitivity shifts via opsin sequence changes was found in both genera, but functional variation within a given opsin gene was confined to a single genus in each case. This suggests that selection may act on different regions of spectral sensitivity in these genera. Although variations in gene expression are clearly key to visual sensitivity within Lake Malawi, the presence of opsin sequence polymorphisms may allow for spectral tuning in situations where gene expression is fairly constrained. As such, opsin sequence variation may contribute to sensory drive and subsequent species divergence in Lake Malawi much as it does in Lake Victoria.