Introduction

Opsins comprise two protein families, called type I and type II opsins, with detailed functional similarities, but whose homology is often doubted. Both opsin classes are seven-transmembrane (7-TM) proteins that bind to a light-reactive chromophore to mediate a diversity of responses to light (Spudich et al. 2000). In both families, the chromophore (retinal) binds to the seventh TM domain via a Schiff base linkage to a lysine amino acid. Type I opsins are found in both prokaryotic and eukaryotic microbes, functioning as light-driven proton pumps, sensory receptors, and in various other unknown functions. Type II opsins are known only in eumetazoan animals and are involved in the regeneration of converted chromophores and the photosensitive elements of visual perception and circadian rhythms (Plachetzki et al. 2007). For two reasons, we herein seek to test a hypothesis on the origins of type II (animal) opsins. First, if type I and type II opsins are homologous, they should share the same origination event. As such, our test of type II origins has implications for homology or nonhomology of opsin types. Second, although an origin by internal domain duplication is supported for some type I opsins, no one has investigated a similar hypothesis for type II opsins.

Internal Domain Duplication and the Origin of Seven-Transmembrane Proteins

Numerous prokaryotic TM proteins probably originated by duplication of their TM domains, evidenced by nonrandom sequence similarity between domains within proteins. Although numerous duplication patterns have been documented in proteins with various numbers of TM domains, most germane to this study are 7-TM proteins. Multiple 7-TM proteins, including type I opsins, have been identified that exhibit a specific duplication pattern in which the first three TM segments (TMSs) are significantly similar to the last three (Shimizu et al. 2004), indicating the following internal duplication at the origin of the protein:

$$ \underline{{{\text{1 - 2 - 3}}}} {\text{ - 4 }} \to {\text{ }}\underline{{{\text{1 - 2 - 3}}}} {\text{ - 4 - }}\underline{{{\text{5 - 6 - 7}}}} $$

The above pattern—internal duplication of three TM domains plus the addition of the fourth domain—was first proposed for the origin of type I opsins (Taylor and Agarwal 1993). This proposal led to some controversy in the literature. For example, some researchers maintained that type I bacteriorhodopsins did not show strong evidence for the single intragenic duplication event (Kuan and Saier 1994; Soppa 1994). More recently, Ihara et al. (1999) clearly established statistically that a type I opsin found in archaea originated from a duplication event. Furthermore, Zhai et al. (2001) have shown that lysosomal cysteine transporter (LCT), a 7-TM protein found in humans, not only is homologous to type I opsins but also evolved from an intragenic duplication event. Given the more recent work on type I opsins, it appears that the consensus is converging to support an intragenic duplication event giving rise to the family.

Compared to these studies on type I opsin and other 7-TM protein origins, almost no study has investigated a similar hypothesis for the origins of type II opsins. One exception is that based on the knowledge about opsin phylogeny at the time, Taylor and Agarwal (1993) assumed that type I and type II opsins were homologous and, therefore, indicated that the type I duplication pattern should be visible in type II opsins as well. But since that proposal, no one has explicitly tested the hypothesis. Here, we test the hypothesis of internal domain duplication at the origin of type II opsins. If the pattern of similarity of the first three and last three TM domains is found in type II opsins, as it is for type I opsins, then this result would be consistent with homology of opsins. Type II opsins are homologous with many G-protein coupled receptors (GPCRs), which implies that type II opsins share their origins with those GPCRs. As such, homology of type I and type II opsins would imply homology of type I opsins and GPCRs, logic that has been used to argue that type I opsins are a good model for GPCR function. Conversely, since an internal domain duplication event is suspected to have given rise to type I opsins, if it is not detected in type II opsins, this could provide further evidence for convergent evolution of opsins. Either scenario presents interesting implications for the evolution of opsins and, more generally, GPCRs.

Here, we test for a duplicative origin of type II opsins using a global alignment scheme tailored specifically for intragenic duplication and using a large set of type II opsins. Although our method supports internal duplication in other 7-TM proteins, including type I opsins, we find no support for the same pattern of similarity in type II opsins. These results provide another line of argument for the convergent evolutionary origins of prokaryotic and animal opsin genes.

Methods

Transmembrane Segment Prediction

The protein sequences included in our data set were downloaded from GenBank (Benson et al. 2005) using BioPerl (Stajich et al. 2002). The exact set of Genbank accession numbers and source code used in our experiments are both available for download from http://www.cs.ucsb.edu/∼nlarusso/research/opsins. TMSs of each protein were extracted with ConPred II (Arai et al. 2004), a TM topology prediction tool which uses a majority-voting approach by combining nine other methods: KKD (Klein et al. 1985), TMPred (Hofmann and Stoffel 1993), TopPred II (Claros and Heijne 1994), DAS (Cserzo et al. 1997), TMAP (Persson and Argos 1997), MEMSAT 1.8 (Jones et al. 1994), SOSUI (Hirokawa et al. 1998), TMHMM 2.0 (Krogh et al. 2001), and HMMTOP 2.0 (Tusnady and Simon 1998). According to the authors, ConPred II has the highest accuracy (69.9) for predicting both the number and the location of the TMSs. The sequences were first evaluated by DetecSig (Lao and Shimizu 2001) to remove any signal peptides prior to extracting TMSs because these sequences are also highly hydrophobic and have been shown to affect the quality TM topology predictions (Lao et al. 2002). From our initial set of proteins, we discarded any that ConPred II did not predict to have exactly seven-TMSs. The resulting data set consisted of 801 type II opsins, from a diverse set of metazoans.

To validate our test for domain duplication, we created a control set of several 7-TM proteins that have been shown previously to exhibit the intragenic duplication pattern of interest. Several 7-TM proteins from Zhai et al. (2001), Ihara et al. (1999), and Shimizu et al. (2004) were added to our dataset and subjected to the same experimental process as all 801 type II opsins. We were not able to test some of the proteins mentioned in the previous works, as DetecSig determined that the first TM segment was actually a signal peptide, so the proteins’ predicted topologies had only six TMSs. The accession numbers of the control group are BAA76589, BAC53250, CAA21219, CAD14038, O93740, P02945, P25619, and Q53496.

Sequence Alignment

For each protein, we ran three sets of experiments to check for intragenic similarity. The first set compared partial sequences (PSs) containing three TMSs along with the sequences of hydrophilic loops connecting them. This set consists of only two alignments: 1-2-3 vs. 5-6-7 and 1-2-3 vs. 4-5-6.

Additionally, we compared PSs containing pairs of TMSs, again including the hydrophilic loops connecting them. These experiments were limited to compare only those TMSs with similar helix orientation within the cell membrane, such that the sequences in both TMSs were ordered from inside to outside the cell, or vice versa. This set resulted in a total of six alignments: 1-2 vs. 3-4, 1-2 vs. 5-6, 2-3 vs. 4-5, 2-3 vs. 6-7, 3-4 vs. 5-6, and 4-5 vs. 6-7. Finally, we compared each individual TMS with every other TMS in the protein for a total of (7 choose 2) = \(7!/(2! \times (7-2)!)=21\) alignments. These three tests were chosen because they cover all of the TM protein topology evolution schemes mentioned by Shimizu et al. (2004) and Taylor and Agarwal (1993).

Sequence Alignment Algorithm

PSs were aligned by regions such that each TMS was restricted to align only to another TMS using a customized version of the Needleman–Wunsch (1970) algorithm for global sequence alignment. Consequently, each hydrophilic loop was also aligned only to another loop. For example, when aligning 1-2 vs. 5-6, TMS-1 could only align to TMS-5, TMS-2 could only align with TMS-6, and the hydrophilic regions could only align with each other. This alignment restriction guaranteed that TMSs and hydrophilic loops did not overlap, confirming that the resulting alignment score was measuring similarities between corresponding regions only. The N- and C-terminals were discarded for these alignments because they are not well conserved. The additional parameters used were as follows: gap start penalty of −11, gap extend penalty of −1, and the BLOSUM45 similarity matrix.

Significance of Sequence Similarity

The significance of the sequence similarity was measured by generating null distributions from randomly shuffled sequence regions. Each protein was broken up into 13 segments, one for each TMS and hydrophilic looping region. The amino acids within each region were then shuffled among themselves, ensuring the resulting random sequence preserved the characteristics of a 7-TM protein (i.e., segments of hydrophobic amino acids corresponding to the TMSs). One hundred random sequences were generated for each protein using this method.

Each comparison of a given protein in the dataset was first aligned to its corresponding intraprotein partial sequence (1-2-3 against 5-6-7, etc.), then against the set of randomly generated partial sequences. Since these distributions do not appear to be normal, we determined the P-value of the alignment by ranking the score from the actual protein alignment among the scores from the random set. The P-value describes the rank of the observed intragenic similarity score of a given comparison relative to the distribution of randomized sequences.

Results

We ran our sequence alignment on 801 type II opsins, representing a broad range of metazoans. For each alignment, the average and standard deviation of the P-value were computed across all proteins. The results of the control group composed of type I opsins, are shown in Fig. 1. Additionally, the results for all 29 comparisons of type II opsins are shown in Fig. 2.

Fig. 1
figure 1

The P-value box plot for the control dataset containing type I opsins that were previously found to exhibit intragenic duplication. The middle point of each comparison (triangle) shows the mean P-value, where the remaining markers represent the 75th and 25th quartiles and the minimum and maximum P-values

Fig. 2
figure 2

The P-value box plot over the entire type II opsin dataset. The middle point of each comparison (triangle) shows the mean P-value, where the remaining markers represent the 75th and 25th quartiles and the minimum and maximum P-values

We did find some proteins with a consistently low P-value for several comparisons, but these were a low proportion of our dataset (only 9% had a P-value <0.05).

For type II opsins, the 1-2-3 → 5-6-7 duplication pattern, originally proposed by Taylor and Agarwal (1993), had an average P-value of 0.38, one of the lowest averages over the entire dataset, yet considerably higher than the standard 0.05 significance level. This weak signal was also observable in the related subcomparisons between TMSs 2-3 vs. 6-7 and TMSs 1-2 vs. 5-6. The average of the 1-2-3 vs. 4-5-6 comparison had the highest average P-value, at 0.72.

Similarity and identity comparisons also yielded little evidence for an internal duplication event at the origin of type II opsins. In general, all the comparisons’ percentage similarity and identity were statistically indistinguishable from those of the random alignments. In a previous study of TM duplication by Shimizu et al. (2004), conclusions of intragenic duplication were based on at least 25% identity for the aligned TMSs. In our analysis of type II opsins, we found very few comparisons with similarly high averages. For those comparisons that did show high percentage identities, we saw similar patterns in the randomized sets as well, and thus most values were not significant in comparison.

In further attempts to identify any duplication patterns (in addition to the 1-2-3 → 5-6-7 pattern previously mentioned), we examined how the different partial sequence comparisons ranked relative to each other on an individual protein basis. For each opsin analyzed, we ordered the scores of each PS comparison. For example, the protein AHH68096 (medium wave cone opsin, Danio rerio) had the ordering of its PS comparisons reported in Table 1.

Table 1 Example ranking of the partial sequence (PS) comparisons for protein AHH68096

We assembled the average position of each PS comparison in a protein to ascertain if a particular pattern seemed to be favored among all type II opsins. This analysis differs from that of the previous comparison of P-values in that it is a relative rank within a protein. This measurement is independent of the evolution rates of the group of proteins. The averages of these positions are shown in Fig. 3. From the figure it is evident that the comparisons associated with the 1-2-3 → 5-6-7 duplication pattern (e.g., 1-2 vs. 5-6, 2-3 vs. 6-7) do have slightly lower means, but not significantly more than what would be expected from a random distribution of rankings.

Fig. 3
figure 3

The ranges of the relative rankings for the entire dataset. The middle point of each comparison (diamond) shows the mean ranking, where the square and triangle represent plus and minus one standard deviation of the rankings, respectively

Discussion

Through our experiments, we found no evidence that type II opsins originated by internal domain duplication, consistent with previous claims that type I and type II opsins are not homologous. If all opsins are instead homologous, they, by definition, derive from a common ancestral gene and share a single origination event. Since type I opsins likely originated by TMS duplication (Taylor and Agarwal 1993), we hypothesized that, if homologous, type II opsins should also show evidence for origination by TMS duplication. Although our methodology found support for TMS duplication in a control group that included type I opsins, we found no support for similar duplications in type II opsins.

Other explanations could also account for the lack of similarity between domains within type II opsins. First, our method may not be sensitive enough to detect similarity caused by an ancient duplication event. However, the method was sensitive enough to detect similarity among type I opsin domains, providing further support for their origin by duplication and speaking to the sensitivity of the test to detect domain duplication. One caveat with this argument is that if rates of evolution have been faster in type II compared to type I opsins, the historical signal of duplicative origin could have been erased in type II, but not type I, opsins. Comparing rates of evolution of nonhomologous genes from different kingdoms would be a challenge, requiring estimates of the absolute rates of evolution, which would be dependent on external calibration with fossils or other information about the absolute timing of divergence. Despite these potential caveats, differing evidence for duplicative origins of type I and type II opsins is consistent with other lines of evidence for nonhomology of the two major opsin groups.

Despite functional similarities, various arguments indicate that type I and type II opsins are not homologous, i.e., they are not descended from a common ancestral gene. First, there is no similarity of primary amino acid sequences between the two groups (reviewed by Spudich et al. 2000). The lack of similarity in primary sequence has been explained by some as expected given the large amounts of time separating prokaryotic and eukaryotic opsins. However, time alone cannot account for this lack of similarity, as evidenced by the discovery of type I opsin homologues in eukaryotic epistokonts (the clade including fungi, animals, and single-celled relatives of animals), which shows that primary sequence similarity of type I opsins is retained between type I opsin and some eukaryotic genes (Bieszke et al. 1999). Specifically, sequence analysis of the nop-1 gene in the fungus Neurospora crassa revealed that it also contains seven TMSs and a nonrandom similarity to prokaryotic type I opsins, which has been used to establish homology between nop-1 and archaeal type I opsins (Bieszke et al. 1999). If similarity can be detected between type I opsins and eukaryotic nop-1 genes, then the ancient divergence between prokaryotes and eukaryotes is not an explanation for the lack of similarity between type I and type II opsins. A remaining caveat with lack of similarity of primary amino acid sequence is that type II opsins could have experienced an increased rate of evolution. As discussed above, this hypothesis would be very difficult to test reliably.

A second line of evidence for nonhomology of type I and type II opsins is structural. Crystal structures for type I (Kimura et al. 1997) and type II (Palczewski et al. 2000) opsins have now been solved. Structural comparisons reveal that type II and type I opsins differ significantly in the size and organization of their hydrophilic looping regions as well as the arrangements of their seven TMSs (reviewed by Spudich et al. 2000). A third line of evidence for nonhomology of opsins is the probable origin of type II opsins within animals. Plachetzki et al. (2007) searched genome databases of early-branching animals and found type II opsins in cnidarians, but not the earlier-branching demosponge Amphimedon. Opsins were also not detected in the animal relative Monosiga or multiple fungal genomes. These organisms possess GPCR genes, but none contain signatures of opsin, like the conserved lysine that binds light-reactive retinal. These results indicate that opsin likely originated from a GPCR that gained the ability to bind retinal, an event that probably occurred after the origin of animals (Plachetzki et al. 2007). As such, even if GPCR genes are homologous with bacterial rhodopsins, each gene family gained its interaction with light independently by convergent (or parallel) origins of chromophore-binding ability.

Together, the above arguments indicate that type II opsins evolved separately from type I opsins, indicating an amazing convergent evolution of the two opsin types as suggested by Spudich et al.. (2000). Convergence might imply that the 7-TM structure common to both families of opsins is vital to the sensory functionality of the protein. Given that opsins are found abundantly across the three domains, the sensory functions of opsins appear to be a necessity to many diverse life forms. Despite this growing consensus indicating that the two types of opsins are not homologous (Bieszke et al. 1999 and Spudich et al. 2000), some researchers do not preclude homologous origins, given such strong functional similarities (Deininger et al. 2000; Ebnet et al. 1999).

Even if type I and type II opsins have separate origins, type II opsins and homologus GPCR genes could still have originated through an internal duplication event separate from that of nonhomologous type I opsins. Although we found no statistical evidence, such a hypothesis is still possible. Since GPCR homologues are known from outside animals, an ancient internal duplication event would have occurred at least several hundred million years ago. If rates of evolution are high enough, this is sufficient time for duplicated TMS domains to diverge and erase the historical signal of duplication. Given the constraints of TMS proteins (e.g., amino acids must be hydrophobic), the power of shuffling analyses may be low. Additionally, rates of evolution or time of origin for type I opsins and GPCRs may be different. As discussed above, these hypotheses would be very difficult to test.

Conclusion

In summary, we used customized global alignment schemes to test the hypothesis of an internal domain duplications at the origin of type II opsins. Our methods supported a duplicative origin of type I opsins, consistent with previous studies (Ihara et al. 1999; Zhai et al. 2001). Using these same methods, we find no statistical support for a similar origin of type II opsins. These results are consistent with multiple other lines of evidence that type I and type II opsins represent an amazing convergence. Although we find no signal of duplication of type II opsin membrane components, we cannot formally rule out the possibility that this signal was erased since the time of origin of these opsins.