Introduction

Cytochrome P450s (P450s) are a superfamily of hemoproteins known to metabolize about 75% of marketed human drugs (Guengerich 2008). P450 genes additionally play important roles in the biosynthesis of steroid hormones, oxidation of unsaturated fatty acids and the metabolism of fat-soluble proteins; they are also involved in the activation of procarcinogens like polycyclic aromatic hydrocarbons (PAHs) in tobacco smoke (Hasler 1999). Over 3000 P450 genes have been described in animals, whereas humans posses 57 functional genes (Guengerich 2008). Allelic variants of several human P450 genes underlie the high among individual variation in metabolic conversion and excretion rates of drugs and therefore drug response (Weide and Steijns 1999). As up to 100,000 deaths per year in the USA alone are attributed to adverse responses to drugs, understanding the variable expression of P450 genes in drug response has become a major goal in the burgeoning field of pharmacogenetics (Lazarou et al. 1998; Pirmohamed et al. 2004). The best studied P450 gene is the highly polymorphic CYP2D6, accounting for 25% of the metabolism of commonly used drugs (Ingelman-Sundberg 2005). Described mutations resulting in various changes in enzyme function include whole-gene deletion, mutations leading to altered splicing or frame shifts, single nucleotide polymorphisms and gene duplication (Gaedigk et al. 1991; Johansson et al. 1993; Oscarson et al. 1997; Ingelman-Sundberg 2004). Numerous approaches for genotyping and phenotyping the CYP2D6 enzyme have been developed (Frank et al. 2007; Leandro-Garcia et al. 2009) leading to the description of 78 allelic variants (http://www.cypalleles.ki.se/cyp2d6.htm). In contrast to such detailed studies in humans, little is known about the sequence variation or function of this gene in non-human animals.

Orthologs of the CYP2D6 gene have been characterized in several animals of interest to veterinary medicine, whereby the main focus of such research is expression and substrate specificity (Garfinkel 1958; Ioannides 2006; Trepanier 2006; Seliskar and Rozman 2007). The role of CYP2D6 in domestic cats has attracted attention due to their low levels of high-affinity acetaminophen-UDP-glucuronosyltransferase, which underlies a high risk of acetaminophen intoxication (Court and Greenblatt 1997; Dong et al. 2000). While there is a great deficit in knowledge of non-human P450s, intraspecific polymorphisms have been found (Court et al. 1999; Paulson et al. 1999; Bull et al. 2003; Ioannides 2006), suggesting that the high polymorphism and general importance of this gene family is a prevalent phenomenon among mammals.

If variation in mammalian P450s, and specifically in the CYP2D6 gene, is adaptive, there should be signals of selection across a broader range of taxa than those thus far characterized. To gain perspective on the evolutionary significance of mammalian CYP2D6, we assessed the distribution of synonymous and non-synonymous mutations within and among taxa as well as changes in amino acid (AA) characteristics in selected felids and compared our data to that available from other mammals, including allelic variants from humans.

Materials and Methods

Liver samples of three domestic cats (Felis catus), four cheetahs (Acinonyx jubatus), three bobcats (Lynx rufus), three leopard cats (Prionailurus bengalensis), three cougars (Puma concolor), three lions (Panthera leo), and three Siberian tigers (Panthera tigris altaica), were stored at −70°C or in liquid nitrogen. One additional liver sample from a domestic cat was received from a local veterinarian and stored at 4°C in RNAlater (Ambion, Austin, TX; Table S1).

Primer Design

Primers for Polymerase Chain Reactions (PCR) were designed against the predicted feline CYP2D6-mRNA sequence retrieved from the “Genome Annotation Resource Fields” (GARFIELD) online browser (Pontius and O’Brien 2007) and placed in conserved regions based on an alignment of mRNA sequences from Homo sapiens, Pan troglodytes, Macaca mulatta, Bos taurus, Canis lupus familiaris, and Felis catus), resulting in selected primers FcCYP2D6-190fb (5′-AGCCCAGCACGCACTGAG-3′), FcCYP2D6-4191r (5′-CAGGAGCACAGGGATGGAGTTCAGG-3′), FcCYP2D6-6125f (5′-ATAAAGCTGTGAGCAACG-3′) and FcCYP2D6-8179r (5′-ACTGGTTTATTGCATATCAGG-3′).

RNA Extraction and Reverse Transcription

Total RNA was extracted using RNeasy Mini Kit (Qiagen) or SV Total RNA Isolation System (Promega) according to manufacturer’s guidelines and stored at −60°C. cDNA was prepared using the ImProm-II™ Reverse Transcription System (Promega) and reverse transcription reactions were carried out with both random and oligodT primers using 1–2 μg total RNA per reaction and in a final reaction volume of 40 μl.

Amplification and Sequencing of CYP2D6

A 719-bp long 5′ fragment of the CYP2D6 gene was amplified using primer pair FcCYP2D6-190fb/FcCYP2D6-4191r and a 1020-bp long 3′ fragment was amplified using primer pair FcCYP2D6-6125f/FcCYP2D6-8179r. Each 25 μl reaction mix consisted of 4 μl of cDNA, 25 pmol of both forward and reverse primer, 5 μl GC buffer (5×), 0.75 μl DMSO, 0.25 mM dNTP mix, 0.5 U Phusion™ DNA Polymerase (New England Biolabs) and H2O or 2 μl AmpliTaq® 360 Buffer, 10×, 1 mM MgCl2, 1 mM dNTP mix, 0.75 U AmpliTaq® 360 DNA Polymerase (Applied Biosystems), and H2O. PCR conditions were as follows: denaturation of 98°C for 30 s, followed by 35 cycles at 98°C for 10 s, 60°C for 30 s and 72°C for 35 s, and final extension at 72°C for 10 min for the 5′ fragment and identical conditions with an annealing temperature of 55°C for the 3′ fragment. If necessary, PCR products were excised—from an agarose gel and purified using the Wizard® SV Gel and PCR Clean-Up System (Promega).

PCR products (20–50 ng) were sequenced in both directions (Big Dye, ABI) on an ABI 3130xl Genetic Analyzer and manually checked for heterozygote positions. Homology of the 5′ and 3′ PCR product was verified by a 100% sequence match of the 133-bp long overlapping region.

CYP2D6 Sequences Obtained from Public Databases

Complete mRNA sequences of putative CYP2D6 orthologs of 11 mammals, including 33 cDNA sequences of functional human CYP2D6 alleles were downloaded from GenBank. An mRNA sequence of the putative feline CYP2D6 ortholog was obtained by searching the GARFIELD web browser generated by the Laboratory of Genomic Diversity and Advanced Biomedical Computing Center (Pontius and O’Brien 2007, Table 1).

Table 1 GenBank accession numbers of gene sequences used in this study

ND5, RAG2, GNAZ, and ATP7A Sequences

Sequences of three nuclear protein coding genes, recombination activating gene 2 (RAG2), guanine nucleotide binding protein, alpha z subunit (GNAZ) and ATPase, copper-transporting, alpha polypeptide (ATP7A), as well as of the mitochondrial NADH dehydrogenase subunit 5 (ND5) gene were either taken from Johnson et al. (2006) or downloaded from GenBank (Table 1). To the best of our knowledge these genes are either neutrally evolving or at least do not show signs of being under strong positive selection.

The nuclear genes were chosen based on sequence availability for the taxa under study and to provide a baseline for among taxa comparisons of dN/dS ratios. The dN/dS ratio is the ratio between the rate of nonsynonymous (dN) substitutions and the rate of synonymous (dS) substitutions in a coding sequence (Nei and Gojobori 1986; Yang and Nielsen 2000). Alignments were done using the ClustalW algorithm in MEGA 4.1 software (Tamura et al. 2007).

Data Analysis

Pairwise p distances were calculated in MEGA 4.1 software and Maximum Likelihood (ML) distances in PAUP 4.0 (Swofford 2002) using substitution models chosen using MODELTEST 3.7 (Posada and Crandall 1998) via a Hierarchical Likelihood Ratio Test. Models were calculated separately for human alleles and the remaining sequences (including one human reference sequence). Sliding windows analysis was accomplished using DnaSP 5.10 (Librado and Rozas 2009) with 100 bp windows and 25 bp steps. The number of segregating sites (s values) were calculated for the global data set as well as subsets defined as “felids”, “human alleles”, and “primates without humans”. Putative secondary structure for all the sequences was predicted using the online tool GOR IV (Garnier et al. 1996) and a “consensus-secondary structure” was created by selecting the most frequent predicted secondary structure at each amino acid position. For each secondary structure type (alpha-helix, extended strand, and random coil) mean s values per site (standardized by dividing by the standard deviation of each subset after subtracting the arithmetic mean) were calculated. The numbers of synonymous and non-synonymous substitutions were calculated in 99-bp step windows. For both felids and humans, for which intraspecific data was available, synonymous and nonsynonymous substitutions were indentified and mapped on a phylogenetic tree (from Johnson et al. 2006). These substitutions where then classified as either polymorphic (within) or fixed (between) species and placed in a 2 × 2 contingency table. The data were then subject to a McDonald–Kreitman test with a William’s correction (McDonald and Kreitman 1991).

Pairwise dN/dS ratios of nuclear genes were calculated in DnaSP following Nei and Gojobori (1986) with a Jukes Cantor correction and plotted against pairwise distances between taxa drawn from the ND5 sequence data.

For the Felidae subset as well as for a “large dataset” (whole dataset but including only one reference sequence for humans), codon-specific dN/dS ratios were calculated with a ML approach using multiple models (M0: one ratio, M3: discrete, M7: beta and M8: beta and ω) implemented in the program CODEML of the PAML software package (Yang 2007). The tree topology for the PAML input file was based on combining the felid phylogeny from Johnson et al. (2006) with our own UPGMA-tree topology for the remaining taxa. Likelihood ratio tests (LRTs) were conducted by comparing likelihood values of two site-model pairs: M0 versus M3 and M8a versus M8. M3 and M8 are more general models than M0 and M8a, both allowing the presence of positive selection. Twice the log-likelihood differences of the model pairs have a χ2 distribution and significant P values indicate the presence of positive selection. The M0 versus M3 comparison can also be considered as a test for a varying dN/dS ratios among codon sites (Yang et al. 2000a, b; Swanson et al. 2003). In addition, a LRT based on branch models (one ratio vs. free ratio model, Yang 1998), testing for varying dN/dS ratios among lineages, as well as the branch-site test of positive selection (Zhang et al. 2005), testing for positive selection on specified “foreground” branches within the tree (in our case, felids) were conducted. The degree of changes in amino acid properties indicating positive selection was calculated using the software TreeSAAP (Woolley et al. 2003). Out of the 31 amino acid properties implemented in this program, those putatively indicating positive selection were selected and a sliding window analysis involving the selected properties was conducted to detect regions under positive selection. Magnitudes of property changes were divided into eight categories and, to reduce false positives, only highly significant (P < 0.001) changes within categories 6–8 were considered as radical (McClellan et al. 2005).

Results

A total of 1506 base pairs were aligned in 67 sequences (Fig. S1; GenBank Acc. No.: Table 1). There were 676 segregating sites, 487 of which were parsimony informative resulting from 931 substitutions and 15 insertion/deletion polymorphisms. All primates carried a 9-bp deletion at position 10 resulting in three amino acids less than other species. Two Siberian tigers (PTI-72, 73) showed a 3 bp insertion (TAC) at positions 331–333 and therefore an additional tyrosine in the protein. A 3 bp deletion in the human allele CYP2D6*9 occurred at position 853–855. Overall mean pairwise p distances were 0.108 (±0.087) and 0.002 (±0.001) for inter- and intra-specific comparisons, respectively (Table S2).

Predicted amino acids were also aligned (Fig. S2); 257 of the 501 sites were variable and 187 of these were parsimony informative. The P450 signature motif was located at sites 440–449. Of the four amino acid residues involved in substrate binding (Rowland et al. 2006), only one was conserved. Little saturation was seen at first and third codon sites, whereas no saturation was observed with respect to transversions and substitutions at second codon positions.

Sliding Window Analysis

Sliding window analysis revealed a high heterogeneity in sequence variation across the CYP2D6 gene. The highest variability was found between positions 526 and 875 with a peak around position 650 (Fig. 1), whereas felid sequences displayed a low anomaly at this position, as well as some increase in variation near the 3′ end (i.e., pos. 1251–1450). Primate sequences showed an additional signal of high variation around position 352. The most conserved region was between positions 1251 and 1450 encompassing the cysteine pocket; however, felids showed additional variation in this region. Relative to interspecific comparisons, human alleles showed little variation throughout the gene, but nonetheless showed their regions of highest s values between positions 250–500 and 750–1100.

Fig. 1
figure 1

Results of sliding window analysis for various data subsets (window size: 100 bp; step size 25 bp): Black bars on top indicate substrate recognition sites described by Gotoh (1992). The membrane anchor (striped) and the proline-rich hinge region (checkered) described by Yamazaki et al. (1993) as well as the cysteine pocket (white bar; Modi et al. 1996) are also shown. Asterisks indicate amino acids involved in substrate binding (Rowland et al. 2006)

dN/dS Ratios

The mean dN/dS ratio of CYP2D6 (0.565 ± 0.484) was higher than the other nuclear genes analyzed (i.e., RAG2, 0.127 ± 0.085; GNAZ, 0.008 ± 0.112; ATP7A, 0.317 ± 0.298). Intraspecific CYP2D6 dN/dS ratios were significantly higher (0.730 ± 0.663) than interspecific ratios (0.440 ± 0.253) (Mann–Whitney U; P < 0.0001). The distribution of dN/dS ratios showed a marked peak within Felidae, followed by a sharp reduction and gradual negative correlation with increasing evolutionary distance (Fig. 2). The highest interspecific dN/dS ratios were all within Felidae. Ratios >1 involved tigers and lions, as well as puma and leopard cats, reflecting evolutionary distances of 3.72 and 6.7 Ma, respectively (Johnson et al. 2006).

Fig. 2
figure 2

Bi-variate plot of pairwise dN/dS ratios of CYP2D6 (open circle), RAG2 (open triangle), GNAZ (plus) and ATP7A (cross) against pairwise p distances of the mtDNA NADH-5 gene. Divergence times are from Johnson et al. (2006) and Springer et al. (2003)

The number of synonymous substitutions exceeded the number of nonsynonymous substitutions for the gene as a whole. Across gene regions, only the first 99-bp window had more nonsynonymous (18) than synonymous substitutions (13) (Fig. 3). Primates showed the same pattern with five nonsynonymous and four synonymous substitutions in the first window. Within felids, 11 out of the 16 windows displayed more nonsynonymous than synonymous substitutions. Because of the low sequence variation in human alleles, no conclusion about the distribution of synonymous and nonsynonymous substitutions throughout the gene could be drawn. There were no significant differences in the number of segregating sites with respect to the three specified secondary structures of the gene.

Fig. 3
figure 3

Distribution of synonymous and nonsynonymous substitutions throughout CYP2D6 in various data subsets. Open bars represent synonymous substitutions, filled bars represent nonsynonymous substitutions

The McDonald–Kreitman test revealed no statistically significant differences between the ratios of synonymous and nonsynonymous substitutions within- and between species (data not shown). PAML Likelihood ratio tests indicated both varying dN/dS ratios among sites and positive selection within felids (M0 vs. M3: P < 0.001; M8 vs. M8a: P < 0.001), as well as across the entire dataset (M0 vs. M3: P < 0.001; M8a vs. M8: P = 0.0076). Substrate recognition sites and the transmembrane anchor included multiple codons with high dN/dS ratios, whereas the conserved cysteine pocket and the proline-rich hinge region revealed only one ratio >1 (Fig. 4a). Interestingly, the codon just upstream of the completely conserved cysteine showed a high number of nonsynonymous substitutions. Posterior probabilities, based on Bayes empirical Bayes (BEB) procedure, PAML detected positive selection for 16 codons within Felidae, and 15 overall (Table 2). A LRT of the two different branch models supported varying dN/dS ratios among lineages (Fig. S3). Based on LRT of the branch-site models, there was no statistical significant support for sites under positive selection within Felidae.

Fig. 4
figure 4

Results of PAML and TreeSAAP analyses for “large dataset”: a Mean dN/dS ratio for each codon. Dotted line indicates a ratio of 1. b Sliding window analysis of amino acid properties detected to be under positive selection. Dotted line represents cut-off values for significant (P < 0.001) positive selection. Black bars on top indicate substrate recognition sites described by Gotoh (1992). The membrane anchor (striped) and the proline-rich hinge region (checkered) described by Yamazaki et al. (1993) as well as the cysteine pocket (white bar; Modi et al. 1996) are shown. Asterisks indicate amino acids involved in substrate binding

Table 2 Results of PAML and TreeSAAP analyses

Over the whole dataset, three of 31 analyzed amino acid properties revealed signals of positive selection (P < 0.001): alpha-helical tendencies, equilibrium constant and power to be at the C-terminal. Regions of significant positive selection in each of the three properties were identified by sliding window analysis (Fig. 4b). Codon sites with high dN/dS ratios showed high numbers of radical amino acid property changes (Table 2), compared to the mean across all sites (0.46 ± 0.94). For the Felidae subset, no amino acid property was detected to be under positive destabilizing selection.

Discussion

Our analysis of the CYP2D6 gene revealed higher than average levels of variability, various signs of positive selection and distinct patterns of sequence variation with respect to specific functional domains.

Sliding window analysis clearly showed that the region of the highest sequence variability coincides with predicted substrate recognition sites. These sites have been shown to exhibit the highest sequence variability within P450s in general (Gotoh 1992; Werck-Reichhart and Feyereisen 2000). The region of lowest sequence variability is located around the heme-binding cysteine (Rowland et al. 2006). The conservation of this region and especially the cysteine AA itself is of great importance because it provides the thiolate ligand to bind the prosthetic group (Gotoh et al. 1983).

More detailed support for positive selection is gained through the dN/dS ratio analysis.

The mean dN/dS ratio across CYP2D6 (0.565) was substantially higher than the four other nuclear genes tested, and considerably higher than averages reported in multiple studies of mammalian genes. Wang et al. (2007) reported a mean dN/dS ratio between human and mouse of 0.138 (12,052 genes) and between human and old world monkeys of 0.177 (14,204 genes). Zhang (2000) calculated mean dN/dS ratios of 47 nuclear genes among primates (0.282), artiodactyls (0.274) and rodents (0.191). A study on 240 human–mouse orthologs, expressed specifically in liver, yielded an average dN/dS ratio of 0.233 (Zhang and Li 2004). CYP2D6, also a liver-expressed gene, exhibited a dN/dS ratio of 0.257 for the human–mouse comparison.

While some of our analytical results may suggest or support positive selection, the non-monotonic distribution of dN/dS values with respect to evolutionary distance (Fig. 2) might also be a sign of initial retention and eventual loss of slightly deleterious mutations (Balbi and Feil 2007). However, such patterns across evolutionary time frames of several million years do not correspond with literature examples of this phenomena, which involve comparisons within or among closely related species (Yang et al. 2000a, b; Rocha et al. 2006). While some effect from small sample bias cannot be excluded, an alternate explanation may relate to the fact that slightly deleterious mutations are not removed as effectively in species with small populations (Hughes and Friedman 2009). Three of the studied felids are listed as vulnerable or endangered by IUCN and felids in general have relatively low effective population sizes. But our comparisons are interspecific so it is not clear how large an effect small effective population sizes have on our data.

The distribution of codon-specific dN/dS ratios across the gene was not homogeneous as higher ratios were primarily located near substrate recognition sites or the membrane anchor. Interestingly, the distribution of radical changes in amino acid properties did not coincide with this pattern as such changes were not found at substrate recognition sites, nor were relevant within Felidae. We speculate that AA changes (or variability in general) at substrate recognition or member anchor sites could be of adaptive importance, but most likely AA changes do not involve “radical” property changes because such sites must maintain their basic properties to be functional. Thus, the approach developed with the program TreeSAAP, aimed at revealing radical AA changes that might be associated with adaptive significance may not have the same application potential across all types of genes. Such changes may signal major functional shifts in a gene, but not minor adjustments in substrate specificity.

Analysis within Felidae revealed a different pattern than that seen for the global data set in that the region around the heme pocket was not as conserved as within humans or primates, perhaps indicating relaxed selection pressure on this region, relative to other mammalian orders. In addition, the variability peak associated with substrate recognition sites was not as pronounced as for the global data set. This result may relate to the fact that felids are strict carnivores and thus may be less exposed to plant toxins, long thought to be the most common P450 substrates.

Surprisingly, sequence variation of human alleles was very low despite the gene’s high expression variability. While we only analyzed putatively functional alleles, several of them are described to result in decreased enzyme function. Moreover, a number of reported allelic variants are not functional. Thus, some sequence variability results in reduced or disrupted function, whereby variable copy number is most likely responsible for increased functionality. The prevalence of variable copy number or enzyme dysfunction of CYP2D6 in non-humans is not known.

Another very important factor influencing the CYP2D6 enzyme and its gene might be diet. So far, no clear pattern of P450 expression between species of particular dietary type, such as herbivores or carnivores has been found (Fink-Gremmels 2008), but several authors report dietary effects on P450 enzyme expression. For example, rabbits showed an increase in mRNA expression of certain P450s corresponding to weaning time (Pineau et al. 1991). Variation of quality and quantity of particular micro- and macro-nutrients in the diet has also been reported to influence P450 expression (Ioannides 1999 and references therein). However, it should be noted that those studies only focused on the expression levels. Our data encompassed sequences of species representing different dietary types (e.g., carnivorous felids, herbivorous horse and cattle and omnivorous rat, mouse, and humans). Out of the four identified substrate binding sites of CYP2D6, only one was conserved and two revealed AA changes correlated with diet. Position 124 revealed a common phenylalanine across all carnivores and primates, whereas herbivores had either a valine or isoleucine; position 220 revealed a common aspartic acid across all carnivores whereas herbivores and omnivores had either glutamine or glutamine acid.

We conclude that the CYP2D6 gene in non-human mammals exhibits considerable variation that may reflect some adaptive significance and should be considered a candidate gene in understanding dietary adaptation. Such studies may be particularly relevant in domesticated animals. Domesticated species tend to have elevated dN/dS ratios as compared to their wild relatives due to reduced selection pressure, altered breeding patterns and reduced effective population sizes (Cruz et al. 2008; MacEachern et al. 2009). Enzymatic studies on domestic cats report variable CYP2D activities, at least between males and females (Shah et al. 2007). Thus, there is some potential to pursue our general overview of felids with a more extensive intra-specific study of domestic cats. Whether based on copy number or sequence variation, a more thorough consideration of CYP2D6 function in domestic or veterinary animals may uncover breed-specific differences that can support more effective drug therapy.