Introduction

Cytochrome P450 (CYP) enzymes are an ancient superfamily of monooxygenase proteins found in all domains of life (Nelson et al. 1993). The CYP superfamily has a total of 977 families of which 69 are present in animals (Nelson 2009). Most vertebrate species have approximately 50–100 CYP genes. Similar CYP families are found across vertebrate species. In rare cases, a CYP family will be found in one vertebrate class but not others; for example, only one family, CYP39, was found in mammals but not in fish (Nelson 2003). The CYP2 family is the largest and most diverse of the vertebrate CYPs (Nelson 2003; Nelson et al. 2004). The number of genes per CYP2 subfamily is variable and can be quite large in some species, particularly mouse, rat, and zebrafish. In addition, the CYP2 gene subfamilies are large in number and largely taxonomically distinct (Nelson 2003). Mammalian CYP2s are responsible for the metabolism of structurally diverse drugs, steroids, and carcinogens. In essence, CYP2s play a significant role in the metabolism of a variety of exogenous and endogenous compounds (Lee et al. 2008; Lee 2008; Wang and Tompkins 2008). Due to their catalytic importance in Phase 1 oxidation of human pharmaceuticals, CYP2 research has been dominated by studies in mammalian model species and is fundamentally lacking in other vertebrate taxa (Lewis 2002). Mammalian CYP2C and CYP2D genes have attracted the most attention since they metabolize a wide array of metabolic substrates, including many clinically significant pharmaceuticals (e.g., s-warfarin and fluoxetine, respectively).

There are ~12,000 known CYP genes (Nelson 2009) found in all domains of life, and discovery of new CYP sequences is expected due to ongoing genome sequencing of diverse species. Maintaining a clear nomenclature system is essential for gene annotation purposes and to prevent confusion across studies (Nelson 2009; Nelson et al. 2004). The current nomenclature system is based on a hierarchical clustering of genes into families (>40% amino acid identity) and subfamilies (>55% amino acid identity; Nelson 2003). CYP nomenclature denotes families by number and subfamilies by letter; genes are numbered in order of discovery. For example, the CYP2D6 gene was the sixth gene identified within the ‘D’ subfamily of the second family. When sequences cannot be clearly placed in a family or subfamily, phylogenetics may assist with nomenclature decisions (Nelson et al. 2004). In general, phylogenetic reconstructions support CYP nomenclature in vertebrates.

The CYP2 family includes 29 subfamilies across vertebrate species but only a few subfamilies are found in multiple vertebrate taxonomic classes. Comparison of human and fugu (Takifugu rubripes) CYP2 subfamilies found only the CYP2R and CYP2U subfamilies shared amongst a total of 19 CYP2 subfamilies (Nelson 2003). Comparison of CYP2 subfamilies in vertebrate genomes suggests there are lineage-specific CYP2 subfamilies in mammals (CYP 2A, 2B, 2C, 2E, 2F, 2G, 2S, 2W), fish (CYP 2K, 2M, 2N, 2P, 2V, 2X, 2Y, 2Z, 2AA, 2AD, 2AE), birds (CYP2H), and amphibians (CYP2Q; Table 1). The CYP2U and CYP2R genes were present in the vertebrate ancestor and are shared across all vertebrate classes (Nelson 2003). Other CYP2 subfamilies are found in multiple, but not all, vertebrate lineages; CYP2D is found in the mammalian, amphibian, and avian lineages (Nelson 2009) and CYP2J has been identified in both mammals and amphibians (Nelson 2009). Neither CYP2D nor CYP2J genes are found in actinopterygian (ray-finned fish) species. The CYP2 family has been primarily studied in vertebrate species. However, there are representatives of this family in arthropods and crustacean species, such as CYP2L in lobster (Nelson 1998; Rewitz et al. 2006), and a significant number of CYP2-like genes were found in the sea urchin and Ciona genomes (73 and 40+ genes, respectively; Goldstone et al. 2006). Investigating the relationship between vertebrate CYP2 subfamilies is crucial for understanding the evolutionary history of this diverse gene family and may provide clues to the function of novel CYP2 enzymes in non-mammalian lineages.

Table 1 Lineage specificity of the vertebrate CYP2 subfamilies

X-ray crystallography analyses of a Pseudomonas putida P450 protein identified six putative substrate binding sites (SRS; Poulos et al. 1987), regions where amino acids were in close proximity to the substrate and potentially important in substrate recognition and/or binding. Gotoh (1992) aligned mammalian CYP (1, 2, and 3) sequences and determined that regions with high amino acid substitution rates correlated with the P. putida binding regions. Gotoh (1992) suggested that these regions were a conserved feature of CYP proteins, that these were substrate recognition sites and that the variable sequence was responsible for the diversity of substrates across the CYP proteins.

Analyses of functional divergence have been completed in a phylogenetic context for both CYP1 (Goldstone et al. 2007) and CYP3A proteins (McArthur et al. 2003). These studies have identified a correlation between regions of high rates of divergence and SRSs (Goldstone et al. 2007; McArthur et al. 2003). The analysis of CYP3A genes revealed some SRSs that correlated with regions of high sequence divergence; however, others were either in regions which presented alignment difficulty or had low sequence divergence (McArthur et al. 2003), suggesting that not all of the original six SRSs are supported in more robust analyses. Even in cases where the SRS regions were correlated with high sequence divergence, the size and boundaries of the SRSs were not strongly conserved (McArthur et al. 2003).

The overall evolutionary relationships of the vertebrate CYP2 subfamilies are poorly understood, which is why the present phylogenetic reconstruction included CYP2 sequences from the major vertebrate lineages: Aves, Amphibia, Actinopterygii, and Mammalia. Species with completed genomes were targeted to ensure coverage of the complete CYP2 complement from multiple species within Mammalia and Actinopterygii. Genome annotation of CYP2 sequences was completed for four species and combined with BLAST searching to amass 196 sequences for phylogenetic reconstruction. CYP2 phylogenetic analyses were performed with maximum likelihood and Bayesian inference to determine the relationships of 24 CYP2 vertebrate subfamilies. Finally, DIVERGE analyses were completed for prediction of type I and type II functional divergence (Gu 2006). Type I functional analysis examined site-specific changes in evolutionary rates for 17 subfamilies and was utilized to predict regions with possible functional divergence within CYP2 proteins. Type II functional analysis detected sites with radical biochemical changes in sister CYP2 subfamilies, with a particular focus on regions of the protein that form the active site.

Methods

CYP 2 Gene Sequences

CYP2 sequences were acquired via three methods: the Cytochrome P450 homepage (Nelson 2009) BLAST server (http://drnelson.utmem.edu/CytochromeP450.html), de novo sequence prediction, and from sequence databases (NCBI Genbank and Ensembl; Table 2). The complete CYP2 gene complement had been previously identified in fugu (Takifugu rubripes; Nelson 2003; Nelson 2009), zebrafish (Danio rerio; Nelson 2009), and human (Nelson 2009; Nelson et al. 2004) and these sequences were retrieved from the Cytochrome P450 homepage BLAST server. These sequences represent fully curated and annotated sequences with nomenclature approved by the P450 nomenclature committee. De novo CYP2 sequences were predicted and annotated for the three-spined stickleback (Gasterosteus aculeatus), medaka (Oryzias latipes), and dog (Canis familiaris) genomes (Supplementary Table 2). CYP2 sequences from fugu were the primary basis of annotation of stickleback (genome assembly version 1; V1) and medaka (V1). Additional sequences from zebrafish CYP2 subfamilies (CYP 2V, 2AA, 2AD, and 2AE) were used in genome searching and annotation because these subfamilies were absent in fugu (Nelson 2003; Nelson 2009). CYP2 sequences for the dog (V2.1) genome were predicted de novo using human CYP2 sequences as a reference. CYP2 sequences from the cow (Bos taurus) genome were available on the Cytochrome P450 homepage BLAST server and were used in searching the V4 genome assembly. Lastly, human CYP2 sequences were utilized in exhaustive and extensive BLAST searches (primarily GenBank; limited searching in Ensembl) to identify additional CYP2 sequences in a number of vertebrate species: rabbit (Oryctolagus cuniculus), koala (Phascolarctos cinereus), opossum (Monodelphis domestica), platypus (Ornithorhynchus anatinus), chicken (Gallus gallus), frog (Xenopus tropicalis and X. laevis), rainbow trout (Oncorhynchus mykiss), and Atlantic salmon (Salmo salar).

Table 2 CYP2 sequences by source and species

The annotation approach for CYP2 sequences was the same for all de novo sequences. Existing sequences (fugu, zebrafish, and human) were utilized in BLAST searches and genome regions with high identity (>60%) were retrieved. Based on previous studies of mammalian CYP2 sequences (Nelson 2003; Nelson et al. 2004) and our own analyses of fugu and zebrafish CYP2 genes (data not shown), CYP2s are composed of nine exons, with the exception of CYP2R and CYP2U genes which have five exons per gene (Rogers and Wall 1980). Exon size is similar for all CYP2 exons in mammalian (Nelson et al. 2004) and actinopterygian species (data not shown), but intron size can vary significantly (Nelson 2003). All vertebrate CYP coding transcripts are approximately 1,500 base pairs in length. BLAST searches with high sequence similarity and nine (or five if CYP2U and CYP2R) exons found in appropriate order (i.e., exon 1 must be adjacent to exon 2, etc.) were retrieved. These nucleotide sequences were imported into MacClade 4.0 (Maddison and Maddison 2000) for annotation of exon and intron boundaries, based on derived eukaryotic consensus splice sites (Nelson et al. 2004; Rogers and Wall 1980). Putative CYP2 sequences were assessed for accurate splice site boundaries via BLAST 2 Sequence server (i.e., dog CYP2F vs. human CYP2F). For quality assurance, all genes were assessed for the appropriate number of exons, correct exon order, similar exon sizes, a coding transcript length of ~1500 bp and the presence of start and stop codons. For all de novo sequences, RefSeq and EST data were used, where possible, to support the annotated sequences. Putative subfamily assignments were derived based on a sequence identity matrix generated via a CLUSTALX (Thompson et al. 1997) algorithm in BioEdit (Hall 1999). All de novo annotations were submitted to the P450 nomenclature committee for sequence verification and naming.

CYP2 Gene Alignment

All retrieved and annotated CYP2 amino acid sequences were aligned using CLUSTALX (Thompson et al. 1997). Ciona intestinalis sequences (CYP2J30, CYP2N, CYP2U1) were used as an outgroup based on the phylogenies of Goldstone et al. (2007). Manual adjustments to the alignment (Supplementary nexus file) were performed using MacClade 4.08 (Maddison and Maddison 2000) and Mesquite 2.0 (Maddison and Maddison 2009). Sequences that presented difficulties within the alignment were removed. For example, sequences with poor alignment (i.e. <40% amino acid identity to other CYP2s, large gaps in data within the trimmed alignment, or poor alignment in regions of strong homology), partial sequences (i.e. those with less than 450 amino acids or had incorrect number of exons), and identical sequences with different names were removed. Alignment regions with gaps and uncertain homology were excluded (masked) from the final phylogenetic analyses. The CYP2 family phylogenetic reconstructions were based on 196 aligned sequences with a total length of 326 amino acids (Supplementary nexus file).

Phylogenetic Analyses

Phylogenetic reconstruction of CYP2 evolutionary history was performed by maximum likelihood and Bayesian inference, both using the JTT + I + Γ (Jones et al. 1992) substitution model with unequal amino acid frequencies determined by ProtTest (Abascal et al. 2005). The maximum likelihood analyses were computed on a RAxML (Randomized Accelerated Maximum Likelihood) BlackBox server (Stamatakis et al. 2008) with 100 bootstrapping replicates. Bootstrapping analysis was computed using maximum likelihood with estimated proportion of invariant sites and with indicated outgroup (Ciona) sequences.

Phylogenetic reconstructions with Bayesian inference were completed using the MrBayes computer software program (V3.1.1; Huelsenbeck et al. 2001). MC3 (Metropolis-coupled, Markov chain, Monte Carlo) searches were performed using four incrementally heated chains with distinct random initial trees for 20 million generations, with a sampling frequency every 1,000 generations. Posterior probabilities were estimated after the removal of MC3 burn-in. Posterior probabilities for nodes that appeared in the maximum likelihood tree but were not present in the consensus Bayesian phylogenetic reconstruction were assessed using 19.96 million-sampled topologies from the Bayesian analysis in PAUP 4.0 (Swofford, 2000).

Additional maximum likelihood analyses and Bayesian inferences were completed for all CYP2 sequences in two clusters: the CYP2C-2E cluster and the CYP2J-2P-2N-2AD-2AE-2Z cluster. These two clusters had high branch order complexity and poor support within the full phylogenetic analyses (Supplementary Fig. 1). The analyses for the CYP2C-2E and CYP2J-2P-2N-2AD-2AE-2Z clusters included a more relaxed masking of the alignment, resulting in alignment length increased by 20 and 17 amino acids, respectively. Phylogenetic reconstructions of the modified alignments were conducted via maximum likelihood analyses.

Functional Analysis

To develop a better understanding of CYP2 functional evolution, an analysis of the amino acid alignment in context of the maximum likelihood tree was conducted using DIVERGE (Gu 1999). Type I functional analyses with DIVERGE utilize a phylogenetic tree to assess site-specific changes in evolutionary rates within amino acid alignments when comparing subclades (CYP2 subfamilies in our case). DIVERGE uses the coefficient of evolutionary functional divergence (θ) to measure change in site-specific evolutionary rate: θ = 0 indicates no functional divergence while increasing values indicate increasing functional divergence, with θ = 1 being the maximum. Utilizing the coefficient of evolutionary functional divergence (θ), we tested for significant functional divergence for each of the pairwise comparison of 17 different CYP2 subfamilies (likelihood ratio test (LRT), P < 0.05).

Type II functional analyses with DIVERGE assessed amino acid positions with physico-chemical changes between subfamilies. We focused on radical changes and applied a cut-off value of θ > 1 for site-specific posterior probabilities. Type II functional analyses were completed for sister subfamilies identified in our phylogenetic analyses (CYP 2C/2E, 2AA/2X, 2K/2W, 2F/2A/2B, 2J/2P/2AD/2N/2Z, and 2U/2R). The sequence alignment was reassessed for each subfamily cluster. Regions of uncertain homology were reexamined and a larger number of residues were included in the type II functional divergence analyses because of the higher similarity of sequences within sister subfamilies as compared to the global CYP2 alignment.

Results

CYP2 Annotation and Nomenclature

The CYP2 alignment was composed of 196 sequences of which 58 were de novo gene annotations, representing 24 CYP2 subfamilies (Fig. 1; Table 1). The CYP2 complement for a given vertebrate species ranged from 12 to 20 genes for most species. Some species had large numbers of CYP2 genes (e.g., zebrafish had 44 CYP2 genes; Table 2). There were 27 CYP2 genes for opossum even though this may not be the complete gene complement for this species, since the opossum genome project assembly and annotation were incomplete at the time of this study and we did not exhaustively search the genome. CYP2 gene expansion was not uniform across CYP2 subfamilies but was concentrated in a distinct subset of subfamilies; opossum, for example, had eight genes in the CYP2C subfamily, while zebrafish had 12 genes in the CYP2AA subfamily, nine in the CYP2K subfamily and seven in the CYP2X subfamily.

Fig. 1
figure 1

Vertebrate CYP2 phylogenetic analysis. A circular phylogenetic representation of the vertebrate CYP2 family highlighting lineages contained in each subfamily. The PROTTEST program determined a JTT + I + G (Jones Taylor Thornton + invariant sites + gamma distribution) model for these protein sequences. A maximum likelihood approach (RAxML) was applied to the final alignment of CYP2 sequences to determine the phylogenetic history. Bayesian inference was computed based on the final CYP2 alignment. Major distal nodes are labeled (18) for reference throughout the text and subsequent figures; bootstrapping/posterior probability values were 100/1.00, 98/1.00, 96/1.00, 36/1.00, 99/1.0, 92/1.00, 40/0.95, 91/1.00, for nodes 18, respectively. CYP2AE/2V cluster represents both subfamilies and contain one sequence per subfamily

In nearly all cases, genome annotation efforts identified sequences from most known mammalian or fish CYP2 subfamilies. A CYP2D gene has not been identified, with confidence, in the dog genome via de novo annotation; although there are regions with some high sequence similarity to human CYP2D6, a continuous region containing nine exons was not found in the dog genome (V2.1). Medaka and stickleback genes were not identified for the CYP2V, CYP2AA, and CYP2AD subfamilies, yet these subfamilies are found in zebrafish (Nelson 2009). The genes retrieved for opossum and rabbit most likely account for the majority of CYP2 genes in these species, as representatives from most of the mammalian CYP2 subfamilies (with the exception of CYP2D and CYP2S) were found.

CYP2 sequences were available from the mouse and rat genomes. However, these species have very large (>35 genes per species) CYP2 gene complements due to high gene duplication rates in some subfamilies (Nelson et al. 2004). As such, rodent CYP2 genes (including rodent-specific subfamilies CYP2T and CYP2AB) were excluded from our analyses, particularly because there were other mammalian genomes available.

Overall, five vertebrate CYP2 subfamilies were not included as they were either not identified by BLAST search (CYP2Q, CYP2AC, CYP2AF) or were rodent-specific (CYP2T, CYP2AB; Nelson 2009; Nelson et al. 2004). The CYP2Q and CYP2AF subfamilies are only found in amphibian and avian species, respectively; the CYP2AC subfamily is found in both avian and amphibian species.

CYP2 Phylogenetic Analyses

For the analysis of overall CYP2 phylogeny, MC3 burn-in length was 40,000 of 20,000,000 generations, resulting in 19,960,000 sampled trees for calculation of posterior probabilities. Replicate MC3 analysis supported convergence via Are We There Yet (AWTY, data not shown; Nylander et al. 2008). The overall topology of the maximum likelihood tree was in agreement with the Bayesian inference consensus tree (Supplementary Fig. 1). The phylogenetic strength of this analysis was interpreted based on bootstrap values and posterior probabilities. Internal node resolution for CYP2 phylogeny had strong bootstrapping and posterior probability values >88/>0.93, respectively (Figs. 2, 3; Supplementary Fig. 1). All subfamilies clustered into monophyletic groups, with patterns of internal branching following vertebrate speciation patterns (Li et al. 2007; Prasad et al. 2008), with few exceptions (CYP2 N/CYP2AD, frog CYP2C8, chicken CYP2C45, stickleback CYP2N17, and fugu CYP2N12). Some subfamilies (CYP2C and the cluster including CYP2J) illustrated a complex history of speciation and gene duplication. Gene clusters surrounding mammalian CYP2C (Fig. 2) and CYP2J (Fig. 3) were subjected to separate phylogenetic analyses.

Fig. 2
figure 2

Phylogenetic analysis of the CYP2C clade. Phylogenetic analyses of the CYP2C cluster were conducted due to high complexity within the vertebrate CYP2 phylogeny (Node 2, Fig. 1). A maximum likelihood tree is shown with bootstrapping and posterior probability values labeled on all distal nodes (see “Methods” section for details). Placental, marsupial, and outgroup (CYP2E) clades are shaded black, grey, and white, respectively. Internal nodes are collapsed; internal nodes had good phylogenetic support with bootstrapping/posterior probability values >88/>0.93, respectively. For these analyses the alignment regions were expanded to a total of 346 amino acids. Mammalian CYP2E sequences were used as an outgroup

Fig. 3
figure 3

Phylogenetic analysis of the CYP2 J-2Z cluster. Phylogenetic analyses of the CYP2 J-2Z cluster were conducted due to high complexity in the vertebrate CYP2 phylogeny (Node 6 and 7, Fig. 1). A maximum likelihood tree is shown with bootstrapping and posterior probability values labeled on all distal nodes (see “Methods” section for details). Mammalia and Amphibia, Actinopterygii, and outgroup (ciona) clades are shaded black, light grey, and white, respectively. Internal nodes are collapsed; all internal nodes had good phylogenetic support with bootstrapping/posterior probability values >97/1.00, respectively. For these analyses the alignment regions were expanded to a total of 343 amino acids. Ciona CYP2N/2J and CYP2X sequences were used as outgroup clades

The placement of some sequences in the CYP2 vertebrate phylogeny did not match their assigned nomenclature. The frog CYP2C8 and chicken CYP2C45 sequences cluster outside of the remaining CYP2C subfamily (Fig. 1). Chicken CYP2C45 clustered with the CYP2H (avian) subfamily, whereas frog CYP2C8 did not cluster with any specific subfamily but was placed between the CYP2Y and CYP2M subfamilies (Fig. 1). The two unnamed opossum CYP2 sequences (XP_001374840 and XP_001364901) were found to cluster with the CYP2C and CYP2E subfamilies, respectively (Supplementary Fig. 1). Stickleback CYP2N17 and fugu 2N12 did not cluster with other CYP2N sequences but instead clustered with the CYP2AD subfamily with strong support (Supplementary Fig. 2).

Basal Vertebrate Subfamilies (2U, 2R, 2D)

The vertebrate CYP2 tree topology was rooted with three Ciona CYP2 sequences. Two ancestral subfamilies, CYP2R and CYP2U, were at the base of the vertebrate CYP2 phylogeny (Fig. 1). While the placement of the CYP2R and CYP2D subfamilies lacked confidence, the basal position of the CYP2U subfamily was well supported (Supplementary Fig. 1). The CYP2U subfamily in our analysis included a total of eight sequences from actinopterygian and mammalian species, whereas the CYP2R subfamily included sequences from actinopterygian, avian, and mammalian species. CYP2D gene loss appeared limited to Actinopterygii and possibly Mammalia (marsupial species only) classes. Full length sequences were not found via BLAST search in all vertebrate classes for the CYP 2U, 2R or 2D subfamilies. Specifically, no full length sequence was identified for CYP2U and CYP2D in Aves, or CYP2U and CYP2R in Amphibia. However, this likely does not reflect gene loss as partial sequences with high sequence identity were found in their respective genomes (data not shown). For example, a partial avian CYP2U sequence with high (60%) sequence identity to the cow CYP2U1 sequence was found but had only three of five exons and was therefore excluded from our analysis. The evolution of vertebrate CYP2s is shown in Fig. 1; a more detailed figure, with complete labels for all branches and node support, is found in Supplementary Fig. 1.

Actinopterygian CYP2 Subfamilies Cluster with Mammalian CYP2J

After the basal CYP2U, CYP2R and CYP2D subfamilies, a gene duplication event (Node 7, Fig. 1) lead to the evolution of two major clusters of CYP2 subfamilies, one which included the CYP2Z, CYP2AE, CYP2V, CYP2P, CYP2AD, CYP2N and CYP2 J subfamilies (Fig. 3; Supplementary Fig. 1). This cluster (Node 8, Fig. 1) presented high complexity with poorly supported branch topology in the vertebrate CYP2 phylogeny. Actinopterygian subfamilies and the mammalian CYP2J subfamily were included in this cluster (Fig. 3; Supplementary Fig. 1). These subfamilies were subjected to separate phylogenetic analyses to provide higher resolution in the phylogenetic tree (Fig. 3; Supplementary Fig. 2). In this second analysis, the placement of CYP2AE and CYP2V subfamilies within this cluster remained uncertain (both subfamilies are only found in zebrafish). CYP2Z (Node 8, Fig. 1) genes were identified in all four actinopterygian species (medaka, fugu, stickleback, and zebrafish) and formed the basal branch of this cluster with strong phylogenetic support (Fig. 3). The remaining subfamilies (CYP 2J, 2P, 2N, 2AD) shared a common ancestral gene duplication event (Fig. 1); however, inconsistencies were found in the branch topology for the focused phylogenetic analysis (Fig. 3). The CYP2J subfamily clustered with the CYP2P subfamily (Fig. 3) and not CYP2N and CYP2AD (Fig. 1), and the bootstrap values and posterior probabilities increased to 38 and 0.85%, respectively. In both analyses, CYP2N and CYP2AD subfamilies were sister subfamilies with similar phylogenetic support values.

Gene Expansions in Zebrafish CYP2 Subfamilies

The second major cluster that arose from node 7 (Fig. 1) included an expansion of actinopterygian genes (CYP 2X-2AA-2K-2Y subfamilies) and the majority of mammalian CYP2 genes (Node 4, Fig. 1). The CYP2AA and CYP2X subfamilies both had highly amplified gene numbers in zebrafish; CYP2AA was a zebrafish-specific subfamily, whereas CYP2X was identified in all actinopterygian species. For the CYP2X subfamily, most actinopterygian species had a single CYP2X gene while zebrafish had seven (Supplementary Fig. 1). This zebrafish-specific high gene duplication rate was also found in the CYP2K subfamily (Node 6, Fig. 1) and these genes were paralogs to the mammalian CYP2W subfamily. In addition, two putative sequences (opossum CYP2 putative (XP_001369607) and frog CYP2 (NP_001037917)) both clustered with the CYP2W subfamily.

Mammalian CYP2 Subfamilies

The majority of the mammalian CYP2 sequences were clustered together with the strongly supported basal actinopterygian subfamily CYP2Y (Node 5, Fig. 1). While, the CYP2M subfamily was basal to the mammalian CYP2s, it is restricted to salmonid species and support for its placement was not strong (Supplementary Fig. 1). There was a major clade within the CYP2 phylogeny that included only sequences from Mammalia and a single Aves subfamily (Node 4, Fig. 1). Within this clade, one major cluster (CYP 2A-2G-2B-2S-2F) represents mammalian-specific CYP2 subfamilies. The tree topology placed the 2F subfamily basal to CYP2A, CYP2G, CYP2B, and CYP2S. Subfamilies CYP2A and CYP2G arose from ancestral gene duplication, as did the CYP2B and CYP2S subfamilies (Fig. 1). The avian CYP2H subfamily was basal to the mammalian-specific CYP2C and CYP2E subfamilies (Node 3, Fig. 1).

Phylogenetic complexity and amplification of CYP2 paralogs was evident in the CYP 2C-2E-2H cluster (Node 3, Fig. 1). The CYP2C subfamily was subjected to separate analysis and support for most nodes increased (Fig. 2; Supplementary Fig. 3). Interestingly, all human CYP2C genes were located in a single clade, while rabbit and cow CYP2C genes were not (Fig. 2).

Type I Functional Divergence

The SRS regions from Gotoh (1992) were mapped on to the unmasked CYP2 alignment (Supplementary nexus file). Three SRSs (SRS2, SRS3, and SRS6) were located within the masked regions of our alignment. The remaining SRS regions (SRS1, SRS4, and SRS5) were in areas of our alignment used for phylogenetic analyses (Fig. 4). Using the masked alignment and the maximum likelihood tree topology, we determined the evolutionary rates of functional divergence of CYP2 s using DIVERGE. A total of 17 of 24 subfamilies were included in the DIVERGE type I analyses as these met the required criteria for site-specific divergence analysis (minimum of four sequences per subfamily). The coefficient of evolutionary functional divergence (θ), its standard error, and the maximum likelihood ratio (LRT) were determined for each pairwise comparison (Supplementary Table 1). A heat map (Fig. 4b) was generated based on type I pairwise comparisons of all 17 CYP2 subfamilies in DIVERGE. With the exception of five pairwise comparisons (CYP 2N/2P, 2N/2Z, 2Z/2D, 2N/2Y, 2N/2A), the remaining 131 pairwise comparisons of vertebrate CYP2 subfamilies exhibited statistically significant divergence in site-specific rate of evolution (LRT, P < 0.05). However, divergence of site-specific rates of evolution was not clustered in the SRS regions or any other part of the alignment, but was distributed across the majority of the CYP2 alignment (Fig. 4b).

Fig. 4
figure 4

CYP2 Sequence schematic and DIVERGE heat map of 17 CYP2 subfamily pairwise comparisons. Figures 6a and b are aligned to scale using the CYP2C8 sequence. A A scaled schematic of the CYP2 gene that illustrates helices (AL) and substrate recognition site (SRS) sites (16). The heat map (B) reflects 136 CYP2 subfamily pairwise comparisons in a graphical representation of DIVERGE analyses. Masked regions not utilized for DIVERGE or phylogenetic analyses are shaded grey. Black regions indicate amino acid positions with a coefficient (theta = θ) of evolutionary rate of functional divergence less than 0.5; in comparison, the red regions represent DIVERGE theta (θ) ≥ 0.5. Mapped CYP2 helices (AL) are labeled above the heat map; the right side of the rows indicates pairwise subfamily comparisons; start residue positions for unmasked regions are labeled below. In particular, the labeled CYP2A, CYP2E, and CYP2N regions represent a total of 35/48 subfamily pairwise comparisons of CYP2A or CYP2E

Type II Functional Divergence

Additional residues were added to SRS regions in several cases for type II functional analyses because we reassessed the alignment for each sister subfamily comparison. Four sister subfamily comparisons (2U/2R, 2AD/2N, 2A/2B/2F and 2AA/2X) added residues in SRS2, SRS3, and SRS6. The 2K/2W and 2C/2E subfamily analyses added residues in SRS2 and SRS6. Type II functional analyses of the 2J/2AD/2N/2P/2Z subfamilies did not add any additional residues in the SRS regions. Type II functional divergence analyses identified residues with radical biochemical changes between sister subfamilies, with significant site-specific posterior probabilities ranging from θ = 1.00 to 7.11. Seven pairwise comparisons (2C/2E, 2K/2W, 2N/2P, 2N/2Z, 2AD/2P, 2AD/2Z, 2AD/2N) had no detectable type II functional divergence.

In the CYP2 vertebrate basal subfamilies, CYP2U and CYP2R, 70/431 sites had radical changes (θ > 1) but these were not clustered within the protein. There were eight amino acids with radical changes in SRS3 (helix G), including a site with a high site-specific posterior probability change (θ > 5.00). At this site, an R in CYP2U (R288 in human CYP2U1) changes to an F in CYP2R proteins (F247 in human CYP2R1), causing a change from a hydrophobic to a positive amino acid (data not shown). The SRS5 region had three radical changes of 10 possible sites.

In the CYP2J/2N/2AD/2P/2Z subfamily cluster (Node 8, Fig. 1), radical amino acid changes were only seen between CYP2J and either CYP 2P or 2Z (Table 3). A total of 14 radical changes were dispersed throughout the protein. Interestingly, one site (T196 in human CYP2J2) had radical changes for most CYP2J comparisons, with the exception of CYP 2J/2AD. The threonine (hydrophilic) in the CYP2J proteins changed to valine (hydrophobic) in CYP2N, 2P, and 2Z proteins (Table 3). Only one site had a radical change between CYP2J and CYP2AD (Table 3).

Table 3 Sites of radical change in the CYP 2J/2AD/2N/2P/2Z subfamilies

Type II functional divergence analysis detected 100 radical amino acid changes (θ > 1) for the CYP 2AA/2X comparison. Most of these sites were distributed throughout the protein, with some clusters in helices A, F, K″, the FG-loop, and B4-1 sheet. The FG-loop had the highest number of radical changes and contained the sites with the highest θ (data not shown).

For the CYP 2F/2A, CYP 2F/2B, and CYP 2A/2B comparisons a total of 95, 54, and 36 radical changes (θ > 1) were identified, respectively. The largest number of radical changes and highest posterior probability values were found in comparisons with CYP2F (2F/2A > 2F/2B > 2A/2B). A total of six radical changes were shared in all three comparisons (residue 189, 276, 277, 324, 406 and 476 in Cow_CYP2A13). Most radical changes were detected in the active site (Fig. 7). Helix F, which contains SRS2, had changes with the highest θ.

Discussion

CYP2 Gene Sequences

Two-thirds of the sequences in the CYP2 alignment were from either the P450 homepage or de novo annotation (Table 2). These sequences have been verified and named by the P450 nomenclature committee. Species with complete, annotated genomes typically had 12–20 CYP2 genes per genome, although species that are known for high gene amplification can encompass well over 40 genes (e.g., zebrafish). Gene number amplification was detected in the Actinopterygii class for the CYP2X, 2AA, and 2 K subfamilies. These three subfamilies are predominately populated with zebrafish sequences (Supplementary Fig. 1). Amplification of CYP2 sequences is thus not uniform but specific to certain subfamilies. Similarly, high gene duplication rates are seen in rodents and marsupials for specific subfamilies (CYP 2B, 2C, 2D, 2J; Nelson et al. 2004). These subfamilies are composed of 12–23 genes in the mouse genome and 6–16 genes in the rat genome (Nelson 2009). The CYP2C33v4 cluster is only found in opossum species, although possible CYP2C33v4 orthologs have been identified in pig (Nelson 2009). Functional and expression data is limited for these genes. It is unclear why an expansion of some CYP2 subfamilies occurred in the rat and mouse genomes; perhaps further research can shed light on their expression, function, and specificity (Nelson et al. 2004).

This study focused on vertebrate CYP2 genes in species with completed genomes. Representative species from the vertebrate lineages (placental and marsupial mammals, bird, amphibian and fish) were chosen to ensure wide coverage of vertebrates. Due to high amplification of CYP2 genes in rodents (Nelson 2009), we chose to exclude rodents and focus on mammalian species where the number of CYP2 genes were more limited but included the major mammalian CYP2 subfamilies. Zebrafish have a large number of duplications within the CYP2 family yet there are a limited number of genomes completed within the Actinopterygii lineage to compare and thus zebrafish sequences were included. The full CYP2 complement was thus obtained for three mammalian (human, dog, cow) and four actinopterygian (zebrafish, fugu, medaka, stickleback) species. A significant number of CYP2 genes, representing most mammalian CYP2 subfamilies, were included from rabbit and opossum. Chicken and frog species were included to provide coverage of the major branches of vertebrate diversity.

Five vertebrate CYP2 subfamilies (CYP2T, CYP2Q, CYP2AB, CYP2AC, and CYP2AF) were not identified during BLAST searching, which utilized human and fugu CYP2 sequences. The CYP2T, CYP2Q, CYP2AB, and CYP2AC subfamilies are specific to rodents, amphibian, avian and rodent/amphibian species, respectively (Nelson 2003; Nelson et al. 2004). A CYP2G sequence from dog has also been recently identified (Nelson 2009) but was not found by our search strategy. Thus, our sequences are not a complete representation of every possible CYP2 gene or subfamily in vertebrates but an assessment of CYP2 genes in the major subfamilies, particularly in the mammalian and actinopterygian lineages.

CYP2 Nomenclature Issues

The phylogenetic reconstructions did not support all CYP2 gene nomenclature and we suggest that the nomenclature be reassessed for some genes. The frog CYP2C8 (NP_001079610) sequence clustered outside of the CYP2C subfamily and thus the appropriate subfamily for this gene is unclear. The chicken CYP2C45 (NP_001001752) clustered well within the CYP2H subfamily. Certain sequences, including opossum CYP2 (XP 001374840), lacked nomenclature at retrieval but our analyses found the gene to cluster with the CYP2C subfamily. Similarly, our analyses strongly support the placement of the opossum CYP2 (XP 001364901) gene within the CYP2E subfamily. Surprisingly, both stickleback CYP2N17 and fugu CYP2N12 may need reassignment to the CYP2AD subfamily.

Cytochrome P450 2 Family Phylogeny

Our CYP2 phylogenetic analysis generated a high resolution tree with strong support (posterior probabilities and bootstrap values) for all internal nodes. With the exception of possible nomenclature problems for the CYP2N and CYP2AD subfamilies, all CYP2 subfamilies were monophyletic. The branching patterns within subfamilies often matched vertebrate speciation patterns (Prasad et al. 2007). For the Actinopterygii, speciation patterns had zebrafish as the most distal, medaka as an intermediate and fugu and stickleback as sister species, as expected (Li et al. 2007).

Basal Vertebrate Subfamilies (2U, 2R, 2D)

Contrary to that suggested by Nelson et al. (2004), the CYP2 phylogeny indicates that the vertebrate ancestor had three CYP2 genes; a CYP2U, CYP2R, and another CYP2 gene that diversified into the remaining CYP2 subfamilies (Fig. 5). Prior CYP phylogenetic analyses showed CYP2U and CYP2R subfamilies to have deep branches within the CYP2 family (Nelson et al. 2004). While support for the basal position of these subfamilies is strong, CYP2R and CYP2D placement within the basal branches is weak in our phylogenetic tree. However, CYP2R and CYP2U are the only CYP2 subfamilies with representatives from all vertebrate classes (Fig. 5) and the primary sequence structure of both CYP2U and CYP2R includes five exons; all other CYP2 genes, including CYP2D genes, contain nine exons (Nelson et al. 2004). Collectively, this supports the placement of CYP2U as the ancestral subfamily at the base of the CYP2 phylogenetic tree followed by CYP2R. With the development of the CYP2D ancestor, the gene structure for CYP2 genes increased to nine exons (2004). In our analyses, representatives from CYP2U, CYP2R, and CYP2D were not identified from all vertebrate lineages yet CYP2U and CYP2R are expected in all vertebrate taxa, as seen in Thomas (2007). CYP2D sequences are expected in all but Actinopterygii lineages (Nelson 2003; Thomas 2007). Determining gene loss in certain lineages can be difficult based on available sequences and genomes. For example, CYP2S, CYP2F, CYP2R, and CYP2U genes have yet to be identified in rabbit (Nelson 2009), for which genome sequences are lacking. Partial sequences for avian CYP2U and CYP2D, partial amphibian CYP2U, and a full amphibian CYP2R sequence have been identified by Thomas (2007), which were considered for our analysis of gene duplication history (Fig. 5). We hypothesize that the lack of CYP2D sequences in the Actinopterygii class is the product of gene loss (Fig. 5) based on their lack in all four actinopterygian species with completed genomes. Gene duplication and speciation patterns were also inferred for the CYP2AA/2X/2K/2W and CYP2Y/2M/2F/2S/2B/2G/2A clusters and are in supplementary Figs. 4 and 5, respectively.

Fig. 5
figure 5

Speciation patterns and the gene duplications in the CYP 2U, 2R, and 2D subfamilies. The evolution of vertebrate subfamilies are symbolized by duplication (filled diamond) and gene loss (open box) events. Diversification of vertebrate species is symbolized by speciation patterns (filled star). Dashed lines represent areas of topology with weak support. Four sequences (Amphibia 2R, 2U, and Aves 2U, 2D) were not detected within the present analysis; however, supportive data was identified and the sequence topology adapted (Thomas 2007)

Actinopterygian CYP2 Subfamilies Cluster with Mammalian CYP2J

For much of the phylogenetic tree, Actinopterygii and Mammalia CYP2 subfamilies were located in distinct clades (Fig. 1). One major cluster of actinopterygian CYP2 subfamilies (2P-2N-2AD-2AE-2V-2Z) contained the mammalian and avian CYP2J subfamily (Node 8, Fig. 1). This clade was subjected to separate phylogenetic analysis to accurately resolve complexity associated with internal nodes (Fig. 3; inferred speciation and duplication patters are in supplementary Fig. 6). The CYP2J subfamily shares a common ancestral CYP2 with the CYP2P subfamily (Fig. 3), even though CYP2P genes have higher sequence similarity to CYP2N and CYP2AD (data not shown). Studies by Oleksiak et al. (2003) suggest that CYP2J, CYP2N and CYP2P enzymes have functional similarities; the regio- and enantioselectivities of killifish (Fundulus heteroclitus) CYP2P3 for arachidonic acid were similar to the mammalian CYP2J2 gene. The CYP2J, CYP2P and CYP2N enzymes have similar structure, metabolic pathways, and expression patterns, with high levels in heart, kidney, and intestines (Scarborough et al. 1999). Oleksiak et al. (2003) suggested that CYP2P and CYP2N had a common ancestral gene, which is supported in our phylogenetic study with inclusion of CYP2J and CYP2AD in that common ancestry. Further investigation into subfamily CYP2AD would be of importance in identifying what functional role it may have in common with CYP2J, CYP2P, or CYP2N. The CYP2AE and CYP2V subfamilies were only represented by a single zebrafish gene and their exact placement relative to the CYP2AD, CYP2N, CYP2P, CYP2J, and CYP2Z subfamilies was unresolved (Fig. 3).

Mammalian CYP2C and CYP2E Subfamilies

The most recently evolved subfamilies in CYP2 vertebrate evolution are the mammalian CYP2C and CYP2E subfamilies (Fig. 6). A series of gene duplication events occurred within the CYP2C subfamily, with the first giving rise to the CYP2C and CYP2C33v4 clades (Node 1, Fig. 1). The CYP2C33v4 clade is basal and includes many marsupial genes from both opossum and koala. No CYP2C genes from placental mammals clustered with the CYP2C33v4 genes, suggesting gene loss in most, but not all, placental mammals (Fig. 6) as a CYP2C33v4 gene has been identified in pigs (Nelson 2009). Within the CYP2C33v4 cluster, two koala CYP2C sequences (CYP2C47 and CYP2C48) have high sequence similarity (>80%) to four CYP2C33v4 sequences (data not shown).

Fig. 6
figure 6

Speciation patterns and the gene duplications in the CYP 2C, 2E, and 2H subfamilies. The evolution of vertebrate subfamilies are symbolized by duplication (filled diamond) and gene loss (open box) events. Diversification of vertebrate species is symbolized by speciation patterns (filled star). Chicken 2C45 sequence is not a true 2C sequence, it does not cluster with the 2C sequences; and the sequence identity to other CYP2C sequences is less than 70%. Opossum CYP2 (XP_001364901) positioned between 2E (mammalian) and 2H (avian) subfamilies may be a putative CYP2E sequence due to its strong phylogenetic placement and sequence identify >70% to 2E subfamily sequences (Supplementary Fig. 1)

The CYP2C clade contained both placental and marsupial genes, although these were distinctly clustered and separated by a speciation event (Fig. 6). The CYP2C sequences are known to have LINE-1 (L1) elements, which are interspersed repetitive DNA elements that can replicate via retrotransposition (Boissinot et al. 2000; Nelson et al. 2004). The L1 elements are primarily found in intron 5 of CYP2C sequences. Multiple L1 elements are found in CYP2C18, CYP2C19, and CYP2C9 (Nelson et al. 2004), which have sequence similarity of 84–92%. These types of recombination factors can lead to production of novel hybrid CYP2C genes.

Type I Functional Divergence

Six substrate recognitions sites (SRSs) were identified by a correlation between high substitution rates and similarity in location to residues that appeared to be involved in substrate interactions in a P. putida P450 101A gene (Gotoh 1992). The SRSs were identified based on mammalian CYP sequence, including CYP2A, CYP2B, CYP2C, CYP2D, CYP2E, and CYP2F genes, and were suggested to be functionally significant for all CYPs (Gotoh 1992; Lewis 2003). Gotoh (1992) suggested residues with high rates of change are responsible for the specificity and functional diversity of CYPs. Pairwise comparisons of 17 CYP2 subfamilies were completed using DIVERGE to determine which regions of the CYP2 genes had elevated rates of evolutionary divergence. The six SRS regions proposed by Gotoh (1992) were mapped onto the CYP2 alignment (Fig. 4a); three of these SRSs were in masked regions of the alignment and three SRSs were included in our phylogenetic and divergence analyses. There was no association between high evolutionary rates and the SRS locations in the alignment (Fig. 4b). The heatmap (Fig. 4b) illustrates that there are residues with high amino acid divergence; however, with the exception of selected regions of θ < 0.5, the divergence was distributed throughout the protein alignment. Notably, SRS1 showed no statistically significant change in evolutionary rates of functional divergence (Fig. 4).

Analysis of subfamily divergence indicates some subfamily-specific patterns (Fig. 4b). For example, comparisons involving subfamilies CYP2A or CYP2E have high evolutionary rates of functional divergence throughout the majority of the sequence alignment, whereas comparisons to CYP2N show low rates of divergence. A total of five subfamily comparisons were not statistically significant based on LRT values (Supplementary Table 1). One such comparison was for CYP2N/CYP2P. These two subfamilies seem to share common functional characteristics (Oleksiak et al. 2003), supported by our DIVERGE results. The CYP2N/CYP2A comparison also lacked statistically significant divergence, which was surprising due to the large evolutionary distance between the two subfamilies (Fig. 1). Further examination of these two subfamilies would be of interest to determine whether CYP2N (fish) and CYP2A (mammalian) share commonalities in function.

Type II Functional Divergence

Our type II functional analyses identified sites with radical biochemical changes between closely related subfamilies. These sites were localized to the appropriate protein structure (helix or B-sheet) and clusters of sites were identified in the amino acid sequence. Particular attention was paid to those sites found in regions important to the active site of the protein. Like the type I functional analyses, type II functional analyses of CYP2 subfamilies had biochemical changes spread throughout the protein without strong clustering of radical changes for most subfamilies. Seven of the pairwise subfamily comparisons had no type II functional divergence, indicating that the CYP 2C/2E, 2K/2W, 2N/2P, 2N/2Z, 2AD/2P, 2AD/2Z, 2AD/2N sister clades may be functionally similar. Certainly, functional studies (Oleksiak et al. 2003) and type I divergence support similar function between CYP2 J and CYP2P subfamilies.

Studies of cytochrome P450 crystal structure have identified functional regions of the protein. Conservative sites appear to surround the heme group (de Graaf et al. 2005; Hasemann et al. 1995; Mestres 2005) and conformational similarity was found in eukaryotic and bacterial P450s for this region, even though overall sequence identity was low at 10–30% (de Graaf et al. 2005; Hasemann et al. 1995; Mestres 2005). The conserved interior structure extends to helices D, E, J, J′, K, K′, K″, L, and the central-carboxyl terminal of the I helix (de Graaf et al. 2005; Hasemann et al. 1995; Mestres 2005). Higher variability in structure was associated with sites involved in substrate recruitment and binding (Hasemann et al. 1995; Mestres 2005). The active site has been associated with helices B′ (SRS1), C, C′, F (SRS2), G (SRS3), the N-terminus of the I helix, and the B1-4 and B4-2 (SRS6) sheets (de Graaf et al. 2005; Hasemann et al. 1995; Lewis 2002; Rowland et al. 2006). Helix F and G and the FG-loop have the highest variability in sequence identity and length due their importance for regiospecificity and spatial movement (Mestres 2005).

Type II functional analysis identified a limited number of radical changes in the CYP 2J/2P and CYP2J/2N comparisons (Table 3), supporting Oleksiak et al.’s (2003) contention that CYP2J, 2N and 2P have similar catalytic function. CYP2J and CYP2P metabolize arachidonic acid and share regio- and enantioselectivities (Oleksiak et al. 2003). CYP2P and CYP2J have not only high sequence similarity, but similar catalytic function and tissue distribution (Oleksiak et al. 2003). Only two sites were identified with radical changes between the CYP2J and CYP2P subfamilies, while a total of nine radical changes were detected between the CYP2J and CYP2N subfamilies. The two CYP2J/2P sites were also identified in the type II functional analysis of CYP2J and CYP2N (Table 3). CYP2Ns oxidize arachidonic acid with regio- and enantio-selectivity that are very different from CYP2P (Oleksiak et al. 2000). All radical changes were located within conserved helices or in-between conserved regions, with no sites identified in regions important for the active site.

In contrast to the CYP2J/2N/2P comparisons, type II functional divergence was identified in regions of the active site for CYP2F, CYP2A and CYP2B. Helix F and the B4-2 sheet, which include SRS2 and SRS6 regions, respectively, had the highest site-specific θ and number of radical changes (Fig. 7a). These radical changes were highest between CYP2F and CYP2A subfamilies, suggesting that the substrate access channel for substrate binding may be different between these two subfamilies. During metabolism, the product exits the CYP protein through a region defined by helices G and I; one-third of the residues in this region had radical changes (θ > 1; Fig. 7d) between CYP2F and CYP2A. In addition, flexibility and variability in the B′ helix, BC loop and SRS6 is necessary for easy passage of the substrate and product (de Graaf et al. 2005; Hasemann et al. 1995). CYP2A and CYP2F type II functional analysis identified one-third of these regions to have radical changes (Figs. 7b, c). Collectively, this suggests that the substrates for CYP2F and CYP2A proteins are likely to be distinct. For example, CYP2A6 shows high metabolic activity for coumarin 7-hydroxylase (Yano et al. 2006), whereas CYP2F1 are found to be important in metabolizing different pneumotoxins such as naphthalene (Tournel et al. 2007). CYP2A6 and CYP2A13 show similar substrate selectivity, yet differences are seen in metabolism of substrates (DeVore et al. 2008).

Fig. 7
figure 7

Radical changes in the active site between CYP2F and CYP2A subfamilies. Type II functional divergence of biochemical changes identified 31 radical changes (θ > 1) between the CYP2F and CYP2A subfamilies. These sites are shown mapped on the hypothetical human CYP2F1 crystal structure (a). Helix G is identified in a, b, and c by 1* for orientation of the protein. The protein structure is rotated and enlarged to clearly show changes in the (b) N-terminus of helix I (yellow, residues 284–298) and C-terminus of helix b (part of SRS1 green, residues 108, 109); c SRS6 (B4-1 and B4-2 sheets, red, residues 471–479); d helix F (which includes SRS2, light purple, residues 196–221), helix G (which includes SRS3, light blue, residues 235–251) and the FG-loop (orange, residues 224, 228) (Color figure online)

Conclusion

Our phylogenetic reconstruction provides an evolutionary understanding of CYP2 function and diversity. The topology illustrates that most subfamilies are lineage-specific, with the exception of a few ancestral subfamilies (CYP2U, CYP2R, CYP2D; Fig. 5). The CYP2 family has diversified throughout all vertebrate species and has expanded gene copy numbers within particular subfamilies (e.g., zebrafish CYP2X, CYP2AA and CYP2K). Since knowledge of function for genes outside of the mammals is limited, our comparative approach provides insight on CYP2 function and diversity for other vertebrate lineages. Our data indicates that the predicted SRSs do not correlate with evolutionary rates of amino acid divergence within the CYP2 alignment. Certain SRSs had small clusters of residues with radical biochemical changes between sister subfamilies but this result was not consistent across all subfamilies nor were the majority of residues in an SRS region involved. Collectively, our type I and II functional divergence suggests that SRSs are not necessarily functionally important for CYP2 proteins. Overall, our evolutionary investigation of CYP2s provides a number of hypotheses worth testing in a functional context.