Introduction

The Pax genes encode transcription factors that are vital for many developmental processes and play important roles in a diverse range of diseases (Chi and Epstein 2002; Robson et al. 2006). They are defined by a 128-amino acid DNA binding domain, termed the “paired domain,” that folds as two subdomains, termed the PAI (N-terminal) and RED (C-terminal) domains (Xu et al. 1995). The PAI subdomain cooperates with the RED subdomain and is required for binding to target sequences (Czerny et al. 1993; Pellizzari et al. 1999; Zwollo et al. 1997). The genes are subdivided into four classes based on the presence or absence of additional motifs (Robson et al. 2006). Class I contains Pax1 and Pax9, which also encode an octapeptide sequence that interacts with the Groucho corepressor (Eberhard et al. 2000; Kreslova et al. 2002), but they lack a homeodomain. The class II Pax genes, Pax2, Pax5, and Pax8, encode the octapeptide and also a partial homeodomain, whereas the class III genes, Pax3 and Pax7, encode the octapeptide plus a full homeodomain. Finally, the class IV Pax genes, Pax4 and Pax6, encode the full homeodomain but lack the octapeptide. In addition, Pax proteins contain a transactivation domain located in their C-terminal regions (Chalepakis et al. 1994; Dorfler and Busslinger 1996; Kalousova et al. 1999; Nornes et al. 1996; Schafer et al. 1994; Tang et al. 1998). The C-terminal regions of Pax2, Pax5, Pax8, and Pax4 also contain inhibitory domains (Dorfler and Busslinger 1996; Fujitani et al. 1999). Such inhibitory domains have not yet been clearly demonstrated for the other Pax genes.

Despite an apparent two rounds of whole-genome duplication in the vertebrate lineage, the human and mouse genomes have surprisingly few genes (Lander et al. 2001; Putnam et al. 2008; Waterston et al. 2002), apparently due to losses of many duplicates. It has been suggested that alternative splicing may help compensate for such gene loss by allowing a greatly expanded and diversified proteome from a relatively small number of genes (Graveley 2001). All four classes of vertebrate Pax genes have alternatively spliced transcripts, with many isoforms having distinct activities (Anspach et al. 2001; Azuma et al. 2005; Kozmik et al. 1993; Lamey et al. 2004; Miyamoto et al. 2001; Nornes et al. 1996; Ritz-Laser et al. 2000; Robichaud et al. 2004; Wang et al. 2007).

Cephalochordates (amphioxus), which diverged from vertebrates about half a billion years ago (Shu et al. 1999), represent basal chordates (Blair and Hedges 2005; Bourlat et al. 2006; Philippe et al. 2005). Amphioxus shares a fundamental body plan with the vertebrates, but is much simpler both structurally and genomically, and therefore is useful as a model for the ancestral vertebrate before vertebrates apparently underwent several whole-genome duplications (Holland et al. 2004; Holland 2003; Putnam et al. 2007, 2008). Consequently, since the amphioxus genome has very little gene duplication or gene loss, it has only a single gene in each of the four Pax classes (Glardon et al. 1998; Holland et al. 1999, 1995; Kozmik et al. 1999). The functional domains in each class of vertebrate Pax genes are conserved in amphioxus and vertebrates and in all the vertebrate duplicates, suggesting strong evolutionary constraints for maintaining the particular domain combinations even after gene duplication (Glardon et al. 1998; Holland et al. 1995, 1999; Kozmik et al. 1999) (Fig. 1A–D). To date, alternative splicing of amphioxus Pax genes has been described only in AmphiPax4/6 (Glardon et al. 1998) and AmphiPax2/5/8 (Kozmik et al. 1999), with two isoforms of AmphiPax2/5/8 displaying functional equivalency to isoforms of human Pax8 (Kreslova et al. 2002). However, the extent of alternative splicing in amphioxus Pax transcripts has not been systematically investigated.

Fig. 1
figure 1

Graphical representations of ClustalW alignments. The functional domains are conserved in vertebrate and amphioxus Pax genes and are maintained in all vertebrate duplicates. Exon numbers and functional domains are included to allow comparison of previously reported vertebrate alternative splicing events with the amphioxus events reported in this study. There is less sequence conservation in the C-terminal regions, and therefore, the alignment of corresponding exons is less certain (introns not to scale). (A) Alignment of AmphiPax1/9 (Bf) (accession no. AJ238974) and mouse (Mm) Pax 1 (accession no. NM008780) and 9 (accession no. NM001041). The coding sequence of AmphiPax1/9 is spread over five exons (see results). (B) Alignment of AmphiPax2/5/8 α (Bf) (accession no. AF053762) but with exon 7, used in the β form (AF053763), also included mouse Pax2 (accession no. NP035167), Pax5 (accession no. CAM23221), and Pax8 (accession no. NM011040). The coding sequence of AmphiPax2/5/8 is spread over 11 exons. The region encoded by exon 4 in amphioxus is split into multiple exons in the vertebrate genes. An exon equivalent to exon 8 in human Pax8 is also found in Xenopus Pax2 but is labeled exon 9 (Heller and Brändli 1997). (C) Alignment of AmphiPax3/7 (Bf) (accession no. AF165886), mouse Pax3 (AK044985), and mouse Pax7 (accession no. AF254422). The coding region of AmphiPax3/7 is spread over six exons. The paired domain and octapeptide is contained within exon 1 in amphioxus, however, it is spread over multiple exons in vertebrates. (D) Alignment of AmphiPax4/6 (Bf) (accession no. AJ223444), mouse Pax4 (accession no. AF031150), and mouse Pax6 (accession no. CAA453380). The coding sequence of AmphiPax4/6 is spread over 13 exons (due to the use of mutiple start codons, exons 1 and 2 are missing from this cDNA sequence)

Recent studies have found an inverse correlation between gene duplication and alternative splicing (Kopelman et al. 2005; Su et al. 2006), suggesting that the two mechanisms could be interchangeable sources of functional diversification. However, it was recently shown that the two processes have different effects on the proteome, with alternative splicing having a greater impact on protein sequence and structure than does duplication followed by divergence (Talavera et al. 2007). To contribute to our understanding of the relationship between alternative splicing and gene duplication and investigate the evolution of alternative splicing within the chordate Pax genes, we systematically surveyed the alternative splicing of the amphioxus Pax transcripts. The nested PCR approach we used is more sensitive and more comprehensive in terms of tissue types and developmental stages surveyed than any study carried out to date for vertebrate Pax genes.

Our results showed that there are alternative splicing events in all four amphioxus Pax transcripts but at differing levels. Compared to vertebrates, amphioxus has approximately the same or even fewer splice forms per Pax gene, indicating not only that the number of alternative splicing events has not decreased subsequent to gene duplication but that the total number of alternatively spliced Pax isoforms for the nine vertebrate Pax genes is probably considerably higher than for the four amphioxus ones. Alternative splicing of amphioxus, as well as of vertebrate, Pax genes, is predicted to dramatically alter known functional domains, creating much greater differences than among the duplicates of a given vertebrate Pax gene. Moreover, although most alternative splicing events are divergent between the amphioxus and the vertebrate Pax homologues, several events are conserved—a notable example being one that removes most of the paired domain of AmphiPax2/5/8 and vertebrate Pax5 and is known to alter DNA binding (Zwollo et al. 1997). This conservation of mRNA splice forms over a wide phylogenetic distance implies conservation of protein function and suggests that comparison of alternative splice forms over large phylogenetic distances may be a useful strategy for distinguishing functionally important isoforms.

Materials and Methods

Identification and Characterization of Alternatively Spliced Transcripts

The technique used to isolate isoforms has been described previously (Gorlov and Saunders 2002). It involves a total of two rounds of PCR. In brief, a first round of RT-PCR with primers spanning a given region of the transcript (see below) is followed by electrophoretic separation of the RT-PCR product and DNA extraction from sections of the agarose gel (QiAquick Gel Extraction Kit; Qiagen,Valencia, CA, USA) surrounding the expected band. These sections potentially contain the PCR product of uncharacterized isoforms. Extractions were performed regardless of whether an additional band was evident following the initial RT-PCR reaction. Finally, the contents of the extracted sections were used as templates for a second round of PCR with nested primers. Identification of isoforms more than ∼150 bp greater or less than the major isoform was performed using nested primers spanning the entire transcript. The isolation of splice variants that differ only slightly from the major isoform/s (∼25 bp) was performed using the same technique but with nested primers flanking each exon. As an example, to analyze the potential alternative splicing of all, or part of, AmphiPax258 exon 2, the above method was used but with nested primers targeted against exons 1 and 3 instead of across the entire transcript. The same approach was used for exons along all four AmphiPax transcripts. Potential isoforms evident from either the first or the second round of PCR reactions were eluted, cloned directly into a TA vector (Invitrogen, Carlsbad, CA, USA), and characterized by automated sequencing (Seqxcel Inc., La Jolla, CA, USA). In some cases, additional PCR reactions were performed using various combinations of primers to gain information regarding the context of splicing events or to check for intron retention. The sequences of all primers and their exon locations are listed in the supplementary materials. The use of standard splice donor and/or acceptor sites was confirmed using the Branchiostoma floridae v.1.0 genome sequence (http://www.genome.jgi-psf.org/Brafl1/Brafl1.home.html).

Animal Collection and RT-PCR Analysis

Branchiostoma floridae adults and developmental stages were obtained as previously described (Holland and Yu 2004) and stored in 4 M guanidinium thiocyanate, 25 mM sodium citrate, 0.5% Sarcosyl, 0.1 M β-mercaptoethanol. Total nucleic acid was isolated via multiple rounds of pH 4.7 phenol–chloroform (5:1) extractions, followed by ethanol precipitation. The DNA was removed with RNase-free DNase (New England Biolabs, Ipswich, MA, USA). RNA (5 μg) was reverse transcribed into cDNA with Superscript II (Invitrogen, Carlsbad, CA, USA) and stored at −20°C. The cDNA was subjected to PCR for 36 cycles: 1 min at 94°, 40 s at 56°C, and 2 min at 72°C for survey across entire transcripts or 1 min at 72°C for survey across individual exons. The second round of PCR reactions was identical to the first round except that reactions were performed for 32 cycles.

Results

Experimental Design

To identify a maximum number of Pax splice forms and determine developmental stage specificity, we surveyed splicing with a nested RT-PCR based technique (Gorlov and Saunders 2002) using RNA isolated from amphioxus neurulae, early larvae and adults. Each of these stages had been shown by semiquantitative RT-PCR (data not shown) and in situ hybridization to express all four Pax genes (Glardon et al. 1998; Hetzer-Egger et al. 2000; Holland et al. 1999, 1995; Kozmik et al. 1999). Comparisons of the sequences of PCR products with the Branchiostoma floridae v.1.0 genome sequence (http://www.genome.jgi-psf.org/Brafl1/Brafl1.home.html) confirmed the use of standard splice donor and/or acceptor sites (GT and AG respectively) in all the splicing events we found.

Although our survey is as comprehensive as possible, we could not identify isoforms produced by alternative promoters because the PCR method requires the sequence of the first and last exons. The alternative splicing events are, therefore, restricted to alternative use of splice donors, acceptors, exon cassette, and retained introns. An analysis of alternative splicing events conserved between mouse and human suggests that approximately 70% of all alternative splicing events fall within one of these four catorgories (Sugnet et al. 2004). However, even by conducting PCR across each exon as well as across the entire transcript, we could only detect differences in length of ∼25 bp or more. Alternative splicing of trinucleotide and hexanucleotide sequences is known to occur in vertebrate Pax genes and be functionally important (Kozmik et al. 1997; Lamey et al. 2004). Despite these limitations, this method is more sensitive and, in terms of the number of tissue types and developmental stages surveyed, more comprehensive than any survey of alternative splicing in vertebrate Pax genes to date and provides a lower limit for the amount of alternative splicing of amphioxus Pax transcripts. We consider an alternative splicing event between amphioxus and vertebrates to be conserved only if the exon or retained intron in question is located in the equivalent position of an alternatively spliced exon or retained intron within the vertebrate genes (Fig. 1). The comparisons use alternative splicing events found in the Pax transcripts of a wide range of vertebrate species. However, most events have been isolated in human and/or mouse and these events are used where possible. Notable exceptions include Pax2, for which the most comprehensive survey has been performed in Xenopus (Heller and Brändli 1997), and Pax6, for which a survey performed in pigeon has made an important contribution to our knowledge of splicing events (Bandah et al. 2007). To give an indication of expression levels, we have stated which variants could only be isolated via nested PCR. All new sequences isolated, as well as splice donors and acceptors used, are provided in the supplementary material.

Alternative Splicing Creates Two Isoforms of AmphiPax1/9

Although the AmphiPax1/9 coding region was thought to include four exons (Hetzer-Egger et al. 2000), the B. floridae v. 1.0 genome sequence reveals an additional intron disrupting what was designated exon 4, resulting in a total of five exons (Figs. 1A and 2A). In addition to the single isoform of AmphiPax1/9 previously characterized, which we term 5a(−) (Hetzer-Egger et al. 2000; Holland et al. 1995), our survey revealed a longer transcript [5a(+)], resulting from the use of an alternative upstream splice acceptor in exon 5 (Fig. 2B). However, this transcript has an altered reading frame which would code for a truncated protein, missing the C-terminal end of the likely transactivation domain. The transactivation domain also undergoes alternative splicing in vertebrates (Nornes et al. 1996), but the exons and splice sites involved differ from those in amphioxus, suggesting independent evolution. Semiquantitative RT-PCR with primers flanking exon 5a showed that the 5a(−) isoform is minor and is developmentally regulated relative to the 5a(+) isoform. When volumes loaded on a gel were adjusted to contain equal amounts of the 5a(+) isoform, the 5a(−) form could only be detected at the early larval stage (Fig. 2C).

Fig. 2
figure 2

Alternative splicing creates two isoforms of AmphiPax1/9. (A) Use of the B. floridae v.1.0 genome sequence suggests that the coding sequence of AmphiPax1/9 is distributed over five exons. On the basis of vertebrate evidence it is expected that the transactivation domain would be located within, or distributed over, exons 3–5. (B) In addition to the previously characterized isoform, termed 5a(−), another isoform that uses an alternative upstream splice acceptor was found. This results in a longer transcript, termed 5a(+), but is predicted to code for a truncated protein, with altered potential transactivation domain. (C) Semiquantitative RT-PCR performed with primers flanking exon 5a suggest that the exon 5a splicing event is developmentally regulated. PCR product was loaded to give equal quantities of the 5a(+) form for each stage, and under these conditions, the 5a(−) form could only be detected at the early larval stage. sm, size marker; gas, gastrula; en, early neurula; el, early larval; ad, adult

AmphiPax2/5/8 Transcripts Undergo Considerable Alternative Splicing

The AmphiPax2/5/8 coding region is spread across 11 exons (Figs. 3 and 4A) and is extensively alternatively spliced. Two isoforms, α and β, which result from the skipping or inclusion of exon 7 (Fig. 4B) and possess different transactivation properties, were previously described (Kozmik et al. 1999; Kreslova et al. 2002). We found four different splice events involving exons 1–5, all of which would create isoforms lacking portions of the paired domain and would presumably have altered DNA binding properties (Fig. 3). All events involving exons 1–5, with the exception of exon 4a alternative splicing, were isolated using nested PCR at neurula and larval stages. The skipping of exon 2 would remove most of the PAI subdomain of the paired domain and cause the reading frame to shift, leading to a premature termination codon (PTC) in exon 3. This event has also been found in the Pax5 transcripts of humans, mice, frog, and zebrafish (Borson et al. 2002; Heller and Brändli 1999; Kwak et al. 2006; Zwollo et al. 1997), and it was suggested that an internal ATG site, which is conserved with AmphiPax2/5/8, may serve as an alternate start codon (Zwollo et al. 1997). Western blots of cell lines expressing human Pax5 showed that for a small percentage of full-length transcripts, the internal ATG can serve as a start codon (Zwollo et al. 1997). If this is true, then splicing-out of exon 2 may regulate the relative proportions of full-length transcripts and of those lacking the paired domain.

Fig. 3
figure 3

Alternative splicing in the region that encodes the AmphiPax2/5/8 N-terminal (exons 1–5) would create isoforms lacking portions of the paired domain and the octapeptide sequence. Exon 2 skipping would remove almost all the PAI subdomain of the paired domain, causing a frame shift, resulting in a premature termination codon (PTC) in exon 3. A conserved internal ATG site in exon 3 is suggested as an alternate start codon in vertebrates (Lowen et al. 2001; Zwollo et al. 1997). Skipping of exon 3 does not alter the reading frame and would result in the deletion of almost all the RED subdomain. An upstream alternative splice acceptor at the 5′ end of exon 4 would cause the inclusion of eight amino acids, termed 4a, at the C-terminal of the RED subdomain. Exon 4 exclusion not only would remove 4a but also would delete the octapeptide sequence, resulting in an altered the reading frame and a PTC within exon 5. Translated from the standard start codon, this isoform would include most of the paired domain plus an additional 13 amino acids. On the basis of known splice donor-acceptor combinations, and assuming that the downstream start codon is conserved, there are potentially eight N-terminal isoforms, five of which would be expected to also encode a C-terminal region

Fig. 4
figure 4

Alternative splicing in the region that encodes the AmphiPax2/5/8 C-terminal would create isoforms with altered transactivation and inhibitory domains. (A) The region encoding the C-terminal (exons 6–11) including an alternatively spliced exon, termed 10a and 10b. The previously described exon 7 has been split into 7a and 7b (see below). (B) The previously described AmphiPax2/5/8 isoforms. The α form (accession no. AF053762) skips exons 7 and 10 and uses the upstream exon 11 stop codon. The β form (accession no. AF053763) includes exons 7a and 7b and uses a stop codon within exon 7, creating an altered, serine/threonine-rich, C-terminal sequence. (C) This survey isolated a further seven alternative splicing events. We have numbered the C-terminal splice variants I–VII. Variant I uses a downstream splice acceptor within exon 11, resulting in an altered reading frame. Two splice forms (II and III) skip exons 7–10 and lack the entire transactivation domain. Variant II uses the downstream exon 11 splice acceptor, resulting in the PHT reading frame (Vorobyov and Horst 2006), while variant III uses the upstream acceptor, resulting in the α reading frame. Splice variants IV, V, and VI include the novel exon 10, disrupting the previously characterized inhibitory domain, but use different combinations of alternative splice donors and acceptors. Splice form VII includes exon 7a but uses a splice donor, prior to the exon 7 stop codon, in combination with the standard splice acceptor 5′ of exon 8. This is predicted to result in isoforms containing a serine/threonine-rich region but potentially possessing any of the above downstream C-terminal sequences. On the basis of known splice donor-acceptor combinations, there are potentially 13 C-terminal isoforms

Skipping of exon 3 does not alter the reading frame and would result in the deletion of nearly all the RED subdomain of the paired domain. The use of an upstream alternative splice acceptor at the 5′ end of exon 4 would result in an extra eight amino acids (termed 4a) at the C-terminal end of the RED subdomain. In addition to deleting the C-terminal portion of the paired box, skipping of exon 4 would also remove the octapeptide and alter the reading frame, causing a PTC within exon 5. Thus, this isoform, which would include most of the paired domain plus an additional 13 amino acids, may bind to DNA and, if so, could act as a dominant-negative.

In addition to the four alternative splicing events in the 5′ half of AmphiPax2/5/8, we found seven involving exons 6–11 (Fig. 4C). These include isoforms that would lack the transactivation domains and possess altered inhibitory domains and, therefore, probably have differing transactivation properties as previously shown for the α and β isoforms (Kreslova et al. 2002). The α isoform, which skips exon 10 and uses an upstream exon 11 splice acceptor resulting in what we term the α reading frame (Fig. 4B), has both the C-terminal activation domain and the adjacent inhibitory domain. The β isoform includes exon 7, which has a stop codon within the exon (Fig. 4B), and codes for a serine/threonine rich C-terminal region that cannot transactivate in in vitro experiments (Kreslova et al. 2002). We have numbered the newly discovered C-terminal splice variants I–VII (Fig. 4C). Variant I is like the α isoform but has an altered reading frame due to the use of an alternative splice acceptor within exon 11, which would create a C-terminal domain, termed a “paired-type homeodomain tail” (PHT), a sequence previously suggested to be the true Pax2/5/8 C-terminal (Vorobyov and Horst 2006). Our studies confirm the use of this reading frame, but we suggest that its use, and, therefore, the presence or absence of the PHT, results from alternative splicing. Two splice forms (II and III) skip exons 7–10, and both lack the entire transactivation domain. Variant II uses the downstream splice acceptor within exon 11 resulting in the PHT frame and appears to be conserved in vertebrate Pax8 (Poleev et al. 1995), while variant III, isolated via nested PCR, uses the upstream splice acceptor resulting in the α reading frame. Splice variants IV, V, and VI include all or most of exon 10 and use different combinations of alternative splice donors and acceptors in exons 10 and 11 resulting in variant inhibitory and/or transactivation domains. Both variant IV and variant VI were isolated via nested PCR. Variant IV uses an internal splice donor within exon 10 and the standard exon 11 splice acceptor and changes the reading frame for exon 11. Variant V uses the splice donor within exon 10 in combination with the internal exon 11 acceptor, resulting in the α reading frame minus 6 amino acids of the inhibitory domain. Variant VI uses a downstream splice donor that results in an extra four amino acids at the C-terminal of exon 10, termed 10b, together with the internal exon 11 splice acceptor, resulting in the PHT reading frame. Finally, splice form VII was isolated via nested PCR and, like the β isoform, includes exon 7. However, rather than continuing to the stop codon used by the β form (Kreslova et al. 2002), it uses a different splice donor in combination with the standard splice acceptor 5′ of exon 8. This exon has been termed 7a. Its inclusion results in an isoform with the serine/threonine-rich region as in the β form, but with the downstream C-terminal sequences described above. This insertion is comparable to exon 8 of human Pax8 (Fig. 1B) in that the human exon is also alternatively spliced and contains a comparable serine/threonine region (Kozmik et al. 1993).

AmphiPax3/7 has Multiple Isoforms

The coding region of AmphiPax3/7 is spread across 6 exons (Figs. 1C and 5A). In addition to the published sequence, which is the 3(+) isoform (Fig. 5B) (Holland et al. 1999), nested PCR revealed an isoform lacking exon 3 (Fig. 5B; 3−). This splicing event eliminates 42 bases at the 3′ end of the region encoding the homeobox, alters the reading frame of exon 4, and results in a stop codon 5 nucleotides into exon 5. This region is equivalent to exon 6 in human and mouse Pax3 and 7 (Fig. 1C). Consequently, this isoform would lack most of the third helix of the homeodomain, which, for other Pax genes, has been shown to mediate both DNA and protein interactions (Bruun et al. 2005) and would have a shortened potential transactivation domain with an altered sequence. A possible homologous splicing event has been described for mouse Pax3, termed 3f (Barber et al. 1999; Seo et al. 1998). Because several isoforms of vertebrate Pax3 and Pax7 are the result of retained introns, we actively checked the equivalent AmphiPax3/7 introns. We found that introns 2, 4, and 5 were retained in amphioxus transcripts (Fig. 5B; In2+, In4+, In5+). In all cases, the retention causes a stop codon (predicted using the B. floridae genome) within the retained intron. Also, in each case, if a splice donor existed downstream of the intron primer site but upstream of the predicted stop codon, we would expect the isoforms to be evident in the nested PCR exon-exon survey, suggesting that these events are due to retained introns rather than alternative 3′ splice donors. Figure 1C shows that the amphioxus introns are equivalent to introns 5, 7, and 8, respectively, in vertebrate Pax3 and -7. The retention of introns 5 and 7 was reported for zebrafish Pax7 (Seo et al. 1998), while the retention of intron 8 has been reported in zebrafish Pax7 and mouse and human Pax3 and Pax7 (Barber et al. 1999; Seo et al. 1998; Vorobyov and Horst 2004), suggesting these events may be highly conserved.

Fig. 5
figure 5

Alternative splicing of AmphiPax3/7. (A) The protein is encoded by six exons and includes a paired domain, an octapeptide, and a complete homeodomain. The transactivation domain would be encoded by exons 4–6. (B) In addition to the published sequence (3+) (Holland et al. 1999), the survey revealed an isoform lacking exon 3 as well as isoforms retaining introns 2, 4, and 5. The removal of exon 3 would remove 14 amino acids from the C-terminal of the homeodomain and alter the reading frame of exon 4, resulting in a stop codon within exon 5. The retention of intron 2 would remove the same 14 amino acids but create an altered C-terminal sequence. The retention of introns 4 and 5 would all create truncated transactivation domains. On the basis of known splice donor-acceptor combinations, there are potentially six AmphiPax3/7 isoforms

Alternative Splicing of AmphiPax4/6 Transcripts Creates an Isoform Lacking the PAI Subdomain

Previous work identified five isoforms of amphioxus Pax4/6 (Glardon et al. 1998). Genomic analysis indicates that three of these (J2, 4.1, and 12.2) use an alternative downstream promoter, while the other two use the upstream promoter (Fig. 6A). Our survey confirmed these splicing events (Fig. 6A and B) and, using nested PCR, revealed several more in the region encoding the N-terminal half, all of which would use the upstream promoter. One of these, which would require the use of the upstream promoter and/or start codon to create an in-frame protein, involves a new exon (exon 2.1) and would change the sequence on the N-terminal side of the paired domain. Another involves alternative splicing of exon 4 (Fig. 6A). The exon 4(+/−) event is analogous to the alternative splicing of exon 2 found in AmphiPax2/5/8. It is predicted to remove almost the entire PAI subdomain of the paired domain and alter the reading frame, leading to a PTC within the sequence normally coding for exon 7. However, the use of alternative downstream start codons or promoters resulting in isoforms lacking the paired domain is well documented (Bandah et al. 2007; Carriere et al. 1993; Jaworski et al. 1997; Zhang and Emmons 1995) and as suggested for Pax2/5/8 exon 2, the alternative splicing of exon 4 may offer a mechanism to regulate the relative proportion of both ‘paired’ and ‘paired-less’ forms of AmphiPax4/6.

Fig. 6
figure 6

Alternative splicing of AmphiPax4/6. (A) Previous splicing events (Glardon et al. 1998) are shown. Genomic analysis suggests that the previously described transcripts (J2, 4.1, and 12.2) use an alternative promoter (downstream arrow), while the remaining transcripts use an upstream promoter (upstream arrow). This survey revealed new isoforms in transcripts driven from the upstream promoter. This includes the inclusion of exon 2.1, predicted to create a new sequence on the N-terminal side of the paired domain, but would require a novel upstream start codon to produce an in-frame protein. The alternative splicing of exon 4 is predicted to remove almost the entire PAI subdomain of the paired domain, altering the reading frame and causing a PTC within the sequence normally coding for exon 7. (B) This survey confirms previous splice sites, including an event (13a+/−), predicted to remove highly conserved residues, one of which is a target for MAP kinase-mediated regulation in vertebrates (Mikkola et al. 1999). On the basis of known splice donor-acceptor combinations, and not assuming the existence of uncharacterized start codons, there are potentially 18 AmphiPax4/6 isoforms

Discussion

Alternative splicing of primary transcripts is one means for proteome expansion in metazoans (Blencowe 2006) and is known to be functionally important in both vertebrate and amphioxus Pax genes (Epstein et al. 1994; Kreslova et al. 2002). Our analyses point to both conserved and divergent splicing events impacting known functional domains and suggest that levels of alternative splicing in the four amphioxus Pax genes are comparable to those in each gene of the equivalent vertebrate family. Thus, the total number of isoforms for the nine vertebrate genes is considerably higher than for the four amphioxus genes.

Alternative Splicing in the N-terminal Encoding Region Suggests Functional Conservation and Divergence

Following gene duplication, alternatively spliced isoforms of the ancestral gene can be subfunctionalized either by being split between the duplicates and, thus, becoming encoded by distinct genes (MacLean et al. 1997) or by losing some duplicate splice-forms. Neofunctionalization of splice forms, defined as an alternative splicing event that evolved in any of the postduplication genes but is not present in the ancestral form, can also occur. Vertebrate Pax genes appear to have undergone both subfunctionalization and neofunctionalization of alternatively spliced forms. An example of the former is a splicing event that skips exon 2 (Fig. 3), which is conserved between AmphiPax2/5/8 and human, mouse, frog, and zebrafish Pax5 (Borson et al. 2002; Heller and Brändli 1999; Kwak et al. 2006; Zwollo et al. 1997), but which has apparently been lost from vertebrate Pax2 and Pax8; there are no published reports of this splice form for Pax2 and Pax8, and we could find no evidence in mammalian EST sequences (data not shown). As noted above, this event removes most of the PAI subdomain of the paired domain. If the accepted ATG is used as the start codon, isoforms lacking exon 2 would have a premature stop codon within the sequence normally encoding exon 3, and would be predicted to produce a truncated and out-of-frame protein, containing no part of any of the functional domains and, therefore, would likely be nonfunctional. However, it has been shown that translation of transcripts can initiate from a downstream ATG within exon 3 (see Fig. 3) that is conserved in AmphiPax2/5/8 and vertebrate Pax5, resulting in isoforms (e.g., Pax5b and 5e) that lack most of the RED domain as well as the PAI domain but are in-frame (Lowen et al. 2001; Zwollo et al. 1997). Although such isoforms would bind paired domain binding sites poorly, if at all (Zwollo et al. 1997), there is evidence that they function to increase the transactivation activity of other Pax5 isoforms (Lowen et al. 2001). Theoretically, transcripts lacking exon 2 could use either ATG as a start codon. If the downstream ATG were used, the resulting protein would be the same as that translated from the downstream ATG of transcripts including exon 2. However, in exon 2(−) forms, transcripts from the upstream ATG would encode an out of frame and extremely truncated protein. Therefore, the increased skipping of exon 2 would likely skew the relative proportion of functional isoforms toward those initiating from the second ATG. The conservation of isoforms lacking the same region of the paired domain in amphioxus Pax2/5/8 and human Pax5 suggests not only that these forms are functional, but also that they have important roles in early development.

We found comparable alternative splicing in AmphiPax4/6, where skipping of exon 4 also removes most of the PAI subdomain of the paired domain, altering the reading frame and resulting in a premature stop codon. Although an identical isoform has not been reported in vertebrate Pax4 or Pax6, events removing the paired domain have been reported (Gorlov and Saunders 2002). In addition, the use of downstream promoters and start codons resulting in isoforms lacking the paired domain (often termed paired-less) occurs in the Pax6 genes of C. elegans (Zhang and Emmons 1995) and several vertebrates (Bandah et al. 2007; Carriere et al. 1993; Jaworski et al. 1997). In addition, products of the two Drosophila paralogues, eyg and toe, lack the PAI subdomain and bind only via their RED and homeodomains (Jun et al. 1998) Possible functions are suggested by a study demonstrating that a paired-less form of Pax6, which interacts with the full-length Pax6 via the homeodomain, confers increased transactivation from paired domain binding sites (Bruun et al. 2005; Mikkola et al. 2001). Further potential functions are suggested by cooperative interactions that occur between the paired and homeodomains (Jun and Desplan 1996). For example, Pax6 isoforms with altered paired domains affect transactivation mediated by a reporter construct containing homeodomain binding sites (Mishra et al. 2002). The use of a highly conserved methionine within amphioxus exon 6 may result in amphioxus paired-less forms, and as suggested for the AmphiPax2/5/8 exon 2(−) form, this event may regulate relative proportions of paired and paired-less forms.

Comparison of our findings with those reported for vertebrates provides evidence for neofunctionalization of Pax splice variants subsequent to gene duplication in vertebrates. For example, in addition to the exon 2(−) form of human Pax5, there are at least six events that skip multiple exons within the N-terminal encoding region (e.g., an exon 2,3,4,5[−] form) (Borson et al. 2002). However, we found no evidence for any of these forms in AmphiPax2/5/8 at the stages we analyzed. Equivalent variants have not been reported in vertebrate Pax2 and Pax8, although in the absence of a systematic survey of isoforms, the possibility remains that such isoforms might exist. However, we cannot rule out the possibility that these six splice variants could predate the amphioxus-vertebrate divergence but have been lost in the amphioxus lineage. Even so, since the percentage of genes and exons undergoing alternative splicing appears higher in vertebrates compared to invertebrates (Kim et al. 2007), the simplest explanation is that these isoforms represent neofunctionalization of Pax5 within the vertebrate lineage.

Another example of likely neofuctionalization is the alternative splicing of a functionally important 42-base pair insertion (exon 5a) (Epstein et al. 1994; Kozmik et al. 1997) in all vertebrate Pax6 genes investigated to date, including those of fish (Puschel et al. 1992). The absence of exon 5a in the Pax4/6 genes of both amphioxus (Glardon et al. 1998) (Fig. 6A) and sea urchin (Czerny and Busslinger 1995) suggests that it evolved within the vertebrate lineage.

Several alternative splice forms involving the N-terminal coding region of amphioxus Pax transcripts appear to have no clear counterparts in vertebrates, for example, the alternative splicing of exons 3, 4, and 4b in AmphiPax2/5/8 and all events in AmphiPax4/6 (Figs. 3 and 7). These splicing events may represent examples of neofunctionalization within the amphioxus lineage. However, the possibility remains that comparable splice forms exist but have not yet been detected in orthologues of vertebrates and/or other invertebrates such as sea urchin. Although many splice forms of Pax genes, especially in vertebrates, have been described (e.g., Bandah et al. 2007; Borson et al. 2002), more comprehensive analyses are clearly needed.

Fig. 7
figure 7

Alternative splicing of amphioxus and vertebrate Pax genes with implications for the common ancestor genes assuming no large-scale loss of alternative splicing (see Discussion for references). (A) The single and probably nonconserved alternative splicing event found in amphioxus and vertebrate Pax9 suggests that little or no alternative splicing was present in the common ancestor. (B) The number of alternative splicing events appears to have undergone a moderate expansion in the vertebrate Pax3 and 7. However, the events in amphioxus do have counterparts in the vertebrate genes, suggesting a common ancestor with multiple events. One event included in this number may not be evident using our survey method. (C) The alternative splicing of amphioxus Pax2/5/8 is comparable to each of the vertebrate genes. Some of these events appear to be conserved, suggesting a common ancestor gene containing multiple alternative splicing events, with many other events being particular to the amphioxus and vertebrate lineages. (D) The number of events in amphioxus is at least comparable to that found in the vertebrate genes. No event appears clearly conserved between vertebrates and amphioxus, suggesting that they have arisen independently following the divergence of the two lineages. The status of the ancestor gene is therefore completely unknown. *Two of the events included in this number involve the use of alternative promoters. Such events would not be isolated using the techniques used, however, exclusion of these events does not alter the overall conclusion

We found no alternative splicing in the N-terminal half of AmphiPax1/9 (Fig. 2) and none has been reported in the comparable region of vertebrate Pax1 or Pax9. The alternative splicing events found in AmphiPax3/7 (Fig. 5) predominantly influence the C-terminal and so are discussed below. However, it is worth noting that alternative splicing in vertebrates results in insertion of a functionally important glutamine into the paired domain of both vertebrate Pax3 and Pax7 and of a glycine-leucine dipeptide into the Pax7 paired domain (Lamey et al. 2004). These events can happen, as the paired domain of vertebrate Pax3 and Pax7 is split over three exons (Fig. 1C), however, since the paired domain of AmphiPax3/7 is encoded by a single exon, the same alternative splicing is not possible. These events could represent neofunctionalization within the vertebrates or loss within the amphioxus linage following the amphioxus-vertebrate divergence.

Alternative Splicing in the C-terminal Encoding Region is Widespread in the Transcripts of Pax Genes

Alternative splicing affecting the C-terminal transactivation domains occurs in all classes of vertebrate and amphioxus Pax genes. However, the evolutionarily conservation of splicing events affecting this region is more difficult to ascertain because the sequence and intron/exon organization downstream of the homeodomains are not as well conserved, and many events appear to be lineage specific. Even so, some of these isoforms, such as Pax2/5/8 C-terminal II (Fig. 4C), human Pax8e (Poleev et al. 1995), and Pax5Δ789 (Robichaud et al. 2004), do appear to be homologous. All three isoforms are predicted to lack the entire transactivation domain but include the region normally encoding the inhibitory domain, albeit in an altered reading frame. Isoforms that lack, or have dramatically altered, transactivation domains (e.g., AmphiPax2/5/8 β, which lacks the transactivation domain due to inclusion of exon 7) may act as competitive inhibitors of other isoforms (Kreslova et al. 2002). The removal of 19 bp at the 5′ end of exon 11 creates the PHT reading frame (Vorobyov and Horst 2006) in AmphiPax2/5/8 but also acts to remove a SSYPYYS sequence (C-terminal I, II, and VI). This event appears highly conserved, as a 19-bp deletion is also found in the final exon of frog and human Pax2 (Heller and Brändli 1997; Tavassoli et al. 1997) and acts to remove the homologous SSPYYYS sequence in both. In addition, the insertion of the serine/threonine-rich exon 7a (Fig. 4, C-terminal VII) appears homologous to the alternatively spliced, serine/threonine-rich, exon 8 in human (Kozmik et al. 1993). Another possibly conserved splicing event is the skipping of exon 3 in Pax3/7 genes. The exon 3(−) form of AmphiPax3/7 (Fig. 5B) would remove 14 amino acids from the C-terminal of the homeodomain and cause a frame shift and premature stop codon affecting the presumed transactivation domain. A homologous splice form occurs in mouse Pax3, termed Pax3f (Barber et al. 1999). As mentioned above, the retention of introns in AmphiPax3/7 is homologous to several events in vertebrates and would truncate the transactivation domain to varying extents (Barber et al. 1999; Seo et al. 1998; Vorobyov and Horst 2004). The apparent conservation of isoforms over such a wide phylogenetic distance suggests they share a function common to all chordates.

Transcripts of AmphiPax4/6 also undergo alternative splicing in the C-terminal encoding regions. However, the use of alternative splice sites in exon 10 or 13, as in AmphiPax4/6, has not to date been described in vertebrate Pax6. Isoforms of mammalian Pax4 with altered transactivation domains have been isolated (Miyamoto et al. 2001; Tokuyama et al. 1998), but comparison of the exons involved suggests that they do not represent conserved events. Use of the downstream splice acceptor within exon 13 of AmphiPax4/6 results in the inclusion of exon 13b. This includes a conserved serine residue (Fig. 6B), phosphorylation of which by mitogen-activated protein kinase (MAPK) in vertebrate Pax6 alters the transactivation ability (Mikkola et al. 1999). Use of the upstream splice acceptor 5′ of exon 13 in leads to the inclusion of exon 13a, resulting in a premature stop codon and a truncated protein lacking the conserved serine. In human Pax6 there are numerous missense mutations that alter the transactivation domain. Patients with such mutations typically suffer from aniridia due to haploinsufficiency (Hanson et al. 1993; Mikkola et al. 1999; Singh et al. 2001). One such mutation, of a conserved residue in exon 13 of human Pax6, which is removed by alternative splicing in AmphiPax4/6, alters the binding affinity of the homeodomain (Singh et al. 2001). Whether the ability of the C-terminal to influence the DNA binding domains is a general property of Pax proteins is still uncertain, but it is supported by changed DNA binding properties of human Pax8 isoforms with altered C-terminal regions (Poleev et al. 1995). However, such a vast array of often quite divergent alternative splicing events in these regions would allow for lineage specific repertoires of Pax proteins, each possessing a range DNA binding specificities. More complete investigations of 3′ splicing in vertebrate Pax genes are needed.

Isoforms with Premature Termination Codons (PTCs)

As discussed above, we found several Pax2/5/8, 3/7, and 4/6 alternative splicing events that would introduce a PTC and, in some cases, would appear to encode a nonfunctional protein unless translated from a downstream start codon (Figs. 3 and 6A). Nonsense-mediated decay (NMD) is a eukaryotic mRNA surveillance pathway ensuring degradation of PTC-containing transcripts (Conti and Izaurralde 2005). A link has been suggested between NMD and mRNA splicing in mammalian cells, such that the introduction of a PTC via an alternative splicing event provides a mechanism to regulate protein levels (Lejeune and Maquat 2005). The extent of this link is still unclear, as the majority of PTC-containing transcripts are present at uniformly low levels, apparently independent of NMD (Pan et al. 2006). However, this mechanism may be highly conserved and it is possible that the PTC-containing transcripts we found could be part of a mechanism regulating the level of functional Pax proteins. Alternatively, PTC-containing transcripts may be the result of splicing errors. A comparison of human and mouse ESTs has suggested that a certain amount of all splicing is aberrant, resulting in truncated nonfunctional proteins (Sorek et al. 2004). Even so, we doubt that the alternative splicing events we found in amphioxus Pax genes, although evidently present only at low levels (i.e., isolated via nested PCR), represent random mistakes in splicing. As noted above, some of these rare splice forms are conserved between amphioxus and vertebrates, suggesting that they are functional. For example, the conservation of exon 2-skipping in AmphiPax2/5/8 and vertebrate Pax5 suggests that, although this event can only be isolated using nested PCR, it generates functional proteins. Also, apart from the isoforms conserved across the chordate phylum, there are many PTC-containing Pax transcripts conserved, to varying extents, within the vertebrate subphylum. Indeed, since alternatively spliced exons, as well as retained introns, both of which alter the reading frame and thereby introduce PTCs, are common in the vertebrate Pax genes (e.g., Barber et al. 1999; Kozmik et al. 1993; Zwollo et al. 1997), their appearance in amphioxus is not surprising. Additionally, given the sensitivity offered by two rounds of PCR, the number of independent primer sets used for the complete screen of each Pax gene transcript (see supplementary materials) and the assumption of no bias in the primer efficiencies for any single Pax gene, if we were isolating only low-level aberrant splicing, we might expect the number of alternatively spliced transcripts to be similar for each of the four amphioxus Pax genes. Instead, the numbers are dissimilar, with much lower numbers for Pax1/9 than for Pax2/5/8. One explanation for the low level of expression of some of these isoforms may be that they occur in only a small population of cells. For example, at the neurula stage, AmphiPax2/5/8 is expressed in the few pigment cells of the frontal eye, a slightly larger number of cells in the developing kidney, and more in the central nervous system and the developing gill slits (Kozmik et al. 1999). Thus although we cannot rule out aberrant splicing, a high degree of conserved gene-specific aberrant splicing within the Pax family, presumably due to the conservation of alternative splice sites for reasons other than the production of altered Pax proteins, would be a phenomenon worthy of further study.

Evolution of the Pax Family and Alternative Splicing

Although there is some uncertainty regarding the duplication history of the Pax genes, it seems likely that the duplication of a single Proto-Pax gene in the urmetazoan ancestor prior to the divergence of the cnidarians and bilaterian lineages gave rise to the two precursors of Pax1/9/3/7 and Pax2/5/8/4/6 lineages and that further duplications resulted in all the four classes of Pax genes in amphioxus, plus another termed Pox-neuro, that was lost in chordates (Balczarek et al. 1997; Hoshiyama et al. 2007; Matus et al. 2007; Vorobyov and Horst 2006). Within the lineage leading to vertebrates, it is thought that further whole-genome duplications followed by gene loss have resulted in the nine Pax genes in most vertebrates (Holland et al. 2004; Holland 2003; Putnam et al 2007, 2008). Our results suggest that, in addition to the duplicates, the number of alternative splicing events per Pax gene appears to be at least equivalent in amphioxus and vertebrates and, in some cases, greater in the latter. The numbers of alternative splicing events with implications for the common ancestor genes are summarized in Fig. 7. It should be noted that in some cases the equivalent exon undergoes alternative splicing in all or some of the vertebrate paralogues, suggesting that the event occurred in the ancestor gene and was maintained following a duplication event. However, for this comparison these events are considered separate because, in all cases, the amino acid sequence of the exon has diverged and, therefore, no longer creates an identical isoform. Both vertebrate Pax9 (Nornes et al. 1996) and AmphiPax1/9 have two known isoforms (Fig. 7A). The presence of more isoforms in the former is suggested by analyses of human and mouse ESTs (de la Grange et al. 2005; Stamm et al. 2006; Thanaraj et al. 2004). In addition, multiple isoforms of Pax1/9 have been found in the tunicate Halocynthia roretzi (Ogasawara et al. 1999), suggesting independent expansion of Pax1/9 splice-forms in this fast-evolving group. Similarly, the levels of alternative splicing in AmphiPax2/5/8 appear to be comparable to those reported in human and mouse Pax2, 5, and 8, revealing an overall expansion of isoforms available to vertebrates (Fig. 7C) (e.g., Borson et al. 2002; Heller and Brändli 1997, 1999; Kozmik et al. 1993; Mackereth et al. 2005; Pellizzari et al. 2006; Poleev et al. 1995; Robichaud et al. 2004; Sekine et al. 2007; Tavassoli et al. 1997; Ward et al. 1994; Zwollo et al. 1997). For Pax4 and 6 the amount of alternative splicing in vertebrates is broadly equivalent to, or greater than, that in AmphiPax4/6, although all events appear to be lineage specific (Fig. 7D) (Bandah et al. 2007; Carriere et al. 1993; Epstein et al. 1994; Gorlov and Saunders 2002; Inoue et al. 1998; Mishra et al. 2002; Miyamoto et al. 2001; Tao et al. 1998; Tokuyama et al. 1998). For Pax3/7, with the exceptions described above (Fig. 5), we see no evidence in amphioxus for many of the isoforms previously described in vertebrates (Barr et al. 1999; Lamey et al. 2004; Parker et al. 2004; Tsukamoto et al. 1994; Vorobyov and Horst 2004) and conclude that the repertoire of splice variants has probably expanded in the vertebrate lineage (Fig. 7B).

The method we employed to isolate amphioxus Pax isoforms, which uses multiple rounds of PCR flanking single exons, as well as across the entire transcript (Gorlov and Saunders 2002), is probably more sensitive than that used in any previous survey of splicing in vertebrate Pax genes. Moreover, because we used whole embryos and adults, our survey of tissue types is all-inclusive. Consequently, in the absence of equally comprehensive studies of vertebrate Pax splice forms, it seems likely that more isoforms of vertebrate Pax genes remain to be discovered. However, just on the basis of previously reported alternative splicing, it seems that the total number of alternatively spliced Pax isoforms for the nine vertebrate Pax genes is considerably higher than for the four amphioxus ones. This conclusion is consistent with the recent finding that, in general, the percentage of genes and exons undergoing alternative splicing is higher in vertebrates compared to invertebrates (Kim et al. 2007).

It has been demonstrated that, in general, gene duplication and alternative splicing have an inverse relationship (Kopelman et al. 2005; Su et al. 2006), suggesting that alternative splicing and gene duplication are interchangeable mechanisms of proteome diversification. However, this does not hold for amphioxus and vertebrate Pax genes. The number of alternatively spliced isoforms per Pax gene appears to be at least equivalent in amphioxus and vertebrates and, in some cases, greater in the latter (Fig. 7). Although our results contradict the finding of an inverse relationship, the duplication of the Pax genes at the base of the vertebrate lineage is thought to be quite ancient, perhaps 520–650 million years ago (Panopoulou et al. 2003; Robinson-Rechavi et al. 2004; Shu et al. 1999), while the inverse correlation is much more pronounced for recent duplicates, (less than ∼80–90 million years ago) (Kopelman et al. 2005; Su et al. 2006). It is possible that this period of time has given an opportunity for the evolution of a large amount of neofunctional alternative splicing, following what may have been initial rounds of subfunctionalization subsequent to the duplication events, a pattern that may be more common in anciently duplicated gene families.

A comparison of amphioxus and vertebrate splicing events that impact domains of known function suggests that the difference between splice variants is considerably more dramatic between Pax isoforms than between the vertebrate duplicates, in which all the functional domains have remained intact (Glardon et al. 1998; Holland et al. 1999, 1995; Kozmik et al. 1999) (Fig. 1A–D). This is consistent with a study demonstrating that gene duplication and alternative splicing are not interchangeable mechanisms of proteome diversification (Talavera et al. 2007). This same study also suggested that the inverse correlation between gene duplication and alternative splicing might be due to the negative selection of alternatively spliced duplicates because of the necessity for a multiple, simultaneous dosage balance of regulating factors. The discovery of alternative splicing events in AmphiPax2/5/8 and 3/7 that are apparently conserved with vertebrates suggests that there were considerable levels of alternative splicing in the common ancestor and offers two examples of alternatively spliced genes being duplicated and maintained.

Possible Role of Expanded Pax Alternative Splicing in Vertebrates

Given the apparent expansion of alternative splicing within the vertebrate Pax lineage it is interesting to consider the possible roles of these isoforms. As described above, functional studies demonstrate that the Pax isoforms have altered DNA binding and transactivation capacities, suggesting that they may bind different gene promoters and/or cause different levels of transcription from the same promoter. Microarray analysis supports this idea by showing that different isoforms of Pax3 regulate distinct but overlapping sets of genes (Wang et al. 2007). It could be that vertebrate Pax genes can influence a far wider range of genes in a much more subtle manner than can the amphioxus Pax genes, with their more limited repertoire of splice variants. The development role of these additional splice forms in vertebrates is incompletely understood. However, the additional isoforms could have played a part in the acquisition of new roles for Pax3-expressing cells at the edges of the neural plate in connection with the evolution of neural crest. Pax3 is required for the normal migration and differentiation of the neural crest (Robson et al. 2006), which evolved after the split between amphioxus and tunicates plus vertebrates (Shimeld and Holland 2000). Interestingly, the transfection of Pax3 splice variants that are not conserved with amphioxus into melanocytes, which derive from neural crest, has isoform-specific effects on cell growth, migration, proliferation, and apoptosis (Wang et al. 2006). Possible insights into the importance of lineage-specific alternative splicing events in vertebrate Pax4 and Pax6 is provided by the alternative splicing of exon 5a in vertebrate Pax6. It has been shown that this event plays a distinct role in postnatal iris formation and is important for the structural integrity of the cornea, lens, and retina (Singh et al. 2002). Our study (Fig. 6A), along with previous investigations (Czerny and Busslinger 1995; Glardon et al. 1998), suggests that this event does not occur in invertebrates, which is entirely consistent with a role in the development of advanced features of the vertebrate eye. The developmental roles of the expanded alternative splicing seen in vertebrate Pax2, 5, and 8 are largely unknown. However, the alternative splicing of exon 8 in human Pax5, an event that does not occur in the nearest equivalent exon of amphioxus, has been implicated in the altered regulation of genes in human lymphocytic leukemia B cells (Oppezzo et al. 2005). The Pax2, 5, and 8 genes are involved in several developmental processes that have become highly elaborated within the vertebrate lineage (Chi and Epstein 2002), and further investigation into the functions of specific isoforms is clearly in order.

In summary, our comparative study of alternative splicing in amphioxus and vertebrate Pax genes has shown that, for this gene family, there is not an inverse relationship between alternative splicing and gene duplication. We find that many events appear to be lineage specific but also find conservation of splice forms that dramatically impact functional motifs. Such evolutionary conservation suggests that these isoforms are not simply a by-product of aberrant splicing and points to the necessity of future experiments to test their function.