Introduction

Cancer/testis (CT-) antigens are a group of diverse proteins that are predominantly expressed in normal testis and in cancer cells. They form 20 families, of which MAGE-A is one of the largest and historically the first (reviewed by Scanlan et al. 2002). The interest in these genes is caused by their expression pattern, making them a promising target for immunotherapy. Several protein families contain a common domain, and thus the corresponding genes form a larger MAGE (for “melanoma antigen”) superfamily (Chomez et al. 2001).

Initially identified in human melanoma (van der Bruggen et al. 1991), MAGE-A genes were found in a variety of cancers, whereas among normal tissues their expression is largely limited to the germ line cells from testis, ovary, and placenta. Other CT-antigens, including non-MAGE ones, were identified by a number of experimental techniques including SEREX (serological expression cloning), differential mRNA expression analysis, and cytotoxic T lymphocyte epitope cloning, as well as mining EST databases for genes with cancer/testis expression patterns (Chomez et al. 2001; Scanlan et al. 2002). Genes from MAGE-A, MAGE-B, and MAGE-C families are expressed in germ line and cancer cells, whereas genes from the remaining MAGE families, most notably MAGE-D (Lucas et al. 1999; Pold et al. 1999) and NECDIN/MAGE-L2 (Jay et al. 1997; Boccaccio et al. 1999), are expressed ubiquitously. The latter are candidate genes for the Prader–Willi syndrome (Jay et al. 1997; Boccaccio et al. 1999).

All MAGE-A genes have one protein-coding exon, preceded by several noncoding exons (De Plaen et al. 1994; Rogner et al. 1995). The same holds for several other MAGE families (MAGE-B, MAGE-G, MAGE-H), whereas coding regions of genes from the MAGE-C and MAGE-D families are interrupted by introns (Lucas et al. 2000; Chomez et al. 2001). Some of these genes are known to be alternatively spliced with isoforms having different 3′-untranslated regions or even coding regions (De Plaen et al. 1997; Lucas et al. 2000).

Most MAGE genes map to the human chromosome X, with the largest families MAGE-A, forming a locus at q28 (De Plaen et al. 1994; Rogner et al. 1995), MAGE-B, at p21–p22 (Muscatelli et al. 1995; Dabovic et al. 1995), and MAGE-C, at q26–q27 (Lucas et al. 1998), although MAGE-F1 is encoded on chromosome 3 (Stone et al. 2001).

Based on the fact that most MAGE families contain no introns in protein-coding regions and show very narrow expression pattern, whereas MAGE-D genes contain numerous introns, are expressed ubiquitiously, and have nonmammalian orthologs, it was suggested that MAGE genes had been formed by retroposition of the ancestral MAGE-D gene (Chomez et al. 2001). If this is correct, the insertion of introns in the upstream region and appearance of alternative splicing of MAGE-A genes should have happened after the founder MAGE-A gene was introduced into the genome. The orthologous mouse family MAGE-A consisting of seven active genes maps to two loci at mouse chromosome X (De Plaen et al. 1999; Chomez et al. 2001). Like their human counterparts, these genes are transcribed in cancer cell lines and in testis. Human MAGE-A and mouse Mage-a proteins form two separate branches on the tree of all MAGE proteins (Chomez et al. 2001; Cannon and Young 2003), and thus it is likely that the multiplication of the ancestral gene occurred independently in these two genomes.

Alternative splicing was recently established as one of the main mechanisms of generating protein diversity in multicellular eukaryotes, and now at least half of human genes are believed to be alternatively spliced (Mironov et al. 1999; Brett et al. 2002; for a review see Modrek and Lee 2002). Moreover, recent comparisons of the human and mouse genomes demonstrated that about half of alternatively spliced genes have genome-specific isoforms (Modrek and Lee 2003; Nurtdinov et al. 2003; Thanaraj et al. 2003), although the functionality of these isoforms was questioned (Kan et al. 2002; Sorek et al. 2004). Creation of alternatively spliced exons is often associated with exon duplication (Kondrashov and Koonin 2001; Letunic et al. 2002) or Alu insertion (Sorek et al. 2002). It has been demonstrated that point mutations can influence the choice of acceptor splicing sites and the ratio of alternatively spliced isoforms (Lev-Maor et al. 2003). On the other hand, splicing-affecting mutations may account for at least 15% of human genetic diseases, and likely even more (Krawszak et al. 1992; Nakai and Sakamoto 1994; Faustino and Kooper 2003). However, evolution of splicing patterns within genomes has not been studied.

In this study we analyze alternative splicing of the MAGE-A genes using mapping of available ESTs to the genomic sequence. We demonstrate the existence of gene-specific isoforms and study the influence of point mutations on constitutive and alternative splicing. To our knowledge, this is the first attempt to reconstruct the evolution of alternative splicing in a family of recently duplicated paralogs. As such, it provides an additional level of resolution to large-scale comparative analyses of alternative splicing in human and mouse genomes.

Methods

Genome and EST sequences were taken from the Human Genome Browser (Karolchik et al. 2003). The human genome assembly of April 2003 (UCSC version hg15) was used. The BLAT-generated alignments from the Human Genome Browser were additionally verified using EST-to-genome alignment by Pro-EST (Mironov et al. 1999). Gene expression data were from Su et al. (2002), obtained via the Human Genome Browser.

Multiple alignment of protein and nucleic acid sequences was done using CLUSTALW with default parameters (Thompson et al. 1994). Phylogenetic trees were constructed using the maximum likelihood algorithm implemented in PHYLIP with default parameters (Felsenstein 1996). The trees were plotted using GeneMaster (Andrey A. Mironov, unpublished).

Alignment regions corresponding to splicing sites were analyzed manually. Mutations changing consensus nucleotides (c/a)AG/GTRAG in donor sites and polyY-NCAG/G in acceptor sites, as well as mutations creating AG dinucleotides upstream of acceptor sites, were considered as weakening the splicing sites, and mutations changing the invariant dinucleotides GT and AG in donor and acceptor sites, respectively, as completely disrupting splicing (Gelfand 1989; Iida 1990; Stenson et al. 2003). Conversely, mutations creating GT and AG dinucleotides or making a site closer to the consensus were considered as increasing the likelihood of splicing.

Results

Genes

The locus containing MAGE-A genes spans about 3 Mb of chromosome X (band Xq28, Fig. 1). There are two recent duplications: inverted duplication of MAGE-A9 and the adjacent region, forming sublocus I, and inverted duplication of the region between MAGE-A3/6 and MAGE-A2, forming sublocus II. In the second case the duplicated genes MAGE-A3 and MAGE-A6 are very similar (99% identity on the nucleotide level and 95% protein identity) but still less similar than the MAGE-A9 and MAGE-A2 (having only 8 substitutions of aligned 5800 nucleotides and 2 substitutions of 4000 aligned nucleotides, respectively). Following the standard nomenclature (Human Genome Browser; Chomez et al. 2001; Scanlan et al. 2002), we retain different names for MAGE-A3 and MAGE-A6 and consider them separately, whereas variants of MAGE-A2 and MAGE-A9, denoted “a” and “b,” are not distinguished below.

Figure 1
figure 1

Genomic organization of the MAGE-A locus. Boldface arrows: MAGE-A genes. Long arrows: inverted repeats.

The phylogenetic trees of the proteins (Fig. 2a) and of the nucleotide sequences immediately upstream of the coding regions (Fig. 2b) are similar. There are three branches in each tree. The first branch (called subfamily I) contains MAGE-A8, MAGE-A9, MAGE-A10, and MAGE-A11. The second one (subfamily II) contains two pairs, MAGE-A3/MAGE-A6 and MAGE-A2/MAGE-A12 (the former pair results from a recent duplication, whereas the latter pair is more diverged, with 91% nucleotide and 87% protein identity). The definition of these subfamilies is also supported by common duplication/insertion events (see below). The third branch, containing the three remaining genes, MAGE-A1, MAGE-A4, and MAGE-A5, is weakly supported by the bootstrap analysis.

Figure 2
figure 2

Phylogenetic trees of the MAGE-A family. a Proteins. b Aligned ∼1800-nucleotide regions upstream of the protein-coding regions of MAGE-A genes. Bootstrap support is shown by filled (>95%) and open (>75%) circles; unmarked nodes have support <55%. The units of branch length (shown by numbers) are the expected fraction of amino acids changed (for proteins, 0.01 is 1 PAM; shown if exceeding 0.02) and the expected nucleotide substitution per site (for DNA alignments, shown if exceeding 0.01).

These trees are slightly different from the tree of Chomez et al. (2001), where subfamily I does not form a cluster, MAGE-A8 is clustered with MAGE-A4 and MAGE-A1, and MAGE-A5 is missing altogether. The remaining differences can be explained by placing the root of the tree of Chomez et al. (2001) within our subfamily I.

The order of genes in the MAGE-A locus weakly correlates with the phylogenetic trees. MAGE-A12 is located between two copies of MAGE-A2, which in turn are framed by MAGE-A3 and MAGE-A6 (Fig. 1). Similarly, MAGE-A11 is located between two copies of MAGE-A9, and MAGE-A8 is immediately downstream of the duplicated area; this sublocus is separated by about 2 Mb from the second sublocus containing the remaining genes (including MAGE-A10 from the same subfamily).

The coding region of each gene is contained in the last exon, preceded by several alternatively spliced untranslated exons. The coding region of MAGE-A5 is interrupted by a premature stop codon that results from a CGA TAA mutation and likely is a recently inactivated, transcribed pseudogene (or encodes a shorter protein of 120 amino acids). MAGE-A7 seems to be a ghost or a pseudogene, as it could not be found either in GenBank or in the UCSC Human Genome Browser; indeed, initially it had been reported to be nontranscribed (De Plaen et al. 1994). MAGE-A10 is extended by about 30 amino acids at the C terminus, whereas close to the N terminus, it contains a serine-rich insert of about 20 amino acids. The length of the remaining proteins is approximately 300 amino acids.

Expression array analysis showed that MAGE-A1, -A2, -A3, -A4, -A5, -A6, and -A12 were highly expressed in testis and cancer and, to a lesser extent, in thymus, placenta, and ovary, followed by pancreas and brain (Table 1) (Su et al. 2002). Expression of the subfamily I genes was weak in all tissues. On the other hand, RT-PCR demonstrated that, in addition to cancer and testis, MAGE-A3, -A4, -A8, -A9, -A10, -A11, and -A12 were expressed in placenta, whereas MAGE-A1, -A2, -A6, and -A12 were not (De Plaen et al. 1994).

Table 1 Tissue specificity of MAGE-A gene expression

Analysis of EST data demonstrated that most genes have one predominant isoform and several isoforms supported by only one or two ESTs. The main isoform is always shown as the first one in Fig. 3, with two exceptions: There is no predominant isoform of MAGE-A2, where all isoforms are supported by one or two ESTs, and there are two main isoforms of MAGE-A4, corresponding to the second and the fourth variants of the initial exon (counting from the right). Most ESTs were derived from cancer cell line libraries with the following exceptions: All MAGE-A5 ESTs were from normal placenta; the main MAGE-A6 isoform and two major MAGE-A4 isoforms were observed in cancer cell lines as well as normal testis and brain (medulla); the minor MAGE-A9 isoform was observed in testis; and, finally, multiple ESTs derived from the MAGE-A4 coding exon common to all isoforms were seen in placenta.

Figure 3
figure 3

Schematic representation of the exon–intron structure of the MAGE-A genes. Boxes: exons. Thick lines: introns. Gray boxes: protein-coding exon 0. Checkered boxes: homologous initial exons. Dotted vertical line: boundary of the well-alignable region. The major isoform supported by multiple ESTs is the first one in all cases excluding MAGE-A2 (no predominant isoform) and MAGE-A4 (two main isoforms, the second and fourth ones from the top; see the text). a Simple cases. b Remaining genes of subfamily II (dotted horizontal line in MAGE-A12: deletion removing parts of exons 1 and 0 and the intron between these exons). c Subfamily I. d Multiplication of the initial exon in MAGE-A4 (double-dotted vertical line in MAGE-A4: the area between the initial exon and coding exon 0 contains no exons and is not shown).

About 1800 bp upstream of the coding region could be aligned in all representatives of the family. This region contains four groups of noncoding exons with slightly different or alternative splicing sites (Fig. 3). This region also contains several deletions or insertions that are specific for branches of the phylogenetic trees and thus support our definitions of subfamilies: a deletion in MAGE-A3 and MAGE-A6 and two likely insertions, a long one common to subfamily I and a short one common to subfamily II. One more deletion is specific to MAGE-A12. All deletions and/or insertions are flanked by short (4–7 nt) direct or inverted repeats. By the way, an alignment excluding the deleted/inserted regions produces the same phylogenetic tree as the long, complete alignment.

Upstream of this region, genes of subfamily I cannot be aligned with the remaining genes, and farther upstream there are multiple duplications of the starting exon in most genes (see below) and thus alignment makes no sense.

Noncoding exons of human MAGE-A genes could not be aligned with noncoding exons of mouse Mage-a genes (data not shown). Indeed, as mentioned in the Introduction, the human MAGE-A genes were duplicated after the divergence of the human and mouse genomes. Thus the mouse genes were not considered in this study.

Exons and Splicing Sites

As there is no universal correspondence between the upstream exons for different genes, for consistency the exon groups are numbered in the 3′-to-5′ order, thus the only coding exon is numbered 0, the preceding exon is numbered 1, etc. The region well alignable throughout the MAGE-A family contains four groups of noncoding exons and the coding exon, by the above convention numbered 4 through 0 (if counting in the standard 5′-to-3′ direction). Not all exons are present in all genes; there are also alternative sites (specified by lowercase letters, e.g., exon 1a, acceptor site 2b, etc.). The isoforms are shown in Fig. 3 and discussed in detail below.

Exon E1

Exon E1 is present in all isoforms of all genes, excluding MAGE-A12, where two deletions remove the intron and the adjacent parts of exons 1 and 0, creating a chimeric exon, 1–0. There are alternative acceptor sites: a site upstream of the main one in MAGE-A3 and MAGE-A6 and a site downstream of the main one in MAGE-A1. The latter is only 8 nt downstream of the main site and, thus, has a poor splicing context (interfering AG immediately upstream). Indeed, it is used in only one EST of five MAGE-A1 ESTs, and the corresponding sites in other genes are inactive, although in some genes the sequence is almost the same (Fig. 4a). The upstream site in MAGE-A3 and MAGE-A6 has been created by an activating mutation G T at position −5 that has enhanced the polypyrimidine tract. Inactive sites in other genes have G or C there. Farther upstream, these genes have AA dinucleotides that weaken the polypyrimidine tract but, on the other hand, might serve as a branch point.

Figure 4
figure 4

Alternative and constitutive acceptor sites of exon 1. Underlined: exons. Wavy underlined: alternative regions (belonging to the intron or exon dependent on the choice of an alternative site). Upper line: deduced ancestral sequence. Boldface: nucleotides, conforming to the consensus splicing signals. Italics: nucleotides not conforming to the consensus splicing signals. Boldface italics: nucleotides that are consensus for one site and nonconsensus for the other site (only for observed sites and ancestral sequences). Shaded: mutations that could have changed site functionality (see text).

Cassette Exon E2

Cassette exon E2 was observed in MAGE-A2, MAGE-A9, MAGE-A10, and MAGE-A8, although potentially it could be incorporated into mRNA in some other genes as well, as the splicing sites are conserved. The only gene where this exon is included in the major isoform is MAGE-A9.

It has two alternative donor sites, the upstream one and the downstream one. Neither site is used in MAGE-A8, and in one isoform the intron between exon 2 and exon 1 is retained, producing a long exon, 2-1. Both sites are used in MAGE-A10, where the use of the upstream site leads to an exon that is only 19 nt long. This site is activated by mutation GCA GAG in the exon positions of the site, making it closer to the consensus (Fig. 5).

Figure 5
figure 5

Splicing sites of exon 2. Acceptor sites and the downstream donor site are shown. Double-underlined: cassette exons. Other notation as in Fig. 4.

The history of the acceptor sites of exon 2 is somewhat complicated. Exons denoted 2a and 2 are not really alternative. In genes from subfamily II, a 7-nt insertion flanked by a direct repeat CAG/GA (overlapping with the intron–exon junction) created two possible sites (Fig. 5). In MAGE-A2 the upstream site is used, as the downstream site is in a weak context, namely, an upstream AG, G at position −3, and a weak polypyrimidine tract.

Cassette Exon E3

Cassette exon E3 is specific to MAGE-A8, where it was observed in only one EST. The corresponding region has been created by a long insertion in subfamily I. The insertion is flanked by a direct repeat of TGAGGAC. It contains the donor site of exon E3. The corresponding positions in other members of the family do not contain the GT dinucleotide, which is destroyed by point mutations or short deletions (not shown). The acceptor site in MAGE-A8 corresponds to a well-alignable region, and it has been created by a short gene-specific insertion that created the AG dinucleotide and the polypyrimidine tract (Fig. 6).

Figure 6
figure 6

Donor splicing site of exon 3 in MAGE-A8. Notation as in Figs. 4 and 5.

Cassette Exon E4

Cassette exon E4 was observed in a pair of genes from subfamily I, MAGE-A10 and MAGE-A11, and a pair of genes from subfamily II, MAGE-A2 and MAGE-A12, although potentially it could exist also at least in MAGE-A1 (despite a slightly weaker donor site) and MAGE-A9 (which is basically indistinguishable in the site regions from MAGE-A10); on the other hand, the acceptor site region is covered by a long deletion in MAGE-A3 and MAGE-A6, and this deletion is flanked by an inverted repeat CCCCT–AGGGG. In MAGE-A12 it was observed in only one EST, whereas in the other three genes it was included in the majority of ESTs.

The dynamics of the site choice in this exon is rather clear. In MAGE-A2, the ancestral (upstream) donor site was weakened by mutations AG GA at positions 4–5, and this led to the use of the downstream site that extends the exon by 14 nt; the latter was created by a G T mutation that produced the canonical GT dinucleotide (Fig. 7a). The relative timing of these events is unknown, but not very important in this context. The use of very close acceptor sites in MAGE-A2, MAGE-A12, MAGE-A10, and MAGE-A11 (exons 4, 4a, 4b) can be explained by a simple rule: The upstream AG is used, and if it is inactivated by a point mutation, the next one is used; in MAGE-A11 the downstream acceptor site is enhanced by a G C mutation in position −3 (Fig. 7b). Finally, a MAGE-A10-specific alternative acceptor site (exon 4c) was created by a series of upstream transversions and deletions that removed upstream purines and thus created a perfect polypyrimidine tract (Fig. 7c).

Figure 7
figure 7

Donor and acceptor splicing sites of exon 4. a Donor sites. b, c Acceptor sites. Notation as in Figs. 4 and 5.

Initial Exons

In region 5′ of exon 4, subfamily I and the remaining genes need to be aligned separately. There is also no correspondence between exons of different genes. However, the main initial exons are homologous (Table 2). In MAGE-A2, MAGE-A12, MAGE-A8, and MAGE-A10 these exons are unique, whereas in the remaining genes the corresponding region was duplicated several times. The most dramatic expansion of this region happened in MAGE-A4 (Fig. 3d), where this region forms a tandem repeat of nine copies; in addition, there are six more partial copies. The core of the repeated region is about 100 nt, covering the donor site of the initial exon. The region corresponding to the initial exon can be aligned well in all genes from subfamily I, but it is expressed only in MAGE-A8 and MAGE-A10, and not in MAGE-A9 and MAGE-A11.

Table 2 Region of the donor site of the main initial exon

As usual, the exact boundaries of the repeated region are difficult to define. Sometimes the aligned region corresponding to the donor site is rather narrow, whereas in other cases several hundred nucleotides may be aligned.

In some genes there are additional or alternative start exons. In particular, alternative initial exons may be used in MAGE-A9 (where a shortened version of the main initial exon is preferred), in MAGE-A11 (an alternative start exon 22 kb upstream of the main one), in MAGE-A3 (an alternative start exon downstream of the main one), and in MAGE-A12 (an alternative start exon downstream of the main one). The donor site of the downstream start exon of MAGE-A3 lies within the region corresponding to exon 5 of MAGE-A2 (see below); it has been created by a C T mutation that produced a GT dinucleotide (Fig. 8a). Similarly, the donor site of exon 5 of MAGE-A3 has been created by a C G mutation that produced a GT dinucleotide (Fig. 8a). The donor site of the downstream start exon of MAGE-A12 is the same as the donor site of exon 6 of MAGE-A2 (see below); it has been created by a C A mutation that improved the match to the consensus (Fig. 8b).

Figure 8
figure 8

Donor sites of downstream alternative start exons. a MAGE-A2 and MAGE-A3. b MAGE-A2 and MAGE-A12. Notation as in Figs. 4 and 5.

In MAGE-A4, of 15 candidate initial exons formed by the expanded repeat, 8 are used; 6 of these are within the long tandem repeat (three candidate exons in the repeat region are not used). However, as most isoforms are observed only once or twice, all statements about the absence of isoforms are very preliminary; in fact, it is likely that most candidate exons will be observed once more ESTs for this gene are sequenced. In addition, there is an isoform starting with initial exon 8 and using an internal exon bounded by an acceptor site upstream of the initial exon 7 and the donor site of initial exon 7. Similarly, in MAGE-A11 there is an additional internal exon following the main initial one.

The most complicated situation is that in MAGE-A2 (Fig. 3b). There are two cassette exons, 6 and 5, and two variants of the start exon, the standard initial exon and a longer exon that starts downstream of the main one. The donor site of the longer exon is downstream of the acceptor site of exon 6, and thus there is a short overlap (11 nt) between these two exons (double-underlined in Fig. 8b). In addition, one EST contains exon 6-5, which spans exons 6 and 5, the retained intron between them, and the sequence upstream of exon 6. This exon is spliced to the acceptor site of exon E4, but after that the EST is incomplete.

Alternatively Spliced and Chimeric Isoforms

The balance between isoforms is usually rather uneven. In most cases one isoform clearly dominates (the only exclusion seems to be MAGE-A4, where two isoforms using initial exons i2 and i4 are prevalent, and MAGE-A2, where each of the multiple isoforms is supported by one or two ESTs). Exon 4 tends to be used when the gene has sufficiently strong splicing sites (it is always used in MAGE-A10 and MAGE-A11, used in most cases in MAGE-A2, and rarely used in MAGE-A12). On the contrary, exon 2 is constitutive only in MAGE-A9, whereas it is seen only in minor isoforms in MAGE-A2 and MAGE-A10. Exon 3 of MAGE-A8 is supported by only one EST. Of the start exons, the main one (or its duplicates in MAGE-A4) is preferred by all genes but MAGE-A9, where the major isoform uses the shorter version.

Finally, there are several chimeric ESTs containing exons of different genes (Fig. 9). One EST splices the initial exon of MAGE-A12 to the candidate downstream gene BC013171. There is also one EST splicing the initial exon of MAGE-A10 to exons 1 and 0 of MAGE-A5; there is one short exon in the intervening region, supporting the hypothesis that these isoforms are produced by splicing of read-through transcripts.

Figure 9
figure 9

Chimeric ESTs. Notation as in Fig. 3.

Discussion

Inactivation of splicing sites due to mutations is a well-known phenomenon, extensively studied in the context of human genetic disease (reviewed, e.g., by Faustino and Cooper 2003; Stoilov et al. 2002). In most cases this leads to exon skipping or activation of cryptic sites: Numerous examples were observed in many different genes (Nakai and Sakamoto 1994; O’Neill et al. 1998; Tuffery-Giraud et al. 1999; Stenson et al. 2003). On the other hand, there are cases of creation of new splicing sites by point mutations (Nelson and Green 1990; O’Neill et al. 1998; Bagnall et al. 1999). The use of specific sites is difficult to predict, although there is some correlation between site choice and its closeness to the consensus (Iida 1990; Ketterling et al. 1999; Lev-Maor et al. 2003). In most cases considered here, the creation of new sites or the choice between alternative sites indeed can be explained by mutations improving match to the consensus or, vice versa, changing functionally important nucleotides.

One specific case is the choice between AG dinucleotides in acceptor sites. We have observed several cases when the most upstream AG is used in a group of several AG dinucleotides. Inactivation of the upstream AG by mutation leads to activation of the (cryptic) AG downstream. Indeed, the avoidance of upstream AG in acceptor sites is a well-known phenomenon (Gelfand 1989). Another feature of functional acceptor sites is C (or, to a lesser extent, T) at position −3, immediately preceding the AG, and again, we have observed mutations that activate sites by changing the nucleotide at this position to C. This agrees with the results of Lev-Maor et al. (2003), where the choice of acceptor splicing sites in alternative exons created by insertions of Alu repeats was studied.

Exon duplication is a well-known mechanism of molecular evolution, in many cases mediated by alternative splicing (Kondrashov and Koonin 2001; Letunic et al. 2002). In most studied cases it concerns internal protein-coding exons, although it might be due to limitations of the applied computational techniques and the fact that protein-coding regions are more conserved than noncoding ones. Indeed, alternative splicing is often associated with differential choice of promoters (Mironov et al. 1999; Tasic et al. 2002). In MAGE-A it is not clear whether this mechanism is implicated since all genes of this family seem to have the same tissue specificity.

Finally, chimeric mRNAs are often a likely product of splicing of read-through transcripts (Romani et al. 2003).

Two remaining questions are whether our scenario of evolution of splicing sites is correct and whether the observed isoforms are functional. As regards the former question, in all cases we invoked the most parsimonious explanation assuming the smallest number of evolutionary events. Indeed, in most cases the observed mutations are gene- or lineage-specific.

The problem of functionality is more complicated. It is well known that transformed cells produce numerous aberrant mRNAs, possibly due to relaxation of control mechanisms. In particular, many alternative isoforms, especially those supported by unique ESTs, might be nonfunctional (Kan et al. 2002). So, one possible explanation for the observed diversity of spliced isoforms of MAGE-A genes might be that they represent rare mis-splicing events. We believe that this is not the case.

Indeed, in at least some cases several full-length mRNA isoforms have been observed (De Smet et al. 1994; De Plaen et al. 1994; Rogner et al. 1995); in other cases, alternative exons were found in different tissue samples (Ali Osmay Güre, personal communication). This means that the fraction of the corresponding isoforms is large enough to be detectable. Further, although in many cases the observed isoforms are supported by unique ESTs, alternative splicing sites occur in several different isoforms. Moreover, a more recent version hg16 of the Human Genome Browser contained only a few new isoforms and all of them were generated by the sites considered here and conformed to the observed patterns (data not shown).

There is also remarkable consistency in isoforms of different MAGE-A genes. It is well known that genomic sequences contain a large number of cryptic splicing sites (e.g., Thanaraj 2000). One could expect that aberrant splicing events would indiscriminantly use these sites. Instead, the same alternative sites and exons are utilized by several genes, whereas the differences in splicing of MAGE-A genes in most cases can be explained by mutations in splicing sites.

Thus the observed isoforms are real in the sense that they represent naturally occurring events. Their functionality in the sense of differential regulation, specific properties, expression patterns, etc., remains an open issue for experimental analysis.

From the evolutionary point of view, it seems that we witness an early stage of gene diversification. A plausible scenario seems to be retroposition of an ancestral MAGE-D gene (Chomez et al. 2001) in the common ancestor of human and mouse, accompanied by loss of a part of the coding region. Thus created pre-MAGE-A was subject to several independent duplications in each of these genomes. Duplications in the human genome seem to be continuing in the sense that some of them are very recent.

Diversification on the level of genes was accompanied by diversification on the level of alternatively spliced isoforms. Alternative splicing did not influence the coding region but generated different 3′-untranslated regions in mRNAs. This process was shaped by two types of events: deactivating mutations in splicing sites, leading to exon loss or intron retention, and birth of new sites by point mutations or insertions creating GT or AG dinucleotides in a proper context, leading to emergence of new alternatively spliced exons. Both types of events could cause exon truncation or extension due to the use of preexisting cryptic sites or newly generated sites.

Analysis of the Mage-a genes in mouse and other families of MAGE/Mage genes will show how common the observed situation is. It should be particularly interesting to analyze the MAGE-B family, where the human and mouse genes are intermixed in the phylogenetic tree, and the MAGE-D family, whose members are highly conserved in the human and mouse genomes and can be found in more distant genomes as well.