Introduction

TAR (transformation-associated recombination) cloning was developed to selectively isolate large genomic fragments from complex genomes (Larionov et al. 1996a, b). The method exploits a high level of recombination between homologous DNA sequences during transformation in the yeast Saccharomyces cerevisiae. For isolation, genomic DNA is transfected into yeast spheroplasts along with a TAR vector that contains targeting hooks homologous to the genomic DNA sequences. Recombination between the vector and the targeted genomic fragment results in rescue of the fragment as a yeast artificial chromosome (YAC). The method has been used in our laboratory for isolation of dozens of human and mouse single-copy genes and specific chromosomal fragments (Larionov et al. 1997; Kouprina et al. 1997, 1998b; Cancilla et al. 1998; Annab et al. 2000; Kouprina and Larionov 1999; Kim et al. 2000; Humble et al. 2000; Leem et al. 2002). Usually both parental alleles of the gene are isolated to allow haplotypes analysis. The technique also provides a way to selectively clone mutant genes from clinical material (Kouprina, unpublished). Other applications of TAR cloning include closing gaps and verifying contig assembly in draft genome sequences (Kouprina et al. 2003).

Because TAR cloning is based on in vivo recombination in yeast, the efficiency of gene targeting should depend on the length of the targeting sequences and the extent of their homology to the targeted genomic region. Recently, we determined that the minimal size of homology required for gene isolation is 60 bp (Noskov et al. 2001) and that further lengthening does not increase the yield of gene-positive clones. In this study we analyzed the effect of sequence divergence on the efficiency of gene capture. We found that divergence of up to 15% did not prevent gene isolation. Such tolerance to DNA divergence means that TAR cloning techniques can be applied to the isolation of chromosomal duplications and gene homologues from different organisms.

Materials and Methods

Yeast Strains, Cell Lines, and Transformation

We used the highly transformable Saccharomyces cerevisiae strain VL6-48N (MAT alpha, his3-Δ1, trp1l, ura3-Δ1, lys2, ade2-101, met14, cir°), which has HIS3 and URA3 deletions, for the transformation experiments (Noskov et al. 2002). Spheroplasts were prepared as described previously (Kouprina and Larionov 1999). Agarose plugs (60 µl) containing approximately 2 µg of high molecular weight DNA were prepared from normal human MRC-5 fibroblasts (American Type Culture Collection), liver cells of the Tg.AC mouse (hemi- and homozygous for the transgene), or the monochromosomal hybrid cell line UV5HL9-5B containing human chromosome 19 (Lawrence Livermore National Laboratory). Linearized TAR cloning vectors (1 µg) were added to genomic DNA (2 µg) and mixed with yeast spheroplasts. Transformants were selected on synthetic histidine minus plates.

TAR Cloning Vectors

We constructed the yeast–E. coli shuttle TAR cloning vector pMK–Alu (Alu–HIS3-CEN6–BAC–Alu) for generation of the human chromosome 19-specific library as follows. We created two Alu targeting sequences using a pair of overlapping oligonucleotides, Alu-T and Alu-D (Table 1). Alu-T contained the first 45 bases of the Alu consensus sequence (Batzer et al. 1994), and Alu-D contained the next 44 bases (positions 46 through 89). These oligonucleotides were annealed and filled by a Taq DNA polymerase. The 148-bp fragment obtained was digested with XbaI and ApaI endonucleases and ligated into polylinker sites of the basic TAR vector pVC604 (Kouprina and Larionov 1999), producing pVC604–Alu45. To generate the yeast–E. coli shuttle vector pMK–Alu, a 2559-bp Alu–CEN6–HIS3–Alu cassette in pVC604–Alu45 was PCR-amplified with the primers 604-P and 604-C (Table 1) and ligated with a 6384-bp SalI fragment of the pBeloBAC11 vector (accession No. U51113). Before being used, the vector was linearized at the BamHI site between two Alu targeting hooks.

Table 1 Primers used in this study

Seven TAR vectors with targeting hooks containing different numbers of base substitutions were generated from the SV60 vector (Noskov et al. 2001), which contains a 60-bp unique sequence of the mouse transgene SV40 promoter and a 130-bp segment of the common mouse B1 repeat. Mutations in the SV40 targeting sequence were introduced by PCR amplification of the SV40–B1 cassette with B1-specific primer and a 60-bp set of SV40-specific primers containing mismatches. The amplified cassettes were recloned into the basic TAR vector pVC604 (Kouprina and Larionov 1999) as an ApaI–SmaI fragment, producing a set of vectors with hooks having different levels of divergence (Fig. 2). The hook in SV60–3M contained three mismatches (A-to-G transitions at positions 5, 25, and 45). The targeting hook in SV60–6M contained six mismatches (A-to-G transitions at positions 5, 15, 25, and 45; a T-to-C transition at position 35; and a G-to-A transition at position 55). The hook in SV60–12M carried 12 mismatches (A-to-G transitions at positions 5, 15, 25, 40, 45, and 60; T-to-C transitions at positions 20, 30, 35, and 50; a G-to-A transition at position 55; and a C-to-T transition at position 10). The hook in SV60–18M contained 18 mismatches (A-to-G transitions at positions 5, 15, 25, 38, 40, 45, and 60; T-to-C transitions at positions 20, 28, 30, 35, 48, 50, and 58; G-to-A transitions at positions 18 and 55; and a C-to-T transition at position 10). Vectors SV60–10ML, SV60–10MC, and SV60–10MR contained SV40 targeting hooks, with the mismatches clustered in different parts of the hook. Mutations were introduced by PCR just as they were, in hooks with scattered mismatches. The targeting hook in SV60–10ML contained 10 mismatches at the proximal end of the hook (positions 1–20), SV60–10ML contained 10 mismatches in the middle of the hook (positions 21–40), and SV60–10MR contained 10 mismatches at the distal end of the hook (positions 41–60). For each region, a transition mutation was generated at the second base pair. The inserts were sequenced as appropriate. Schemes of mutated targeting hooks are shown in Fig. 2. Prior to the experiment, the TAR vectors were linearized with SalI to release the targeting hooks.

Figure 1
figure 2

Alignment of targeting sequences derived from the transgene-specific sequence with different numbers of mismatches. All sequences are 60 bp. Each mismatch is indicated by a vertical arrow.

For cloning the human and mouse HPRT genes, we constructed, respectively, TAR vectors pMHPRT–Alu and pMHPRT–B1. Both contained the same targeting hook specific the 3′ mouse HPRT sequence. As a second hook, pMHPRT–B1 contained a common mouse B1 repeat and pMHPRT–Alu a common human Alu repeat. The specific targeting hook was based on a mouse genomic HPRT sequence in the public draft sequence (contig NW_042621; positions 15231843–15265284). The targeting sequence was PCR-amplified from mouse genomic DNA using a pair of specific primers, mHP-F and mHP-R (Table 1). The forward primer added an EcoRI restriction endonuclease site to the amplified DNA, while the reverse primer added an XhoI site. A 96-bp XhoI-EcoRI fragment was cloned into the XhoI and EcoRI sites of the pVC604 polylinker. A 189-bp ApaI–XhoI Alu repeat or a 130-bp ApaI-ApaI B1 repeat was also cloned into the polylinker. Prior to the experiments, the vectors were linearized with XhoI to release the targeting hooks.

For cloning segmental pericentromeric duplications from the human genome, we constructed the TAR vector pCHR10–PCD. The vector contained a targeting hook specific for the pericentromeric region of chromosome 10 and, as a second hook, an Alu sequence. The targeting hook was based on the public draft human genome sequence (accession No. AL133173; positions 178082–178177). The targeting sequence was PCR-amplified from genomic DNA prepared from a monochromosomal hybrid cell line with a specific primer pair, CH10-F and CH10-R (Table 1). The forward primer added an XhoI restriction endonuclease site to the amplified DNA, while the reverse primer added an EcoRI site. A 108-bp XhoI–EcoRI fragment was cloned into a pVC604 polylinker. The second hook, 45 bp of Alu consensus sequence (Batzer et al. 1994), was synthesized and cloned into the EcoRI–XbaI sites of the polylinker. Before the transformation experiments, the TAR cloning vector was linearized with EcoRI to release the targeting hooks.

Construction of the Human Chromosome 19-Specific YAC/BAC Minilibrary

Genomic DNA was prepared from the human–rodent monochromosomal hybrid cell line UV5HL9-5B, which contained human chromosome 19, and presented to yeast spheroplasts along with the BamHI linearized TAR vector pMK–Alu. Recombination between Alu sequences in the vector and those from human chromosome fragments led to the establishment of circular YACs containing a BAC cassette. Five hundred randomly chosen His+ transformants were characterized by clamped homogeneous electrical field (CHEF) gel electrophoresis followed by probing with an Alu-specific probe (Kouprina et al. 1998a). Among 500 transformants, 380 (nearly 75%) contained human DNA inserts. Agarose plugs prepared from 380 human-positive YAC/BACs containing transformants were used for electroporation into E. coli cells. All YAC/BACs were successfully transferred from yeast to bacterial cells with electrocompetent DH10B cells, as described previously (Kouprina and Larionov 1999). BAC DNAs were isolated using a standard protocol.

Sequencing of BAC Ends

We sequenced the ends of the inserts of 150 YAC/BAC clones generated with the Alu-containing TAR vector pMK–Alu using the standard vector-specific primers M13F and M13R. We compared the sequences to the draft human genome sequence of chromosome 19 at the web sites of the National Center for Biotechnology Information (_http://www.ncbi.nlm.nih.gov/genome/guide/ ) and the University of California, Santa Cruz (_http://genome.ucsc.edu/goldenPath/apr2001Traks.html ) (Build 30, June 2002), using BLAST genome analysis software (_http://www.ncbi.nlm.nih.gov/BLAST/blast_databases.html ).

Analysis of Yeast Transformants

We used a pair of primers specific to the ξ-globin promoter region to PCR-screen transformants for the presence of the Tg.AC transgene sequence (Humble et al. 2000). These primers generate a 419-bp PCR product that is diagnostic for recombination between the TAR vector and genomic Tg.AC transgene sequences. For transformants that contained a human HPRT sequence, we used primer pair 46L plus 47R, which amplifies a 575-bp sequence of exon 2 along with its flanking introns (Noskov et al. 2002). We developed two primer pairs diagnostic for mouse HPRT exon 3, mHPRTex3-F1 plus mHPRTex3-R1 and mHPRTex3-F2 plus mHPRTex3-R2 (Table 1). Yeast genomic DNA was isolated from transformants as described previously (Kouprina and Larionov 1999) and PCR-amplified by either (a) 30 cycles of 45 s at 94°C, 15 s at 50°C, and 45 s at 68°C or (b) 45 s at 94°C and 45 s at 68°C, followed by a 7-min extension cycle at 72°C. To examine for the presence of the HPRT coding region in YAC clones, we used PCR with nine pairs of the primers specific for exons 1–9 (Table 1). For detection of clones with the region duplicated on chromosomes 2, 10, 16, 22, and X, we performed PCR with duplication-specific primers CDD-F and CDD-R (Table 1), which were designed from the genomic DNA sequence 399 bp downstream of the targeted region. To assign TAR isolates to a chromosomal duplication of a certain chromosome, we analyzed transformants by PCR with primers developed for chromosomal duplications (Horvarth et al. 2000a). Each PCR product was sequenced and compared to duplicated genomic sequences. To determine the size of YAC inserts, we prepared chromosome-size DNA from yeast transformants, exposed it to low-dose γ-rays (5 krad), separated it by CHEF gel electrophoresis, and blotted and hybridized it with a diagnostic probe.

Rescue of HPRT YAC Ends

The human HPRT YAC ends were rescued and sequenced with a standard protocol. Yeast genomic DNA from YAC-containing clones was digested with the restriction endonuclease ApaI, ligated, and transfected into E. coli. Transformants were selected on ampicillin-containing medium (50 µg/ml). Plasmids with rescued YAC ends were isolated with a standard protocol and the insert was sequenced with a T7 primer.

Results

Analysis of Products of Recombination Between Alu Targeting Hooks and Human Genomic DNA

In our previous work, we used Alu-containing TAR vectors to construct human chromosome-specific libraries (Kouprina et al. 1998a; Nihei et al. 2002). Analysis of TAR isolates revealed no bias in human chromosomal regions recovered during TAR cloning with the vectors containing Alu consensus repeats as targeting sequences. The presence of approximately 1,500,000 Alu repeat copies with different levels of divergence in the human genome (i.e., up to 15–20%) suggested that isolation of genomic fragments was the result of interaction between repeats that were not necessarily 100% homologous. To test whether targeting similar but not identical (homeologous) sequences is indeed efficient, we sequenced the ends of inserts in YAC/BACs generated from human chromosome 19 (see Materials and Methods for details). The vector used for library construction, pMK–Alu, contained two 45-bp Alu targeting sequences developed from an Alu consensus sequence (Batzer et al. 1994). Figure 1 summarizes the data from comparison of the Alu sequences from the pMK–Alu vector with the Alu sequences at the ends of inserts of 150 randomly chosen clones. Approximately 40% of the end sequences exactly matched the 45-bp Alu targeting hooks in the TAR vector. Thirty, 13, and 9% of the sequences differed from the targeting hooks by one, two, and three base substitutions, respectively. A small fraction (9%) of TAR clones arose by targeting the genomic Alus with more than three base-pair substitutions and/or single-base deletion or insertions. For half the clones, the end sequences (up to 700 bp) were confidently identified in the draft sequence of chromosome 19, which allowed us to identify the targeted genomic Alus and compare them directly with the Alu sequences in the hooks. The comparison showed that in most cases (115/157), substitutions detected in the YAC/BAC end sequences were also present in the targeted genomic Alus, which is in agreement with our suggestion that divergence between Alu repeats does not hinder their targeting. It is impossible, however, to make any quantitative estimates using those data. The draft sequence of human chromosome 19 has been built only in part by sequencing the clones generated from UV5HL9-5B, the monochromosomal hybrid cell line that was used to construct the YAC/BAC library analyzed in this study. Most of the sequences came from analyses of the libraries constructed from different sources of genomic DNA, suggesting that some DNA changes derive from Alu sequence polymorphisms.

Figure 2
figure 1

Analysis of products of recombination between Alu targeting hooks in the vector and human genomic DNA. We used MEGA2 software to estimate the number of substitutions (Kumar et al. 2001) and Statistica for graphical representation of the results.

Effect of Target Sequence Divergence on the Efficiency of Gene Isolation

To estimate quantitatively how the degree of divergence affects gene targeting, we chose the mouse Tg.AC transgene as a target. This region was previously used to determine the minimal size of homology required for TAR cloning (Noskov et al. 2001). The Tg.AC transgenic region consists of approximately 40 copies of the transgene integrated into a unique site on chromosome 11. Each transgene unit consists of a ξ-globin promoter fused to a ν–Ha–ras structural gene with a terminal simian virus 40 (SV40) polyadenylation signal and spans about 4 kb.

For isolation of the Tg.AC transgene from the mouse genome, we constructed four TAR vectors with the divergent targeting sequence, SV60–3M, SV60–6M, SV60–12M, and SV60–18M (Fig. 2). Each vector contained a 130-bp B1 repeat and a 60-bp transgene-specific hook with randomly scattered mismatches. The results of the Tg.AC transgene cloning are summarized in Table 2. As can be seen, the yield of gene-positive clones decreased as the levels of divergence in the targeting hook increased. No transgene-positive clones were found when the vector containing 18 mismatches was used, suggesting that 30% divergence prevented efficient isolation of the transgene.

Table 2 Effect of scattered targeting hook mismatches on the yield of transgene-positive clones

We also analyzed how distribution of mismatches within the hook affects gene capture. For this purpose, we constructed three additional TAR vectors—SV60–10ML, SV60–10MC, and SV60-10MR, with mismatches clustered, respectively, in the 5′, internal, and 3′ parts of the hook (Fig. 2). Each construct contained 10 mismatches that corresponded to about 17% divergence between the hook and the genomic target. The results are summarized in Table 3. The yield of gene-positive clones with the constructs was reduced about three to six times compared with the control construct (SV60–0M), whose hook was 100% homologous, but the reduction was not as dramatic as it was with the constructs containing 12 randomly scattered mismatches (Table 2), demonstrating that not only the number of mismatches, but also their distribution, affects the gene targeting efficiency. Thus, our experiments on mouse transgene cloning demonstrated that divergence of up to 15% in a small (60-bp) targeting hook does not prevent transgene isolation by TAR in yeast.

Table 3 Effect of clustered targeting hook mismatches on the yield of transgene-positive clones

Isolation of Pericentromeric Duplications

The human genome contains a large number of duplicated segments, ranging from a few to a hundred kilobases. Repetitive complexity of the segments leads to sequence misassignment and misassembly. Duplication-rich pericentromeres are particularly problematic in terms of genome assembly (Jackson et al. 1999; Horvath et al. 2000a, b; Bailey et al. 2001). Duplications on different chromosomes consist of similar but not identical sequences with a level of divergence >10% (Bailey et al. 2001), so these regions can be isolated by in vivo recombination in yeast, which, as shown, tolerates that level of DNA divergence. In the present work we chose for targeting a duplication found in five human chromosomes (2, 10, 16, 22, and X) with >93% homology (Horvath et al. 2000a). One of the targeting sequences in the TAR vector pCHR10–PCD was developed from the pericentromeric duplicated segment from human chromosome 10q11. This 99-bp sequence is 92, 95, 96, and 98% homologous with the corresponding region on chromosomes 22, 16, 2, and X. We chose a 45-bp Alu sequence as a second hook. TAR cloning was carried out as described previously (Kouprina and Larionov 1999). Analysis of 840 transformants revealed 21 positive clones. Thus, the yield was about 2.5%, i.e., similar to that observed during cloning of the multicopy mouse transgene. CHEF gel electrophoresis analysis showed that the size of the YAC inserts varied from 50 to 250 kb. Alu profiles revealed at least six types of DNA inserts. To assign each TAR isolate to a certain chromosome, we analyzed DNAs isolated from the clones by PCR using primers that amplify paralogous chromosomal duplications (Horvath et al. 2000a). In further sequencing of the PCR products, some of the TAR isolates mapped to a specific chromosome (Table 4). Thus, our results showed that TAR cloning was suitable for isolation and characterization of chromosomal duplications.

Table 4 Characterization of TAR isolates containing pericentromeric duplications

Isolation of a Gene Homologue By TAR Cloning

To examine whether TAR cloning can be applied to the isolation of gene homologues, we chose the human HPRT gene as a target. We constructed the TAR vector pMHPRT–Alu containing the 96-bp unique sequence from exon 9 of mouse HPRT as a specific hook and an Alu human common repeat as the second hook (see Materials and Methods). The targeting sequence had 88% homology to exon 9 of human HPRT (Fig. 3A). For the control (i.e., isolation of mouse HPRT with a homologous targeting sequence), we used the pMHPRT-B1 TAR vector, which contained the same mouse-specific targeting hook and a mouse B1 repeat as the second hook. pMHPRT–Alu was transfected along with human genomic DNA, and pMHPRT–B1 was transfected along with mouse genomic DNA, into yeast spheroplasts; we used PCR with a set of primers specific for mouse exon 3 (Table 1) and human exon 2 (Noskov et al. 2002) to identify HPRT-positive clones among transformants. The yield of clones with human HPRT and mouse HPRT was almost the same (Fig. 3B), demonstrating that 12% divergence between the 96-bp mouse HPRT targeting sequence and the targeted human HPRT region did not significantly affect isolation of the human HPRT gene. Eight mouse HPRT-positive clones were identified and further examined for the presence of all nine exons by a PCR assay with exon-specific primers (Table 1). Because all isolates were positive for the exon sequences, the TAR isolates contained the entire mouse HPRT genomic sequence.

Figure 3
figure 3

Isolation of the human HPRT gene by a TAR vector containing a mouse HPRT-specific hook. A Comparison of the 96-bp targeting mouse HPRT-specific sequence with the 93-bp homologous region of human HPRT. Differences between the two sequences (shaded areas) derive from seven mismatches (four transitions and three transversions) and five single-base deletions. The level of homology is 88%. B Yield of HPRT-positive clones from mouse and human genomic DNAs.

When we rescued the ends of human HPRT YACs and sequenced them (see Materials and Methods) to analyze the products of homologous recombination between the mouse hook and the human genomic HPRT sequence, most of the end sequences analyzed (9/12) exactly matched the 96-bp HPRT targeting hook in the TAR vector. One sequence represented mouse HPRT and two other hybrids between human- and mouse-specific sequences. Human-specific sequences were at the proximal end and mouse sequences were at the distal end, as would be predicted by a crossover with no detectable gene conversion.

Discussion

The mechanisms of mitotic and meiotic recombination in yeast have been extensively studied during the last two decades (Paques and Haber 1999). It is well documented that both processes are greatly inhibited by divergence between recombination substrates in yeast, and much of the inhibition is due to antirecombination activity of the mismatch repair system (Selva et al. 1995; Datta et al. 1997). At the same time, little has been published on the mechanism of TAR that underlies the TAR cloning technique. This recombination differs from recombination in mitotic cells in the following features: (1) the frequency of recombination between the transforming DNAs is much higher (Ma et al. 1987; Larionov et al. 1994; Raymond et al. 2002), (2) the minimal length of homology sufficient for recombination is smaller (Hua et al. 1997; Noskov et al. 2001), and (3) recombination is not as sensitive to DNA divergence (Pompon and Nicolas 1989; Mezard et al. 1992; Mezard and Nicolas 1994; Larionov et al. 1994; Priebe et al. 1994). The differences are presumably explained by structural differences in the recombination substrates. While recombination between chromosomes includes interaction between chromatinized targets, TAR is based on interaction between naked DNA molecules.

Tolerance to DNA divergence during TAR in yeast suggests that TAR cloning could be used for isolation of similar but not identical genomic regions. The present work represents a systematic study of effect of DNA divergence on efficiency of gene capture by TAR cloning. Using a set of recombination substrates with different numbers of mismatches, we showed that DNA divergence of up to 15% did not significantly effect the efficiency of gene capture. This observation greatly expands the potential of TAR cloning technique for analysis of complex genomes and comparative genomics.

We have demonstrated that TAR cloning is applicable to the isolation of pericentromeric duplications, which share a high degree of sequence similarity and are particularly problematic in terms of contig assembly (Horvath et al. 2000b; Bailey et al. 2001). It is worth noting that paralogous duplicated segments are common in mammalian chromosomes (Jackson et al. 1999; Bailey et al. 2001), and as a result they represent a significant impediment to complete sequencing of the human genome. Many of the euchromatic regions of human chromosomes that are littered with highly homologous but not identical duplications are implicated in disease-causing recurrent chromosomal structural rearrangements (Mazzarella and Schlessinger 1998). Thus, a specialized technique such as TAR cloning would be essential for assembling these exceptional regions. Among other applications, the TAR cloning system can be used for isolation of duplications of subtelomeric regions, which remain relatively poorly characterized at the molecular level. Furthermore, the method can be extended to the isolation of gene homologues. We demonstrated this by efficiently and accurately isolating the human HPRT genomic region using a TAR vector containing a mouse HPRT targeting sequence that was about 88% homologous to the targeted region. Taking into account that such a level of divergence is characteristic for mammalian gene homologues (Makalowski and Boguski 1998), most orthologous chromosomal regions can be selectively cloned by in vivo recombination in yeast using the same targeting hook for further structural, functional, and comparative analyses.

In summary, we have demonstrated in this study that the TAR cloning system provides a powerful tool for analyzing chromosomal duplications and characterizing gene homologues from different organisms.