Introduction

Transgenic animals are frequently generated by random integration of foreign DNA. Except for the constructs based on retrovirus, lentivirus and transposons, the foreign DNA fragment often integrate into the host genome as tandem repeats encompassing several copies of the foreign gene. Transgene expression can be affected by the host genome at the insertion site but also by the number of integrated copies. Numerous studies have focused on the use of insulators and regulatory elements to prevent transgene silencing as well as to obtain a reliable expression independent of the integration site and correlated with the number of integrated copies (Giraldo et al. 2003; Houdebine 2007, 2009; Montoliu et al. 2009).

In our previous experiments (Saidi et al. 2007), we generated several transgenic mouse lines by microinjection of a series of large (30–145 kb, Fig. 1) porcine genomic DNA fragments into fertilized oocytes. By using large DNA fragments, we expected a reliable expression of the pig Whey Acidic Protein (pWAP) gene, one of the genes present in the pig genomic DNA insert. The pWAP gene expression varied more than anticipated within the different transgenic mice lines for each DNA insert. This effect was unrelated to the number of integrated copy. We thus decided to investigate whether this variability could be explained by the structure of the integrated DNA at the insertion site in the host genome.

Fig. 1
figure 1

location of specific TAIL-PCR primers in the various pig DNA inserts used to generate transgenic mice. Three pig DNA fragments (30, 130, 145 kb) were recovered from the BAC 344H5 by digestion with restriction enzymes (NotI, NruI or AscI) as previously described (Saidi et al. 2007). The position of the 5′P and 3′OH borders of the inserts is given relatively to the transcription start site of the pig WAP gene. All the inserts harboured the same 3′OH border. Thus, the same set of primers (p1-3, p2-3, p3-3) was used for the study of the 3′OH border flanking regions. The lower panel presents the sets of primers used for the 145 kb insert. All other sequences are available upon request. Numbers indicate the position of the nucleotides at the border of the insert. Numbering begins at the 5′P border of the 145 kb fragment

To reach this goal, we attempted to identify the transgene insertion site within the mouse genome. Several protocols, which were not similarly successful, were implemented. The TAIL-PCR method (thermal asymmetric interlaced PCR, Liu et al. 1995; Liu and Chen 2007; Tadege et al. 2008) allowed us (1) to analyze transgene/genomic borders and internal concatamer junctions for eleven transgenic lines, (2) to obtain sequence information for seven borders, (3) to place three transgenes in the mouse genome, and (4) to obtain sequence data for seven transgene junctions in concatamers.

Materials and methods

Production of transgenic mouse lines

Transgenic mouse lines (C57-Bl6/CBA F1) were obtained by microinjection of long pig DNA fragments (Fig. 1) as presented in our previous paper (Saidi et al. 2007). Transgenic mice were heterozygous for the integrated transgene. All the procedures for the preparation of large DNA inserts from BACs respected the recommendations described by VanKeuren et al. (2009). Large DNA inserts (from 30 to 145 kb) were prepared by digestion of BAC DNA using specific restriction enzymes, followed by pulse field gel electrophoresis (FIGE) on 1% agarose gel in 0.5× TBE buffer. Recovery of DNA from gel was achieved by gelase digestion (Tebu-bio, Le Perray en Yvelines, France) or electroelution (BioTrap, Whatman Schleicher & Schüell, Versailles, France). The product was submitted to dialysis against microinjection buffer (Tris pH 7.5 10 mM; EDTA 0.1 mM; NaCl 100 mM; spermine 30 μM; spermidine 70 μM) and used for microinjection within 2 weeks. Integrity of the DNA was assessed by FIGE. Microinjection was performed when the expected DNA fragment size was observed in gel without any detectable DNA fragmentation.

Extraction of genomic DNA from tail tip

Genomic DNA was prepared from tail tips by proteinase K digestion. Biopsies were incubated overnight at 37°C or 2 h at 56°C in 500 μl of proteinase K extraction buffer (Tris pH 8 5 mM; EDTA 5 mM; 1% SDS, sodium acetate 300 mM; 100 μg proteinase K extemporaneously added), then treated with 1 volume of phenol:chloroform:isoamylic alcohol. DNA was recovered after isopropanol precipitation (0.6 volume).

Thermal asymmetric interlaced PCR (TAIL-PCR)

The protocol was adapted from experiments carried out with transgenic plants (Liu et al. 1995; Tadege et al. 2008) using tail tip DNA as template (Saidi et al. 2007) and sets of primers (Table 1). Primary TAIL-PCR reactions (20 μl) contained 25 ng of genomic DNA, 1× PCR buffer, 200 μM dNTP mix, 300 nM specific primer 1, and an arbitrary degenerated ADn primer (3 μM) with 1.5 units of Taq polymerase. Significant results were obtained with high quality Taq polymerase (Takara ExTaq, Lonza, Courtaboeuf, France) and good quality genomic DNA preparations. The sequences of five distinct degenerate primers (AD1, AD2, AD3, AD5, AD6) were those already designed by other (Liu et al. 1995; Tadege et al. 2008). These sequences are given in Table 1. Specific primers were designed according to the expected position of the borders of the microinjected DNA fragment. For each microinjected DNA fragment, several sets of primers matching more or less internal positions were tested to anticipate the possibility of deletions of the extremities of the transgene (not shown). Sets of primers giving the best results are given in Fig. 1. Sequences of specific primers are available upon request. Each secondary 20 μl TAIL-PCR was performed with the specific primer 2, the same degenerate AD primer than in the primary reaction and 2 μl of the 50-fold diluted PCR product of the primary reaction (final dilution = 500). Each tertiary TAIL-PCR was performed with the specific primer 3, the same degenerate AD primer than in the secondary reaction and the 50-fold diluted PCR product of the secondary reaction. The concentrations of all components of the PCR reactions were similar in primary, secondary and tertiary TAIL-PCR. The thermal conditions for TAIL-PCR are given in Table 2. The final products of the third amplification were analyzed using agarose 1% TBE 1× gels. The DNA fragments longer than 200 base pairs were isolated by column purification from agarose gel (GenElute, Sigma) or directly from the tertiary PCR reaction (MSB Spin PCRapace, Invitek, Berlin, Germany). Sequencing was performed (MWG Eurofins service, Ebersberg, Germany) either directly on the product of purification, or after cloning in pGEM-T Easy cloning vector (Promega, Courtaboeuf, France). The specific primer used in the third round of PCR was used as sequencing primer.

Table 1 Sequence of the arbitrary degenerate primers
Table 2 TAIL-PCR programs

To confirm data, PCR amplification was performed using primers specific of the pig DNA insert and of the identified flanking region using the transgenic genomic DNA as template. Confirmation was achieved when only one DNA fragment was amplified with the expected size and sequence.

Sequence analysis

Homology search was performed by BLAT analysis using the Ensembl web site (http://www.ensembl.org/index.html). The position of integration site within the mouse genome was deduced from the right and left transgene flanking sequences.

Evaluation of transgene copy number

The integrated copy number was evaluated as previously (Saidi et al. 2007) by real time PCR using the 2ΔΔCt method with the purified BAC 344H5 as standard. The mouse β2-microglobulin gene was used as reference gene. For each line, the copy number was determined using different sets of primers specific for the pig DNA injected insert: one set located at the 5′P border, one at the 3′OH border and one on the pig WAP gene. All sequences are available upon request.

Results

The principle of the TAIL-PCR method consists of three rounds of nested hemispecific PCR amplifications (Fig. 2a). Each reaction includes a long-high Tm primer complementary to the border of the transgene and a degenerate-low Tm primer. Non-specific products were eliminated by two successive PCRs using nested primers (Fig. 2b, compare lanes I, II, and III). The TAIL-PCR results in the amplification of fragments overlapping the border of the transgene and the adjacent host genome (Liu and Whittier 1995; Tadege et al. 2008).

Fig. 2
figure 2

Analysis on agarose gel electrophoresis of the products of the TAIL-PCR. a Different steps of the TAIL-PCR. b Typical electrophoresis analysis of the products amplified after the different rounds of PCR in the line 130 kb-535. Amplified DNA fragments showing a decreasing size in the successive PCR amplifications were supposed to be specific for the border of the transgene. The identity of the best set of primers (specific and degenerated) varied with the transgenic line. A few numbers of specific fragments were observed (labelled by white arrows). These DNA fragments were recovered from gel and sequenced. c Electrophoresis of amplified products after two rounds of PCR in the line 145 kb-28. Obviously, no specific fragment was amplified after the second PCR. The numbers I, II, and III refer to the number of the PCR. ADn arbitrary degenerate primer n, L kb ladder

To anticipate the possible deletions of the transgene borders, several sets of primers, located more than 150 base-pairs (bp) upstream of the theoretical border were tested. The length of the recovered fragments in the TAIL-PCR experiments ranged between 200 and 2,000 bp as it is commonly reported for this method. However, internal sets of primers gave no better amplification than the sets of primers located near the border.

Eleven transgenic lines were analyzed. A typical analysis of the amplified fragments on gel electrophoresis is given in Fig. 2. Analyses of the sequences are summarized in Table 3 and Figs. 3 and 4. The position of the transgene within the genome was successfully identified in three lines only (145 kb-29, 145 kb-30, 30 kb-110, Fig. 3). In two of these lines (lines 145 kb-29 and 145 kb-30), the 3′OH and the 5′P borders were flanked by regions that belong both to the same mouse genome locus (respectively in chromosome 7 D1 band, and chromosome 5 E1 band), which confirms the transgene position in the mouse genome. In the line 145 kb-29, the integration of the transgene was coincident with a 93 bp deletion in the mouse chromosome 7 harbouring the transgene, without any modification in the second allele (not shown). In the line 145 kb-30, the 3′OH and 5′P borders were located in chromosome 5 in a head-to-tail orientation but spaced out by more than 60 kb (Fig. 3). In this line, only one copy of the transgene was integrated. We can hypothesize that the transgene was split into two fragments integrated in a reverse orientation. We could not confirm this hypothesis, the 60 kb-long region intercalated between the two hypothetical fragments being too short for the detection of two distinctly labelled points in FISH experiments. Alternatively, in the mouse used for producing this transgenic line, the sequence of the locus in chromosome 5 was in an inversed orientation comparatively to the published mouse genome. No further identification of such chromosome reorganization was undertaken in this line. In the line 30 kb-110, the 3′OH border of the transgene was flanked by a region that belongs to the mouse chromosome 15 (B3.3 band). We were unable to characterize the region flanking the 5′P border of the transgene, probably because the 5′P border of the 30 kb DNA insert is characterized by a GC rich sequence spanning over more than 1 kb which may reduce the chance of performing specific PCRs successfully.

Table 3 Summary of the analysis of genomic DNA sequences flanking pig DNA transgenes
Fig. 3
figure 3

Detailed description of the transgene borders and their flanking sequences in lines 145 kb-29, 145 kb-30, and 30 kb-110. The 5′P or 3′OH border of the transgene is represented by a grey box, with the set of specific primers that were used for the characterization (horizontal black arrows). Boxes filled with dots represent internal pig DNA fragments. Numbers indicate their position in the 145 kb pig DNA insert. Number 1 corresponds to the 5′P border, and 145,000 to the 3′OH border. Numbers in grey boxes indicate the position of the mouse chromosome integration site as given by the Ensembl mouse database

Fig. 4
figure 4

Sequence of the junctions between tandemly integrated copies of the transgene. The first line represents the expected sequence, reconstituted by the fusion of restricted DNA inserts. Other lines underneath indicate the actually determined sequence. Dotted lines represent missing nucleotides in the expected sequence. The numbers indicate the position of the first nucleotide of the linked region. Arrows below each sequence indicate the orientation relative to the pig WAP gene in the insert

In the other lines, the position of the transgene in the genome was not identified. In some of the lines (145 kb-28 and 145 kb-863) no specific DNA bands were observed using electrophoresis (Fig. 2c). In the line 130 kb-19, the region flanking the 3′OH border of the transgene was not assigned to any location in the mouse genome. One hypothesis is that the transgene was inserted in repeated-sequences, preventing its positioning in the mouse genome.

Surprisingly, the junctions between the DNA inserts could not be characterized in all cases. Expected 5′P to 3′OH junctions (head-to-tail) were found with almost the exact sequence in three lines only (145 kb-815; 145 kb-820; 130 kb-23, Fig. 4) but not in the other lines harbouring several copies of the transgene. In line 145 kb-820 only, the NotI site was reconstituted. Moreover, we found that internal regions of the foreign DNA insert were linked to the 5′P or the 3′OH borders of the transgene (lines 145 kb-29; 145 kb-809; 130 kb-19; 130 kb-23; 130 kb-535) as indicated in Table 3 and Figs. 3 and 4. Thus, rearrangements of the DNA insert occurred, with multiple recombination events before or at the time of integration into the host genome. Likely, these rearrangements can be an obstacle to the characterization by TAIL-PCR of genome-transgene or copy–copy junctions in most lines.

We then tried to evaluate the integrity of the inner part of the transgene by amplification using specific primers. This led us to identify putative rearrangements in the insert such as those presented in the line 145 kb-29 (Fig. 3). In this line, a 344 bp long DNA fragment encompassing the expected 3′OH border of the transgene was inserted in the 90 973–90 976 region of the pig DNA insert. The actual border of the transgene (chromosome 7-transgene junction) was 432 bp farther, corresponding to position 91 408 of the pig DNA insert. Rearrangements are therefore not specific of one or the other microinjected DNA fragment. On the contrary, they are observed in several mouse transgenic lines whatever the length of the microinjected pig DNA insert. To summarize, our TAIL-PCR analysis shows evidence of unexpected rearrangements in five transgenic lines (145 kb-29, 145 kb-809, 130 kb-19, 130 kb-23, 130 kb-535), insert concatenation with minor or no deletion in three lines (145 kb-815, 145 kb-820, 130 kb-23) while in two lines no data could be generated (line 145 kb-28, 145 kb-863).

Discussion

The characterization of genome integration sites is mainly based upon the search for host genome sequences adjacent to each extremity of the transgene. Several commonly employed methods consist of implementing a set of multiple PCR using primers anchored on the theoretical extremities of the transgene. These methods have the advantage to be rapid and easy to perform. However, deletions of the extremities of the microinjected DNA fragment, as already quoted by previous authors (Chen et al. 1995), are an obstacle to the success of these techniques. Nevertheless, among the various methods that we tested (not shown), the TAIL-PCR gave us the best results.

TAIL-PCR allowed us to characterize mouse genome transgene integration sites in several but not all the transgenic lines. Other techniques such as a recently published refined and improved TAIL-PCR technique (Liu and Chen 2007) should help us to identify more easily the integration sites. Besides, in our hands, the inverse PCR based technique (Ochman et al. 1988) was poorly efficient. In order to improve the efficiency of inverse PCR, it has been suggested by others to digest the transgenic mouse DNA with the enzyme that was used for the preparation of the insert to be injected (Liang et al. 2008). Theoretically, this procedure has the advantage to eliminate head-to-tail, head-to-head or tail-to-tail junctions. In lines issuing from the microinjection of the 145 kb fragment, the NotI enzyme was used. In the line 145 kb-820, the NotI site was found between the two copies of the transgene (Fig. 4). However, in the present work, digestion of the DNA with NotI did not eliminate the tandemly integrated copies (not shown). One hypothesis is that the NotI enzyme action was hampered by DNA methylation that can occur after insertion in the host genome. DNA methylation analysis should be performed to test this hypothesis.

The most surprising finding of our work concerns the complex rearrangements between regions of the pig DNA insert. More, the frequency of these rearrangements was unexpectedly high, since it concerned five lines over the eleven analyzed ones. One hypothesis is that, in the present study, the long DNA insert was fragmented when microinjection was performed, despite the care taken to avoid this artefact. However, fragmentation due to the mechanical shearing of long DNA fragment is probably not the only cause of the rearrangements. It has been postulated that random DNA integration into the host genome occurs through DNA double-strand break repair by illegitimate integration via micro-homologies or non-homologous end joining (NHEJ) (Bishop 1996; Würtele et al. 2003; Kamisugi et al. 2006). Rearrangements were observed as well when a few kilobases long DNA inserts were microinjected in early mouse embryos (Chen et al. 1995; Pawlik et al. 1995). Several works were designed to know whether this phenomenon can be influenced by the sequence of the injected DNA insert. The addition of SINEs elements (short interspersed elements) at both side of foreign DNA increased the efficiency of integration in mouse embryo genome, and this was probably due to the enhancement of homologous recombination (Kang et al. 1999). In the present work, the long pig DNA fragments encompass series of repetitive sequence regions such as SINEs or LINEs. Our hypothesis is that the presence of these elements along the foreign DNA favoured the multiple rearrangements at the integration site or even before integration. The occurrence of such rearrangements can hardly be anticipated. Due to the length of the pig DNA inserts, we can suspect that other internal rearrangements occurred. Thus, the internal structure of the transgene should be investigated through complementary methods. The Southern blot technique could be helpful for this determination. However, this technique is tedious and strenuous especially in the case of hundreds of kilobases-pair long DNA inserts. Moreover, it does not allow the identification of small size rearrangements as those we observed at the 3′OH border of the transgene in the line 145 kb-29.

It is more and more admitted that microinjection of long DNA fragment is a valuable technique to generate transgenic animals expressing in a reliable manner the genes which are included, because it is expected that important regulatory regions are present (Giraldo et al. 2003; VanKeuren et al. 2009). Previous works have shown that deletions occurred frequently in large transgenes (Chandler et al. 2007). We also reported previously this event and we eliminated from our study all the transgenic lines harbouring deleted copies. More complex rearrangements involving large BAC transgenes have already been reported (Abrahams et al. 2003). The TAIL-PCR method allowed us to characterize such rearrangements. Notably, the present work addresses the difficulty of cloning integration sites of BAC transgenes. In the literature, the difficulties and the failures to characterize integration sites are poorly documented. Indeed, previously published papers report successful cloning of integration sites (Liang et al. 2008; Mehta et al. 2009). The difference between our experiments and the previous ones could result from the multiple rearrangements of our transgenes that generate unexpected borders. Moreover, we noticed that the successful reported experiments are frequently based upon transgenesis of transposons, or retrovirus that integrate the host genome with minor rearrangements in opposite of BAC transgenes. Finally, the present paper stresses the frequency of such complex rearrangements. This should be carefully considered when transgenic animals are produced with large genomic DNA fragments.