Introduction

In recent years the domestic dog (Canis familiaris) has emerged as a powerful model organism for comparative biomedical research. In particular, the pathophysiological similarities between numerous human and dog diseases place genomic studies of the dog in a high-profile role for the advancement of our understanding of genetic diseases. Furthermore, development of a detailed knowledge of genetic abnormalities associated with numerous diseases in the domestic dog has the potential to aid in diagnosis, prognosis, and the choice of therapy for both dogs and humans. The development of molecular cytogenetic reagents and resources for analysis of the canine genome has advanced rapidly over the past decade, expanding our knowledge of canine genome organization through the creation of standardized chromosome nomenclature as well as comparative cytogenetic maps and integrated genome maps (Breen 2008). A major new component in the canine genomics ‘toolbox’ is the development of two cytogenetically validated, genome-assembly integrated bacterial artificial chromosome (BAC) panels, the first at 10 Mb resolution (Thomas et al. 2007) and the second at 1 Mb resolution (Thomas et al. 2008). These BAC panels provide the opportunity to perform canine genome-integrated array-based comparative genome hybridization (aCGH) as well as appositionally based multicolor fluorescence in-situ hybridization (FISH) analysis of canine cells (Thomas et al. 2008). These complementary approaches allow for the identification of aberrations in genome organization, through the analysis of genome-wide DNA copy number changes and structural alterations. Such aberrations to the genomic landscape are particularly evident in cancers, where they have been recorded for a plethora of human malignancies (http://cgap.nci.nih.gov/Chromosomes/Mitelman) and several canine malignancies (Breen 2008). Ultimately, identification of the genes within a defined area of recurrent chromosome abnormality provides a framework for further investigation, potentially leading to the development of targeted therapies for disease.

Advances in bioinformatic analysis in recent years have greatly enhanced our ability to characterize chromosomal abnormalities with new levels of sophistication. The combination of bioinformatics (in-silico analysis) and molecular cytogenetics (in-situ analysis) has been used previously to refine karyotypes and gene comparisons of several species; for example the synteny block arrangements of the white-cheeked gibbon and the horse, both with respect to the human genome, were defined using pair-end sequence analysis and FISH analysis (Roberto et al. 2007, Chowdhary & Raudsepp 2008), while Zoo-FISH mapping of the cat genome in conjunction with comparative sequence analysis produced the current cat–human and cat–dog synteny block arrangements (Murphy et al. 2007).

Mammalian sex chromosomes are not homologous for the majority of their length and the pseudoautosomal region (PAR) is the site of obligatory pairing and recombination between X and Y in male meiosis (Perry et al. 2001). As such the PAR provides a naturally occurring and ubiquitous opportunity for demonstrating the strength of using a combined cytogenetic and bioinformatic approach to resolving the extent of DNA copy number changes and refining the corresponding chromosome breakpoint. In humans, both ends of the X and Y chromosome contain homologous regions that pair regularly during male meiosis and undergo recombination. The larger region of homology, PAR1, spans 2.6 Mb and the smaller region, PAR2, extends approximately 320 kb (Strachan & Andrew 2004). Previous comparisons between humans/chimp, mouse, and horse have lead to the hypothesis that the PAR varies in size and gene content across mammals (Perry et al. 2001, Mangs & Morris 2007, Raudsepp & Chowdhary 2008). In contrast to humans, mice have a much smaller PAR (approximately 700 kb) in which recombination occurs regularly between X and Y (Perry et al. 2001). Along with human/chimp and mouse, the equine PAR is now well characterized, spanning approximately 1.8 Mb (30% smaller than in humans) and interestingly containing orthologues with the 1 Mb region on human chromosome X beyond the PAR1 (Raudsepp & Chowdhary 2008). There appears to be similarity across mammalian species with human PAR1, but no equivalent to human PAR2 has been seen in other mammals including primate species (Strachan & Andrew 2004, Mangs & Morris 2007).

Major differences exist between human, mice, and other mammals involving the PAR boundary (PAB) length and the gene content within the region. The PAB is the point at which the identical segments of the X and Y chromosomes diverge into X-specific and Y-specific sequences (Ellis & Goodfellow 1989). For human PAR1 the PAB lies within the XG gene (Ellis et al. 1994a,b), while in mice the PAB resides within the Mid1 gene (Perry et al. 2001) whose human orthologue MID1 is found within the X chromosome-specific sequence that is not a part of the PAR. In the cat genome the identification and mapping of PAR markers in a 1.5 Mb-resolution radiation hybrid map has refined the PAB to less than 200 kb between the SHROOM2 and the WWC3 genes (Murphy et al. 2007). The gene located closest to the equine PAB on the PAR side of the chromosome is protein kinase (PRKXY) (Raudsepp & Chowdhary 2008), which is not a PAR gene in human, being located 3.6 Mb farther along the sequence map of HSA X (http://genome.ucsc.edu/).

Compared with human, mouse and horse, relatively little is known about the PAR in the domestic dog, providing an excellent opportunity for characterization by a combined in-silico/in-situ approach. The sex chromosomes are the only bi-armed chromosomes in the canine karyotype and at ∼126 Mb and ∼27 Mb, dog chromosomes X and Y (CFA X and CFA Y) are respectively the largest and smallest chromosomes in the canine karyotype (Langford et al. 1996, Lindblad-Toh et al. 2005, Breen 2008). Although the PAR of the domestic dog is not well studied, previous evidence has indicated that it maps to the telomeric end of Xp within the region p22.3-p22.2 (Spriggs et al. 2003). This region of CFA X contains two markers (AHTx21 and AHTx13) that also map to CFA Y, suggesting that, cytogenetically, the canine PAR extends from the Xp telomere to at least Xp22.3.

In this study we used the 7.6× female canine genome assembly to direct multicolor FISH analysis of the male canine karyotype to refine the breakpoint region associated with the naturally occurring DNA copy number variation present in the PAR. We demonstrate that the canine PAR is much larger than the human PAR1, contains more genes, and extends farther down CFA Xp. Furthermore, this approach allowed us to define the canine PAB to within a single BAC clone in the dog genome assembly. Computational analysis using the ∼1.5× male canine genome sequence was subsequently used to refine the PAB to kilobase resolution and allow for comparative analysis of the PAR across mammals. We demonstrate the ability to use a combined in-situ/in-silico approach to identify and subsequently refine boundaries associated with chromosome aberrations to base pair resolution.

Materials and methods

Clone selection

In the absence of Y chromosome sequence in the 7.6× female canine genome it was necessary to first define the extent of the dog PAR using cytogenetics. With the exception of a single Y-specific cone selected from the RPCI-81 (male) BAC library, all clones described in the present study were selected from the CHORI-82 dog (female) BAC library (http://bacpac.chori.org/library.php?id5253; Children’s Hospital Oakland Research Institute, Oakland, CA, USA), derived from a Boxer, and from which the 7.6× dog genome assembly was constructed (Lindblad-Toh et al. 2005). As part of the development of a 1 Mb-resolution genome-integrated cytogenetically verified canine BAC set (Thomas et al. 2008), a series of BAC clones were FISH mapped to CFA Xp, grossly defining the PAB to within a 1 Mb interval, between CFAX:5,646,909 bp and CFAX:6,794,779 bp. To further resolve this 1 Mb interval we used the UCSC genome browser (http://genome.ucsc.edu) to interrogate the canFam2 build of the canine genome assembly to identify additional BAC clones within this region (Figure 1).

Figure 1
figure 1

(a) Schematic representation of the location of the primary BAC clones on CFA Xp and CFA Y, used to cytogenetically identify the PAB. The colored circles correspond to the cytogenetic location of each BAC on CFA Xp and their labeled fluor. The previously defined PAB is shown on the far left between clones 112O09 and 335G17, which are 1 Mb apart on CFA X (Thomas et al. 2008). Additional FISH reactions proved the PAB to lie between two partially overlapping clones, 172L08 and 397M10, as marked by a wavy line. (b) Hybridization of four BACs with overlapping sequence on CFA Xp and a CFA Y marker clone 035H24 from the RPCI-81 dog BAC library. The clone 172L08 hybridized to both CFA X and Y while 397M10 only hybridized to CFA Xp, defining the PAB between these two clones. (c) The BAC clone layout as depicted in UCSC’s dog genome browser (http://genome.ucsu.edu/). The expanded area demonstrates the exact start position of BACs 172L08 and 397M10, with the non-overlapping sequence between them being approximately 32 kb.

Cytogenetic evaluation

Metaphase chromosome preparations were produced by mitogenic stimulation of peripheral lymphocytes from several male dogs using conventional techniques of colcemid arrest, hypotonic treatment, and methanol–glacial acetic acid fixation (Breen et al. 1999). BAC DNA was isolated from 10 ml bacterial cultures using standard alkaline lysis and all FISH mapping was performed according to our routine multicolor FISH protocols (Breen et al. 2004). Briefly, 500 ng samples of each BAC DNA were labeled by nick translation to incorporate one of five spectrally resolvable fluorescent nucleotides, Spectrum Red/Orange/Green-deoxyuridine triphosphate (dUTP) (Vysis, Downers Grove, IL, USA), diethylaminomethylcoumarin-5-dUTP, or cyanine5-dUTP (Perkin Elmer Life Sciences, Boston, MA, USA). Groups of differentially labeled clones were then used in each FISH reaction as described previously (Breen et al. 2004), beginning with six non-overlapping clones selected to span the 1 Mb interval defined previously by the BACs 122O09 and 335G17 and which we have shown previously (Thomas et al. 2008) to map either side of the canine PAB (Figure 1). To further refine the region containing the PAB, additional BAC clones were selected subsequently from the genome assembly by a process of clone walking along this region of CFA Xp and each clone was assigned by FISH to verify the hybridization pattern. In order to accurately identify and orient CFA Y, a previously determined Y-specific BAC clone, 035H24 from the RPCI-81 dog BAC library (Thomas et al. 2005), was included in each reaction.

Computational analysis of overlapping BAC clones

Once BAC clones were identified by FISH analysis as spanning the PAB, they were analyzed computationally to further resolve the PAB. Repeat-masked canFam2(chrX:6 580 407–6 630 000 bp) derived from a female Boxer was downloaded from the UCSC Genome Browser (http://genome.ucsc.edu) and searched against 6.2 million unassembled shotgun sequence reads from a male poodle genome (Kirkness et al. 2003) using BLASTn (Altschul et al. 1997). Homologous sequences, and the mated sequences to which they were linked physically on plasmid clones, were then searched against the complete canFam2 (female) assembly to identify those with best hits to this region of CFA X and simultaneously define stretches of sequences that are not apparent in the female genome assembly.

Results

Cytogenetic interpretation of the canine PAR

Previous cytogenetic analysis of CHORI-82 BAC clones that map to CFA X at 1 Mb intervals revealed two clones, 122O09 and 335G17, that mapped to opposing sides of the canine PAB on CFA Xp, with 122O09 (5 646 909–5 840 739 bp) hybridizing to both X and Y and 335G17 (6 794 779–6 989 959 bp) appearing only on CFA X (Thomas et al. 2008). To further resolve the boundary, we used the UCSC genome browser (http://genome.ucsc.edu/) to identify six non-overlapping BAC clones within this region from the female genome assembly and then cytogenetically assigned each of these clones. These data indicated that the PAB was present within the 342 kb region defined by two of these six clones, 493H08 (6 473 120–6 653 980 bp) and 103O15 (6641 624–6 815 138 bp). An additional five overlapping clones were selected within this 342 kb, and FISH data of these clones refined the PAB to a region comprising four overlapping clones, 97B04, 9I11, 172L08, and 397M10, that span 279 459 bp of theCFA Xp sequence at 6 533 887–6 813 346 bp (Figure 1). The FISH pattern of these four clones revealed that three of the clones; 97B04, 9I11, and 172L08, contained sequence that resided within the PAR, while the fourth clone, 397M10, hybridized only to CFA Xp. Clones 172L08 and 397M10 overlapped for the majority of their lengths, but 172L08 is positioned in the canine genome at 6 580 407 bp on CFA Xp and 397M10 begins at 6 612 687 bp on CFA Xp, leaving a non-overlapping region of approximately 32 kb between the clones, where we suggest the canine PAB is located. Thus, using female genome assembly directed molecular cytogenetics, the region of CFA Xp in which the PAB resides was reduced to 32 kb (Figure 1), and an approximation of the size of the canine PAR to 6.6 Mb was determined.

Computational analysis of the PAB

The sequence reads from the UCSC genome browser contained within BAC clones 172L08 and 397M10, canFam2(chrX:6 580 407–6 630 000 bp) derived from a female Boxer, were searched against 6.2 million unassembled shotgun sequence reads from a male poodle genome (Kirkness et al. 2003) using BLASTn (Altschul et al. 1997) to identify the best hits from the male poodle genome to the specific region in CFA X. Forty-three mated pairs of sequence reads were revealed, where one or both reads had best hits to the region. For 41 of these, both of the paired reads mapped consistently (i.e., with expected orientation and separation on the canFam2 sequence), indicating female origin. However, for two clones (19600425738530 and 19600424387951), only one of the end reads (ti|1938244836 and ti|1928078697, respectively) could be mapped to this region, indicating that while one end resides on the X chromosome the other end does not. Since the mated reads (ti|1925875314 and ti|1927677701) are not homologous to CFA X: 6 580 407–6 630 000 bp, these may therefore be derived from CFA Y, suggesting that the sequences within clones 19600425738530 and 19600424387951 may span the PAB (see Figure 2). The computational analysis was able to take the 32 kb region defined by cytogenetics and further refine the putative PAB to a region spanning approximately 2 kb between the end read of clone 19600425738530, ti|1928078697, and the front read of clone 19600424387951, ti|1925875314 (Figure 2).

Figure 2
figure 2

The sequence within BAC clones 172L08 and 397M10, canFam2(chrX:6 580 407–6 630 000), searched against 6.2 million unassembled shotgun sequence reads from a male poodle genome (Kirkness et al. 2003) using BLASTn (Altschul et al. 1997). The PAB was narrowed by further computational analysis to an approximately 2 kb region between two reads (ti|1928078697 and ti|1925875314) within the male poodle genome sequence (Kirkness et al. 2003) as marked by the dashed green box.

Comparison of canine and human PAR-specific genes

Refinement of the canine PAR with molecular cytogenetics and bioinformatic analysis allows for gene position comparison of previously compiled genes that lie within the human PAR1 and PAB (Mangs & Morris 2007) with canine PAR and PAB genes by use of the UCSC human and dog genome browser. The four currently identified genes in human PAR2 (SPRY3, SYBL1, ILR9, and CXYorf1) are not present in the canine PAR and are located on the distal end of CFA Xq at 126 Mb. Assignment of 111 BAC clones spaced at ∼1.1 Mb intervals along the full length of CFA X did not reveal the existence of a second PAR region in the canine genome when hybridized to male chromosomes spreads (Thomas et al. 2008). This is consistent with the absence of a PAR2 in any species investigated thus far other than human. We were able to expand on previous studies that had mapped the genes Ant3 and Csf2ra to the canine PAR (Toder et al 1997) by defining the extent of the canine PAR and proving it is evident that the majority of genes that are located within the human PAR1 are found also within the dog PAR (Figure 3). The location and order of the genes within the human PAR1 are very similar to their corresponding positions within the dog PAR. The human PAR1 contains at least 24 genes, while the dog PAR contains upward of 30 genes. The difference between the human PAR1 and canine PAR is found in the span of the region and the specific gene that contains the PAB. The human PAB gene, XG, that codes for the XG blood group gene is found in the dog PAR at approximately 1.4 Mb, which is only a fifth of the entire ∼6.6 Mb canine PAR. The dog PAB encompasses the gene Shroom 2 (apical protein of Xenopus-like), which lies on the human X chromosome at approximately 9.7 Mb and which is not present within the human PAR (http://genome.ucsc.edu/) (Mangs & Morris 2007). The canine PAR is smaller than the proposed ancestral PAR, which is thought to be within the gene AMELX at approximately 11.22 Mb on HSA X and 7.7 Mb on CFA X (Iwase et al. 2003, 2007) (http://genome.ucsc.edu/). Variation in the region is also prevalent when comparing the horse, human, mouse, and canine PAR. The horse PAR, although smaller in size, contains seven genes present in the canine PAR that are located proximal to human PAR1 (Raudsepp & Chowdhary 2008) (Figure 3). The mouse PAB is estimated to be between 30 and 50 kb and within the Mid1 gene on chromosome X and the gene’s truncated partner on the Y chromosome (Perry et al. 2001). In humans the PAB gene, XG, also has its truncated partner on the HSA Y (Ellis et al. 1994a,b). Interestingly, the canine PAB sits above the MID1 gene on CFA Xp that contains the mouse PAB (Figure 3).

Figure 3
figure 3

Positions of PAR genes in Mb from the p-arm telomere and their comparative status in human (NCBI Build 36.1), horse (equCab1), dog (canFam2), and mouse (NCBI Build 37) according to the UCSC genome browser (http://www.genome.ucsc.edu/) (Raudsepp & Chowdhary 2008, Perry et al. 2001). The dashed lines indicate the direction of genes on the X chromosome with human, horse, and dog PARs residing on Xp above the centromere and the mouse PAR on Xq directly above the telomere. The human PAR1 is shown in blue and additional genes only present in the horse, dog, and mouse PARs are represented and localized on HSA Xp corresponding to color. The size of the PAR and the PAB, indicated by a yellow line, differs across species with the mouse PAR’s gene content being vastly different (Perry et al. 2001). The horse PAR contains more genes than the human PAR but is smaller in size (Raudsepp & Chowdhary 2008). The canine PAR extends the farthest on CFA Xp and contains several more genes than humans PAR1.

Discussion

Separately, the availability of a high-quality 7.6× canine genome assembly (Lindblad-Toh et al. 2005) and a cytogenetically verified panel of evenly spaced genome integrated bacterial artificial chromosome (BAC) clones (Thomas et al. 2008), provide valuable resources for interrogation of aberrant genome architecture. The aim of this study was to demonstrate the power of using a combinatorial in-situ/in-silico approach to identify and further refine chromosome breakpoint boundaries. This was achieved by using this approach to characterize the naturally occurring sex chromosome boundary presented by the PAR and subsequently identifying the location of the canine PAB. Our cytogenetic approach allowed us to determine that (1) the canine PAR spans approximately 6.6 Mb of the canine genome and (2) the canine PAB exists between two partially overlapping BAC clones that have a non-overlapping distance of 32 kb.

However, while FISH is a valuable tool for visualizing regions surrounding chromosome breakpoints/boundaries, the limitations of optical microscopy restrict the resolution of characterization, and so refinement of the precise location of boundaries to the kilobase and even base pair level requires intervention with bioinformatic analysis. In this study, computational analysis was used to compare genomic DNA sequence within this 32 kb cytogenetically defined region of the female dog genome assembly (Lindblad-Toh et al. 2005) against shotgun reads from a male poodle genome (Kirkness et al. 2003). The result was a decrease in the region containing the PAB to 2 kb (Figure 2), 6% of the previous cytogenetically determined 32 kb boundary.

As well-annotated genome assemblies become more available, computational analysis is proving valuable for identifying structural changes and comparative status across mammalian species. Comparative maps of the dog, cat, and horse genomes exhibiting syntenic sequence segments with human, chimp, mouse, cattle, pig, and rat have emerged as important tools for the identification of genes with important traits such as disease susceptibility/resistance, growth, and performance (Lindblad-Toh et al. 2005, Murphy et al. 2007, Chowdhary & Raudsepp 2008). The location and function of the PAR and PAB defined in other mammalian genomes strengthened previous knowledge concerning unique genetic, physical, and evolutionary properties (Perry et al. 2001) and the characterization of the canine PAR and PAB in this study further enhances our understanding of the comparative nature of mammalian genomics.

Our use of bioinformatic analysis provided the ability to identify the gene that contained the canine PAB, Shroom2: apical protein of Xenopus-like (http://genome.ucsc.edu/) which is located 5.2 Mb from the human PAB gene XG, adding to the growing body of evidence that large variation between mammalian PARs exists. It is hypothesized that the eutherian PARs evolved rapidly and led to divergence in PAR sequences across species (Filatov & Gerrard 2003, Park et al. 2005, Blaschke & Rappold 2006, Bussell et al. 2006). Multiple chromosomal rearrangements, resulting in a degree of shuffling/alteration in gene content that in turn created new barriers to XY recombination, have lead to different sizes and compositions of PARs across eutherian lineages (Park et al. 2005). Our findings support this hypothesis that the ancestral eutherian PAR was larger than the present human PAR, explaining why, despite the high degree of conserved synteny between mammalian X chromosomes, gene content and size varies in the PAR.

A high rate of recombination within the PAR compared to the rest of the genome has proved this region interesting and made the PAB difficult to characterize. In mice the PAB is more complex than a simple transition from PAR sequence into X-specific or Y-specific sequence; a variable tandem repeat array lies within the sequence directly adjacent to the PAR. The number of repeat units seems to vary greatly between mice of different strains as well as mice from the same strain (Perry et al. 2001). Similarly, the identified shotgun reads from the male poodle genome (Kirkness et al. 2003) that were found to span the PAB when compared with canFam2 using BLASTn (Altschul et al. 1997) contained repetitive sequence. However, these repetitive sequences were not located on the reference CFA X near the proposed PAB. Currently there exists no contiguous Y chromosome sequence with which to align the 7.6× female canine genome assembly (Lindblad-Toh et al. 2005) in order to completely demarcate the PAB on CFA Y. The continued characterization of the male poodle genome (Kirkness et al. 2003) will permit assembly of contigs with long-range contiguity, allowing for greater resolution of the PAB on CFA Y.

The characterization of the canine PAR and PAB supports the use of the dog as an important model system for biomedical research. The comparison of human and canine PAR gene content proves yet again that humans and dog share many genomic similarities, reinforcing the dog genome as an excellent resource for the advancement of canine and comparative disease studies. Our approach used to characterize the canine PAR, its size and boundary, allowed us to demonstrate the strength of combining cytogenetic resources with computational analysis to identify chromosomal breakpoint boundaries to kilobase/base pair resolution. Advances in bioinformatic and cytogenetic resources in recent years have allowed a synergistic application to the characterization of genomes, and these processes now serve key roles in precisely defining chromosomal breakpoint boundaries in aberrant genomes. The tandem application of an in-situ/in-silico approach to the study of chromosomal abnormalities present in diseases such as cancer promises the ability to pinpoint key changes leading to aberrant phenotypes, thus improving our understanding of human and canine health.