Introduction

Macrosatellites are large, repetitive sequences composed of many kilobase-sized tandem repeats, and there are at least 50 such satellites in the human genome (Warburton et al. 2008; Tremblay et al. 2010). However, due to the limitations of sequence assembly, such repetitive loci are still largely unexplored “white space on a map” in most genomes (Alkan et al. 2011). One of the most extensively studied human macrosatellites is D4Z4, which is of particular interest because of its link to the human genetic disorder facioscapulohumeral muscular dystrophy (FSHD).

Nested within each 3.3-kb repeat unit of D4Z4 is one copy of the intronless double-homeobox gene DUX4 (Hewitt et al. 1994). Aberrant expression of DUX4 in muscle fibers of patients with FSHD has been causally linked to muscle degeneration and disease (reviewed by van der Maarel et al. 2011). In humans, large tandem arrays of D4Z4 sequences are located at the subtelomeres of both chromosome 4 and 10. These arrays are highly polymorphic in copy number, and each of the four alleles usually contains 11 to >150 D4Z4 units, making DUX4 the human protein-encoding gene with the highest overall copy number (Alkan et al. 2009). In FSHD, the chromosome 4 array is contracted to fewer than 11 repeats (Wijmenga et al. 1992; van der Maarel et al. 2011). This is thought to “relax” the D4Z4 chromatin and cause the de-repression and transcription of DUX4 in muscle, where this gene is usually silenced (Snider et al. 2010; Lemmers et al. 2010a).

For human D4Z4, we have extensive knowledge about its internal organization (Lemmers et al. 2004, 2001), epigenetic modifications (de Greef et al. 2009; Zeng et al. 2009), repeat number distribution (Rossi et al. 2007), and recent evolution (Lemmers et al. 2010b), but we know little about the organization and evolution of D4Z4-related sequences in other mammals.

Previous studies showed that both the tandem array organization and the subtelomeric localization of D4Z4 are conserved in chimpanzee, orangutan, and gorilla (Clark et al. 1996; Rudd et al. 2009). D4Z4-like sequences containing intronless DUX4 open-reading frames were also identified in the genomes of the deeply rooted mammalian clade Afrotheria (elephant, hyrax, and tenrec), while a homolog with a similar structure (Dux) was found in mouse and rat (Clapp et al. 2007). Like DUX4 in primates and Afrotheria, mouse Dux is intronless, and many copies of it are embedded within larger repeats (4.9 kb) in tandem array macrosatellites (Clapp et al. 2007).

Other mammals apparently lack intronless DUX genes, although their genomes do contain a number of intron-containing DUX homologs (DUXA, DUXB, Duxbl, and DUXC) (Leidenroth and Hewitt 2010). Of these four genes, DUXC is the most closely related to DUX4 (Leidenroth and Hewitt 2010). DUXC is also the only intron-containing DUX gene to share a conserved C-terminal domain with DUX4 and Dux (Clapp et al. 2007; Leidenroth and Hewitt 2010). This domain can act as a transcriptional activator: several cases of Ewing-like sarcoma are linked to a fusion of the DUX4-CTD to another DNA binding protein (CIC) by translocations (Kawamura-Saito et al. 2006).

Previously, we reported evidence for DUXC homologs in the mammalian groups Laurasiatheria (dog, cow, dolphin, and bat) and Xenarthra (armadillo and sloth) (Leidenroth and Hewitt 2010). Intriguingly, we never identified any intronless (DUX4-like) retrogenes in any of these species. Conversely, genomes that contain DUX4 or Dux appeared not to contain DUXC. Based on their reciprocal species distribution pattern and close relatedness, DUX4 and DUXC could be functional homologs.

Until now, DUXC was presumed to be a single-copy gene. We had therefore previously hypothesized that the intronless DUX4 and Dux macrosatellites arose in the common ancestor of placental mammals through the reverse-transcription and retrotransposition of a spliced ancestral DUXC mRNA into a new genomic location. It was thought that local copy-number expansion of these new retrogenes then created the D4Z4 macrosatellites (Clapp et al. 2007; Leidenroth and Hewitt 2010). This model implied that the primate, murine, and Afrotheria lineages had lost DUXC but retained the intronless DUX macrosatellites, while the Laurasiatheria and Xenarthra retained DUXC but lost DUX4.

Here, we present new data that suggest an alternative evolutionary model. The genome of the most recent common ancestor of all placental mammals contained a DUXC macrosatellite but no intronless tandem arrays. The intronless DUX4 and Dux genes then arose independently several times by separate retrotransposition events that displaced the ancestral DUXC macrosatellite in primates, murines, and the Afrotheria.

Materials and methods

Sample acquisition and tissue culture

Illumina reads were downloaded from the DNA database of Japan (http://trace.ddbj.nig.ac.jp). The cow fibroblast cell line GM06034 was purchased from the Coriell Institute for Medical Research. The cow embryonic fibroblast cell line (BFF3) was donated by Ramiro Alberio. Cells were grown under standard tissue culture conditions. For the DUXC FISH, blood samples from Bos taurus and Bubalus bubalis were used. Tenrec (Microgale cowani), hyrax (Procavia capensis), and elephant (Loxodonta africana) cell lines were provided by Willem Rens, Department of Veterinary Medicine, University of Cambridge.

Copy-number analysis with mrsFAST and mrCaNaVaR

We used mrsFAST (Hach et al. 2010) (2.3.0.2) and mrCaNaVaR (0.32) (Alkan et al. 2009). We had previously assembled the B. taurus DUXC locus from trace archive data (Clapp et al. 2007). Using this assembly, we built a small reference genome including DUXC (5.9 kb), using DUXA (5.5 kb) and ZAR1 (6.3 kb) as controls and cow chromosome 24 (66 Mb) to estimate background read-depth.

The reference genome was processed with RepeatMasker (www.repeatmasker.org). Tandem Repeat Finder (Benson 1999) was run with parameters “2 7 7 80 10 50 500 -m”. Genome assembly gaps were downloaded from UCSC (http://genome.ucsc.edu). The reference sequence was indexed with mrsFAST–index, and copy windows were defined with mrCaNaVaR–prep. Reads were mapped using default parameters and 5 % hamming. Aligned *.sam files were processed with mrCaNaVaR–read and –call modes, and copy-numbers calls were calculated as an average across windows spanning DUXA (two windows) and DUXC (three windows); outer windows were excluded to avoid edge effects.

DNA preparation and restriction digests

Genomic DNA was extracted using standard methods (Miller et al. 1988). For pulsed-field gel electrophoresis (PFGE), cells were embedded in low gelling agarose type VII (SIGMA-Aldrich) and equilibrated in the appropriate restriction enzyme buffer. Enzymes were purchased from Roche (BlnI, BamHI, and EcoRV), NEB (PvuII), Fermentas (BglII), and Promega (HindIII).

Gel electrophoresis and Southern blot

Digested plugs were equilibrated in 0.5 × TBE buffer and run in a Bio-Rad Chef Mapper tank in a 0.5 × TBE 1 % gel. Cow DNA was separated at 15 °C for 48 h with 5–120 s ramp at 4.5 V/cm to resolve the smaller fragments, or at 16 °C for 26 h with 8–120 s ramp at 6 V/cm to resolve the larger fragments. Linear 1 % agarose gels (20 × 20 cm) were run for 24 h at 40 V in 1 × TAE buffer. DNA markers for sizing were: Lambda ladder (NEB), MidRange PFG marker I (NEB), or digoxigenin-labeled high-molecular weight marker II (Roche). DNA was transferred to a positively charged nylon membrane (Roche) using standard protocols.

Probes were amplified using BioMixTM Red (Bioline) and cloned into pGEM T-Easy vector (Promega). Primer pairs were 5′ CTATACAGCACTCATCAAATCTAGC 3′ + 5′ CCCAAAAGCAATGCCAAACTAGTC 3′ (p13E-11) and 5′ TGGTTTCAAAACCGAAGAGC 3′ + 5′ AGGAGAGGACCCTGGAGAAG 3′ (cow DUXC). Digoxigenin-labeled probes were synthesized with the PCR DIG probe synthesis kit (Roche) according to manufacturer's instructions. To probe the lambda ladder, 0.5 μg of BstEII cut lambda DNA (NEB) was labeled with Fluorescein High-Prime (Roche) according to manufacturer's instructions.

Membranes were pre-hybridized with DIG EasyHyb (Roche), and probes were hybridized overnight at 42 °C (p13E-11) or 59 °C (DUXC) in a roller oven. For signal detection, the Roche DIG wash and block buffer set was used. For linear gels, anti-digoxigenin-AP antibody (Roche) was diluted 1:20,000 in 20 ml fresh 1 × DIG blocking solution. For pulsed-field gels, antibody detection was instead performed in 5 ml blocking solution supplemented with 2 μl anti-fluorescein NEF709 antibody (Perkin-Elmer). Signal was detected with CPD-star (Roche).

DUX4/DUXC metaphase FISH analysis

The Afrotheria DUX4 probes were amplified from genomic DNA by PCR and cloned into T-Vector. Tenrec (M. cowani): 5′ GTGGCCAGGAAGATGACAAA 3′ + 5′ TGACGCTTTCAGAGGCTTGT 3′. Hyrax (P. capensis): 5′ GCTTTGCCCTCGTTTACCTG 3′ + 5′ GGAGGCATTTCCTTTCGCAAC 3′. Elephant (L. africana): 5′ GAACTCCTCCCTGCCATCAC 3′ + 5′ TCTCTCCCCACAGTGCTTGA 3′. Probes were between 2.1 and 2.4 kb in size and labeled using biotin. The hyrax and tenrec probes span part of the DUX4 ORF. FISH was performed as described previously (Rens et al. 2006). The DUXC probe in pGEM T-Easy (see above) was labeled and hybridized to metaphase chromosomes using fluorescence in situ hybridization and cow RBPI-banding methods as described elsewhere (Iannuzzi and Di Berardino 2008).

Sequence analysis and alignments

BLAST analysis, trace archive searches, and phylogenetic analysis were performed as previously described (Leidenroth and Hewitt 2010).

Results

Telomeric DUX4 macrosatellites are also present in the Afrotheria

The telomeric location of DUX4 on chromosomes 4 and 10 is conserved in primates (Clark et al. 1996; Rudd et al. 2009). In human and ape genomes, additional D4Z4-related arrays are preferentially found in pericentromeric regions and the heterochromatin of acrocentric chromosomes (Lyle et al. 1995; Clark et al. 1996). To identify DUX4 in the Afrotheria genomes, we hybridized elephant, hyrax, and tenrec chromosomes with species-specific DUX4 probes (Fig. 1 and Online Resource 1). In all three species, most signals mapped to telomeric or pericentromeric regions of acrocentric chromosomes. In elephant, there are multiple telomeric signals of varying intensity, where stronger signals could represent higher sequence identity or copy number. In hyrax, there is a single telomeric signal on an acrocentric chromosome. Tenrec shows signals near centromeres on two chromosomes. The signals on the two chromosomes have different intensities, probably indicating two arrays of different sizes. Further support for DUX4 tandem arrays in the Afrotheria comes from the reference assemblies of elephant and hyrax, which include contigs with four and five repeats (Fig. 1d and e).

Fig. 1
figure 1

Telomeric DUX4 satellites are also present in the Afrotheria. ac Representative metaphase FISH images, with chromosomes mounted in DAPI and DUX4 probes labeled with Cy3. The species-specific DUX4 probes produce strong telomeric signals in elephant, hyrax, and tenrec (white arrows). d Dot plot of hyrax contig 209751 aligned against itself using BLAST. This contig contains four tandem copies of ~3.9 kb each and a partially inverted repeat. Arrow direction indicates orientation of the DUX4 open-reading frame within the repeat. e The elephant contig 85902/85903 (scaffold 112) contains four copies of about ~5.5 kb each. The red arrow marks a repeat with a DUX4 ORF that has an internal stop codon

We could not determine which Afrotheria chromosomes carried the DUX4 signals, but we would expect no conservation of the chromosomal location of DUX4 between Afrotheria and primates: in the primate lineage, DUX4 maps distal to FRG1 at the end of chromosome 4q. However, in other mammals, FRG1 is not located at the end of a chromosome but internally; it is located next to the gene ASAH, which has an ortholog on human chromosome 8p (Grewal et al. 1997). There is conservation of synteny between FRG1 and ASAH in all mammalian groups except primates. In the ancestral Eutherian genome, FRG1 and ASAH were neighboring loci, but in primates, a chromosomal fission event distal to FRG1 separated them, generating 4qter (Ferguson-Smith and Trifonov 2007). In non-primates, there are no DUX genes located near FRG1 homologs. This strongly suggests that in primates, DUX4 was transferred distal to FRG1 on 4qter after the chromosomal fission. Therefore, although Afrotheria genomes do contain DUX4 arrays, these are not found at the orthologous location to that of the primate arrays. Inspection of the elephant genome (loxAfr3) confirms conservation of this ancestral FRG1/ASAH linkage group, with no evidence for DUX4 at this location.

DUXC copy-number analysis by read-depth analysis

Our surveys of sequence archives indicated that DUXC might also be present at high copy number, with the number of sequence traces far exceeding the average fold-coverage of genomes. This was surprising, as we expected DUXC to be a single-copy gene like DUXA, DUXB, or Duxbl. Given the close relatedness of DUXC to the high copy-number genes DUX4 and Dux, we decided to investigate this further. We chose to study DUXC in cow, as we had access to next-generation sequencing data for different cattle breeds, and a cell line for experimental confirmation.

Gene copy numbers can be estimated by counting next-generation sequencing reads. While bacterial shotgun cloning bias makes inferences from Sanger traces difficult, short-read technologies largely avoid this issue.

Several algorithms can estimate genomic copy numbers from the “relative enrichment” of reads over the average background of diploid loci. MrsFAST and mrCaNaVaR have been developed to study genome-wide copy numbers (Alkan et al. 2009; Hach et al. 2010). In three human genomes, these algorithms identified DUX4 as the gene with the highest overall number (Alkan et al. 2009). In HapMap individual Yoruba NA18507, mrCaNaVaR counted 97 total diploid copies for DUX4 (Alkan et al. 2009). This agrees reasonably well with a DUX4-copy estimate for NA18507 (82 copies) that was based on PFGE (Online Resource 2a).

We used these tools to test DUXC copy number in B. taurus by remapping publicly available Illumina data to a custom-built reference (see “Materials and methods”). We included DUXA and ZAR1, a known single-copy locus in cow (Uzbekova et al. 2006), as controls. The analysis takes into account bias from over- and underrepresentation of sequences of extreme GC contents (Online Resource 2b). Online Resource 3 summarizes our copy-number estimates for datasets from four different breeds. While both ZAR1 and DUXA were assigned copy-number calls of around two, the high copy-number predictions for DUXC ranged from 174 (Sahiwal breed) to more than 400 (N'Dama breed). We interpreted these results as order-of-magnitude and evidence for DUXC amplification rather than precise estimates of copy number. Nonetheless, this means that DUXC is present at a copy number similar to that of DUX4 in humans.

The DUXC copies are arranged in large tandem arrays

The bioinformatics analysis does not reveal whether the hundreds of DUXC copies are dispersed as single sequences across the genome, or whether they are clustered in a single locus such as a tandem-array macrosatellite. We tested the BFF3 cell line with PCR primers facing in opposite orientations in DUXC (Fig. 2a). We observed a product of 576 bp (data not shown), as would be predicted by a tandem-array organization of adjacent repeats (confirmed by Sanger sequencing, data not shown). PCR and trace archive data allowed the repeat unit size to be defined as 5.89 kb, and showed “inverted” mate pairs, where the end sequences of clones were orientated in opposite directions, indicating a tandem-array arrangement.

Fig. 2
figure 2

DUXC in cow is present in large tandem arrays. All gels were subjected to Southern blotting using the cow DUXC probe. a Diagram of one repeat unit of the DUXC sequence. Positions of restriction enzyme sites for Southern blots in parentheses. Exons are represented by boxes, homeoboxes are highlighted in blue. If repeats are arranged in tandem, a digest would yield positive fragments of 3907 bp with BamHI. 4334 and 1548 bp bands are produced by PvuII, but only the large fragment is detected by this probe. Primer positions for the repeat-junction spanning PCR are shown (total product size 576 bp). b Southern blot using enzymes predicted not cut within the DUXC array (PFGE parameters 15 °C for 48 h with 5–120 s ramp at 4.5 V/cm). Large positive fragments >1 Mb are unresolved under these conditions. HindIII and EcoRV digests also yield smaller fragments of around 85 kb and 242 kb. c PFGE using parameters to improve resolution of the large fragments (16 °C for 26 h with 8–120 s ramp at 4.5 V/cm). d Linear gel electrophoresis shows two digests with a single-cutter (BlnI) and two double-cutters (BamHI and PvuII) that yield bands of the expected sizes of 5.9, 3.9, and 4.3 kb, respectively. LL lambda ladder, MR mid range marker, DL DIG ladder

PFGE and Southern blotting using enzymes that have no restriction site within DUXC (BglII, HindIII and EcoRV) excises large fragments of 1–2 Mb (Fig. 2b and c). HindIII and EcoRV generate additional fragments of around 85 and 242 kb, and a size decrease in one of the large fragments. Thus, this individual animal carries large arrays of 150–350 repeats, which is consistent with our bioinformatics analysis of other individuals. There is one predicted BlnI site per DUXC repeat. Accordingly, linear gel electrophoresis followed by Southern blot shows a single strong band, migrating at the size of one repeat (Fig. 2d). There are two predicted cut sites each for BamHI and PvuII which generate bands of 3.9 or 4.3 kb, consistent with a head-to-tail orientation (Fig. 2a, d).

The DUXC macrosatellite in cow is located in a pericentromeric region

We hybridized metaphase chromosomes of two species of cattle with our DUXC probe. In B. taurus, we observed a single strong signal in the chromosome BTA7q12 pericentromeric region (Fig. 3a). This also confirmed that the signals on our pulsed-field gels represent a single DUXC locus. In river buffalo (B. bubalis), there was a strong signal at the homoeologous locus in the pericentromeric region of BBU9q12 (Fig. 3b).

Fig. 3
figure 3

The DUXC array in cow is a single locus near pericentromeres. a RPBI banding and overlaid FITC signals with a DUXC probe show that the DUXC array in cow localizes to a single locus in the pericentromeric region on 7q12. b In River buffalo, the probe hybridizes to the homoeologous locus on 9q12. There is only one hybridizing locus in each species, indicating that the DUXC macrosatellite is a single locus containing many copies

Other Laurasiatheria also have DUXC tandem arrays

To see if the genomic organization of DUXC was conserved in other species, we analyzed reference genomes. In the B. taurus assembly, BLASTn detects a single DUXC sequence at the tip of chromosome 7, which agrees with the FISH signal (Fig. 4). It is not surprising that the reference contains only a single repeat compared to the hundreds we observed by PFGE, as reads from identical repeats either “collapse” onto one single sequence upon assembly or are excluded from the assembly. Therefore, we independently assembled cow DUXC Sanger sequences from the NCBI trace archive. This also indicated a high copy number and a tandem-array organization (Online Resource 4).

Fig. 4
figure 4

Telomeric DUXC tandem arrays are present in other Laurasiatheria. a Ensembl BLAST analysis of the DUXC orthologs from cow, pig, and dog all show loci at telomeric regions. bd Dot plots of three DUXC loci. The sequences were aligned against themselves using BLAST with default parameters. Arrow indicates orientation of transcription. For example, b shows that the dolphin contig 166 contains four DUXC copies in tandem, with a single repeat unit size of ~8.5 kb. c shows the Alpaca DUXC with three 8.5 kb repeats in tandem. d shows a dot plot of two tandem repeats of DUXC in sloth, a member of the Xenarthra

Using a similar analysis, we found numerous examples of DUXC tandem arrays in other members of the Laurasiatheria. The dolphin assembly has two separate contigs, each containing four tandem DUXC repeats, and we also observed inverted mate-pair traces for this species. The alpaca genome also contains a contig with tandemly arrayed DUXC copies (Fig. 4 and Online Resource 4). Both the pig and dog reference genomes contain a single copy of their respective DUXC ortholog in the subtelomeric regions on chromosome 17 (Fig. 4), but these are probably also collapsed tandem arrays, which is supported by trace archive data with multiple inverted mate pairs in both of these species. Although the horse reference genome contains a single DUXC copy on chromosome 1, our trace archive interrogation identified several others, for which we also observed inverted mate pairs (Online Resource 4). Interestingly, we also found a tandem array DUXC locus in sloth, a member of the Xenarthra (Fig. 4). Together, this suggests that the tandem-array organization of DUXC is conserved throughout the Laurasiatheria and Xenarthra.

Additionally, there is a preferential localization of both DUX4 and DUXC to telomeric or pericentromeric regions. However, for most Afrotheria DUX4 and Laurasiatheria DUXC homologs, the short sequence contigs preclude any analysis of synteny. Although pig, dog, and cow DUXC loci have been assigned to chromosomes in Ensembl (17ter, 17ter, and 7ter, respectively), there was no conservation of synteny between these regions.

The recent arrival of DUX4 on 4qter in primates means that for primate DUX4 and Laurasiatheria DUXC, the observed subtelomeric localization is not due to chromosomal synteny. Therefore, this preferential localization could be due to other mechanisms such as convergent evolution, or a mechanistic pressure.

Discussion

Conservation of the genomic organization between DUX4 and Dux

We have shown that the unusual genomic organization of DUX4 is found not only in primates but also in the Afrotheria. The murine Dux locus shares both sequence and genomic organization with DUX4 (Clapp et al. 2007), but does not share any synteny with primate DUX4. Although the Dux tandem arrays are not in a telomeric or pericentromeric location, they lie adjacent to a murine-specific chromosomal fusion point (Clapp et al. 2007). According to the NCBI m37 assembly, the genomic sequence adjacent to Dux contains approximately 500 bp of a degenerate (TTAGGG)n array (10:57582100-57582613), indicative of a recent subtelomeric origin (Flint et al. 1997). This illustrates both the high mobility and plasticity of DUX-containing macrosatellites, and the difficulty of assigning orthology.

Origins of intronless DUX macrosatellites

Genomes that contain DUX4 or Dux arrays appear to lack DUXC, and vice versa. As DUX4 and Dux lack introns but share highly similar homeodomain sequences as well as the C-terminal transcriptional activation domain with DUXC, these intronless genes probably arose by retrotransposition from an ancestral DUXC gene. Unexpectedly, we have found that DUXC is also a telomeric/pericentromeric high-copy macrosatellite, which suggests a simple model for the observed distribution of these genes in mammals (Fig. 5). In this model, DUX4 (and Dux) arose multiple times independently in different mammalian lineages, with the DUXC retrotranspositions occuring not at random genomic sites but at the parental DUXC macrosatellite. Although in most cases, processed retrogenes insert into random genomic locations and are often “dead on arrival” (Vinckenbosch et al. 2006), reverse-transcripts can also displace their parental gene (Derr and Strathern 1993) by repair of double strand breaks through homologous recombination (Hu 2006). The high DUXC copy number would have provided many homologous targets for recombination; additionally, high-copy genes may be highly expressed in the germline and provide many cDNA template molecules, with ample opportunity for retrogenes to arise (Vinckenbosch et al. 2006; Fink 1987). Although little is known about the expression of DUX genes in humans, robust DUX4 expression has been shown in human testis by Snider et al. (2010), who also reported DUXC transcripts in dog testis. Similarly, the mouse Duxbl homolog is expressed in testis and ovary (Wu et al. 2010).

Fig. 5
figure 5

A new model for the evolution of DUX gene macrosatellites. The genome of the common ancestor of placental mammals contained a DUXC macrosatellite. In some mammalian lineages, spliced DUXC sequences were retrotransposed back into the array, resulting in intron loss by gene conversion followed by array homogenization. This was the origin of DUX4 macrosatellites, which thus displaced DUXC macrosatellites in primates and Afrotheria, while Laurasiatheria and Xenarthra maintained the DUXC array

In our model, DUXC already existed as a high-copy tandem array in the common ancestor of placental mammals. A retrocopy integration in the germline of an intronless, spliced DUXC sequence into a single DUXC repeat unit could be followed by the local spreading and homogenization of the intronless variant through the rest of the tandem repeats. Such array homogenization (“concerted evolution”) is known to occur in tandem arrays like the rDNA clusters (Ganley and Kobayashi 2007).

According to this model, DUX4, Dux, and DUXC did not need to acquire their tandem array structures independently. Replacement of the parental DUXC gene in this manner would also explain why mammals have either DUX4/Dux or DUXC, but never both (this work; Leidenroth and Hewitt 2010). Thus, it may be more appropriate to think of the DUX4/Dux retropositions as gene conversions leading to intron loss of DUXC (Hu 2006), which makes DUXC and DUX4 effectively “retro-orthologs”.

We found no evidence for conservation of synteny for the DUXC orthologs and Afrotheria DUX4, but telomeric and subtelomeric regions are well known for their plasticity (Mefford and Trask 2002) and are therefore often poorly integrated in genome assemblies. This also means that lower-resolution techniques such as chromosome painting are unlikely to answer the question of synteny between these DUX genes. However, the impending arrival of long-read sequencing could soon offer a useful alternative tool.

The unusual genomic arrangement of DUXC, DUX4, and Dux appears to be conserved throughout the placental mammals. There must be mechanistic or selective pressures maintaining all of these unusual tandem arrays in mammals, but these are currently unknown.