Nuclear paralogs of mitochondrial genes (numts; Lopez et al. 1994) are the product of the translocation of mitochondrial sequences into the nuclear genome. The characterization and documentation of numts are of evolutionary interest because they offer an insight into the dynamics of genome evolution. Numts have been described from many vertebrate taxa, including the ‘reptiles’ (Hay et al. 2004; Podnar et al. 2007), birds (summarized in Qu et al. 2008), and mammals (Lopez et al. 1994; Lu et al. 2002; Schmitz et al. 2005; Kim et al. 2006; Triant and DeWoody 2009). However, mitochondrial pseudogenes appear to be absent in basal vertebrates (the ‘fishes’) despite the prevalence of mtDNA-based phylogenetic studies (Bensasson et al. 2001). The presence or absence of mitochondrial pseudogenes in agnathan vertebrates is especially interesting as this group is representative of the basal-most clade of extant vertebrates (Osorio and Retaux 2008). Further, reports of large-scale chromatin diminution (loss of > 20% of the embryonic genome) that occur during development in the sea lamprey (Petromyzon marinus; Smith et al. 2009) have stimulated investigation of the genomic structure of this evolutionarily important group (e.g., Covelo-Soto et al. 2014). Here, we report the first characterization of a numt in two lamprey species: Ichthyomyzon castaneus and I. gagei. The southern brook lamprey (I. gagei) is a non-parasitic species that is considered to be a derivative of the parasitic chestnut lamprey (I. castaneus) with which it regionally co-occurs (Docker 2009). Attempts to establish DNA bar-code data for I. castaneus and I. gagei failed to distinguish the two species (April et al. 2011), suggesting that they have recently diverged from their common ancestor.

Previously (Strange et al. 2016), we found sequence ambiguities (e.g., double peaks) while sequencing the mitochondrial NADH dehydrogenase subunit 5 (mt-ND5) gene of I. gagei; we subsequently designed allele-specific primers (as suggested by Song et al. 2008) to selectively amplify and determine the ‘correct’ mitochondrial sequence. The non-mitochondrial sequence includes a single nucleotide deletion approximately 170 bp downstream of the ND5 start codon (Fig. 1a); this deletion results in a frameshift mutation that would render the ND5 protein non-functional, a diagnostic feature of mitochondrial pseudogenes (Triant and DeWoody 2008). We determined that the pseudogene includes the complete protein-coding sequences of ND5 and ND6 (ca. 2.3 kb) and we subsequently focused on this region in the following analyses (henceforth referred to as ψND5/6).

Fig. 1
figure 1

a Consensus alignment of the 5′ end of mitochondrial (mt) and pseudogene (ψ) copies of the ND5 coding sequence in Ichthyomyzon gagei. Points (.) refer to synonymous nucleotides, substitutions are represented by letters, and the deletion at position 170 is indicated by an asterisk. b Phylogenetic relationships of lamprey mitochondrial ND5/6 sequences and the corresponding ψND5/6 pseudogene of Ichthyomyzon castaneus and I. gagei as inferred from Parsimony and Likelihood analyses. Filled circles at nodes represent bootstrap support greater than 95%. Branch lengths are proportionate to the likelihood estimates of number of substitutions per site. The state from which samples of I. castaneus and I. gagei were obtained is indicated by two-letter suffixes (see Table 1): AR Arkansas, IN Indiana, KY Kentucky, MO Missouri, and TX Texas

Table 1 Species, collection localities, and GenBank accession numbers for lampreys used in this study

Application of our allele-specific primers to other specimens of I. gagei (three populations), I. castaneus (three populations), and several other lamprey species provide evidence that the ψND5/6 pseudogene is restricted to I. castaneus and I. gagei (Table 1). Within each individual examined, the ψND5/6 and mtND5/6 sequences differ from each other by an average of 57.17 sites (2.48%), including the previously indicated single deletion at position 170 in all ψND5/6 sequences. Phylogenetic analysis of the mtND5/6 and ψND5/6 sequences (Fig. 1b) revealed relationships consistent with previous mtDNA-based studies (Lang et al. 2009). All ψND5/6 sequences from I. castaneus and I. gagei formed a monophyletic group sister to the mitochondrial ND5/6 sequences of the same species, suggesting that the duplication that gave rise to the pseudogene occurred in their unique common ancestor. We found greater divergences (uncorrected p) between mtND5/6 and ψND5/6 clades (2.45%) than within either clade; further, the average divergence within the mtND5/6 clade was notably higher (0.61%) than that for the ψND5/6 clade (0.14%). When compared to the inferred ancestral ND5/6 sequence, the mitochondrial clade experienced 44 substitutions, while the pseudogene clade experienced seven substitutions and a single deletion. Relative-rate tests confirmed that the rate of sequence evolution in the mtND5/6 clade is 5.5% greater than that of the ψND5/6 clade (P = 0.00005). This large difference between the evolutionary rates of the mtND5/6 and ψND5/6 sequences suggests that ψND5/6 resides in the nucleus where the substitution rate is significantly lower than the mitochondrial genome (e.g., Zischler et al. 1995; Lu et al. 2002).

ψND5/6 is clearly a duplication of the ND5/6 region that occurred in the ancestor of I. gagei and I. castaneus. Our previous work with I. gagei revealed no evidence of a duplication of this region within its mitochondrial genome (Strange et al. 2016), suggesting that ψND5/6 is most likely located somewhere among the nuclear chromosomes. Further, ψND5/6 exhibits a decreased rate of sequence evolution relative to its mitochondrial counterpart, consistent with the lower rate of evolution found in the nuclear genome. While cytogenetic evidence is needed to demonstrate its location within the nucleus, we propose that ψND5/6 is a recent transfer from the mitochondrial genome to the nuclear genome.

Given the phylogenetic position of lampreys relative to other vertebrates, the presence of numts within lampreys raises two hypotheses regarding the evolution of vertebrate genomes. First, the mechanism by which numts are inserted into the nuclear genome may be a retained vertebrate characteristic in agnathans and amniotes, but lost in teleost fishes. A second (and more intriguing) hypothesis is that the presence of numts in lampreys is a product of the genome rearrangements these animals undergo during development (e.g., Smith et al. 2009; Covelo-Soto et al. 2014). Further investigation into these hypotheses will provide insight into the evolution of vertebrates and their genomic organization.