Introduction

With regard to size and structure, the animal mitochondrial (mt) genome is particularly uniform. With >800 animal mt genomes sequenced (1127 as of 17 October 2007), the vast majority contain the same set of 36–37 genes on a single circle of double-stranded DNA, with minimal intergenic and noncoding DNA (Boore 1999). However, there are isolated exceptions to these general features. For example, subgenomic minicircles (circularized fragments of the mt genome) have been detected in a sporadic but diverse range of animals: in mesozoans (primitive worm-like parasites) (Watanabe et al. 1999; Awata et al. 2005), humans (Kajander et al. 2000), and nematodes (Lunt and Hyman 1997; Armstrong et al. 2000). Subgenomic minicircles are an anomaly, as it has been proposed that they represent the end-products of recombination (Lunt and Hyman 1997), a process considered absent in animal mt genomes (Moritz et al. 1987). Although they could be conceivably produced by incomplete replication of a “master circle” (followed by circularisation), the coexistence of reciprocal subgenomic circles is most consistent with their production through recombination (Lunt and Hyman 1997).

However, it is not clear how prevalent subgenomic minicircles are, and whether they always indicate the operation of recombination. We know of only four reports of minicircles (cited above), and these are from very divergent groups. Nematodes potentially represent the group in which subgenomic minicircles are most widespread, given that they have been described in two divergent genera: Meloidogyne javanica (Lunt and Hyman 1997) and Globodera pallida (Armstrong et al. 2000) and Globodera rostochiensis (Gibson et al. 2007a). These two nematode genera belong to different families within the superfamily Tylenchoidea. However, the natures of the subgenomic circles are very different in these two nematode genera. The subgenomic circles of M. javanica are either very small (250-bp minicircles) or very large (16,000-bp maxicircles), while those of the two Globodera species are of intermediate size (range, 6400–9500 bp). The minicircles of M. javanica are thought to be produced by nonhomologous (but repeat mediated) intra-mt recombination and contain no genes or gene fragments, while those of Globodera are thought to be produced by homologous inter-mt recombination, and contain overlapping sets of genes (Armstrong et al. 2000; Gibson et al. 2007a, b). This overlapping arrangement was reminiscent of the multipartite structure of the plant mt genomes (Palmer and Shields 1984) and was named as such in G. pallida (Armstrong et al. 2000).

We recently completed an extensive sequence characterization and analysis of five subgenomic mtDNAs (scmtDNAs: small circular mtDNAs) of G. pallida (Gibson et al. 2007b). This indicated that the subgenomes were generally mosaics of each other, composed of fragments (some as long as 6500 bp) present on other circles. Although these fragments showed a very high level of sequence similarity (94–98%), some circles contained gene copies that appeared nonfunctional, containing point indels that would disrupt the reading frame. These were presumed to be pseudogenes but could conceivably be rescued by RNA editing.

We are interested in determining the evolutionary extent of multipartite mt genomes among relatives of G. pallida and, as a first step toward that, report here the complete sequence of a subgenomic circle from Globodera rostochiensis. One advantage of examining mt genomes in the latter nematode is that there is an extensive EST database for it (nearly 6000 EST sequences; nematode.net). We can therefore examine whether any apparently nonfunctional mt genes are expressed.

Materials and Methods

Genomic DNA extraction

Globodera rostochiensis nematodes were imported into Australia from the Scottish Crop Research Institute (Invergowrie, Scotland) as a suspension in 95% ethanol. Genomic DNA was extracted both from multiple adults and from individual cysts. In both cases, the biological material was centrifuged at 1500 g for 5 min to separate it from the ethanol, and the pellet extracted by the method of Sunnucks and Hales (1996). However, the proteinase K step was varied for the extraction of single cysts; the homogenate was incubated at 65°C for 1 h, then 95°C for 10 min, to inactivate the proteinase K. DNA was resuspended in TE buffer (1 mM Tris-HCl, 0.1 mM EDTA [pH 8]) and stored at 4°C until use.

We verified by two methods that the nematodes extracted were indeed G. rostochiensis. In the first, we used a primer pair and PCR amplification conditions that are specific to the G. rostochiensis ITS1 region (Mulholland et al. 1996). This amplified a fragment of the expected size, while amplification with G. pallida specific primers (Mulholland et al. 1996) did not. In the second method, we amplified and sequenced the ITS1 and ITS2 regions using the primers described by Subbotin et al. (2001) and aligned our sequence with those in GenBank for G. pallida and G. rostochiensis. Our sequence was identical to some of the G. rostochiensis sequences.

Globodera pallida nematodes (populations Gourdie, Luffness, and P4A) were cultivated and genomic DNA extracted in Scotland, as described (Armstrong et al. 2000). These populations are named after the region from which they were first collected; Gourdie and Luffness are both from the United Kingdom, and P4A is from South America.

Amplification and cloning of a mt subgenome from G. rostochiensis: Gro-scmtDNA I

A mt subgenomic circle (Gro-scmtDNA I) was amplified from a genomic extract of multiple individuals of G. rostochiensis in two overlapping pieces (hereafter referred to as fragments 1 and 2). Fragment 1 was amplified using primers designed to anneal to regions of the mt genes for ND4 and Cytb. The ND4 primer was designed to a conserved region of this gene, identified after alignment of ND4 genes from related nematodes, with the primer designed using CODEHOP (Rose et al. 2003). The Cytb primer was designed during the sequencing of two mt subgenomes in the close relative G. pallida (Gibson et al. 2007b). Sequence data from fragment 1 were used to design outward facing primers for the amplification of fragment 2. Two primer combinations were trialed for fragment 2. Primer sequences are listed in Table 1.

Table 1 Primer sequences used for amplification of fragments 1 and 2 from G. rostochienis, coding regions of fragments 1 and 2 from G. rostochienis, and ND4, COII, and COIII fragments from G. pallida

Amplification reactions utilized the Expand Long Template PCR System (Roche, Australia). Each 20-μl reaction contained Expand Long Template buffer 2, 490 μM dNTPs, a 200 nM concentration of each primer (Sigma, Australia), 2.5 U of Expand Long Template Enzyme mix (containing Taq and Tgo DNA polymerases), and 0.5 μl of extracted genomic DNA. Cycling conditions consisted of 2 min at 92°C; followed by 35 repeats of 10 s at 92°C, 30 s at 55–60°C, and 7 min at 68°C; followed by 7 min at 68°C. Amplicons of fragment 1 were purified after agarose gel electrophoresis by gel extraction, using the QIAquick Gel Extraction Kit (Qiagen, Australia). Amplicons of fragment 2 were similarly purified, but using the Wizard SV Gel and PCR Clean-Up System (Promega, Australia).

The polymerase enzymes used to generate fragments 1 and 2 do not leave an unpaired adenine at the 3′ end. Each fragment was adenylated by incubation at 70°C for 30 min with 5 U of Taq DNA polymerase (Promega, Australia) and 200 μM dATP in 10 mM Tris-HCl (pH 9 at 25°C), 50 mM KCl, 0.1% Triton X-100, 2.5 mM MgCl2. After adenylation, fragments 1 and 2 were cloned into the pGEM-T Easy vector system (Promega, Australia).

Amplification and cloning of the coding regions of fragments 1 and 2, from single G. rostochiensis cysts

A 1.5-kb fragment spanning most of the ND4 gene was amplified from the genomic DNA of a single G. rostochiensis cyst; the ND4 gene is the only coding region on fragment 1. Similarly, a 4.2-kb fragment spanning the coding region of fragment 2 was amplified from the genomic DNA of a single cyst. The primer sequences for both reactions are listed in Table 1. Each 20-μl reaction contained OptiBuffer (composition withheld by manufacturer; Bioline, Australia), 2.75 mM MgCl2, 490 μM dNTPs, a 200 nM concentration of each primer, 1.2 U of BIO-X-ACT Long DNA polymerase (containing a combination of DNA polymerases; composition withheld by the manufacturer; Bioline, Australia), and 1 μl of extracted genomic DNA (diluted 1/50 or 1/100, depending on the reaction). Cycling conditions were 2 min at 94°C; followed by 35 repeats of 10 s at 94°C, 30 s at 50, 52, or 54°C, and 4 min (fragment 1) or 10 min (fragment 2) at 68°C; followed by 5 min at 68°C. The polymerase enzymes used do leave an unpaired adenine at the 3′ end. Both fragments were gel purified, using the Wizard SV Gel and PCR Clean-Up System (Promega, Australia). Amplicons were then cloned into the pGEM-T Easy Vector System (Promega, Australia).

Amplification and cloning of ND4, COII, and COIII gene fragments from G. pallida

Genomic DNA was extracted from ∼100 cysts of G. pallida populations Gourdie, Luffness, and P4A using a Qiagen DNeasy Tissue Kit (Qiagen, UK). The DNA was quantified using a nanodrop spectrophotometer (Labtech International, UK), and dilutions made with water to 2 ng/μl and stored at −20°C.

RNA was extracted from ∼20 mg of juvenile nematodes by grinding in a 1.5-ml centrifuge tube with a plastic pestle and fine sand using TRI Reagent (Sigma-Aldrich, UK) following the manufacturer’s protocol. Following resuspension of the extracted RNA in water, a subsample was quantified with a nanodrop spectrophotometer and treated with DNA-free (Ambion, UK). First-strand cDNA was synthesized with ∼100 ng of total RNA using the SuperScript III First-Strand Synthesis System (Invitrogen, UK) and OligoT primer supplied in the kit. Reactions were terminated at 85°C for 5 min. RNA extractions were attempted from individual nematodes but did not yield sufficient RNA for downstream cDNA synthesis and PCR reactions.

PCR was performed with genomic DNA or cDNA in 25-μl reactions with 2.5 μl GoTaq Flexi 10× buffer, 1.0 nM MgCl2, a 0.2 nM concentration of each dNTP, 200 nM of each primer (MWG, Germany) (Table 1), 1 U of GoTaq DNA polymerase, and 1 μl of DNA. Cycling conditions consisted of 4 min at 94°C; 35 cycles of 10 s at 94°C, 30 s at 55–60°C, and 1 min at 72°C; and a final extension at 72°C for 10 min. PCR products were purified after agarose gel electrophoresis by gel extraction using the QIAquick Gel Extraction Kit or the MinElute PCR Purification Kit (Qiagen, UK) and cloned into pGEM-T Easy vector system (Promega, UK) or pJET1 (Fermentas, UK).

Sequencing

Both strands of positive clones were sequenced by primer walking. Sequencing reactions were carried out using the BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems, Australia), as described by the manufacturer. Sequencing products were separated using an ABI 3130 xl Genetic Analyzer (Applied Biosystems, Australia) for G. rostochiensis genes and an ABI 377 for G. pallida genes. ChromasPro (v. 1.33; Technelysium Ltd., Australia) was used to edit sequence electropherograms and to construct a contiguous consensus sequence of each fragment for G. rostochiensis genes. For G. pallida, the sequence data were processed through PhredPhrap and consensus sequences produced by CONSED (Gordon et al. 1998); sequences were then aligned using ClustalW (Chenna et al. 2003). The sequence of the entire G. rostochiensis subgenome is deposited under GenBank accession number EF193005.

Sequence analysis

Protein-coding genes were identified using ORFinder (http://www.ncbi.nlm.nih.gov/gorf) specifying the “invertebrate mitochondrial” genetic code. An rRNA BLAST search (http://www.psb.ugent.be/rRNA/blastrrna.html) was used to search for rRNA genes. tRNA genes were identified using tRNAscan-SE 1.21 (Lowe and Eddy 1997), specifying the following: Source, Nematode Mito; Genetic Code for tRNA Isotype Prediction, Invertebrate Mito; and Cove Score Cutoff, 10, due to the observed structural plasticity of nematode tRNA genes (Suematsu et al. 2005). Tandem repeats were identified using Tandem Repeats Finder (tandem.bu.edu), with the default settings; dispersed repeats were identified by pairwise dot-plot comparison (http://www.isrec.isb-sib.ch/java/dotlet/Dotlet.html); inverted repeats were searched for using einverted (http://www.bioweb.pasteur.fr/seqanal/interfaces/einverted.html).

Results

Amplification and sequencing of fragments 1 and 2

The amplification of entire mt genomes from species for which there is no sequence information can be problematic. Previously, mtDNA from the multipartite G. pallida was characterized after preparation of a mt library, using material isolated by ethidium bromide-cesium chloride isopycnic centrifugation (Armstrong et al. 2000). To avoid preparing such a library in G. rostochiensis, and to take advantage of the sequence information gleaned from G. pallida, amplification primers were designed to conserved regions of two genes found on scmtDNAs characterized in G. pallida; ND4 and Cytb. Amplification of genomic DNA (extracted from multiple individuals of G. rostochiensis) with these primers produced an amplicon of approximately 3.5 kb, but at levels insufficient for subsequent cloning. This fragment was gel-purified, reamplified using the same primers, and cloned. Seven clones containing a 3.5-kb insert were identified and kept for further characterization. These were named Gro-AR1–5, Gro-AR7, and Gro-AR9. Fragment 1 from Gro-AR1–3 was completely sequenced by primer walking.

Sequence information from fragment 1 was used to design outward facing primers, to facilitate the amplification of the remainder of the mt circle (i.e., fragment 2). Two primer combinations were trialed, with both amplifying a single, strong product of ∼7.5 kb. Cloning of this amplicon proved difficult, possibly due to the large size of the fragment or to inefficient A-tailing. Five clones containing a 7.5-kb insert were isolated for further characterization and named Gro-AR10, Gro-AG3, Gro-AG11, Gro-AG17, and Gro-AG21. Fragment 2 from Gro-AR10 was completely sequenced by primer walking, while a subregion of fragment 2 (∼4 kb) was sequenced for the remaining clones.

Sequence analysis of Gro-scmtDNA I

As expected, fragments 1 and 2 overlapped at each end, consistent with their amplification from a circular molecule. However, the overlaps were not precise. Comparing fragment 1 from Gro-AR1 with fragment 2 from Gro-AR10, there was a 641-bp overlap between the 5′ end of fragment 1 and the 3′ end of fragment 2, with 4 of these 641 nucleotides not matching. Similarly, there was a 908-bp overlap between the 3′ end of fragment 1 and the 5′ end of fragment 2, with 14 of these 908 bp not matching (see Fig. 1 for an indication of where the overlaps occur). As both fragments were amplified from genomic DNA extracted from multiple individuals, we interpret these imprecise matches as indicating that there is some heteroplasmy, either between or within individuals, in Gro-scmtDNA I. This is consistent with other data, including (i) slight sequence variation between cloned amplicons of fragment 1 and (ii) amplification from single cysts (both reported below).

Fig. 1
figure 1

Map of Gro-scmtDNA I, a mt subgenome amplifed from G. rostochiensis. The size and overlap of the two amplified fragments are schematically shown inside the circle. Protein-coding genes that are interrupted by stop codons are shaded gray (ND4, Cytb). Those that are not interrupted by stop codons are shaded black (COIII, ND3). tRNA genes are indicated by the single letter for the amino acid they encode (T, Q). Tandem repeats (tr1, tr2, tr3) and dispersed repeats (dr1, dr2) are also shown, drawn proportionally to their size. No inverted repeats were found

Taking the overlaps into account, the size of Gro-scmtDNA I (based on Gro-AR1 and Gro-AR10) is 9210 bp, similar to the size of scmtDNAs reported in G. pallida (range: 6400–9428 bp [Gibson et al. 2007b]). Gene content and organization were then mapped onto Gro-scmtDNA I. Four protein-coding genes were detected, based on similarity to other nematode mt genes (Fig. 1). These genes are ND4, COIII, ND3, and Cytb. Although three tRNA genes were detected (tRNA Thr, tRNA Leu UUR , tRNA Gln), tRNA Leu UUR overlapped significantly with a protein-coding gene (ND4), and a comparison with tRNA Leu UUR genes from other nematodes indicated very poor similarity. We tentatively conclude that there are just two tRNA genes on this subgenome. No rRNA genes were detected. Except for the tRNA Leu UUR gene, all genes are encoded on the same mt strand.

Alignment of the noncoding region with those of scmtDNAs sequenced from G. pallida (scmtDNAs I-V) indicated only minor similarity. However, there are substantial differences when the noncoding regions are compared within G. pallida. For example, a comparison of the x222 region (the major noncoding region) of scmtDNAs I–III from G. pallida indicates that these are all highly similar (>90% identity), but different from scmtDNAs IV and V (∼70% identity). However, scmtDNAs IV and V are highly similar to each other (>90% identity) (T. Gibson, unpublished data). Although the noncoding region of Gro-scmtDNA I showed only minor similarity to those from G. pallida, we see this as consistent with the considerable variation that exists between the noncoding regions of some of the scmtDNAs of G. pallida. A map of Gro-scmtDNA I is presented in Fig. 1.

Protein-coding genes

Pairwise nucleotide alignment of the ND4 gene with its homologue from G. pallida results in 78% of positions being identical, while the pairwise amino acid alignment results in 70% of positions being identical (Fig. 2). No gaps are introduced into the amino acid alignment. Although there are two premature stop codons in the G. rostochiensis ND4 gene (positions 73 and 112), a high degree of similarity/identity with the G. pallida gene is maintained both before and after these stop codons (Fig. 2). We interpret this as evidence that the ND4 gene in G. rostochiensis either is nonfunctional or requires posttranscriptional editing.

Fig. 2
figure 2

Alignment of the amino acid sequences of ND4 from G. rostochiensis (Gro) and G. pallida (Gpa). Identical positions are shaded dark gray; similar positions (BLOSUM62 matrix) are shaded light gray. Stop codons are indicated by an asterisk. Note that pairwise identity remains high, both before and after the premature stop codons in the G. rostochiensis sequence

Both the COIII and the ND3 genes are not interrupted by stop codons. Pairwise alignment of the COIII amino acid sequence with its homologue from G. pallida indicates 70% identity, while that between ND3 and the G. pallida homologue is 62%. However, Cytb is similar to ND4, in that the genomic sequence cannot code for a functional gene product. Pairwise nucleotide alignment between Cytb from G. rostochiensis and G. pallida indicated 71% identity, but the amino acid alignment was much poorer, with only 21% identity, due to two point indels that disrupt the reading frame. Further, numerous stop codons are present within the G. rostochiensis Cytb sequence, despite nucleotide similarity remaining relatively high throughout the alignment. Again, we take this as evidence that the Cytb either is nonfunctional or requires posttranscriptional editing.

Genome organization

A comparison of the organization of Gro-scmtDNA I with that of the five mt subgenomes that have been completely sequenced from G. pallida reveals some overlap in genomic organization. In particular, there are conserved blocks of genes when Gro-scmtDNA I is compared with scmtDNA I from G. pallida (Fig. 3). Both have the genes tRNA Thr, ND4, and COIII lying in a contiguous block, although scmtDNA I from G. pallida has a tRNA His gene in this block, which is missing from Gro-scmtDNA I. Both genomes also have ND3 lying directly upstream from Cytb (Fig. 3).

Fig. 3
figure 3

Comparison of linear maps of scmtDNA I from Globodera species. Conserved blocks of genes include T-ND4-COIII (with an additional H in G. pallida) and ND3-Cytb. Genes are drawn proportional to their size, including the noncoding regions

Variation in the ND4 gene between different copies of Gro-scmtDNA I

In our recent report on the complete sequence and characterization of five subgenomic circles from G. pallida (Armstrong et al. 2000; Gibson et al. 2007b), we found some genomic sequences that could code for a functional gene product, whereas others either were nonfunctional or required editing. Similarly, the genomic sequences of ND4 and Cytb in Gro-scmtDNA I either are nonfunctional or require editing. However, a limited number of amplicons were sequenced for both of these genes, and the possibility remained that different copies of Gro-scmtDNA I might contain genomic sequences that could code for functional ND4 or Cytb genes. To investigate this, we sequenced the ND4 region from multiple clones, after amplification from genomic DNA extracted from (a) multiple individuals and (b) a single cyst. Cysts contain ∼500 eggs, and as all the eggs are the progeny of a single female, this represents the largest extractable form of a single individual available. This approach would thus detect any intra- or interindividual variation in the ND4 gene. An alignment of all ND4 sequence data is included as Supplementary Fig. 1. A portion of this alignment, demonstrating the salient features, is shown in Fig. 4.

Fig. 4
figure 4

Subsection of an alignment of the ND4 gene amplified from mixed individuals (unboxed) and a single cyst (boxed). Sequences generated from fragment 1 are bracketed, as are sequences generated from fragment 2. Substitutional variation is evident between amplicons generated from mixed individual DNA, but not from clones generated from the single cyst. Variation in the length of polythymidine tracts is evident in both amplicons. The entire alignment is available in supplementary alignment 1, including ESTs for ND4

As ND4 is present on both fragment 1 and fragment 2, we sequenced this gene in multiple clones from amplicons of both fragments. Focusing on the ND4 region of amplicons generated from multiple individuals (Fig. 4; unboxed sequences), these sequences were highly similar (98–100% identity). However, only two clones were identical along the entire length of the gene: Gro-AR3 and Gro-AR4 (Supplementary Fig. 1). Despite these differences, none of these ND4 sequences can be conceptually translated into a functional ND4 protein; point indels (which disrupt the reading frame) or premature stop codons are present in all clones (Supplementary Fig. 1).

Differences between these clones fall into two categories: substitutional variation (with transitions more common than transversions) and variation in the number of thymidine residues in a polythymidine tract (Fig. 4). The latter type of change is the most frequently observed difference between clones. These results suggest that these clones are representative of highly similar minicircles, with slight differences between minicircles present in individuals of G. rostochiensis.

The ND4 gene was then sequenced from multiple clones after amplification of the coding regions of fragments 1 and 2, using DNA extracted from a single cyst. This approach should detect any intraindividual variation (Fig. 4; boxed sequences). Of these 13 clones, only two pairs have identical sequences: Gro-AR13 and Gro-AR21, and Gro-AR19 and Gro-TG2. No substitutional variation was evident between ND4 regions generated from the single cyst (see Supplementary Fig. 1). In contrast, variation in the number of thymidine residues in a polythymidine tract was again observed (Fig. 4). These results suggest that Gro-scmtDNA I is heteroplasmic within an individual, with variation restricted to the number of thymidine residues in a polythymidine tract. Taken together with the results from the sequencing of ND4 from multiple individuals, substitutional variation appears to occur only between individuals, but occasional substitutional variations were observed within an individual, in the remaining protein-coding genes (see below).

Variation in the COIII, ND3, and Cytb genes between different copies of Gro-scmtDNA I

COIII, ND3, and Cytb are only present on fragment 2. These genes were sequenced from five clones after amplification from DNA extracted from multiple individuals (Supplementary Fig. 2; unboxed sequences) and from six clones after amplification from DNA extracted from a single cyst (Supplementary Fig. 2; boxed sequences). The trends in variation were very similar to those seen for the ND4 gene. No clones were identical. Substitutional variation was most often seen among clones of amplicons generated from DNA extracted from multiple individuals. However, there were two instances of substitutional variation among clones generated from single cysts. These are at positions 1774 and 1934 (both within the COIII gene; see Supplementary Fig. 2); both result in conservative amino acid changes (Ala to Thr, and Gly to Val, respectively). Nevertheless, substitutional variation occurs more frequently among clones generated from mixed individuals. We interpret this as an indication that substitutional variation between individuals is more common than that within individuals, but that some heteroplasmy within coding regions does occur. Again, variation in the length of polythymidine tracts was more frequently observed (Supplementary Fig. 2).

In Gro-AR10 (the clone initially characterized and completely sequenced), COIII was found to be full-length, uninterrupted by stop codons and without point indels that would disrupt the reading frame. Of the 11 clones generated that contained COIII, 7 had full-length genes, with 4 having point indels that disrupted the reading frame (Supplementary Fig. 2). However, in each of these four cases, the point indel lay within a polythymidine tract (or a polyuridine tract, from a mRNA perspective), such that a posttranscriptional U-insertion would correct the reading frame. Although insertional editing is generally rare in animals (Horton and Landweber 2002), this specific type of posttranscriptional editing has been described in the mt genome of another nematode, Teratocephalus lirellus (Vanfleteren and Vierstraete 1999). In this type of editing, a single U is inserted in the mRNA of a gene encoded by the mt genome, at a site in which the genomic sequence is represented by a polythymidine tract.

Of the 11 clones that contained the ND3 gene, 8 had full-length genes, with 1 clone requiring one U-insertion to correct the reading frame, 1 clone requiring two U-insertions (at separate sites), and 1 clone requiring one U-deletion to correct the reading frame. In each case, the insertion/deletion required lay within a polyuridine tract. Although posttranscriptional U-deletions were not observed in T. lirellus, the number of sequence data screened for editing was much lower in that study (1160 bp).

Finally, examination of the 11 Cytb sequences revealed 2 that could code for a full-length Cytb (Gro-TG2 and Gro-AG17; these were both two amino acids shorter than the Cytb gene of G. pallida), although another (Gro-TG5) is only nine amino acids shorter. One clone contained a single large deletion (65 bp; Gro-AG21). There is substitutional variation between clones generated from multiple individuals but not from those generated from the single cyst. Again, variation in the number of thymidine residues in a polythymidine tract was observed in both cysts and multiple individuals (Supplementary Fig. 2).

Comparison of genomic sequences with the EST database is consistent with the operation of mRNA editing

One advantage of examining mt genomes in G. rostochiensis is that there is an extensive EST database for it (nearly 6000 EST sequences; nematode.net). We can therefore look for evidence of mRNA editing in G. rostochiensis, by comparing our genomic sequences with the EST database. The EST database contained seven sequences putatively identified as ND4 genes. Alignment of these ESTs with our genomic sequence data revealed that five of these were highly similar (98–100% identity) to our genomic sequences. An alignment of these five ESTs with our ND4 genomic sequences appears in Supplementary Fig. 1. Although the comparisons are restricted due to the short length of the EST data collected (range: 400–600 bp), two of the ESTs were identical to some of the genomic ND4 sequences, at least over the regions for which comparisons were possible. Specifically, GenBank accession BM355935 is identical to four of the cyst clones (Gro-TG1, Gro-TG2, Gro-TG5, Gro-TG3.7) and two of the mixed individual clones (Gro-AG3, Gro-AR19), while GenBank accession AW506104 is identical to three mixed individual clones (Gro-AR1, Gro-AR11, Gro-AR22). Two of the ESTs are uninterrupted by stop codons (BM355935 and AW506104) and thus could represent mature mRNAs. Pairwise alignment of these two indicates complete identity at all positions for which data are available, except at a polyguanine tract that represents the first 11 nucleotides sequenced for one of the ESTs, and so may not be reliable.

Nevertheless, the remaining three ESTs are interrupted by stop codons. Alignment of these five ESTs indicates that variation is predominantly restricted to the length of polythymidine tracts; a portion of the alignment showing two such positions is shown (Fig. 5). The data thus are consistent with the notion that insertion and/or deletion mRNA editing occurs for mt genes in G. rostochiensis. Nevertheless, editing is clearly not efficient, as some of the ESTs retain deleterious features, such as premature stop codons.

Fig. 5
figure 5

Subsection of an alignment of three ESTs for the G. rostochiensis ND4 gene. Arrows indicate the positions of variation in the length of polythymidine tracts. There is one additional position that shows variation in the entire alignment, when the five ND4 ESTs are aligned

Just two sequences with similarity to COIII were identified in the EST database: BM344353 and BM343222. An alignment of these ESTs with our genomic COIII sequences appears in Supplementary Fig. 3. Both ESTs are identical with the common COIII genomic sequence (although differences at the very ends of the ESTs were ignored, as these are least reliable). Further, the ESTs were uninterrupted by stop codons. This result is consistent with our observation that most genomic COIII sequences did not require editing.

There are no ESTs that have been identified as having similarity to the ND3 gene, and only one with similarity to Cytb. The Cytb EST is identical to the common genomic sequence for Cytb (Supplementary Fig. 4), except for residues within six bases of both ends of the EST. The Cytb EST is uninterrupted by stop codons but only covers about one-third of the Cytb gene. This is consistent with the EST representing the mature mRNA sequence.

Evidence of editing of mt transcripts in G. pallida

We then examined whether there was similar evidence of editing of mt transcripts in G. pallida. For these experiments, instead of comparing genomic DNA (gDNA) with ESTs, mRNA was extracted, converted to cDNA, and multiple clones sequenced. In this way, editing could be examined by directly comparing mRNA and gDNA from the same population of nematodes. We sequenced mRNA and gDNA fragments of the COII, ND4, and COIII genes from three populations of G. pallida; Gourdie, Luffness, and P4A. Each of these genes is present on two subgenomes (scmtDNAs I and II); single-nucleotide differences between these two genomic copies make it possible to deduce the origin of each gDNA and mRNA molecule sequenced.

Supplementary Fig. 5 shows an alignment of the ND4 cDNA and gDNA sequences. This alignment demonstrates precisely the same types of variation seen with the G. rostochiensis comparisons, variation in the length of polythymidine tracts. For example, the polythymidine tract starting at position 555 is between 15 and 20 nucleotides long, with no consistent differences between populations and whether gDNA or cDNA is sequenced. Another length-variable polythymidine tract is at position 589. Although P4A did not vary at either site, it varied at other polythymidine tracts (see Supplementary Fig. 5). Variation in the length of polythymidine tracts was also evident in both the gDNA and the cDNA sequences of COII and COIII (see Supplementary Figs. 6 and 7).

Evidence that expression of mt genes reflects the subgenomes that are present

In G. pallida, COII, ND4, and COIII are each present on two subgenomes, scmtDNA I and II. Despite high nucleotide similarity between the two copies of each gene, only the copies on scmtDNA I appear functional. Conceptual translation of these three genes on scmtDNA I produces genes highly similar to other nematode mt genes, while conceptual translation of scmtDNA II genes produces genes highly divergent from other nematode mt genes—premature stop codons and/or frameshifts disrupt the reading frame (Gibson et al. 2007b). We proposed that preferential expression of the genes on scmtDNA I might alleviate the deleterious effects that expression from scmtDNA II would have. Two populations of G. pallida nematodes have distinctly different levels of scmtDNAs I and II. Southern blot experiments with probes that can differentiate between scmtDNAs I and II indicated that population Gourdie has predominantly scmtDNA I, while population Luffness has predominantly scmtDNA II (Armstrong et al. 2000). We amplified and sequenced fragments of ND4, COII, and COIII from both gDNA and cDNA from both populations. Primer annealing regions were conserved between scmtDNAs I and II, such that genes on both subgenomes would be amplified, if present. In this way, we could examine whether expression was (a) consistent with the subgenomes that were present or (b) preferentially from scmtDNA I.

As expected, amplification of gDNA from population Gourdie indicated that ND4, COII, and COIII were primarily of the scmtDNA II type (see positions 672 and 716; Supplementary Fig. 5), reflecting the predominance of scmtDNA II in population Gourdie. However, amplified cDNA from population Gourdie was also of the scmtDNA II type. This indicates that preferential expression from low levels of scmtDNA I in population Gourdie do not provide a means of escaping the potentially deleterious effects of the genes encoded on scmtDNA II.

Similarly, amplified gDNA from population Luffness was primarily derived from scmtDNA I, reflecting the predominance of this subgenome in this population (Armstrong et al. 2000). Amplified cDNA was also predominantly from scmtDNA I, although two cDNA clones (6 and 8) were surprisingly derived from scmtDNA II. Clearly, there is no mechanism to prevent the transcription of scmtDNA II in G. pallida. It appears that the transcription of mt genes in G. pallida is a function of the levels of each of the subgenome present.

Consistent with this, both amplified gDNA and cDNA from population P4A was derived from both scmtDNAs I and II. Southern hybridization experiments with P4A genomic extracts indicate that this population has both scmtDNAs present (Blok and Phillips, in press).

Discussion

We amplified an entire mt subgenome from the nematode G. rostochiensis, in two overlapping pieces. Although fragment 1 amplified only weakly, it did not contain one of the genes that one of the primers (129F-TG-SC3-1F) was designed to anneal to; Cytb is absent from fragment 1 (Fig. 1). Although the targeted annealing region for this primer was present twice in the subgenome (in the two copies of dispersed repeat 1; see Fig. 1), a comparison of the primer sequence of 129F-TG-SC3-1F with the targeted regions indicated a poor match: only 68% and 70% similarity. Examination of the sequence overlap of fragments 1 and 2 indicated that the region that the primer did anneal to was not identical to the primer sequence. In contrast, the remainder of the subgenome (fragment 2) was readily amplified once primers were designed directly from fragment 1 sequence data. This suggests that the weak amplification of fragment 1 was due not to Gro-scmtDNA I being present at very low levels in the nematode, but to a poor match between the genome and one of the primers used.

The multipartite structure of the G. rostochiensis mt genome

We report here the sequence and organization of just a single subgenomic circle from G. rostochiensis. The size, gene content, and organization of Gro-scmtDNA I are similar compared with some of the subgenomic circles characterized in G. pallida (Armstrong et al. 2000; Gibson et al. 2007b). In addition, we have amplified and entirely sequenced six other subgenomes from G. rostochiensis (Gibson et al. 2007a), some of which also have a gene content and organization similar to those reported for G. pallida (Armstrong et al. 2000; Gibson et al. 2007b). For example, subgenome IV from G. rostochiensis contains the same 17 tRNA genes (in the same relative positions) that are present on subgenome IV of G. pallida. Subgenome V from G. rostochiensis contains a COI and a tRNA Lys gene, again in the same relative positions as those on subgenome V from G. pallida. Nevertheless, some genes normally found on nematode mt genomes have not been identified on any of these seven subgenomic circles. These missing genes may reside on other, as yet uncharacterized, subgenomes. Taken together, these results suggest that the mt genome of G. rostochiensis is multipartite, with a structure highly similar to that reported in G. pallida.

The possibility that the amplified fragments are numts, rather than bona fide mitochondrial fragments, has been raised previously, and dismissed (Armstrong et al. 2000; Gibson et al. 2007b), for G. pallida. The same arguments can be applied to the data acquired for G. rostochiensis. Briefly, the presence of both functional genes and pseudogenes on the same multigenic numt is unlikely (except in a very recent integration), as all genes would be expected to degenerate at the same rate. The two PCR fragments reported here overlap (almost) precisely at both ends (each overlap is >500 bp long), as do the six additional subgenomic circles reported separately (Gibson et al. 2007a). Finally, given the similarity between the organization of the subgenomes reported for these two closely related species, we consider it unlikely that the amplified fragments are numts.

Evidence for posttranscriptional editing of Globodera mt genes

In our recent report on the complete sequence of a number of subgenomic circles in G. pallida (Gibson et al. 2007b), we reported that many of these subgenomes contained genes that were highly similar to the mt genes of other nematodes, but with premature stop codons or point indels that would render the gene transcript nonfunctional. We identified these genes as pseudogenes and discounted editing as a means of rescuing these genes. However, in the present study, we sequenced multiple clones from DNA extracted both from multiple individuals and from single cysts, and compared these sequences with ESTs generated from the whole-genome sequencing project for G. rostochiensis. This established a number of trends that were consistent with the operation of posttranscriptional editing. First, comparison of genomic sequences generated from single cysts indicated a predominant type of variation in all genes: variation in the length of polythymidine tracts. Such variation was evident at multiple sites in each of the four protein-coding genes. Further, EST data indicated that at least some of these genes are expressed, as some EST sequences match precisely the genomic sequence. Finally, there are cases of multiple ESTs for the same mt gene (ND4), and a comparison of these EST sequences indicated again that the predominant source of variation is in the length of polythymidine tracts (Fig. 5).

One shortcoming of using EST sequences is that there may be genetic variation between the sources of nematodes used to generate the ESTs compared with the nematodes used in the present study. Such genetic variation may spuriously suggest editing. Ideally, editing is detected when mRNA and gDNA sequences from the same individual are compared; population-level and interindividual genetic variation cannot confound the comparison. However, such direct comparisons are not possible in Globodera. Variation in the length of polythymidine tracts exists within the gDNA of individuals (see Fig. 4, boxed sequences). Thus, intraindividual genetic variation makes the detection of editing difficult. The different ESTs produced could be from the different gDNA sequences, requiring no editing for their production. Similarly, the different mRNAs sequenced from G. pallida could have been produced from the different gDNA sequences that G. pallida produces (Supplementary Figs. 5–7).

Given the recent report of precisely the same type of editing in another nematode, T. lirellus (Vanfleteren and Vierstraete 1999), we suggest that editing is responsible for the variation between the gDNA and the EST/mRNA sequences reported here. Moreover, as the genes on scmtDNA II in G. pallida are the predominant ones expressed in some populations (Gourdie; compared with scmtDNA I) but cannot code directly for functional products, we consider that editing must be mandatory to avoid the deleterious effects that would otherwise ensue. We therefore suggest that each of the genes present on Gro-scmtDNA I may be functional. For example, the Cytb gene appears divergent at the 5′ end compared with the Cytb gene from G. pallida (which does not require editing), but conceptual editing of the mRNA (one U-deletion) removes the many premature stop codons and produces a protein much more similar to the G. pallida Cytb gene (Fig. 6), particularly at the C-terminal end.

Fig. 6
figure 6

Alignment of Cytb genes from G. pallida (Gpa), unedited G. rostochiensis (Gro), and the conceptually edited version in G. rostochiensis, after insertion of two uridines at the positions marked with arrows (Gro edited). The U-insertions would convert the highly divergent C-terminal end of the gene sharing many more identities with the G. pallida gene

Explanations that do not invoke editing

However, given the difficulty of directly demonstrating editing in this biological system (due to the intraindividual variation in genomic sequences), alternate explanations warrant consideration. One such explanation is that slippage of the U-tract during translation may avoid these apparent frameshift mutations. Such slippage, also called +1 frameshift, has been suggested to occur in the mitochondrial Cytb gene in a genus of ants (Beckenbach et al. 2005) and in the mitochondrial ND3 gene in some birds and turtles (Mindell et al. 1998). If occurring here, the slippage would have to involve both +1 and +2 frameshifts if all genomic sequences were to give rise to functional products, and would be the most extensive example of such slippage so far reported. Importantly, such slippage would explain why some EST and cDNA sequences retain frameshift mutations and premature stop codons—these frameshifts may not be deleterious if slippage is occurring. Alternatively, a minor proportion of the various copies of a mitochondrial gene may code for full-length mRNA that requires no editing; this may be sufficient to maintain mitochondrial function.

We discount experimental error as a source of this apparent editing in Globodera. The DNA polymerase used in the G. rostochiensis amplifications contained proofreaders (both 5′→3′ and 3′→5′). The DNA polymerase used in the G. pallida amplifications did not, but we see no evidence of polymerase errors in our data. For example, there are many single-nucleotide differences between the ND4, COII, and COIII genes previously sequenced from the scmtDNA I and II subgenomes (Armstrong et al. 2000; Gibson et al. 2007b); these single-nucleotide differences are consistently observed in the G. pallida fragments amplified here (see Supplementary Figs. 5–7). Further, we observed consistent differences in the length of polythymidine tracts unrelated to experimental conditions. For example, at position 555 of the ND4 fragment in G. pallida (Supplementary Fig. 5), variation in the length of the polythymidine tract is seen in populations Gourdie and Luffness. However, no variation is seen in population P4A, despite the screening of eight genomic and eight cDNA clones for such variation. If the variation in the length of polythymidine tracts was produced by amplification errors, it should be present equally in all three populations.

For all data, both DNA strands were sequenced to confirm the base identity at each position. We reexamined all electrophorograms where variation in the length of polythymidine tracts was evident, and saw no evidence of base-calling errors in the sequence of either strand. The sequencing signal at the position of substitutions and indels was clear and unambiguous; Supplementary Fig. 8 shows an example. However, the most convincing evidence comes from the observation of precisely the same type of variation among ESTs (Fig. 5), sequences generated independently from our laboratories. Taking this evidence together, we conclude that the variation in the length of polythymidine tracts is biological rather than artifactual. This suggests that the pseudogenes that we previously identified in G. pallida (Gibson et al. 2007b) may also be rescued by posttranscriptional editing.

Insertion and/or deletion editing of mRNA is extremely rare. Thus far, U insertion and U deletion have only been described in the mitochondria of kinetoplastids (Estevez and Simpson 1999) and the nematode Teratocephalus lirellus (Vanfleteren and Vierstraete 1999), while C insertion and U insertion have been described in myxomycete mitochondria (Mahendran et al. 1991). Finally, G insertions and A insertions have been shown in some viruses (Horton and Landweber 2002).

Orr et al. (1997) searched for evidence of such editing in the mt genomes of the nematode Caenorhabditis elegans and found no consistent differences between cDNA and genomic DNA sequences. The only report of insertional editing in any animal gene comes from Vanfleteren and Vierstraete (1999), as described above. These authors noted that insertional editing occurred at sites with a very consistent pattern: wherever six (and in one case, seven) thymidines occurred in the mtDNA sequence. Although the pattern we see in G. rostochiensis is similar, there are differences. If variation between clones in the length of polythymidine tracts is evidence of editing sites, we also see such sites in G. rostochienis. However, the length of these polythymidine tracts is 7–10 nucleotides (Table 2), most commonly 8 or 9 nucleotides (there are no polythymidine tracts in the Cytb gene of T. lirellus >7 nt long). Moreover, some Cytb copies would require U deletion(s) to maintain the reading frame, while others may require the insertion of more than a single U (see Fig. 4 for an example of a polythymidine tract that varies by as many as 5 nt in length). Further, there are equally long polythymidine tracts (8 nt) that appear not to be edited, as there was no variation detected between copies (Table 2, lower panel).

Table 2 Variation in the length of poly(T) tracts between copies of the Cytb gene in G. rostochiensis

Thus, we suggest that insertion and possibly deletion editing occur sporadically throughout the nematodes. Given that the genomes of a number of other nematodes are currently being entirely sequenced, examination of the ESTs generated should provide a better indication of the prevalence of editing.