Introduction

The animal mitochondrial genome is a closed, maternally inherited circular molecule typically composed of 37 genes coding for 13 proteins, 2 ribosomal RNAs (rRNAs) and 22 transfer RNAs (tRNAs) [40, 57]. Mitochondrial DNA (mtDNA) sequence data are important for the studies of taxonomy and phylogeny [5], genetic structure [7], biological identification [21] and conservation genetics [54] as mtDNA evolves more rapidly than the nuclear genome [4]. Recently, complete mtDNA sequence and gene arrangement comparisons have been employed as powerful new tools for resolving phylogenetic relationships [2, 15, 20, 30]. Up to May 2008, the complete mtDNA sequence of 80 avian species, none for bustards (Gruiformes: Otididae), are available in GenBank. The Great Bustard (Otis tarda) is listed as a vulnerable species with estimated global population of 31,000–37,000 individuals [1]. To obtain the complete mtDNA sequence of O. tarda will be valuable to know more about this precious species.

The classification of the order Gruiformes has been very unstable in the taxonomic history of birds. Traditionally, Gruiformes was classified as 12 families including Otididae (Bustards), Aramidae (Limpkin), Cariamidae (Seriemas), Eurypygidae (Sunbittern), Gruidae (Cranes), Heliornithidae (Sungrebes), Mesitornithidae (Mesites), Pedionomidae (Plains-wanderer), Psophiidae (Trumpeters), Rallidae (Rails), Rhynochetidae (Kagu), and Turnicidae (Buttonquails) [22, 56]. Sibley and Monroe [46], however, recognised only nine families according to DNA hybridization. Dickinson [10] just removed Pedionomidae from the traditional Gruiformes and recognised 11 families.

Recently, Fain et al. [13] studied the phylogeny of “core Gruiformes” including Gruidae, Aramidae, Psophiidae, Heliornithidae, and Rallidae. In this study, by including five more Gruiformes families, we aimed to discuss the phylogenetic relationship and classification of Gruiformes based on sequences of three mtDNA genes (12S rRNA, 16S rRNA and tRNA-Val).

Materials and methods

Samples and DNA extraction

Otis tarda sample (Sample No. AV01065) was obtained from the Animal Conservation Biology Laboratory, College of Life Sciences of Anhui Normal University. The mtDNA was extracted from the muscle tissue stored at −80°C using GENMED mtDNA Extraction Kit.

Primer design, PCR amplification and sequencing

The total length of O. tarda mtDNA was amplified by polymerase chain reaction (PCR) method. Since there was no relative complete mtDNA sequence available in Otididae family, the primer design took five steps. Firstly, some universal primers from published literatures (Table 1) were tried for PCR. Some fragments were gained and then sequenced. Secondly, based on the sequences obtained in the first step, some primers were designed using Primer 5.00 (PREMIER Biosoft International). Thirdly, based on the alignment of complete mtDNA sequences of five avian species (Gallus gallus, NC_001323; Arenaria interpres, NC_003712; Larus dominicanus, NC_007006; Falco sparverius, NC_008547; Ninox novaeseelandiae, AY309457), which are far from the Otididae family in phylogeny, some primers were designed in the conservative region using Oligo 6.0 [41]. Fourthly, the primers designed in the second and third steps were paired for PCR. Fifthly, after the former four steps, most of the mtDNA sequences had been obtained, but there still left some gaps. Thus some primers were designed based on the obtained sequences using Primer 5.00 for PCR to fill the gaps. The PCR products were expected less than 1,400 bp and each segment overlapped the next by 60–250 bp. Altogether, 21 pairs of primers (Table 1) were employed for amplifying and sequencing the complete mtDNA sequence of O. tarda.

Table 1 Primer sequence used in this study

Polymerase chain reaction mixtures contained 100 ng template DNA, 3 μl of 10× reaction buffer, 2 μl of 25 mmol/l MgCl2, 2 μl of 2 mmol/l dNTPs, 1 μl of 10 μmol/l each primer, 1 unit of Taq DNA polymerase (Promega), and sterile double distilled water to make up a final volume of 30 μl. PCR reactions were performed in the MJ Model PTC-200 thermal cycler, consisting of an initial denaturation at 94°C for 5 min, 32 cycles of denaturation at 94°C for 50 s plus annealing at 52–56°C for 1 min and extension at 72°C for 55 s, and a final extension at 72°C for 7 min. The resultant PCR products were electrophoresed on 1% agarose gels, and then were purified by PCR cleanup Kit (V-gene) for sequencing on automatic DNA sequencer (Applied Biosystems, 3730).

Sequence analysis

Nucleotide sequences were edited using the program DNAstar (DNAstar Inc.) and aligned by ClustalX [52]. Protein-coding genes were identified using Sequin 5.35 and improved manually. The tRNA genes were identified using software tRNA Scan-SE 1.21 (http://lowelab.ucsc.edu/tRNA-SE) and their clover leaf secondary structure and anticodon sequences were identified using DNASIS 2.5 (Hitachi Software Engineering Inc.). Two rRNA, tRNA-Cys and tRNA-Ser (AGY) genes were determined by comparison with the known complete mtDNA sequences of Porphyrio hochstetteri (Gruiformes: Rallidae) (EF532934) and Gallus gallus (NC_001323). The complete mitochondrial genome sequence of O. tarda has been deposited in GenBank under accession number FJ751803.

Phylogenetic analysis

We sampled 10 families ever included in the order Gruiformes. Eurypygidae and Cariamidae were not included in this study for the lack of corresponding sequence data. For Charadriiformes has generally been thought to be the closest order to Gruiformes [27, 47] and recent studies indicated that Pedionomidae and Turnicidae should be placed in Charadriiformes [12, 33], thus representatives of five families widely accepted as members of Charadriiformes were included in this study. Gallus gallus (Aves: Galliformes) and Anser albifrons (Aves: Anseriformes) were designated as the root. The sequences of O. tarda were obtained in this study, and the other related sequences were obtained from GenBank (Table 2).

Table 2 Species in phylogenetic analysis

Gblock 0.91b [6] was used to delete gaps within certain regions of the 12s and 16s rRNA to avoid the alignment difficulties introduced by indels. Phylogenetic analysis was performed using maximum parsimony (MP) and maximum likelihood (ML) algorithms implemented in PAUP*4.0b10 [50]. For the MP analysis, a heuristic search, with 1,000 replicates of random addition sequences and tree bisection reconnection (TBR) branch swapping, was executed to obtain the MP tree. The robustness of the MP tree was assessed with 1,000 bootstrap replicates.

The Program Modeltest 3.6 [36] was used to choose an appropriate substitution model for the ML analysis. The model (GTR + I + G) was subsequently used in PAUP*4.0b10 to search the ML tree using a heuristic search with 1,000 replicates of random addition sequences and TBR branch swapping. Reliability of the phylogenetic relationships was evaluated by performing 1,000 replicates of bootstrap analysis.

In addition, Bayesian inference of phylogeny was performed using the program Mrbayes 3.0 [23]. The substitution model selected by Modeltest 3.6 was likewise used in the Bayesian method. The Bayesian analysis started with randomly generated trees; four Markov chains under default heating values were run for 4 million generations and sampled every 100 generations. The “burn-in” was determined by checking for the likelihood of being stationary.

Results and discussion

Characteristics of the O. tarda mitochondrial genome

The total length of the O. tarda mtDNA is 16,849 bp. The arrangement of the mitochondrial genome is the same with typical avian mtDNA [15] and is shown in Fig. 1. The genome contains 13 protein-coding genes (ATP6, ATP8, COI-III, ND1-6, ND4L, and Cyt b), 2 ribosomal RNAs (12S rRNA and 16S rRNA), 22 transfer RNAs and a putative control region (Table 3). ND6 gene and 8 tRNA genes are encoded by the L-strand, whereas the other genes are encoded by the H-strand. The overall base composition of the L-strand (A = 30.5%, T = 24.2%, C = 31.6%, G = 13.7%) is similar to those of other avian species. The A + T content of 54.7% is within the range (51.6–55.7%) for avian mitochondrial genomes [18].

Fig. 1
figure 1

Mitochondrial genome of Otis tarda. Genes encoded by the heavy strand are shown outside of the circle, whereas those encoded by the light strand are shown inside the circle. Gene abbreviations used are 12S, 12S rRNA; 16S, 16S rRNA; ND1-6, NADH dehydrogenase subunits 1–6; COI-III, cytochrome oxidase subunits I–III; AT6 and AT8, ATPase subunits 6 and 8; Cyt b, cytochrome b; and one-letter codes of amino acids, tRNA genes specifying them

Table 3 Organization of the mitochondrial genome of Otis tarda

Protein-coding genes

Among the 13 protein-coding genes, the longest one is ND5 gene (1,815 bp), whereas the shortest one is ATPase8 gene (168 bp). The most common start codon is ATG found in eight genes. Nonstandard start codons are found in the COI and ND5 genes (GTG), ND2 gene (ATA) and ND3 gene (ATT). As in the mtDNA of the other birds, TAA is the most frequent stop codon in O. tarda. TAG and AGG are used twice, respectively, and AGA is found in ND5 gene. In COIII and ND4, a terminal T probably serves as the stop signal after it is completed to UAA by posttranscriptional polyadenylation [31]. In ND3 gene of O. tarda mtDNA, a base is not translated. It is similar to many birds and one turtle species for unknown frameshift mechanism [29].

tRNA genes

The tRNA genes range in size from 66 to 74 bp. Sequences of the tRNA genes can be folded into a canonical cloverleaf secondary structure except for tRNA-Cys and tRNA-Ser (AGY), which loses “DHU” arm (not shown in this paper). It is common that tRNA-Ser (AGY) cannot be folded into the canonical cloverleaf secondary structure in many vertebrate mtDNA [45, 58, 59].

A check in the GenBank shows that, as found in O. tarda, tRNA-Cys of few avian mtDNA cannot form a canonical cloverleaf secondary structure. This is also found in Gekko gecko (Reptilia: Gekkonidae) [17].

Spacers, overlaps, WANCY region and OL

Similar to other vertebrate mtDNA, spacers and overlaps are also found in mtDNA of O. tarda (Table 2). The total overlaps and spacers are 33 bp and 107 bp, respectively. ATPase8 gene and ATPase6 gene share the longest overlap of 10 bp. The longest spacer of 41 bp is located between tRNA-Thr gene and ND6 gene. In this spacer, base C is much abundant (58.5%), while base G is absent.

The origin of L-strand replication (OL) usually locates in a cluster of five tRNA genes: tRNATrp–tRNAAla–tRNAAsn–tRNACys–tRNATyr (WANCY) in many vertebrates. O. tarda lacks this OL region, as is found in many birds [44]. This may indicate that avian mitochondrial genomes departed from their mammalian and amphibian counterparts during the course of evolution of vertebrate species [9]. Interestingly, resent studies showed that the OL region is also absent in the crocodilian mtDNA [24, 26, 58].

Control region

The control region (CR) of mtDNA in O. tarda is 1,265 bp, which locates between tRNA-Glu and tRNA-Phe genes. The mitochondrial control region (mtCR) is responsible for transcription and replication of the mitochondrial genome [51]. The overall base composition of the O. tarda mtCR (L-strand) is A, 31.5%; T, 28.1%; G, 13.4%; C, 27.0%. These values show that there is an A + T (59.6%) to G + C (40.4%) asymmetry in this sequence. Three internal CR portions have been recognized: the 5′-peripheral domain, the central conserved domain and the 3′-peripheral domain [42, 49]. In the avian CR, the 5′-peripheral domain contains the extended termination-associated sequence (ETAS), termination-associated sequence (TAS), the central conserved domain contains the F, E, D and C boxes, and the 3′-peripheral domain contains the origin of H-strand replication (OH), conserved sequence block 1 (CSB1) and the H- and L-strand transcriptional promoter (HSP-LSP) sites [39].

After an alignment with described consensus counterpart mammalian and avian sequences [9, 37, 38, 49, 53], some conserved sequence boxes such as F, E, D, C and CSB1 are identified (highlighted in Fig. 2a) in O. tarda mtCR. A putative ETAS sequence block (ETAS1, highlighted in Fig. 2a) is located, which has 66.1% similarity to the mammalian consensus ETAS1 sequence [43]. A putative TAS sequence (boxed in Fig. 2a) that is described by Foran et al. [14] and consensus in many avian species [38] is found inside ETAS1. HSP and LSP sites are located according to alignment with conserved sequence described by L’Abbé et al. [25]. According to the assumption that a poly-C sequence upstream of CSB1 represents the origin of H-strand replication [53], the position of OH is located (Fig. 2a).

Fig. 2
figure 2

The L-strand sequence and a schematic representation of the Otis tarda control region (CR). a The sequence of the CR. Underlined at the front of the sequence is the interrupted poly-C sequence. Also underlined is the simple sequence repeat (SSR), a tetranucleotide microsatellite (CAAA). Highlighted is the extended termination-associated sequence (ETAS)-1. The boxed sequence inside ETAS1 is the termination-associated sequence (TAS). Also highlighted are the F, E, D and C boxes and conserved sequence block (CSB)-1. The symbol represents origin of heavy strand replication (OH). The symbols represent the light (L) and heavy (H) strand promotor (LSP-HSP) sites. b A schematic representation of the CR, which shows the flanking genes and portions of the CR sequence represented in this figure

The 5′-peripheral domain contains an interrupted poly-C sequence (underlined in Fig. 2a). This structure is conserved across many avian species [37, 38]. Although it could potentially form a stable hairpin structure [37], its function has not been determined yet.

There is a simple sequence repeat (SSR) in the end of 3′-peripheral domain (Fig. 2a). The SSRs comprise 31 perfect tetranucleotide microsatellite repeats consisting of (dC-dA-dA-dA)31·(dG-dT-dT-dT)31. The same tetranucleotide microsatellite repeats has also been found in Rhea americana (Aves: Rheidae) [19] and Pygoscelis adeliae (Aves: Spheniscidae) [39]. Large repeat that is usually found in the 3′-peripheral domain in many avian CRs is absent in O. tarda mtCR.

Phylogeny

The result of phylogenetic analysis is shown in Fig. 3. As the resultant MP and ML trees have the same topology structure, thus only the ML tree is shown (the numbers above branches represent bootstrap support for MP/ML). Rhynochetidae has been classified as a member of Gruiformes for long time [10, 22, 46, 56]. But, in our study, Rhynochetidae is outside of traditional Gruiformes with strong support in MP, ML and Bayesian (BA) trees (94% bootstrap values, 98% bootstrap values and 100% posterior probability values, respectively). Once Rhynochetidae was thought to be affiliated with the family Ardeidae (Aves: Ciconiiformes) for possessing powder down feathers [32]. Based on sequences of intron 7 of the nuclear encoded β-fibrinogen gene (FGB-int7), Fain and Houde [11] found that Rhynochetidae did not group with traditional Gruiformes, but instead with the proposed clade Metaves, which also include the doves, hoatzin, flamingos, tropicbirds, sandgrouse, grebes and some other birds. Hackett et al. [16], however, found that Rhynochetidae grouped with families from Caprimulgiformes and Apodiformes based on large sequence data representing 19 nuclear loci. Thus, more phylogenetic studies are needed to confirm the taxonomic status of Rhynochetidae.

Fig. 3
figure 3

Phylogenetic relationships of Gruiformes based on mitochondrial 12S, 16S and Val sequences. (Left) Trees obtained with maximum parsimony (CI = 0.3638, RI = 0.5126) and maximum likelihood (−LnL = 35142.11765) analysis. Numbers represent bootstrap values (MP/ML) and only those >70% are shown. Asterisks indicate bootstrap values of 100%. (Right) Tree obtained with Bayesian analysis (−LnL = 30320.417, Pinvar = 0.369941). Numbers represent posterior probabilities and only those >95% are shown. Asterisks indicate posterior probabilities of 100%

In MP and ML trees, Otididae (Bustards) is embedded in the clade of traditional Gruiformes without strong support. While in BA tree, Gruidae, Aramidae, Psophiidae, Heliornithidae and Rallidae are grouped to a clade defined as “core Gruiformes” [13]. Otididae is a sister group to core Gruiformes and Charadriiformes with strong support (97% posterior probability values). Moreover, the study of Fain and Houde [11] and Hackett et al. [16] showed that Otididae was outside of core Gruiformes. Recently, Fain and Houde [12] included bustards in an order: “Otidiformes”. It is supported by the result of Bayesian analysis in this study. As there were only five genera from Otididae sampled in this study, the trees just show part of the phylogeny of Otididae that has been studied by Pitra et al. [35] and Broders et al. [3].

Pedionomidae and Turnicidae are embedded in the Charadriiformes clade in all trees. The result is also supported by recent phylogenetic studies [12, 16, 33]. This indicates that Pedionomidae and Turnicidae correctly belong to Charadriiformes.

In MP and ML trees, Mesitornithidae is a sister group to the traditional Gruiformes clade with weak support. While in BA tree, Mesitornithidae is a sister group to the Charadriiformes clade with weak support. Fain and Houde [11] found that Mesitornithidae grouped with the proposed clade Metaves. The study of Hackett et al. [16] confirmed that the Mesitornithidae was a sister group of the doves. The lack of homologues sequence data in our analysis may be the reason that Mesitornithidae is unstable in the phylogenetic trees.

In contrast to the overall taxonomic uncertainty surrounding many lineages considered gruiform, a consensus has begun to emerge that there is a monophyletic “core” consisting of Gruidae, Aramidae, Psophiidae, Heliornithidae and Rallidae [11, 13]. It is supported by the result of Bayesian analysis in this study. The phylogenetic relationship of the core Gruiformes based on Bayesian analysis is shown in Fig. 4. It has the same topology with the study of Livezey [27] and Fain et al. [13]. In MP and ML trees of this study, by removing Otididae, the phylogenetic relationship of the five families is the same with Fig. 4.

Fig. 4
figure 4

Phylogenetic relationship of core Gruiformes