Introduction

Common carp (Cyprinus carpio L.) is a widespread freshwater fish that originated in Eurasia. Common carp have a cultural history and domestication of several thousand years and have become one of the most important edible and popular aquarium fish, comprising a large number of strains worldwide (Zhou et al. 2003b).

Because of long-term geographic isolation, selection, and breeding, common carp have developed distinct body shape, skin color, scale pattern, body size, growth rate, and stress tolerance phenotypes to adapt to various environments, providing abundant germplasm for further genetic selection. China has the greatest number of common carp populations and domesticated strains in the world. The renowned strains in China include Hebao red carp (HB), Xingguo red carp (XC), Oujiang color carp (OJ), and Songpu mirror carp (SP). Intensive hybrid breeding of common carp was conducted over the last half century in China, which generated many hybrid strains. Jian carp (JC) is one of the most successful hybrid strains and has contributed significantly to the rapid growth of common carp. However, aquaculture of the hybrid strains has caused significant adverse impacts. Escape, artificial propagation, and release of the hybrids into natural waters have changed the genetic profiles of many wild populations and have caused serious management, conservation, and breeding consequences. Phylogenetic analysis is a widely used approach to protect and conserve germplasm and is used to clarify the phylogenetic relationships and reveal the obscured breeding histories of populations and strains. Phylogenetic studies benefit genetic breeding and germplasm conservation efforts.

Many studies have been performed to illustrate the origin and phylogenetic relationships among common carp strains, employing various genetic tools, including microsatellites, random amplified polymorphic DNA, amplified fragment length polymorphism, restriction fragment length polymorphism, and mitochondrial DNA (mtDNA) sequence analyses (Bartfai et al. 2003; Cheng et al. 2010; Mabuchi et al. 2008; Zhou et al. 2003a). Some studies have provided insight into the phylogenetic relationships among specific strains and wild populations. All of these studies indicate that all common carp populations belong to the European and East Asian subspecies. Genetic differentiation into the two subspecies of Cyprinus carpio carpio in Europe and Cyprinus carpio haematopterus in East Asia is clearly evident, and the existence of a third subspecies (C. carpio rubrofuscus) that may have diverged from C. carpio haematopterus in China has been suggested (Kohlmann and Kersten 2013; Zhou et al. 2004a). However, many phylogenetic relationships and the breeding history remain uncertain in common carp. For example, the origin and phylogeny of koi (KC), the most popular cultured ornamental carp in the world, are long-term controversies. Balon (1995) believed that KC originated from pond carp in Japan, China, or Germany. Froufe et al. (2002) suggested that KC originated from European common carp. A phylogenetic analysis sequences conducted by Wang and Li (2004b) using mtDNA found that KC may be derived from OJ inhabiting Zhejiang province, China. The relationships among the three major domestic strains in South China, such as XC, HB, and OJ, are also controversial (Wang and Li 2004a). Many hybrid common carp strains occur in China. Multiple rounds of hybridization and genetic introgression are employed during hybrid breeding. These hybrid strains have ambiguous genetic backgrounds when breeding history is not well recorded. These phylogenetic uncertainties require the development of comprehensive genetic tools and thorough sampling to resolve the obscure origins and relationships among common carp. A substantial investment has been made in common carp genetics and genomics, which has greatly expanded the available genetic resources and tools. These resources include a large number of genetic markers (Ji et al. 2012; Kongchum et al. 2010; Zhang et al. 2008), genetic and physical maps (Cheng et al. 2010; Xu et al. 2011c; Zhang et al. 2013), expressed sequence tags and RNA-seq (Moens et al. 2007; Williams et al. 2008; Xu et al. 2011a), microRNA (Yan et al. 2012), a bacterial artificial chromosome (BAC) library (Li et al. 2011), and BAC-end sequences (Xu et al. 2011b). The complete common carp genome has been recently published and released (Xu et al. 2014). With these molecular tools and resources, phylogenetic analyses will provide a better understanding of common carp phylogeny and comprehensively unveil obscure relationships among strains.

MtDNA is the most popular genetic tool for phylogenetic studies. However, only partial mtDNA sequences were used in most studies because of budget and technology concerns. The cytochrome c oxidase subunit II, cytochrome b (Cyt b), and the displacement loop (D-loop) regions are the most commonly used sequences for phylogenetic analyses (Kohlmann and Kersten 2013; Zhou et al. 2004b). However, different genes and regions in the mitogenome have various evolutionary rates, which may generate significant differences during phylogenetic analyses. Many scientists have suggested analyzing multiple genes or gene clusters rather than a single gene or genome region in the mitogenome, as this approach increases the possibility of revealing the actual evolutionary phylogeny of a particular species (Cunningham 1997; Wiens 1998b). Higher resolution and accuracy of phylogenetic relationships are gained when more mtDNA sequences are used (Wiens 1998a). Complete mitochondrial genome (CMG) sequences are desired for an accurate phylogenetic study. For example, the CMGs of teleost species have been extensively used to reinvestigate the main patterns of higher teleostean phylogenies and have resolved persistent controversies over higher-level teleost relationships. The CMGs from ten phyllostomid bat species were sequenced using the Illumina platform and were used to investigate the phylogenies of leaf-nosed bats and resolve their evolutionary history (Botero-Castro et al. 2013). Ten new CMGs from pulmonates were sequenced and used to evaluate their relationship through phylogenetic analyses (White et al. 2011).

Herein, we utilized the high-throughput whole-genome resequencing data of 26 common carp individuals from nine strains and extracted the mitochondrial genome sequences using a bioinformatics approach. The CMGs were collected, assembled, and subjected to a phylogenetic analysis of these individuals. Utilizing CMG sequences will provide additional resolution to decipher difficult phylogenetic relationships and benefit germplasm conservation and genetic breeding efforts for common carp.

Materials and Methods

Sample Collection

Nine strains of common carp including 26 individuals were collected across Europe and China (Fig. 1), representing typical lineages in both geographic regions. Danube River carp (DR, n = 3), Tisza River carp (TR, n = 3), and Thailand carp (TC, n = 2) were collected from the live carp gene bank of the Research Institute for Fisheries and Aquaculture, Hungary. Representatives of all other strains and wild populations were collected from China, including SP (n = 4) from Heilongjiang Fishery Research Institute of the Chinese Academy of Fishery Sciences (CAFS), HB (n = 4) from Wuyuan county in Jiangxi province, XC (n = 3) from Xingguo county in Jiangxi province, OJ (n = 3) from Longquan county in Zhejiang province, KC (n = 2) from the Beijing Fishery Research Institute, and JC (n = 2) from the CAFS Freshwater Fisheries Research Center.

Fig. 1
figure 1

Sample collection sites and their geographic distribution

Genome Sequencing, Extraction of the Mitochondrial Sequences, and Assembly

Blood samples or fin clips were collected from each individual. Genomic DNA was extracted using the DNeasy Blood & Tissue Kit (Qiagen, Hilden, Germany). Paired-end libraries were constructed following the standard Illumina protocol with an insert size of 180 bp and sequenced in one flow cell (eight lanes) on the Illumina HiSeq 2000 platform. Low-quality and short reads were removed using SolexaQA software to obtain a set of usable reads (Cox et al. 2010). The data were used primarily to identify genome-wide single-nucleotide polymorphisms (SNPs) and to conduct the genetic diversity analysis (Xu et al. 2014). Herein, we used the data to extract the mitochondrial sequences. Paired-end reads from each individual were aligned to the mitochondrial reference genome (NC_018366) (Wang et al. 2013) using the Burrows–Wheeler Aligner (Li and Durbin 2009). All mapped reads were collected and assembled into the mitochondrial genome of each individual using Newbler (Roche Scientific, Manheim, Germany). The assembled CMG sequences were validated by comparison to references using CLUSTALW2 (http://www.ebi.ac.uk/Tools/msa/clustalw2/).

Sequence Analysis

Alignment of the D-loop, Cyt b, and CMG sequences of the 26 individuals was performed using MEGA 6 (Tamura et al. 2013), and the sequences were subjected to base composition, conservative site, variable site, parsimony informative site, and transition/transversion ratio analyses. DNAsp v5 (Librado and Rozas 2009) was used to calculate haplotype and DNA polymorphisms. We used Arlequin v3.5 (Excoffier and Lischer 2010) to determine the connection length between the different haplotypes and laid out the network for optimal visualization with the help of NETWORK 4.6.1.2 (http://www.fluxus-engineering.com) to directly show the associations between the haplotypes.

Divergence Time and Phylogenetic Analysis

Pairwise genetic distances were calculated in MEGA 6 based solely on the CMG sequences using the maximum composite likelihood model. Transitions and transversions were included as nucleotide substitutions. Then, Nei’s formula was used with the Cyprinidae evolutionary rate to calculate divergence time. The phylogenetic relationships were estimated using the D-loop (927 bp), Cyt b (1141 bp), and CMG (16,581 bp) sequences and the maximum likelihood (ML) and neighbor-joining (NJ) methods, respectively. Bayesian information criterion values were calculated for each model to determine the optimum for the ML method in this study. The Tamura–Nei and maximum composite likelihood models were used in the ML and NJ analyses, respectively. A total of 1000 bootstrap replicates were conducted for each calculation. The PhyML software package was used in the phylogenetic analysis as a comparable approach.

Results

Mitochondrial Genomes and Their Features

A total of 332.3-Gb whole-genome sequences were collected from the Illumina sequencing platform, which contained reads derived from the mitochondrial and nuclear genomes of 26 individuals. The raw reads were mapped to the common carp mitochondrial reference genome to discriminate mitochondrial reads from background. A total of 2,065,531 mapped reads were collected, representing 206.5 Mb of the mitochondrial genomes from the 26 individuals. Sequencing depths were 193–1054, with a mean of ~479×, which was comparable with the reference mitochondrial genome (Table 1). The mitochondrial genomes of the 26 individuals were assembled completely for further analysis.

Table 1 Total paired-end reads, data size, and sequencing coverage of mitochondrial genomes in each individual

All assembled CMGs were 16,581 bp in length, with GC content of 43.2–43.4 % (Table 2), which was similar to the reference mitochondrial genome. These CMGs contained 13 protein-coding genes, 22 tRNA genes, two rRNA genes, and a D-loop region. The structural organization and location of the different features in the genome conformed to the common vertebrate mitochondrial genome model.

Table 2 Nucleotide composition of D-loop, Cyt b, and CMG

The lengths of the D-loop, Cyt b, and the CMG were 927; 1141; and 16,581 bp in all 26 individuals, respectively, and contained conserved sites of 896; 1118; and 16,347 bp; variable sites of 31, 23, and 234 bp; parsimony informative sites of 31, 22, and 227 bp; transitions of 9, 7, and 69 bp; and transversions of 1, 1, and 4 bp (Table 3).

Table 3 Statistics of the D-loop, Cyt b, and CMG in 26 individuals

Haplotype Analysis

We identified 12 haplotypes from the 26 individuals (Table 4 and Fig. 2). The Asian and European individuals were divided into two clades, containing nine and three haplotypes, respectively. Six strains (DR, JC, KC, XC, OJ, and TC) contained only one haplotype, two strains (HB and TR) contained two haplotypes, and one strain (SP) contained three haplotypes. All haplotypes belonged exclusively to one specific strain, except haplotype H7, which was shared by the SP and XC strains. SP also contained the H6 and H8 haplotypes, which were relatively new and likely derived from H7. JC is a hybrid strain with HB as the maternal parent during multiple hybridization rounds. However, haplotype H4 belonged exclusively to JC, suggesting that either H4 is a newly evolved haplotype or our samples did not represent all HB haplotypes. Investigating additional HB and JC samples would clarify this issue.

Table 4 Haplotypes in nine common carp stains and their frequencies
Fig. 2
figure 2

Haplotype network of the nine common carp strains studied here. Each strain is represented by a different colored circle, and the area of the circle represents the number of individual samples (n = 1–4) sequenced in that strain. The 12 identified haplotypes are labeled H_1H_12

Phylogenetic Analysis

A phylogenetic analysis was performed based on the CMGs from the 26 individuals using the ML method. The ML tree was constructed as shown in Fig. 3. We performed a NJ analysis and constructed the NJ tree in MEGA6 (Fig. 4). We also constructed a ML tree in PhyML to evaluate the reliability of the phylogenetic analysis using the ML method. The NJ tree supported the ML tree topology, and only minor differences in the bootstrap support (BS) values were detected, providing confidence for the common carp phylogenetic topology results. The PhyML topology was consistent with that from MEGA6 (Supplementary Fig.S1).

Fig. 3
figure 3

Maximum likelihood-based phylogenetic tree of the full-length mitochondrial DNA sequences in 26 samples from nine strains of common carp. Bootstrap support values for each branch are included in the tree. Two distinct clades, A and B, are identified along with two subclades (A1 and A2) of clade A

Fig. 4
figure 4

Neighbor-joining-based phylogenetic tree of the full-length mitochondrial DNA sequences in 26 samples from nine strains of common carp. Bootstrap support values for each branch are included in the tree. Two distinct clades, A and B, are identified along with two subclades (A1 and A2) of clade A

Two distinctive clades corresponding to two subspecies were found. Clade A belonged to strains inhabiting East Asia, including SP, XC, JC, HB, KC, TC, and OJ, and was clearly divided into the A1 and A2 subclades. A1 consisted of SP, XC, JC, and HB, whereas A2 consisted of KC, TC, and OJ. Clade B belonged to the subspecies C. carpio carpio inhabiting Europe, including DR and TR. All strains and populations in clade A belonged to the East Asian subspecies C. carpio haematopterus, except SP, which originated from a European mirror carp (C. carpio carpio) strain. SP was introduced into China in the 1970s and was bred to adapt to the local aquaculture environment. SP showed high similarity with XC in subclade A1. Further discussion will follow below regarding the exception.

We further constructed the ML and NJ trees based on the partial mitochondrial genomes, including Cyt b (Figs. 5 and 6) and the D-loop (Figs. 7 and 8), which are commonly used in phylogenetic analyses. The topologies were also consistent in the major portion of the trees based on the CMG. However, we detected some differences between the CMG and partial mitochondrial genome sequence trees. For example, the ML and NJ trees based on Cyt b hardly discriminated HB from SP and XC, whereas HB was clearly separated from SP and XC and merged with JC in the CMG trees. The ML and NJ trees based on the D-loop region and CMG were significantly different for the OJ, KC, and TC topology. OJ and TC were clustered first in the CMG tree and then clustered with KC. However, OJ and KC were clustered first in the D-loop tree and then with TC.

Fig. 5
figure 5

Maximum-likelihood-based phylogenetic tree of cytochrome b (Cyt b) sequences in 26 samples from nine strains of common carp. Bootstrap support values for each branch are included in the tree. Two distinct clades, A and B, are identified along with two subclades (A1 and A2) of clade A

Fig. 6
figure 6

Neighbor-joining-based phylogenetic tree of cytochrome b (Cyt b) sequences in 26 samples from nine strains of common carp. Bootstrap support values for each branch are included in the tree. Two distinct clades, A and B, are identified along with two subclades (A1 and A2) of clade A

Fig. 7
figure 7

Maximum-likelihood-based phylogenetic tree of the D-loop sequences in 26 samples from nine strains of common carp. Bootstrap support values for each branch are included in the tree. Two distinct clades, A and B, are identified along with two subclades (A1 and A2) of clade A

Fig. 8
figure 8

Neighbor-joining-based phylogenetic tree of the D-loop sequences in 26 samples from nine strains of common carp. Bootstrap support values for each branch are included in the tree. Two distinct clades, A and B, are identified along with two subclades (A1 and A2) of clade A

Time of Divergence Estimate

We calculated Nei’s genetic distance (D) of the nine strains based on the CMG sequences (Table 5). The D values ranged from 0.00006 to 0.00699, with a mean of 0.0048. XC and SP had the smallest D value (0.00006), whereas OJ and DR had the largest D value (0.00699). The D values of SP and the strains belonging to the East Asian subspecies C. carpio haematopterus were much smaller than those between SP and the strains of the European subspecies C. carpio carpio. These results conflict with the breeding history of SP, which was bred based on mirror carp introduced from Europe.

Table 5 Nei’s standard genetic distance of nine common carp stains based on the CMG

Mitochondrial genomes usually have a faster evolutionary rate than that of nuclear genomes. Generally, a rate of 2 % per million years has been calculated for Cyt b genes in several bony fish species (Wu et al. 2013). A rate of 3.3 % per million years has been calculated for Cyt b and 16S rRNA genes to estimate the divergence time of cyprinids. Different genes and regions in the mitochondrial genome may have different evolutionary rates; therefore, we used the evolutionary rate of 3.3 × 10−8 to estimate the time of divergence of the common carp lineages (Tong and Wu 2001). The divergence time between two different lineages was calculated using Nei’s formula of T = D/2α (where T is the number of years since divergence and α is the evolutionary rate). The results showed that the time of divergence was 0.91 × 103–1.06 × 105 years ago, with a mean of 7.27 × 104 years ago. OJ and DR showed the longest divergence time, suggesting that the European subspecies (C. carpio carpio) (clade B) and the Asian subspecies (C. carpio haematopterus) (clade A) diverged about 100 KYA. The average D value between clade A1 and A2 was 0.0058, suggesting a divergence time of about 8.79 × 104 years ago for the two Asian subspecies.

Discussion

As low-cost and high-throughput, next-generation sequencing technologies have been employed for genome sequencing in the past several years, CMG sequences can be easily extracted from whole-genome sequencing data based on sequence similarities and assembled into the CMG of the target species. The CMG can then be sequenced on a next-generation sequencing platform with discriminating barcodes to separate different samples. A number of phylogenetic and evolutionary studies have reported CMGs sequenced using next-generation sequencing platforms for various applications (Davila et al. 2011; Fujii et al. 2010; Rodriguez-Moreno et al. 2011). We extracted the CMG sequences from genome resequenced data of 26 common carp individuals from nine lineages based on the Illumina platform, and a phylogenetic analysis was performed based on the CMG sequences in cyprinid species for the first time. This approach will provide a rapid and effective alternative method to collect CMG sequences and conduct phylogenetic analyses on other aquaculture species. However, the Illumina platform usually has ultra-high throughput and sequencing capacity compared with the size of the mitochondrial genome. A large number of barcoding and sequencing libraries are required to reduce the cost of each mitochondrial genome and increase efficiency. However, it is not feasible when only a small number of mitochondrial genomes are sequenced. Ion Torrent (Life Technologies, Carlsbad, CA, USA) is a better alternative for this case. Ion Torrent sequences 100 Mb to 1 Gb within several hours using various chips, which is effective for rapid and economic sequencing of a mitochondrial genome for various applications.

Three red common carp strains are commonly cultured in China, such as HB, OJ, and XC, with 800, 1200, and 1300 years of documented culture history, respectively. OJ is mainly distributed in Zhejiang province, and HB and XC are cultured in Jiangxi province, as shown in Fig. 1 (Wang and Li 2004a). A phylogenetic analysis of these three strains based on cytochrome oxidase II mtDNA suggested that XC and OJ may have originated from one monophyletic group, whereas HB may have originated independently (Li and Wang 2001). However, these data were inconsistent with the biogeographic evidence that HB and XC both inhabit the Yangtze River watershed, whereas OJ is found in the Oujiang River watershed. We found here that XC and HB were clustered in clade A1, and OJ was clustered in clade A2 with TC and KC. Our data suggest that XC is more closely related to HB than OJ, which is consistent with the biogeographic evidence.

JC is bred based on the HB (maternal parent) and Yuanjiang carp (paternal parent) hybrid, which has been commonly cultured in China in the past 20 years. According to our phylogenetic analysis, JC and HB were closely clustered into subclade A1, which is consistent with the JC breeding history. However, we also determined that the genetic distance between JC and HB was about 0.00224, suggesting that slight genetic divergence occurs in JC during intensive selection and breeding after the initial hybridization between HB and Yuanjiang carp.

KC is the most renowned ornamental fish, and more than 100 color patterns have been developed through artificial selection in the past century. However, some controversy exists in the origin of KC and its relationships with other common carp lineages. According to our phylogenetic topology, OJ, TC, and KC were clustered into subclade A2 and separated distinctively from subclade A1 (XC, HB, JC, and SP). Therefore, we suggest that ancestral populations of KC or TC may have originated from OJ, or they could have the same origin.

SP is a Chinese domesticated mirror carp strain based on German mirror carp and was introduced to China in the 1970s. Breeding records show that no hybrids with Asian strains have occurred. Thus, SP genetically belongs to the European subspecies. A phylogenetic study based on genome-wide SNP loci confirmed consanguinity (Xu et al. 2014). Surprisingly, our results show that SP was closely clustered with XC based on the very small genetic distance. The only reasonable explanation is that XC was the maternal parent when SP was domesticated in the 1970s. Notably, the divergence time between SP and HB was estimated to be about 0.91 × 103 years ago based on molecular evidence, which is a significant overestimate compared with the actual divergence time if SP hybridized with XC in China. Divergence time estimates based on commonly used mtDNA sequences likely suffer from a high substitution rate. For example, divergence time derived from mitochondrial data could be considerably overestimated and systematically biased toward the calibration point because of substitution saturation in the absence of younger more effective calibration points. The overestimate may be ten times older than the actual divergence date for recent splitting events (Zheng et al. 2011). The results of a population structure analysis based on genome-wide SNP loci showed that SP has a genetic introgression signature from Asian populations (refer to Fig. 3c in Xu et al. [2014]).Therefore, we suggest that European mirror carp was first hybridized with XC to enhance adaptability after its introduction into China and was then backcrossed continuously with European mirror carp to achieve desired traits, such as high growth rate and stress tolerance. SP was selected with stable traits and performance after several rounds of backcrossing and introgression.

Conclusion

We extracted mitochondrial genome sequences from high-throughput genome resequencing data of 26 individuals representing nine common carp lineages from Europe and East Asia. The phylogenetic analysis was performed based on these CMGs and partial mitochondrial sequences. The results clarified several undetermined and controversial common carp phylogenetic relationships and breeding history and provide additional resources for germplasm conservation and genetic breeding of common carp.