Introduction

The Japanese archipelago is an island system extending over 2,000 km in the northeast–southwest direction at the eastern margin of East Asia. Its unique, rich biota with regional heterogeneity has provided intriguing material for historical biogeography (Watanabe et al. 2006; Sota and Hayashi 2007; Yoshikawa et al. 2008). Repeating connection and disconnection of the Japanese archipelago with the continent of Asia, as well as development of the mountainous topography, presumably played important roles in the formation of the biota. The separation of the archipelago from the continent of Asia began in the Early Miocene (Yonekura et al. 2001). In the initial stage of the archipelago separation, the Fossa Magna, a channel crossing the center of the present Honshu Island, separated the southwestern and northeastern landmasses. The two landmasses gradually connected after 15 million years ago (Ma), followed by uplifted mountain ranges during the Late Miocene–Early Pliocene around the region (Machida et al. 2006). Western Japan was connected to eastern China during the Middle Miocene to the Late Pliocene, and thereafter a land bridge intermittently connected western Japan to the Korean peninsula during glacial periods (Yonekura et al. 2001). Tectonic movements involving uplift of mountains became active especially in the Pleistocene, resulting in mountain coverage of about 61 % of the archipelago. Due to the midlatitude location, the archipelago has not experienced extensive glaciation except for a part of the high mountain areas (Kaizuka and Chinzei 1986). Therefore, the major Japanese biota was probably formed by repeated vicariance and dispersal since the Miocene, rather than by range expansion from glacial refugia.

Freshwater fish studies provide clues to the relationships between geology and biota formation, because they are easily isolated by geological barriers such as mountains and straits (Avise 2000). Close relationships of the freshwater fish fauna between Japan and East Asia have been emphasized (Aoyagi 1957); the former might have originated by dispersal of the latter through ancient river systems in the Late Pliocene and the Pleistocene (Lindberg 1972; Nishimura 1980; Mizuno 1987). However, recent molecular phylogenetic studies have suggested that the divergence times of some Japanese and Korean species date back to the Late Miocene (e.g., Kitagawa et al. 2005).

Remarkable genetic divergence among regional populations of species is known in many Japanese freshwater fishes (Watanabe et al. 2006). However, reliable estimation of their divergence time has not been available in most cases and some are controversial. For example, the northeastern–southwestern division is a major division for the Japanese freshwater fish fauna (Nishimura 1980; Watanabe 2012), and inter- and intraspecific genetic divergences in several fish groups are also associated with this division (Watanabe et al. 2000; Mihara et al. 2005). Although some attribute this divergence to the uplifting of mountains around the Fossa Magna (Central Highlands) in the Late Miocene–Pliocene (Watanabe et al. 2006), others suggest that the differentiation occurred at the formation of the Japanese archipelago in the Early Miocene (Setiamarga et al. 2009). To date, only fragmented information has been available to address the timescale of the distribution process of freshwater fishes (and other aquatic animals) in the Japanese archipelago (see Watanabe et al. 2006).

Considering the fundamentally low dispersal ability of primary freshwater fishes, widely distributed species provide an excellent opportunity to reconstruct the formation pattern of biotas, because they may retain many historical signs in their phylogeographic patterns. In this study, we focused on the cyprinid pike gudgeon Pseudogobio esocinus, which is widely distributed in rivers and lakes in almost all areas of Japan (except for the northernmost island of Hokkaido and the Ryukyu Islands), as well as in the Korean peninsula and northern China (Fig. 1). Three valid congeners occur in the continental areas: Pseudogobio vaillanti (wide ranges of China; Xia et al. 2005), Pseudogobio guilinensis (the Pearl River system) and Pseudogobio banggiangensis (Vietnam). Recent studies have revealed that both P. esocinus and P. vaillanti include two or more highly differentiated mitochondrial (mt) DNA lineages (Xia et al. 2005; Tominaga et al. 2009), suggesting that species of Pseudogobio will be highly informative regarding the formation process of freshwater fish fauna in East Asia.

Fig. 1
figure 1

Sampling locations of Pseudogobio esocinus, P. vaillanti and P. guilinensis, with location codes

This study investigated the intraspecific differentiation of P. esocinus, which may reflect the distribution process of this wide-ranging species. Molecular phylogenetic and phylogeographic analysis of Japanese P. esocinus was performed with the continental congeners using specimens sampled from its entire geographic range. We used sequence data of the mtDNA cytochrome b (cyt b) gene and partly three nuclear DNA loci. Based on hypotheses of vicariance, we employed the molecular phylogeny to date the major lineages and reconstructed the chronological development of the geographic distribution. We found three cryptic lineages in Japanese P. esocinus that differentiated at the species level and revealed contrasting patterns among their distribution processes. These results are discussed with respect to previous information on the fauna and phylogeography of aquatic animals in Japan. The results suggest the importance of studying wide-ranging species for clarifying geographic heterogeneities and historical formation of biotas.

Materials and methods

Samples. We collected 1,208 individuals from 200 locations in 135 river systems throughout the distribution area of Pseudogobio esocinus in Japan [Fig. 1; see Electronic Supplementary Material (ESM) Table S1]. Of these, 138 individuals from 26 locations in Japan were the same as those used by Tominaga et al. (2009). Four individuals from the Imjin River system in Gyeonggi-Do in Korea, used by Tominaga et al. (2009), were also included in the analysis.

Laboratory procedures. DNA was isolated from fins or tissue stored in 99 % ethanol using the Genomic DNA Purification Kit (Promega, Madison, WI, USA). Polymerase chain reaction (PCR) was conducted to amplify the cyt b region using the primer pair L14724 and H15915, as described by Tominaga et al. (2009). Three single-copy nuclear DNA loci, myosin heavy polypeptide 6 (myh6), a novel protein similar to vertebrate ryanodine receptor 3 (RYR3) and glycosyltransferase (Glyt), were amplified using primer pairs and PCR conditions according to Li et al. (2007). The PCR products were purified with ExoSAP-IT (USB, Cleveland, OH, USA) and sequenced using the BigDye Terminator Cycle Sequencing FS Ready Reaction Kit ver. 3.1 (Applied Biosystems, Foster City, CA, USA). Sequencing was performed on an ABI3130 automatic DNA sequencer (Applied Biosystems). We obtained the sequences of 689 bp of the 3′ side of the cyt b region for all individuals. For a subset of the specimens, sequences were determined for the full length of the cyt b region (1,140 bp) and three nuclear DNA loci (myh6, 732 bp; RYR3, 822 bp; Glyt, 816 bp), as described below.

Fourteen cyt b sequences of related species (Pseudogobio vaillanti, Pseudogobio guilinensis, Abbottina rivularis and Biwia zezera) were obtained from the DNA databank DDBJ/EMBL/GenBank (ESM Table S1). The sequences of P. vaillanti were sampled from all five major clades detected by a phylogeographic analysis in Xia et al. (2005). The sequences of the three nuclear DNA loci of B. zezera from Lake Biwa (myh6, AB536844; RYR3, AB536850; Glyt, AB536847; Watanabe et al. 2010) and A. rivularis from the Kiso River system (accession nos. AB794032, AB794037 and AB794041; this study) were used as outgroups. The sequences were edited and aligned by eye using the alignment explorer in MEGA4 (Tamura et al. 2007). All newly obtained sequences were deposited in the DDBJ/EMBL/GenBank (accession nos. AB793798–AB794041 and LC062387–LC062392).

Phylogenetic analyses and divergence time estimation. Haplotypes of the 3′ side of the mtDNA cyt b region (689 bp) from all 1,212 individuals constituted data set A, with 14 sequences of the related species. To detect major lineages, a phylogenetic tree from data set A was constructed by the neighbor-joining (NJ) method as implemented in MEGA4. The Jukes–Cantor (JC) model was selected to calculate the evolutionary distance, according to a guideline by Nei and Kumar (2000). Nodal support of the tree was assessed by bootstrap method with 1,000 resampling. Major “clades” and “subclades” were defined based on the NJ tree.

We determined the full length (1,140 bp) of cyt b sequences in 50 individuals of P. esocinus representing each major clade and subclade detected in the above tree (47 individuals from Japanese populations and three individuals from a Korean population). These 50 haplotypes constituted data set B with the 14 sequences of the related species as used in data set A. Furthermore, three nuclear DNA loci were sequenced in 19 individuals of P. esocinus representing each clade and subclade from data set B (17 Japanese and two Korean specimens). Heterozygous sites within individual specimens were rare (0–0.68 % for each gene); these were coded as IUPAC degenerate nucleotide symbols and used directly in the analyses. The data from the three loci were treated as data set C with the sequences of two related species (B. zezera and A. rivularis).

Using data sets B and C, we constructed the maximum likelihood (ML) phylogeny as implemented in Treefinder ver. October 2008 (Jobb 2008) to investigate more reliable phylogenetic relationships. The sequence data were partitioned by gene and codon position. The most suitable substitution model for each partition was selected with Kakusan 4 (Tanabe 2007) using the Akaike information criterion (AIC). In both data sets, the proportional model, which considers dispersion of the mean substitution rate between partitions, was selected (ESM Table S2). The ML tree was repeatedly searched by the nearest-neighbor interchanges method until the likelihood value did not become larger. Nodal support of the tree was assessed by the bootstrap method with 1,000 resampling.

Based on data set C, we also inferred the “species tree” using *BEAST (Heled and Drummond 2010) included in BEAST ver. 1.6 (Drummond and Rambaut 2007). The distinct groups defined by phylogenetic analysis of mtDNA were treated as “species” in this analysis. The evolutionary model was selected by locus using Kakusan; the K80 model was used for all the three loci. The strict molecular rate model and the random local clock model (Drummond and Suchard 2010) were applied. In the Markov chain Monte Carlo (MCMC) analysis, samples from the posterior were drawn at every 1,000 generations over a total of 50 million generations, following a discarded burn-in of the first 5 million generations. The results from two independent runs were combined. The parameter estimates, convergence of the chains to the stationary distribution and large effective sample size (ESS; >200) were checked using Tracer ver. 1.5 (Rambaut and Drummond 2009).

The divergence time of each major clade and subclade of P. esocinus and the overall mean substitution rate of Pseudogobio were estimated using the Bayesian MCMC approach implemented in BEAST using only species of Pseudogobio from data set B. The TN+G+I model was used as the optimal model chosen by Kakusan using the Bayesian information criteria. The Yule process was used as the tree prior. As the hypothesis of a strict molecular clock was rejected by the analysis implemented in HyPhy ver. 2.0 (Pond et al. 2005), we used the random local clock model.

Although P. esocinus fossil records are not available, geological ages of dispersal barriers might be informative for freshwater fishes (e.g., Burridge et al. 2008); i.e., uplift of mountains separating drainages can be an important cause of geographic vicariance for primary freshwater fishes. Several species of Japanese primary freshwater fishes (e.g., Rhinogobius flumineus, see Shimizu et al. 1993; Biwia zezera, see Watanabe et al. 2010) show population differentiation between the western and eastern sides of the Suzuka Mountains, central Honshu. Therefore, the uplift age of the mountain system (1–1.5 Ma; Kawabe 1989) was employed as a minimum limit of the calibration point when it can be assumed to separate reciprocal clades; an exponential distribution was set for the tree root height prior to maximize the prior distribution at 1.0 Ma (offset = 1.0, mean = 0.167), when the mountains were certainly uplifted (constraint 1).

We set a lax constraint associated with the southwestern–northeastern vicariance for the phylogeny. In freshwater organisms, the differentiation of the sister species or subspecies across the center of Honshu is often hypothesized to result from the vicariance associated with uplift of the Central Highlands (3–6 Ma; Machida et al. 2006; Watanabe et al. 2006). Setiamarga et al. (2009), however, recently estimated that the two regional medaka (Oryzias latipes) populations differentiated around 18 Ma upon the formation of the Japanese archipelago (15–20 Ma; Yonekura et al. 2001). Therefore, we employed a uniform prior distribution between 3 and 20 Ma for the correspondent node (constraint 2).

In the MCMC analysis, samples from the posterior were drawn at every 5,000 generations over a total of 50 million generations, following a discarded burn-in of the first 5 million generations. The analysis was repeated and the samples from two independent runs were combined. The parameter estimates, convergence of the chains to the stationary distribution and large effective sample size (ESS; >200) were checked using Tracer.

Elimination of individuals with haplotypes suspected to originate from artificial transplantation. Before the phylogeographic analysis, we carefully eliminated artificial factors that disturbed the geographic distribution of mtDNA haplotypes. Many species of freshwater fishes, including P. esocinus, have spread from Lake Biwa to many rivers in Japan, accompanied by the stocking of Ayu (Plecoglossus altivelis), the most important target of inland fisheries in Japan (Watanabe 2012). We excluded individuals from the population data that had haplotypes belonging to the clade with haplotypes from Lake Biwa, which appeared sporadically in the distribution areas of the other clades. Seventy-eight individuals from 30 locations in 28 river systems were removed from data set A, with the remaining individuals constituting data set D (ESM Table S1).

Demographic analyses. We defined “regional population groups” in Japanese P. esocinus by geographic divisions congruent with the distribution area of the mtDNA subclades that were revealed from the phylogenetic analysis. To estimate the demographic history of each mtDNA regional population group, we applied Bayesian skyline plot (BSP) analysis (Drummond et al. 2005), implemented in BEAST using data set D. In the MCMC analysis, samples from the posterior were drawn every 5,000 generations over a total of 50 million generations, following a discarded burn-in of the first 5 million generations. The analysis was repeated and the samples from two or four independent runs were combined. For each dataset, the substitution model was selected by Kakusan, and the substitution rate that was inferred using the random local clock model by BEAST was used to estimate the time to expansion. The parameter estimates, convergence of the chains to the stationary distribution and ESS (>200) were checked using Tracer.

We also used mismatch distribution analysis (Rogers and Harpending 1992), and Tajima’s (1989) D and Fu’s (1997) F S tests of selective neutrality to detect the population expansion, using ARLEQUIN ver. 3.1 (Excoffier et al. 2005). Using the formula τ = 2ut, where τ is the estimated parameter and u is the substitution rate for the whole haplotype, we calculated t, which is the time of the latest expansion under the models. The substitution rate that was inferred using the random local clock model by BEAST was used to calculate t.

Haplotype network and nested clade phylogeographic analysis. A nested clade phylogeographic analysis (NCPA; Templeton 1998) was performed to infer the phylogeographic history of Japanese P. esocinus using data set D. Although the validity of NCPA is controversial (e.g., Beaumont and Panchal 2008; Knowles 2008), we employed NCPA for a heuristic search of phylogeographic signals and carefully interpreted the result. Latitudinal and longitudinal data were used as geographic information on sampling locations. Haplotype networks were created by the statistical parsimony method using TCS ver. 1.21 (Clement et al. 2000) with the connection limit of five steps. Ambiguities and loops in the haplotype networks were resolved following the criteria suggested by Crandall and Templeton (1993). Nesting of the haplotype networks was implemented by the nesting algorithm in ANeCA (Panchal 2007). Contingency permutation tests using GeoDis ver. 2.6 (Posada et al. 2000) were used to test the null hypothesis of random distribution of the haplotypes within the nested clade. The statistical significance of the clade distance values (Dc; i.e., geographic range of a clade) and nested clade distance values (Dn; i.e., average distance from the center of a clade to all individuals that comprise the subclades) was also calculated with the Dunn–Sidak multiple test correction in GeoDis. Significant values were interpreted using the latest GeoDis inference key (January 2011 version). Phylogeographic inferences were performed for clades at the higher than two-step clade level. Direction of range expansion was interpreted by following Templeton (1998).

Results

Phylogeny and population subdivision. In total, 267 mtDNA haplotypes were revealed in 1,208 Japanese Pseudogobio esocinus, and three mtDNA haplotypes were found in four Korean specimens (data set A; ESM Table S1). Based on the phylogenetic analysis, the Japanese haplotypes were divided into three major clades (A–C; Fig. 2; ESM Fig. S1). The three Korean haplotypes, together with P. vaillanti haplotypes from the Liao River basin, northern China, formed an independent clade (K). Excluding the haplotypes from the Liao River basin, P. vaillanti and P. guilinensis were divided into four clades: clades Y1 and Y2 consisted of P. vaillanti from the Yangtze River basin, clade H comprised P. vaillanti from Hainan Island and clade P contained P. vaillanti and P. guilinensis from the Pearl River basin.

Fig. 2
figure 2

Neighbor-joining tree of haplotypes of Pseudogobio esocinus and related species based on 689 bp of partial mitochondrial cyt b sequences (data set A; see text) with the Jukes–Cantor model, shown with bootstrap probabilities (>50 %) of 1,000 resamplings for major nodes. Clade (A–C) and subclade names (e.g., “A-1”) are given. The regional population codes corresponding to the latter are shown in parentheses

Each Japanese clade showed a characteristic geographic distribution (Fig. 3). Clade A, consisting of 187 haplotypes from 891 specimens, was distributed mainly in southwestern Japan throughout Kyushu, Shikoku and western Honshu. Clade B, consisting of 20 haplotypes from 102 specimens, was found only in part of the Pacific Ocean side of western Honshu, its range completely contained within that of Clade A. Clade C included 60 haplotypes from 215 specimens and was distributed widely in eastern Honshu.

Fig. 3
figure 3

Distribution patterns of each clade and subclade of Pseudogobio esocinus in Japan. The codes for clades, subclades and regional population groups are the same as in Fig. 2

The Japanese clades were further divided into several subclades that were distributed in distinct areas (Figs. 2, 3). Seven subclades were found in clade A: A-1, from the area surrounding the Seto Inland Sea (Setouchi area; SET); A-2, from the Tokai district (TOK); A-3, from eastern San-in district (ESI); A-4, from western San-in district (WSI); A-5, from the Hokuriku district (HOK); A-6, from the river systems flowing into the seas of Hibiki and Genkai (RHG); and A-7, from central–southern Kyushu (CSK). However, haplotypes belonging to subclade A-1 were often sporadically found in individuals from outside of their main distribution area and treated as transplants from Lake Biwa. Two subclades were found in clade B: B-1, from SET; and B-2, from TOK. Three subclades were found in clade C: C-1, widely from eastern Honshu (WEH); C-2, from two river systems from Joetsu area (JOE), together with C-1 haplotypes in one of the two rivers; and C-3, exclusively from the Fuji River system (FUJ).

The ML and Bayesian phylogenetic analyses using data set B revealed that neither P. esocinus nor P. vaillanti were monophyletic (Fig. 4; ESM Fig. S1). The eight major clades including three Japanese and five continental clades were statistically well supported with high bootstrap values (≥95 %) and posterior probabilities (all 1.00), respectively. Clade A was most closely related to clades K and Y1. Clades B and C were reciprocally monophyletic.

Fig. 4
figure 4

Linearized Bayesian phylogenetic tree of selected haplotypes of species of Pseudogobio from data set B (see text) based on the entire cyt b sequence (1,140 bp), shown with posterior probability (>0.9) and node bars of 95 % credibility interval for major nodes. Divergence times were estimated for the nodes with a circled number (see Table 1). The nodes with a number on black background are calibration points. The codes for clades, subclades and regional population groups are the same as in Fig. 2

The phylogenetic tree for Japanese and Korean specimens based on three nuclear DNA loci (data set C) indicated two clades (clades nA and nB; Fig. 5a; see also ESM Fig. S2). Clade nA included specimens with mtDNA from clades A and K, and clade nB included those with mtDNA from clades B and C. Only one specimen having clade C mtDNA (C44, from the Fuji River system) was included in clade nA. Except for this, monophyly of nuclear DNA sequences from the specimens with clade A mtDNA was supported (bootstrap value = 89 %). In contrast, monophyly was not sufficiently supported in clades K, B and C, respectively, because of poor resolution of the tree. The species tree analysis conducted excluding the specimens with C44 mtDNA supported the monophyly of the specimens with A and K mtDNA (posterior probability >0.99) and that of specimens with B and C mtDNA (1.00), both in the strict molecular rate model and the random local clock model (Fig. 5b).

Fig. 5
figure 5

a Maximum-likelihood tree of selected individuals of Pseudogobio esocinus and related species based on the three nuclear loci (myh6, RYR3 and Glyt; total 2,370 bp; data set C; see the text), shown with bootstrap probabilities (>60 %) of 1,000 resamplings. OTUs are labeled by mitochondrial haplotype codes. Shaded OTUs indicate the same sequences. b The “species tree” of P. esocinus and related species based on the three nuclear loci, estimated by *BEAST, is shown with posterior probability (>0.9). One specimen belonging to clade nA with the mtDNA clade C haplotype (C44) was excluded from this analysis

Divergence time. Divided by the Suzuka Mountains, both clades A and B included two reciprocal subclades (subclade A-1/A-2 and B-1/B-2; Figs. 2, 3, 4; ESM Fig. S1). The uplift age of the Suzuka Mountains (constraint 1) was employed as the constraint for both node ages (nodes 8 and 14; Fig. 4). Clades B and C were reciprocally monophyletic and distributed in the southwestern and northeastern areas across the Central Highlands around the Fossa Magna area, respectively. Therefore, constraint 2 was applied as their most recent common ancestor node (node 13; Fig. 4).

The divergence time of the common ancestor of clade A was estimated to be about 2.2 Myr [95 % highest posterior density (HPD), 1.4–3.2 Myr; node 4; Table 1], and all of the major subclades diverged between 1 and 2 Ma (nodes 5–9). The split of clade A from its continental sister clade (including clades K and Y1) was estimated to date back about 5.6 (3.6–7.8) Ma (node 3). Clades B and C formed a clade and their divergence time was estimated to be about 4.9 (3.0–6.9) Myr (node 13, calibration point). The time of major divergence within clade B (subclades B-1 and B-2; node 14, calibration point) was estimated at 1.2 (1.0–1.4) Myr. Subclades C-2 and C-3 were estimated to have diverged at 0.8 (0.4–1.4; node 16) and 1.0 Ma (0.5–1.6; node 15), respectively.

Table 1 Estimated divergence times under random local clock model

The divergence time of the common ancestor of Pseudogobio species was dated to about 9.0 (5.5–12.8) Myr (node 1). The mean substitution rate of cyt b of species of Pseudogobio was estimated to be 0.90 %/Myr (0.60–1.24 %/Myr) per lineage.

Demography and phylogeography. BSP analyses, using data set D, indicated that the recent population expansion started from 0.5 Ma or later for the respective regional population groups A-1 (SET), A-2 (TOK) and C-1 (WEH) (Fig. 6; ESM Fig. S3). In the mismatch distribution analysis, SSD and ruggedness index suggested no deviation from the simulated sudden expansion/spatial expansion models for almost all populations and subpopulations (ESM Table S3). Regional population groups A-1 (SET) and C-1 (WEH) significantly deviated from neutrality in both Tajima’s D and Fu’s F S tests (ESM Table S3), and mismatch distribution showed a smooth unimodal shape (Fig. 6). Expansion of all the regional populations was estimated to have occurred later than 1 Ma based on τ values.

Fig. 6
figure 6

Bayesian skyline plots (left) and mismatch distributions (right) of the regional populations of Japanese Pseudogobio esocinus for which the P value of Tajima’s D or Fu’s F S  test was significant (<0.05). In the Bayesian skyline plots, the middle lines represent mean estimates of the effective population size, and the areas between two thin lines represent 95 % highest posterior densities. The mismatch distributions of observation data (pairwise differences) are denoted by vertical bars, and the expected distributions under sudden expansion (solid lines) and spatial expansion models (broken lines) are superimposed. See ESM Fig. S4 for the results of all regional populations

Twelve separate haplotype networks were generated by TCS (ESM Fig. S4). These subdivisions were entirely congruent with the 12 regional population groups identified by phylogenetic analysis (Figs. 2, 3). Phylogeographic inferences by NCPA for the clades at the higher than two-step clade level are shown in ESM Table S4. In the network of A-1 (SET), range expansion from the eastern to western areas was inferred for clades 3-9 and 4-11 (Fig. 7a). In the network of A-2 (TOK), range expansion from central to northern and eastern areas was inferred for the total network (clade 4-12; Fig. 7a). Restricted gene flow was inferred for the network of A-6 (clade 3-2; Fig. 7b). The network of A-7, distributed in Kyushu Island, exhibited a star-like form with haplotype A149 being at the center, which was widely distributed in Kyushu and possessed by about one-fourth of the A-7 specimens. Restricted gene flow/dispersal, but with some long-distance dispersal, was inferred for this network (clade 3-1; Fig. 7b).

Fig. 7
figure 7

Schematic figures for the formation of phylogeographic patterns of Pseudogobio esocinus in Japan inferred by nested clade phylogeographic analysis. See ESM Fig. S4 and Table S4 for the details

Network C-1 showed a star-like form with the central clade 3-18, which was distributed in northern Tohoku district and connected to all other three-step clades (ESM Fig. S4). Haplotype C40, found from the Yamagata area, was at the center of clade 3-18. Long-distance colonization from the Yamagata area to other areas of eastern Honshu and/or past fragmentation was inferred for the entire C-1 network (clade 4-13; Fig. 7a). No conclusive inference was obtained for clade B networks (clades 3-11 and 3-13).

Discussion

Phylogenetic relationships. The present study revealed cryptic diversity within Pseudogobio esocinus, one of the most common freshwater fish species of Japan. Three highly differentiated mtDNA lineages (clades A–C) were found in its Japanese populations [clades A and B correspond to “Group 1” and “Group 2”, respectively, in Tominaga et al. (2009)]. These clades were grouped into two monophyletic groups (A and B+C), which almost completely correspond to the clades nA and nB in the nuclear DNA phylogeny, respectively. Because of low resolution, the nuclear DNA phylogeny did not adequately support mitochondrial clades B and C. However, no topological conflicts were basically detected between the mitochondrial and nuclear DNA trees. The “species tree” based on the three nuclear genes is also consistent with the result from mtDNA data. Furthermore, clades B and C are distributed allopatrically in southwestern and northeastern Japan, respectively. These consistent mitochondrial and nuclear DNA data and distribution patterns strongly support the idea that the Japanese P. esocinus includes three evolutionary entities that basically correspond to the three mtDNA lineages (hereafter referred to as Groups A, B, and C, respectively). Among the three, Groups A and B are considered to be, at least mostly, reproductively isolated from each other; i.e., different species, because they are distributed sympatrically in western Japan and caught simultaneously in some localities. Group C, which is a sister form of Group B, is also regarded as a different species from Group A. However, the data for nuclear DNA loci may be insufficient due to the limited sample size and the rather poor resolution of the tree. Also, as we found an individual with clade C mtDNA and clade nA nuclear DNA from the Fuji River system in the boundary region between Groups A and C, a hybrid zone may exist. Further investigations on hybridization and taxonomic re-examinations, including the continental congeners, are needed.

The divergence time between Groups B and C, estimated using the lax constraint (3–20 Ma) for this divergence, emphasized the importance of the uplifting of the Central Highlands, central Honshu, in establishing their distribution. The two groups represented distinct distributions separated by the highlands and diverged from each other around 4.9 Ma (3.0–6.9 Ma, 95 % HPD). The estimated average substitution rate of the cyt b region in this study, which was 0.90 %/Myr (0.60–1.24 %/Myr) per lineage, was similar to the published rates for Cyprinidae (e.g., 0.76 %, Zardoya and Doadrio 1999; 0.65 %, Dowling et al. 2002). Considering the validity of the rate, the estimated divergence time favors the hypothesis that vicariance of northeastern/southwestern P. esocinus occurred by uplifting of the Central Highlands in the Late Miocene to the Early Pliocene (Machida et al. 2006), and does not support the hypothesis that vicariance occurred by formation of the eastern/western landmasses dating back to the Early Miocene (Setiamarga et al. 2009). Some previous studies on freshwater fishes based on the traditional molecular clock approach also support the former hypothesis (Mihara et al. 2005; Watanabe et al. 2006). The latter hypothesis was proposed for two medaka regional groups whose distribution may not be equivalent to true northeastern/southwestern vicariance, because its northern group is restricted on the Sea of Japan side (Takehana et al. 2003). Additionally, the dating by Setiamarga et al. (2009) might require re-examination, because they used fossil records dated to the Palaeogene or older (mostly Mesozoic and Palaeozoic eras) for inferring differentiation in the Neogene. In any case, the common ancestor of Groups B and C of P. esocinus would have already occurred in the western and eastern areas of the Japanese archipelago in the Early Pliocene, when the Central Highlands were not yet a critical dispersal barrier of freshwater organisms (Watanabe et al. 2006).

Similar to some previous phylogeographic studies comparing Japanese and continental fish populations, the results of this study provide a view different from traditional hypotheses that espouse relatively recent origins of the Japanese lineages by dispersal through ancient river systems that appeared during regressive periods, mainly in the Pleistocene (e.g., Aoyagi 1957; Lindberg 1972; Nishimura 1980; Mizuno 1987). Distributed in southwestern Japan, the mtDNA clade A of Group A formed a sister group to clade K (Korean P. esocinus) + Y1 (P. vaillanti from north-eastern China), with an estimated divergence time of 3.6–7.8 Myr in 95 % HPD (Pliocene–Late Miocene). Recent phylogeographic studies have also suggested that some Japanese freshwater animals were isolated from the closest continental counterparts since the Pliocene or the Miocene (e.g., Coreoperca kawamebari, see Shirai et al. 2003; spiny loach, Kitagawa et al. 2005; clawed salamander, Yoshikawa et al. 2008). Group A’s limited distribution in southwestern Japan, however, suggests that its range expansion occurred after the uplifting of the Central Highlands about 3–6 Ma (Machida et al. 2006). This hypothesis is supported by the estimated time of the most recent common ancestor of Clade A (1.4–3.2 Myr).

In conclusion, our results suggest that Group A split from the continental lineage in the Early Pliocene or Late Miocene, but has spread over western Japan since the Late Pliocene or later as a newcomer; this contrasts with Groups B+C, the older colonizers, which had already expanded widely in Japan in the Early Pliocene (Fig. 8). Another widely distributed freshwater fish, Cobitis biwae species complex, shows a similar pattern in which old lineages have remained in eastern Japan and a part of western Japan, and a newcomer lineage has a wide range in western Japan (Kitagawa et al. 2003). These examples exhibit multi-layered formation processes of the Japanese freshwater fish fauna related to complex geographic history between the Japanese archipelago and the continental area.

Fig. 8
figure 8

Schematic figures of the formation of geographic distributions of the three groups of Pseudogobio esocinus in Japan, as inferred by the present phylogenetic and phylogeographic analyses. a A common ancestor of Groups B and C was distributed in the Japanese archipelago in the Late Miocene. b Groups B and C were isolated by uplifting of the Central Highlands in the Early Pliocene, and the ancestor of Group A started to differentiate from the continental population. c Group A widely expanded in southwestern Japan until the Early Pleistocene, but could not expand beyond the Central Highlands. d Present distribution area

Phylogeographic history of regional population groups. Groups A and B showed a similar geographic distribution in southwestern Japan and shared major phylogeographic patterns, suggesting a common biogeographic history. Their distribution ranges are presumably influenced by the Second Setouchi Palaeo-Lake/River systems, which connected a wide area in western Japan from the Pliocene to early–middle Pleistocene (Ota et al. 2004). The regional population groups of Groups A and B were separated by major mountain systems, with their estimated divergence occurring in the early Pleistocene (about 1.1–2.2 Ma). These isolations among the regional groups are likely related to the disappearance of the Second Setouchi Palaeo-Lake/River systems and active orogenic movements in southwestern Japan since the early–middle Pleistocene (Ota et al. 2004). As inferred by the BSP and mismatch distribution analyses, most of the regional groups in southwestern Japan experienced population expansion in the middle Pleistocene or later, when the major mountain systems had already uplifted and divided the regions.

Similar distribution patterns to the regional population groups of Groups A and B of P. esocinus are also found in other freshwater fish species. For example, the Suzuka Mountains are important geographic barriers that influenced phylogeographic structures of other freshwater species (e.g., Rhinogobius flumineus, see Shimizu et al. 1993; Biwia zezera, see Watanabe et al. 2010) and may have contributed to the heterogeneity of freshwater fish fauna (Watanabe 2012). In contrast, the minnow Hemigrammocypris rasborella and the freshwater goby Odontobutis obscura, both widely distributed in western Japan, exhibit slightly different patterns from Groups A and B; they consist of two genetically differentiated regional groups in the Setouchi area, which might be associated with two palaeo-river systems that flowed eastward and westward, respectively, in that area (Sakai et al. 1998; Watanabe et al. 2014). No such differentiation was found in Groups A and B, and NCPA suggested westward gene flow for partial subclades of Group A in this area. This, however, should be statistically reexamined using genetic markers with higher resolution. The differences in the distribution and phylogeographic patterns of freshwater fish species may be related to their habitat, dispersal ability, and historical regional extinction.

Although differentiation among the regional population groups of Groups A and B primarily reflects major watershed divisions, some remarkable cases involving range expansion across the watersheds were also detected. For example, some Group A populations genetically belonging to regional groups on the Pacific Ocean side inhabited the river systems flowing into the Sea of Japan beyond the watersheds (e.g., the Gono and Yura rivers; Fig. 3). As paleogeographic evidence for river capture exists for these river systems (Gono River, Inami 1951; Yura River, Okada and Takahashi 1969), the present phylogeographic patterns are probably attributable to such an event. Similar range expansions beyond the watersheds are supported by distributions of fish species and karyotypic races (Kimizuka 1987), as well as fishes in other phylogeographic studies (e.g., catfish, Watanabe and Nishida 2003; bitterling, Kitazima et al. 2015; atyid shrimp, Fujita et al. 2011).

Phylogeographic patterns of Group C, widely distributed in northeastern Japan, remarkably differ from those of Groups A and B in southwestern Japan. Subclades C-2 (JOE) and C-3 (FUJ) occurred in narrow areas in the western edges of Group C’s range (Fig. 3) and were inferred to be differentiated from other subclades in the early Pleistocene (Fig. 4; nodes 15 and 16). In contrast, subclade C-1 (WEH) was distributed in almost all areas of eastern Honshu, excluding the range of subclade C-3 (FUJ), without remarkable regional differentiation. NCPA and demographic analyses indicated that this subclade rapidly expanded from the Yamagata area (the Sea of Japan side of northern Honshu; Fig. 7) since about 0.3 Ma (Figs. 6, 7; ESM Table S3). The Yamagata area is known as a biogeographically unique area where several freshwater fish species are found as relics (e.g., nine-spined stickleback Pungitius sp., Takahashi et al. 2001; Cottus nozawae, see Yokoyama and Goto 2002). The phylogeographic patterns of Group C of P. esocinus, as well as the presence of these relics, imply that the Yamagata area has maintained stable freshwater environments and taken the role of an important refugium for freshwater fishes since the middle Pleistocene.

The contrasting phylogeographic patterns between northeastern and southwestern Japan probably reflect their topographic and geohistorical differences. The wide distribution range and rapid expansion of subclade C-1 (WEH) suggest that the Ou Mountains, the backbone mountain range of northeastern Japan, did not function as a critical migration barrier for P. esocinus. The Ou Mountains run north to south throughout eastern Honshu and have been actively uplifted since 3 Ma (Nakajima et al. 2006; Fig. 3). All large river systems in eastern Honshu flow out from the Ou Mountains, and the river systems flowing to the Sea of Japan and the Pacific Ocean are often divided by a low elevation watershed with an accompanying basin (Kaizuka and Chinzei 1986). These geographic conditions and tectonic movements might have enabled some freshwater fishes to expand their ranges in northeastern Japan even after the middle Pleistocene. Similar contrasting patterns between northwestern and southeastern Japan are exhibited by the freshwater lamprey Lethenteron reissneri species complex (Yamazaki et al. 2003). However, the cause of a presumable range reduction and vicariance preceding the expansion of subclade C-1 is unclear. Also, contrasting with the above cases, some freshwater animals show genetic differentiation between regional populations divided by the Ou Mountains (e.g., loach Lefua echigonia, see Mihara et al. 2005; giant water bug Appasus major, see Suzuki et al. 2014). Further comparative phylogeographic studies are necessary to elucidate which geographic and geologic events, such as orogenic/volcanic activities and sea-level fluctuations, have influenced the formation of freshwater fauna and phylogeographic structure in northeastern Japan.