Introduction

Genetic variations in various kinds of populations have been evaluated at worldwide, country, and regional levels using molecular markers. The cultivated rice, Oryza sativa L., has two major subspecies: indica and japonica. Large-scale comparisons of genome sequences clearly elucidated the domestication process of indica and japonica from their wild ancestor O. rufipogon (Huang et al. 2012). The genomes of 12 Oryza species, including large DNA expansions, contractions, inversions, and translocations, have been globally compared (Ammiraju et al. 2006, 2008; Kim et al. 2008). The origin of cultivated rice is known to be tropical regions. Rice now grows in an extremely wide range of climatic conditions, from 53°N to 40°S latitude. Therefore, the process of establishing local populations and their origins remain unclear.

Next-generation sequencing technologies can directly detect genome-wide DNA polymorphisms in order to identify variety-specific mutations. Therefore, the genes that establish and shape adaptability for local regions can now be identified. The availability of the complete genome sequence of the rice variety NIPPONBARE has enabled the discovery of genome-wide DNA polymorphisms in the germplasm, varieties, and breeding lines of interest (IRGSP 2005). Varietal differences as DNA variations are an important genetic resource in the gene pools of local populations with adaptability to local environmental conditions. Many japonica varieties in Japan have been used in re-sequencing to detect DNA polymorphisms (Nagasaki et al. 2010; Yamamoto et al. 2010; Arai-Kichise et al. 2011, 2014). However, it currently remains unclear whether these polymorphisms contributed to the diversification of populations and novel phenotypes of adaptability.

Accessions from marginal populations were previously shown to cause an increase in deleterious non-synonymous SNPs (Alonso-Blanco et al. 1999; Schmid et al. 2003; Nordborg et al. 2005; Günther and Schmid 2010), which may be involved in local adaptation to specific environmental conditions at the edge of the species range. To enable the discovery of local adaptability based on sequence diversity, we focused on a local rice population from Hokkaido, which is the northernmost region of Japan and one of the northern limits of rice cultivation in the world. We previously identified the genes responsible for extremely early heading date behavior for adaptation to such specific environmental conditions (Fujino and Sekiguchi 2005a, b, 2008; Nonoue et al. 2008; Fujino et al. 2013). We also characterized the genetic population structure of the local population established during the 100-year-history of rice breeding programs (Shinada et al. 2014). Our findings revealed that the local population had been divided into six groups according to the objectives of the rice breeding programs in each generation. Group I consisted of landraces and their pedigrees established before 1941. Group II were the pedigree of group I established between 1919 and 1953. Group IIIa were established between 1935 and 1977. Group IIIb established between 1962 and 1989. Group IV were established between 1940 and 1990. Group V were established after 1983.

KITAAKE, one of the rice varieties from Hokkaido, is known to possess unique features such as an extremely short life cycle and high transformation efficiency (Kim et al. 2013). It belongs group V in the local population and has played an important role in the generation of the new genetic group, group V, among the local population (Shinada et al. 2014). In the present study, we conducted re-sequencing of the KITAAKE genome using next-generation sequence technology. Insertions and deletions (indels) found in the KITAAKE genome were traced in different populations, wild rice with the A-genome, cultivated rice over the world, and Japanese local populations. The results of the present study indicated that the rapid accumulation of pre-existing mutations played major roles in establishing and shaping adaptability for local regions in current rice breeding programs.

Materials and methods

Plant material

The rice japonica variety KITAAKE was used for re-sequencing. Three populations were used to survey the distribution of the indel polymorphisms found in the KITAAKE genome. One was the Hokkaido Rice Core Panel (HRCP), which included 63 landraces and breeding lines that represented genetic diversity among the local population (Shinada et al. 2014), with the rice japonica variety NIPPONBARE as a reference. A set of 40 genetically diverse landraces from Japan, termed the Japanese rice core collection (JRC), was used and selected to represent the wide genetic diversity among landraces from Japan (Ebana et al. 2008). JRC is the ancestor population of HRCP. The other was a set of genetically diverse landraces, termed the world rice core collection (WRC), which was collected from 19 different countries and selected to represent the wide genetic diversity among cultivated rice varieties (Kojima et al. 2005). Of these, 57 varieties were used for data analysis. Forty-eight accessions from Oryza species with the A-genome were also used; 18 O. rufipogon, eight O. barthii, nine O. glumaepatula, seven O. longistaminata, and six O. meridionalis. These wild rice accessions were derived from Ranks 1 (highly admirable 44 representative accessions from 18 species) and 2 (a recommendation collection of 65 accessions from all species) of the wild core collection at the National Institute of Genetics, Japan (http://www.shigen.nig.ac.jp/rice/oryzabase/locale/change?lang=en). Seeds were provided by the Local Independent Administrative Agency Hokkaido Research Organization, National Agricultural Research Organization, and National Institute of Agrobiological Sciences, Japan. The DNAs of Oryza species were provided by the National Institute of Genetics, Japan.

Re-sequencing

Genomic DNA extracted from <20 seedlings of KITAAKE was used for pair-end sequencing by Illumina Hiseq 2000. Raw sequence data were deposited in the DDBJ BioProject database under the accession number PRJDB2868. To ensure their quality, raw data were modified by the following 2 steps: adapter sequences were deleted, and reads containing low-quality bases (quality value ≤5) were removed. The trimmed reads were aligned onto the reference genome Os-NIPPONBARE-Reference-IRGSP-1.0 using SOAP2 (Li et al. 2009).

After alignment on the reference genome, >27 bp indels were predicted using BrakeDancer (Chen et al. 2009). Polymorphisms containing <5 sequence depth and GT (genotype) = heterozygous were considered to be low quality and were subsequently removed. SnpEff with the default setting (Cingolani et al. 2012) annotated these polymorphisms based on their genomic locations, such as an intron, untranslated region (5′ or 3′ UTR), upstream (1000 bp), downstream (500 bp), coding sequence (CDS), splice site, or intergenic regions on the NIPPONBARE reference genome in RAP-DB (http://rapdb.dna.affrc.go.jp/). Because BrakeDancer predicts structural variants (SVs) on the region, not the position, a high impact was determined based on whether the SV region overlapped with the exon in the genes.

DNA analysis

Total DNA was isolated from young leaves using the CTAB method (Murray and Thompson 1980). To validate indel polymorphisms between NIPPONBARE and KITAAKE, 85 PCR products were sequenced directly using cycle sequencing with BigDye terminators on a Prism 3700 automated sequencer (Applied Biosystems). A total of 52 indel markers over the whole genome were used to detect indel polymorphisms among the populations (Table S1). PCR, electrophoresis, and sequencing were performed as described previously (Fujino et al. 2004, 2005).

Data analysis

The genetic distance matrixes based on the indel markers were calculated using Nei’s index (Nei and Li 1979) by Phyltools ver 1.32 (Buntjer 1997). The unweighted pair group method with neighbor joining (NJ) clustering was performed using the PHYLIP 3.66 package (Felsenstein 1993), and dendrograms were drawn using NJplot v2.3 (Perrière and Gouy 1996). The bootstrap probability for each cluster was calculated using the resampling data (n = 1000) by Phyltools and PHYLIP software.

Results

Detection of indel polymorphisms

A total of 14.28 Gb of high-quality data comprising 158.73 M short reads were obtained with an average read length of 90 bp. A total of 153.41 M reads were successfully mapped on the NIPPONBARE genome. Of these, 130.92 M reads (85.3 %) were uniquely mapped to the NIPPONBARE genome, resulting in an effective sequencing depth of 37.0-fold. After filtering, a total of 4440 SVs including 1520 deletions and 2920 insertions were identified as high-quality polymorphisms in the KITAAKE genome relative to the NIPPONBARE genome. These were distributed over the genome with an uneven distribution along chromosome (Figs. S1, S2). The average number of deletions was 126.7 per chromosome from 45 on chromosome 5–200 on chromosome 11, while the average number of insertions was 243.3 per chromosome from 170 on chromosome 9–357 on chromosome 1 (Fig. 1a). The size of indels varied (Fig. 1b, c).

Fig. 1
figure 1

Variations in indel polymorphisms between KITAAKE and NIPPONBARE. a Distribution of indels on chromosomes. Closed and open circles indicate deletions and insertions, respectively. b Frequency distribution in the size of deletions. c Frequency distribution in the size of insertions

A total of 70 indel polymorphisms larger than 100 bp, 21 insertions and 49 deletions, were targeted to detect polymorphisms and were validated by Sanger sequencing (Tables S2, S3). Based on sequence comparisons of these indel fragments in the NIPPONBARE and KITAAKE genomes, they were classified into seven categories based on the original mutation events (Table 1). In the deletion polymorphisms, non-TE fragments were deleted at 26 sites in the KITAAKE genome, while 22 TE-like fragments were inserted and one fragment was tandem duplicated in the NIPPONBARE genome. In the insertion polymorphisms, nine TE-like fragments were inserted, six fragments were tandem duplicated, and one non-TE fragment was inserted in the KITAAKE genome, while five non-TE fragments were deleted in the NIPPONBARE genome. Among the deletion polymorphisms in the KITAAKE genome, five caused the loss of the whole gene, eight caused the loss of the whole exon in the gene, and eight caused the deletion of part of the exon (Table S2). Among the insertion polymorphisms, three insertions occurred in the exon, five in the introns, and 13 in the 5′ upstream regions (Table S3). These may have abolished or altered the functions of the genes.

Table 1 Classification of indel polymorphisms based on original mutation events

Changes in allele frequencies during rice breeding programs

To identify the establishment of the rice variety KITAAKE, the distributions and frequency of the KITAAKE alleles at 45 loci with the indel polymorphisms were determined along six groups of HRCP (Fig. 2). The homozygous of the KITAAKE alleles was markedly increased during rice breeding programs in Hokkaido. The allele frequency of the KITAAKE alleles in each locus was 59.1–73.6 % in groups I–IV, while that of group V was higher at 90.9 % (Table 2). The KITAAKE alleles were not only preferred to KITAAKE as a variety, but also group V including KITAAKE. These changes were achieved by rice breeding programs. In group I, 17 loci were homozygous by the KITAAKE alleles. Similarly, 13–17 loci were homozygous in groups II–IV. Of these, seven loci were shared in these groups, but the remains were different loci. On the other hand, 30 loci were homozygous by the KITAAKE alleles in group V. A higher frequency of the KITAAKE alleles of 76.9–92.3 % was observed in the remains.

Fig. 2
figure 2

Changes in allele frequencies of the NIPPONBARE and KITAAKE alleles at each locus in genetic groups among HRCP. Open and closed circles indicate the NIPPONBARE and KITAAKE alleles, respectively. The locus is numbered 1–45 depending on the chromosomal locations listed in Table S4

Table 2 Distribution of the KITAAKE alleles in HRCP at 45 loci

The number of the KITAAKE alleles in each variety varied along group differentiations (Table 2). Varieties in groups I–IV have similar number of the KITAAKE alleles, from 26.2 loci in group II to 31.3 loci in group IIIb. Varieties in group V had higher than those of other groups, 40.3 loci, from 36 loci in NANATSUBOSHI to 45 loci in KITAAKE. In group I, the KITAAKE alleles were not observed at five loci (Fig. 2; Table S4). At these loci, the KITAAKE alleles were found during rice breeding programs; IN_19, DEL_53, and IN_12 in group II, DEL_15 and DEL_40 in group IIIa, and DEL_38 in group V.

Origins and distributions of the KITAAKE alleles

The origins and distributions of the KITAAKE alleles were surveyed in three different populations: JRC, WRC and wild rice. The KITAAKE alleles were distributed over all populations (Table 3). Among the 28 loci examined, the KITAAKE alleles were detected at 24 loci in wild rice. Five loci were shared in all A-genome species, while the others showed specific distributions (Table S5). Five loci showed an O. rufipogon-specific distribution, while three loci showed a non-O. rufipogon-specific distribution. The KITAAKE alleles were not detected at four loci in wild rice. Of these, the KITAAKE alleles were distributed at IN_07, DEL_13, and DEL_49 in WRC, but not at DEL_63 (Table S6). The KITAAKE allele at DEL_63 was detected in three varieties in JRC (Table S7). The number of the KITAAKE alleles carried in each accession varied from 7.0 loci in O. glumaepatula to 10.3 loci in O. rufipogon (Table 3). Two accessions of O. rufipogon, W1945 and W2003, carried the KITAAKE alleles at 14 loci, while a single accession of O. glumaepatula, W1183, carried the KITAAKE alleles at six loci (Table S5).

Table 3 Distribution of the KITAAKE alleles in different populations

In WRC, the KITAAKE loci were detected at all 28 loci examined, except for one locus, DEL_63 (Table 3). Four loci and one locus showed japonica and indica-specific distributions, respectively. Therefore, the KITAAKE alleles were distributed over the three and two groups at 17 and 5 loci, respectively. The average number of the KITAAKE alleles carried in each variety was similar: 14.4 loci in japonica, 12.8 in aus, and 11.5 in indica. Two japonica varieties, WRC50 and WRC53, carried the KITAAKE alleles at 18 loci, while a single indica variety, WRC05, carried the KITAAKE alleles at 8 loci (Table S6).

In JRC, the KITAAKE loci were detected at all 46 loci, except for two loci, IN_12 and DEL_15 (Table 3). The average number of the KITAAKE alleles carried in each variety was 21.4 loci, from 29 loci in JRC05 and JRC46 to 8 loci in JRC40 and JRC52 (Table S7).

A phylogenic analysis of these populations was performed using these indel polymorphisms. JRC was clearly classified into two major groups that corresponded to lowland and upland types (Fig. S3). WRC was clearly classified into two major groups that corresponded to japonica and indica/aus types (Fig. S4). Wild rice was clearly differentiated into five major clusters that corresponded to Oryza species (Fig. S5). In contrast, the clustering of HRCP did not correlate with any of the six major clusters based on SSR makers (Shinada et al. 2014) (Fig. S6).

Discussion

Sequence polymorphisms by mutations are involved not only in phenotypic differences, but also crop domestication, diversification, and plant breeding programs. The identification of variety-specific mutations may contribute to the shaping of adaptability during the establishment of local populations. Genomes in local populations are structured by artificial selection of the genotype × environmental conditions in recurrent cycles of hybridizations among local populations or with an exotic germplasm during plant breeding programs (Shinada et al. 2014). These processes may generate genetic variations that are desirable to local populations. Genome-wide DNA polymorphisms were identified between NIPPONBARE and KITAAKE in the present study (Figs. 1, S1, S2). The origins and distributions of the polymorphisms between NIPPONBARE and KITAAKE were traced using cultivated rice populations and wild rice populations (Tables 2, 3). The results indicated that alleles widely distributed throughout wild rice had accumulated in the local population from Hokkaido via cultivated rice over the world and Japanese landraces as the ancestral population of Hokkaido. These results strongly suggested that combinations of pre-existing mutations were related to the establishment of adaptability. This approach using the re-sequencing of local varieties in unique environmental conditions will be useful as a genetic resource in plant breeding programs in local regions.

The KITAAKE alleles were distributed in JRC, WRC, and wild rice (Table 3) and may be useful for the classification of each population coincident with another studies (Kojima et al. 2005; Zhu and Ge 2005; Ebana et al. 2008), indicating that these alleles found between NIPPONBARE and KITAAKE from Japan represent the genetic diversity of such populations with genetically wide diversity. Mutation events causing polymorphisms between NIPPONBARE and KITAAKE were already distributed over Oryza species, not only the ancestral wild rice O. rufipogon, but also relative wild rice with the A-genome (Table 3). Because growth areas of wild rice are geographically different, Asia, Africa, South America, and Australia, it was impossible to outcross between these accessions in different species. These suggested that these KITAAKE alleles might be generated in the ancestral population of wild rice with the A-genome. These might be widely dispersed among each species. Then, the KITAAKE alleles derived from O. rufipogon were accumulated among cultivated rice. The KITAAKE alleles were introduced into HRCP during rice breeding programs from the ancestral population, JRC (Fig. 2; Tables 2, 3). Combinations of these KITAAKE alleles may be desirable for the establishment of adaptability to marginal regions of rice areas.

It still remains unclear how these alleles accumulated into a local population in marginal regions. The allele frequency of the KITAAKE alleles were markedly changed between groups I–IV and group V among HRCP (Fig. 2). The results of this study indicate that intensive selections during rice breeding programs in local regions contributed to the shaping of adaptability for local regions (Fig. 3). The ancestral population of Hokkaido, JRC, had sufficient genetic diversity for the development of numerous varieties grown in various regions of Japan today. Rice breeding programs in local regions, mainly using hybridization among the ancestral population, then generated rice varieties, for example NIPPONBARE in Aichi, a central region of Japan, and KITAAKE in Hokkaido. They were clearly distinguished by variety-specific alleles derived from the ancestral population under selections for adaptability to local regions and human demands at that time. These variety-specific alleles were already involved in the ancestral population, but no variety carries all of them. Some were homozygous in the landrace type varieties, which may be associated with adaptability for the local environmental conditions, while some from those widely distributed over the ancestral population were continuously introduced as the parental varieties during rice breeding programs in local regions. Intensive selections focusing of the current breeding objectives with breeding theory rapidly resulted in homozygous for the desirable genotype.

Fig. 3
figure 3

Model of rice breeding programs in local regions. a NIPPONBARE was established in Aichi Prefecture, located in the middle area of Japan. KITAAKE was established in Hokkaido, the northernmost area of Japan. These were selected under different environmental conditions from the ancestral population. b Landrace type cultivars were selected from the ancestral population before Mendelian adoption. During systematic breeding programs, desirable/proper variations involved in the ancestral population have been continuously introduced into the local population and fixed among the local population to establish desirable phenotypes. Open and closed circles indicate the NIPPONBARE and KITAAKE alleles, respectively. Triangle and square indicate unidentified alleles other than the two alleles

In addition, the novel mutations that occurred during rice breeding programs in local regions played important roles in shaping of adaptability for local regions. We previously identified two genes under the selections of rice breeding programs in Japan: loss-of-function alleles caused by the 71-bp deletion in qLTG3-1 for low-temperature germinability and the 19-bp deletions in Hd5 for flowering time (Fujino et al. 2008, 2013; Fujino and Iwata 2011; Fujino and Sekiguchi 2011). These mutation events occurred in a genetic background specific to local regions. The resultant phenotype under such local environmental conditions may be useful in rice breeding programs in local regions.

Plant breeding programs generate intensive selection pressures that focus not only on the shaping of adaptability to local environmental conditions, but also on cultivation methods and market demands, and such programs have restricted genetic diversity among local populations (Dilday 1990; Fu et al. 2003; Le Clerc et al. 2005; Roussel et al. 2005; Yamamoto et al. 2010). The results of the present study suggested that the homozygous genotype found in current varieties among local populations was valuable for rice breeding programs. However, it was a monogenic feature as an ideotype for current breeding objectives. The introduction of genetic diversity will enhance new varieties by combining the adaptability established in rice breeding programs in local regions with novel traits from exotic germplasms.

Author contribution statement

KF conceived, designed the experiments, and wrote the manuscript. OM and KF performed the experiments. OM, KT, IT and KF analyzed the data. OM, KT, IT and KF approved the final manuscript.