Introduction

Domesticated plants have expanded their ranges from their points of origin during a history of continuous selection (Tenaillon et al. 2004; Hyten et al. 2006; Haudry et al. 2007). Genetic population studies have revealed that adaptability is a major factor that generates genetic groups within a crop species (Han et al. 2016; Li et al. 2020; Morales-Hojas et al. 2020; Sansaloni et al. 2020). During continuous selection for adaptability to local environmental conditions, genetic diversity should be shaped to crop production. However, the genetic mechanisms that drove crop adaptability conferring the diversification of genetical population structure are unknown.

Asian cultivated rice, Oryza sativa L., was domesticated around 10,000 years ago (Fuller 2011; Huang et al. 2012; Choi et al. 2017) and later spread around the world from 53°N to 40°S latitude (Lu and Chang 1980; Agrama et al. 2010). The control of heading date is key to its adaptation to specific ecological conditions and environments (Zhao et al. 2011; Guo et al. 2020; Hu et al. 2019; Fujino et al. 2019a; Fujino and Ikegaya 2020). Artificial selection for natural variations may have optimized heading date for better agricultural fitness.

The genetic basis of heading date in rice varieties from Hokkaido (41° 02′–45° 03′ N latitude), the northernmost region of Japan at the northern limit of rice cultivation, is well understood (Fujino ans Sewkiguchi 2005a, b; Nonoue et al. 2008; Shibaya et al. 2011; Fujino et al. 2013, 2019a, b, c). Rice cultivation in Hokkaido began only 150 years ago (Fujino et al. 2019c). Mutations in both Grain number, plant height and heading date 7 (Ghd7) and O. sativa Pseudo-Response Regulator 37 (OsPRR37), notated as ghd7osprr37 and called EARLY DUO, may underlie the extremely early heading date (Fujino et al. 2019a, b). Ghd7 encodes a CCT (CO, CO-LIKE, and TIMING OF CAB1) domain protein (Xue et al. 2008). OsPRR37 is an ortholog of the circadian clock genes PRR3/7 in Arabidopsis (Nakamichi et al. 2005; Murakami et al. 2007; Koo et al. 2013; Gao et al. 2014). EARLY DUO combines two loss-of-function alleles. In Ghd7-0a, a single nucleotide substitution, G → T, causes a premature stop codon (Xue et al. 2008). In OsPRR37, a single nucleotide substitution, T →  C, causes an amino acid substitution (Murakami et al. 2003; Koo et al. 2013; Gao et al. 2014).

Furthermore, EARLY DUO switches the effect of Hd1, which is a major gene in the control of rice heading date, from delay to promotion of heading under naturally long-day conditions in the field (Yano et al. 2000; Fujino et al. 2019a). Ghd7 and OsPRR37 have roles in the adaptability to cultivation at extremes of latitude (Li et al. 2015; Zhang et al. 2015, 2019; Fujino et al. 2019b; Fujino and Yamanouchi 2020). In addition, Ghd7-2tp is identified, which has an insertion of a transposon-like sequence (Fujino and Yamanouchi 2020). This allele has weak genetic effect on heading date and can also switch the effect of Hd1 (Fujino and Yamanouchi 2020).

Extremely early heading in rice underlies adaptability to higher latitudes with longer daylength during rice growth. Here, we demonstrated selection for extremely early heading date during the expansion of the rice growth range in Japan. First, we traced mutations in Ghd7 and OsPRR37 in varieties in northern Japan. Then, we elucidated the genetic population structure in the varieties. Finally, we propose a model of the establishment of varieties with extremely early heading date.

Materials and methods

Plant materials and growth conditions

We used eight rice populations (Table 1). We grew 59 varieties from the Hokkaido Rice Core Panel (HRCP), which represents genetic diversity among the gene pool of varieties bred in Hokkaido during the last 100 years (Shinada et al. 2014; Fujino et al. 2015, 2017). Varieties in HRCP head extremely early (Fujino et al. 2019b). We grew 48 varieties from the Japanese Rice Core Collection (JRC), which represents genetic diversity among the ancestral gene pool of varieties bred in Japan (Ebana et al. 2008). The genetic population structure and genetic diversity among the JRC have been well characterized by whole-genome sequencing (Tanaka et al. 2021). We developed a population of ancestral varieties of Hokkaido (AnH), consisting of 320 varieties collected in the Tohoku region of Japan in Genebank. And we grew landraces from Hokkaido (HL), landraces from Tohoku (Lthk), breeding lines from Tohoku (Bthk), landraces from Hokuriku (HKR), and varieties from the initial phase of rice breeding in Japan (VIB) (Fujino et al. 2019b; Fujino and Yamanouchi 2020).

Table 1 List of populations used in this study

Seeds of rice varieties were provided by the Genebank of NARO (Tsukuba, Japan) and the Local Independent Administrative Agency, Hokkaido Research Organization, Hokkaido Central Agricultural Experiment Station (Takikawa, Japan).

DNA analysis

For DNA isolation, seeds obtained from Genebank were sown. Total DNA was isolated from young leaves by the CTAB method (Murray and Thompson 1980). PCR, electrophoresis, and detection of the products were performed as described by Fujino et al. (2004, 2005). The genotypes of Ghd7 and OsPRR37 were determined by PCR and CAPS (Cleaved Amplified Polymorphic Sequence) according to Fujino et al. (2019b) and Fujino and Yamanouchi (2020) (Fig. 1). Primers for haplotyping Ghd7 and OsPRR37 were developed by using the “myINDEL” procedure (Table S1) (Fujino et al. 2018). In addition, seven SSR markers were used (IRGSP 2005).

Fig. 1
figure 1

Schematic representations. a The chromosome 7. The genes focused on this study, Ghd7 and OsPRR37, are located on 9.152 and 29.616 Mb, respectively, in IRGSP 1.0. b genes for early heading date. top; loss-of-function in Ghd7, middle; transposon (triangle) inserted allele, bottom; loss-of-function in OsPRR37

Genotyping by ddRAD-Seq and data analysis

Genome-wide SNP genotyping was performed with double-digest restriction-site-associated DNA (ddRAD-Seq) analysis (Peterson et al. 2012; Shirasawa et al. 2016). ddRAD-Seq reads were aligned to the reference genome (Os-Nipponbare-Reference-IRGSP-1.0) in BWA-MEM v. 0.7.17 software (Li and Durbin 2009). Variant calling was performed in GATK HaplotypeCaller v. 4.1.4.1 software (van der Auwera et al. 2013). Biallelic SNPs were selected and those with minor allele frequencies of < 5% or missing rates of ≥ 20% were removed in vcftools v. 0.1.16 software (Danecek et al. 2011). Missing genotypes were imputed in Beagle v. 5.1 software (Browning et al. 2018).

SNPs were pruned in PLINK v. 1.9 software (Chang et al. 2015). The population structure was inferred in Admixture v. 1.3.0 software (Alexander et al. 2009). Hierarchical clustering of populations was performed by using the R function “hclust” and the pruned marker genotypes.

Sequence data from this study have been deposited in EMBL/GenBank under accession number DRA0011916. Raw sequence data, which was deposited to DDBJ Sequence Read Archive (DRA), are listed in Table S2.

Results

Haplotypes around Ghd7 and OsPRR37 in HRCP

In HRCP, only two haplotypes, Hap HI and Hap HII, around Ghd7 were identified (Fig. 2, Table S3). They were defined by using 19 marker loci and were specific to each allele of Ghd7. Hap HI corresponded to the loss-of-function allele Ghd7-0a (Fig. 2, Table S3), in which a 1 831 066-bp region between DNA markers Ghd7_06 and Ghd7_18 was conserved in 49 varieties. Hap HII corresponded to a 2 104 335-bp region between DNA markers RM21323 and Ghd7_21, conserved in 10 varieties and corresponded to Ghd7-2tp (Fig. 2, Table S3).

Fig. 2
figure 2

Haplotypes around Ghd7. a Markers on IRGSP 1.0. Boxes: black, Ghd7; gray, flanking markers; white, genotype markers. b Upper row, Hap HI (Ghd7-0a); lower row, Hap HII (Ghd7-2tp). c Upper row, haplotype without the FNP in Ghd7; lower row, haplotype without transposon-like insertion in Ghd7 in JRC. DEL = deletion; INS = insertion; NP = the Nipponbare allele; mt = mutation; WT = wild type; A, B, C = different size in the amplifies

Haplotypes around OsPRR37 were identified by using the genotypes of 27 marker loci in HRCP (Fig. 3, Table S4). Both wild type (WT) and mutant allele of OsPRR37 were detected in HRCP, in five and 54 varieties, respectively (Table S4). Hap Ha, corresponding to the loss-of-function allele osprr37, was conserved in a 128 152-bp region between DNA marker RM22170 and OsPRR37 itself (Fig. 3, Table S4).

Fig. 3
figure 3

Haplotypes around OsPRR37. a Markers on IRGSP 1.0. Boxes: black, OsPRR37; gray, flanking markers; white, genotype markers. b Upper row, haplotype with the FNP for osprr37 (mt); lower row, haplotype without the FNP (WT), in HRCP. c Upper row, haplotype with the FNP (osprr37); middle row, recombinant haplotype; lower row, haplotype without the FNP (WT) in JRC. DEL = deletion; INS = insertion; NP = the Nipponbare allele; mt = mutation; WT = wild type; A, B = different size in the amplifies

Haplotypes around Ghd7 and OsPRR37 in JRC

Among the 48 varieties of JRC, haplotypes around Ghd7 were determined by using 17 markers (Fig. 2, Table S5). There were 23 haplotypes, J1–J23, in the 3 232 714-bp region between Ghd7_02 and Ghd7_31 (Table S5). Only two Hokkaido varieties—Akage (JRC17) and Fukoku (JRC46) —carried haplotypes Hap HI and HII, respectively. Four varieties had Hap HI in the 306-kb region between markers RM21331 and Ghd7_14 without a functional nucleotide polymorphism (FNP) for the loss-of-function allele of Ghd7-0a (Table S5). Eight varieties had Hap HII in the 1137-kb region between markers Ghd7_10 and Ghd7_18 without the insertion in Ghd7-2tp (Table S5).

There were six 6 OsPRR37 haplotypes, J1–J6, in the 128 152-bp region between RM22170 and OsPRR37 itself (Table S5). Only Akage (JRC17) and Fukoku (JRC46) carried the loss-of-function allele (Table S5). Nine varieties had the Hap Ha haplotype without the FNP for OsPRR37.

Distributions of Ghd7 and OsPRR37 alleles

To elucidate the distributions of the alleles, we genotyped FNPs in Ghd7 and OsPRR37 in 220 varieties in four populations—152 in Lthk, 35 in Bthk, 12 in HKR, and 21 in VIB in addition to HRCP, HL, and JRC (Tables 2, S4–S10). A single variety Tamagawawase in Lthk carried a loss-of-function allele only in Ghd7 (ghd7 OsPRR37), while 13 varieties in Lthk, two in Bthk, and two in HKR carried a loss-of-function allele only in OsPRR37 (Ghd7 osprr37) (Tables S6–S10). Six varieties in Lthk and one in Bthk carried loss-of-function alleles in both genes (ghd7 osprr37) (Tables S6, S7); these were collected from Aomori and Akita prefectures. The distribution of Ghd7-2tp was detected in 12 varieties in Lthk, three in Bthk, and one in HKR (Tables S6–S8).

Table 2 Distributions of genotype of Ghd7 and OsPRR37 in different populations

Haplotypes around Ghd7 and OsPRR37 between populations

Six haplotypes around Ghd7 were identified by using three markers, Ghd7_12, Ghd7 itself, and Ghd7_28 (Table S11). We found two mutant haplotypes of G7af (which corresponds to Hap HI) in 49 HRCP varieties and G7bf (Hap HII) in nine HRCP varieties. We found G7af also in seven of 152 Lthk varieties and G7bf also in 13 Lthk varieties. We found G7aF (WT) in 15 Lthk varieties and G7bF (WT) in 110 Lthk varieties. G7bF might be ancestral of G7af. G7aFf carries the loss-of-function ghd7, whereas G7aF carries the gene without the FNP. G7bF was widely distributed over all populations.

Ten haplotypes around OsPRR37 were identified by using three markers, RM22170, RM22175, and OsPRR37 itself (Table S12). We found mutant haplotype R37af (which corresponds to Hap Ha) in 54 HRCP and 20 Lthk varieties. We found R37aF (WT) in 57 Lthk varieties. R37aF might be ancestral of R37af. R37af carries the loss-of-function osprr37, whereas R37aF carries the gene without the FNP. R37aF was widely distributed over all populations and was predominant in two populations, Lthk and Bthk.

Genetic population structure of varieties in northern Japan

Next, we performed ddRAD-Seq on the varieties from the Tohoku region. ddRAD-Seq analysis sequenced a total of 585 million reads (59 Gb) from 378 accessions. The mean was 1.5 million reads (156 Mb) per variety. After filtering, 5938 SNPs among 301 varieties (53 in HL and 248 in AnH) were used for further analysis. After variant pruning, 2067 SNPs were used for clustering, Admixture, and principal component analyses (Fig. S1).

The dendrogram clearly shows three clusters, N1, N2, and N3 (Fig. 4, Tables S13, S14). The clusters corresponded well with the three populations obtained in the Admixture analysis using K = 3. In the AnH population, 61/103 varieties, 59.2%, were in N3, and only 10 varieties, 9.7%, were in N1 (Table S13). All varieties in HL varieties were grouped together in principal component analysis (Fig. 5), in which the first, second, and third principal components explained 18.2%, 12.5%, and 8.7%, respectively, of the total variation.

Fig. 4
figure 4

Classification of rice varieties from HL and AnH populations with a dendrogram and population structures by Admixture software

Fig. 5
figure 5

Principal component analysis using genotypes from ddRAD-Seq analysis. Symbols: ● AnH; + and HL populations

Distributions of Ghd7 and OsPRR37 genotypes among the 103 AnH varieties showed a clear association with the population structure (Table 3). In cluster N3, mutations in Ghd7 and OsPRR37 were identified: one variety with ghd7 OsPRR37, six with Ghd7-2tp OsPRR37, and five with Ghd7 osprr37 (Table 3). Only one variety in N3 had both mutations, ghd7 osprr37. Varieties were tended to be differentiated toward the north (Fig. 6, Table S13). All 44 HL varieties were classified in N1 (Table S14). The double mutation ghd7 osprr37 was found in 29 HL varieties (Tables S13, S15).

Table 3 Distributions of the genotype in Ghd7 and OsPRR37 genotypes in the AnH population
Fig. 6
figure 6

Distributions of the clusters in these prefectures. Black, N1; gray, N2; and white, chart in the circle indicate the clusters N1, N2, and N3, respectively

The double mutation ghd7 osprr37 was found in two AnH varieties in N1 and in one AnH variety in N3 (Table S16). One mutation in Ghd7 or OsPRR37 was found in 22 AnH varieties (Table S16).

Discussion

As crops around the world have been continuously selected for adaptability to local environmental conditions, the underlying genes could limit their genetic potential. The process of selection for adaptability could reveal novel phenotypes among local populations. Here, we elucidated the genetic basis of the adaptability of rice to Hokkaido, at the northern limit of rice cultivation, due to EARLY DUO (ghd7 osprr37). We traced the origins of the mutations in Ghd7 and OsPRR37 and characterized the genetic population structure among ancestral varieties in northern Japan.

The mutations in Ghd7 and OsPRR37 had distinct distributions (Table 2). The ancestral population of varieties with extremely early heading date, AnH, was divided into three clusters (Fig. 4). All HL varieties were classified into cluster N1. Three mutant genotypes (ghd7 OsPRR37, Ghd7 osprr37, and ghd7 osprr37) were present in all clusters with different frequencies (Table 2). Furthermore, they were geographically distributed (Fig. 6). The results suggest that spontaneous mutations in Ghd7 and OsPRR37 occurred independently and were later combined as ghd7osprr37.

Ghd7 and OsPRR37 have major roles in heading date and local adaptability in Heilongjiang, China, also (Yamamoto et al. 2000; Lin et al. 2003; Shibaya et al. 2011; Li et al. 2015; Fujino et al. 2019b; Zhenhua et al. 2021). Novel alleles for heading date might have driven evolutionary and selection forces for the expansion of rice cultivation around the world. We propose a model of the establishment of ghd7 osprr37, similar to demographic scenarios in weedy rice varieties (Fig. 7) (Sun et al. 2019). The distinct distributions of the mutations in Ghd7 and OsPRR37 reveal the selection history of earlier heading date in Japan in cluster N3. The underlying mutations might have occurred locally in the ancestral varieties of cluster N3 and split off cluster N1. These genetic events could have split varieties with the mutations off as cluster N1. The EARLY DUO phenotype thus underlays the spread of rice cultivation into Hokkaido, and extremely early heading date contributed to the differentiation of the local population structure.

Fig. 7
figure 7

Proposed model of the establishment of the local population in Hokkaido. a Differentiation of clusters N1 from N3 (comprising varieties from Tohoku and Hokuriku). Selection of ghd7 osprr37 for extremely early heading date split off cluster N1. The mutations in two genes might be combined into the double mutation as EARLY DUO. Varieties with EARLY DUO spread into Hokkaido. b History of artificial selection in Japan. Arrow indicates direction of selection. Checkered boxes show varieties bred from them; left, Honshu; right, Hokkaido

These genetic events are likely to have been completed within a short time span. Historical records show that the southern Tohoku society started around the year 700. Rice cultivation in Hokkaido started in the late 1800s (Fujino et al. 2019c). So the genetic events for extremely early heading date are likely to have occurred within the span.

Adaptability to local environmental conditions is associated with rice yield (Huang et al. 2012; Fujino and Ikegaya 2020; Fujino 2020). Earlier heading date seems to be associated with a shorter vegetative phase, fewer seeds, and a shorter reproductive phase (Fujino et al. 2017, 2019b). We should break through these disadvantages in present breeding programs (Fujino et al. 2019b). Understanding of the genetic mechanisms for shaping adaptability would facilitate rice breeding.