Introduction

Perilla is one genus in the mint family, Lamiaceae, and is a traditional oil crop in Korea and a medicinal herb in Asia [1]. Perilla seeds are a good source of polyunsaturated fatty acids, including alpha-linolenic acid (ALA), linoleic acid and oleic acid. Perilla seed oil consists of 54–64% omega-3 (ALA) fatty acids, which is a higher proportion than found in other plant oils. Perilla oil also contains omega-6 (linoleic acid; ~ 14%) and omega-9 (oleic acid) fatty acids. These polyunsaturated fatty acids are beneficial to human health and contribute to the prevention of several diseases, including cardiovascular disorders, cancer, inflammation, and rheumatoid arthritis [2, 3].

Oil accumulation occurs at different stages of seed development across different species. For example, according to Msaada et al. [4], coriander seeds start to accumulate lipids from the 5th day after flowering, and the maximal oil content is observed when the seeds reach their maximum maturity stage. In contrast, Perilla frutescens seeds accumulate lipids very slowly during the first stages of maturity [5], and the maximal rate of lipid accumulation is observed between 15 and 20 days after flowering. Stability in lipid accumulation is reported as a characteristic of the last phase of P. frutescens seed maturation and occurs between 20 and 31 days after flowering. A comparable trend in lipid accumulation has been observed in oil seed rape (Brassica napus): rapid lipid accumulation is observed between 25 and 35 days after flowering with slower lipid accumulation observed during the first and the last stages of maturity [6]. Understanding of flowering and maturity can be one of important factors to improve economical value of oilseed crop, perilla.

Perilla is one of the typical short-day (SD) plants [7]. Flowering time (FTi) and maturity are greatly influenced by day length in many plants [8, 9]. The molecular mechanisms of flowering and the genes involved in the pathway have been reported in Arabidopsis thaliana as a model plant [10]. Arabidopsis FLOWERING LOCUS T (FT) and one of its rice orthologues, HEADING DATE 3a (Hd3a), are key flowering integrators, which encode a florigen that is transported from leaves to shoots or lateral apical meristems through the phloem in a regulated manner and induce the development of floral meristems [11, 12]. The amount of FT transcript, which is directly induced by the transcriptional activator CONSTANTS (CO) protein, strongly influences the timing of flowering. The circadian clock and light signaling tightly control CO protein activity throughout the day in the companion cells of the leaf phloem [13]. Several studies on QTLs related to flowering and maturity also have been detected in soybean [14], rice [15], wheat [16], and other plants [17]. However, there are no reports on the flowering-related QTLs or related genes in perilla, a short-day plant.

The genetic map is required to perform QTL analysis, but there is no published data in perilla so far. Although, several RAPD [18] and SCAR [19] markers and around a hundred of SSR markers [1, 20,21,22,23,24] have been reported to classify different perilla species, but it is insufficient number to construct high-density genetic map of perilla. Additional molecular markers need to be developed to construct a high-density genetic map of perilla. SNPs are the most abundant class of polymorphisms in most genomes of living organisms, and are one of the most efficient markers to find candidate genes associated with QTLs [25]. Additionally, GBS is a useful and cost-effective tool for the resequencing of bulk samples, and can be applied to various areas of plant genetics and breeding, including SNP discovery, high-density genetic mapping, and QTL analysis especially for little studied plant [26,27,28,29,30,31,32,33].

In this study, therefore, we first constructed a high-density genetic map via GBS with 96 F2 plants from an interspecific crossed between P. citriodora and P. hirtella. We developed the F2 population using two different species for two purposes. First, we wanted to construct a high-density genetic map for anchoring the scaffolds of perilla genome. Therefore, selecting two species which can be able to cross interspecies and has a lot of polymorphic SNP between them. The second purpose was to analyze QTLs of agronomic traits. Using this linkage map, we then tried to identify QTLs related to flowering and seed maturity.

Materials and methods

Population mapping and phenotyping

A segregating population of 96 F2 plants derived from an interspecific cross between P. citriodora (P1) and P. hirtella (P2) was cultivated, grown, and evaluated for mapping at the National Institute of Crop Science, RDA (Miryang, Korea) [34]. Phenotypic data was evaluated for the following three traits: day to visible flower bud (DtoFB), day to flowering (DtoF), and day to maturity (DtoM). Days to visible flower bud (DtoFB) is days until bud showed up on shoot tip since the day of sowing, days to flowering (DtoF) is days from the day of sowing to when the first flower opens on shoot tip, and days to maturity (DtoM) is days from the day of sowing until the first seeds color turned into brown. The difference between parents was relatively small, but there was a greater variation in the F2 population. Pearson’s correlation coefficients and statistical analyses of measured traits were performed using Minitab 18 (State College, PA, USA) and SAS 9.3 (SAS Institute, Cary, NC, USA).

DNA extraction and GBS library preparation

Fresh young leaves were collected, and DNA was extracted through a modified cetyltrimethylammonium bromide (CTAB) method [35]. The DNA was quantified using a Thermo Scientific Nanodrop 8000 spectrophotometer (Fisher Scientific; Hampton, NH, USA). Genomic DNA from 96 F2 lines and parents was used to prepare the libraries for GBS. The libraries were prepared by restriction digestion of DNA for each of the F2 lines, followed by ligation with barcoded adapters. Ninety-six different barcode sequences were used to tag the samples [36]. The GBS libraries were constructed using the restriction enzymes PstI (CTGCAG) and MspI (CCGG) using a protocol modified from Elshire et al. [36]. The libraries were pooled and sequenced using Illumina TrueSeq Ver3.0 paired-end sequencing with 101-bp read lengths on the Illumina platform HiSeq 2000. Ninety-six samples were sequenced in one lane.

Sequencing and SNP genotyping

After sequencing, the raw reads were de-multiplexed according to the barcode sequences and trimmed using a Python script. This script split the raw Illumina FASTQ file into 96 separate FASTQ files based on the barcode sequences associated with each sample, and filtered out reads that contain any ambiguous bases in the barcodes [26]. The reads that contained only the common adapter also were trimmed using Cutadapt software [37]. The de-multiplexed reads were trimmed using the Solexa QA package v.1.13 [38]. It is common for the quality of the bases to decrease at the ends of Illumina reads; therefore, the ends were trimmed when the Phred quality score dropped below Q = 20 (or 0.05 probability of error). Additionally, all 5′ and 3′ stretches of ambiguous N nucleotides were trimmed. Poor quality sequence reads and reads shorter than 25 bases were discarded.

The Burrows–Wheeler Aligner program (BWA; 0.6.1–r104) [39] was applied to clean reads of the draft genome sequence of P. citriodora (unpublished data) to align the SNP calling. The mapped reads were extracted for further analyses from the resulting BAM file using SAMtools v.0.1.16 [40]. The high mapping quality ensured reliable (i.e., unique) mapping of the reads, which is important for variant calling. Using the SAM tools varFilter command, SNPs were called only for variable positions with a minimal mapping quality (i.e., -Q) of 30. The minimum and maximum of the read depths were set to 3 and 100, respectively. An in-house script that considered bi-allelic loci was used to select the significant sites in the called SNP positions [41].

Linkage map construction

A linkage map of perilla was constructed using IciMapping ver.4.1 [42] using 2518 filtered SNPs detected in F2 progenies derived from a cross between P. citriodora and P. hirtella. SNPs were grouped into ten linkage groups (LGs) with logarithm of the odds (LOD) threshold values ≥ 3. The ordering of the 2518 SNPs distributed over these 10 LGs was performed using the nnTwoOpt algorithm [43]. Rippling to fine tune the marker order was performed by the Sum of Adjacent Recombination Fractions (SARF) with a window size of 5 as the rippling criteria. The genetic distances of the SNP markers, which were based on the recombination rates, were converted using Kosambi’s mapping function [44]. The final linkage map was drawn using MapChart ver2.3 [45].

QTL analysis

QTL mapping of the three traits was performed based on the composite interval mapping (CIM) method with Windows QTL Cartographer ver2.5 [46, 47] using forward–backward stepwise regression. The LOD threshold for significance (P = 0.05) was determined via a 1000-permutation test. The software also estimated the percentage of phenotypic variance and additive effects as well as the dominance effect explained by each QTL for each trait. The gene action was determined from the calculated value of the dominance to additive effect ratio and analyzed as described by Stuber et al. [48].

Orthologous genes

To predict orthologous genes related to flowering time (FTi) in perilla, TBLASTN was performed using BLAST 2.2.82. Protein sequences of 20 FTi genes from A. thaliana and perilla draft genome sequences were used. The FTi genes were selected from Blumel et al. [49]. Among these 20 FTi genes, 15 genes were functional genes orthologous with genes from other crops or species and 10 genes were homologous with A. thaliana genes. The BLAST results were filtered using an e-value of 1e−10 and a query coverage ≥ 50% to predict ortholog genes. The position of ortholog genes in genetic map was determined using markers close to the physical location found.

Results

Evaluation of phenotypic data

There were significant differences in the phenotypes of DtoFB, DtoF, and DtoM within the F2 generation mapping population and parents. Descriptive statistics are summarized in Table 1. Normal frequency distributions were observed for all three traits (Fig. S1). DtoF was significantly correlated with the other two traits (Table 2). All three traits were positively correlated with each other. The highest degree of correlation was observed between DtoFB and DtoF (r = 0.81). P1 takes 5 more days to appear flower bud than P2.

Table 1 Descriptive statistics of three flowering-related traits in the F2 population of P. citriodora × P. hirtella
Table 2 Pearson’s correlation coefficients for three flowering-related traits in the F2 population of P. citriodora × P. hirtella

Genome-wide identification of SNPs using GBS

The genomic DNA of 96 F2 lines was digested with the restriction enzymes PstI and MspI. A 96-plex GBS library was constructed with the F2 lines. Sequencing was carried out in one lane on 51.9 Gbp (514,560,106 reads) with a HiSeq 2000 platform (Table S1). Following the application of filtering criteria, such as removal of the barcode and PstI and MspI overhang sequences, 503,215,708 (97.8%) of the sequencing reads were de-multiplexed. As a result, 459,068,228 (91.15%) high-quality trimmed reads had a Phred quality score ≥ 20 after removing ambiguous nucleotides (Table S2). The trimmed data was aligned to the draft genome sequence of P. citriodora. Our in-house GBS analysis pipeline [26, 50] was used for SNP mining from the sequence data [41]. A total of 91,132 raw SNPs was identified in 96 F2 lines (Table S2). These SNPs were filtered to identify putative markers using the criteria of 30% missing values across the genotyped individual and MAF ≥ 0.05, which yielded a total of 14,223 SNPs. Next, 9607 polymorphic SNPs were detected between P. citriodora and P. hirtella as the parents of the mapping population. After generating a bin map of the genotyping and SNP selections using 50 kb as the window size, 2518 SNP markers were finally identified.

Construction of a high-density linkage map

A linkage map comprised of 2518 markers on 10 LGs (Fig. 1, Table 3) was constructed from the GBS data. The total length of the linkage map was 1309.39 cM. LG01 (271.86 cM) was the largest LG and LG10 (45.8 cM) was the smallest LG. The number of markers per LG varied from 60 (LG10) to 661 (LG01), with an average of 251.8 markers per LG. The average marker density was 0.56 cM. LG06 was the densest LG (0.3 cM) and LG02 was the least dense LG (0.85 cM). A summary of the constructed linkage map is presented in Table 3 and illustrated in Fig. 1.

Fig. 1
figure 1

SNP distribution. The distribution of single-nucleotide polymorphism (SNP) markers using a linkage map of perilla constructed using the F2 population derived from the parental lines P. citriodora and P. hirtella. QTLs associated with the three traits (DtoFB, DtoF, DtoM) were shown on the left side of linkage map. Black box indicates QTLs for DtoFB, white box for DtoF, box with check pattern for DtoM. The location of the orthologue of the perilla that regulates the FTi is indicated by the underlined TAIR symbol

Table 3 Statistics of the linkage map constructed from the F2 population of P. citriodora × P. hirtella

QTL mapping

QTL mapping was performed for the three flowering-related traits on the integrated genotype and phenotype data using the CIM method. A total of six QTLs were identified for the three traits, which were distributed on LG01, LG02, and LG08 (Fig. 2). The LOD thresholds ranged from 6.49 (qDFB2) to 13.57 (qDFB1) in DtoFB, 3.98 (qDF8.2) to 7.37 (qDF2) in DtoF and 4.50 (qDM2) in DtoM. The identified QTLs have been summarized in Table 4.

Fig. 2
figure 2

Quantitative trait loci. QTL associated with three flowering-related traits [days to visible flower bud (DtoFB), days to flowering (DtoF), and days to maturity (DtoM)] in the F2 population derived from the parental lines P. citriodora and P. hirtella in the LGs identified via CIM. The LOD threshold is represented by the horizontal line. a Days to visible flower bud, b days to flowering, c days to maturity

Table 4 Summary of the QTL regions of the three flowering-related traits

DtoFB

Two QTLs were detected for DtoFB on LG01 and LG02 with LOD thresholds of 13.57 (qDFB1) and 6.49 (qDFB2), respectively (Fig. 2a). At all the loci, the alleles from parent P. citriodora (P1) favored DtoFB. The QTLs explained 44.07% (qDFB1) and 26.69% (qDFB2) of the phenotypic variance, respectively. This finding suggests that the P. citriodora allele may delay DtoFB.

DtoF

Three QTLs were detected for DtoF on LG02 and LG08 with LOD thresholds of 7.37 (qDF2), 4.20 (qDF8.1), and 3.98 (qDF8.2), respectively (Fig. 2b). The QTLs explained 36.13% (qDF2), 12.90% (qDF8.1), and 14.03% (qDF8.2) of the phenotypic variance, respectively. Positive additive effects derived from P. citriodora (P1) were observed for qDF2, whereas negative additive effects derived from P. hirtella (P2) were observed for qDF8.1 and qDF8.2.

DtoM

One QTL was detected for DtoM on LG02 with an LOD threshold of 4.50 (qDM2) (Fig. 2c). This QTL explained 19.34% of the phenotypic variance. A positive additive effect derived from P. citriodora (P1) was observed for qDM2, and was associated with a delay in DtoM.

One-way analysis of variance (ANOVA) for QTL markers

The effects of markers in these six QTL regions were confirmed via one-way ANOVAs. Fifty-eight markers had LOD value that exceeded the LOD threshold for each QTL: 18 markers each for qDFB1, qDFB2, and qDF2; one marker for qDF8.1; one marker for qDF8.2; and one marker for qDM2 (Table S3). Markers in qDFB1 and qDFB2 were significantly different between P. citriodora and P. hirtella with R2 values ranging from 29.68 to 44.07% and 12.37 to 31.38%, respectively. DtoFB was significantly later in P. citriodora by about 6–7 days compared to P. hirtella (P =0.000). qDF2, qDF8.1, and qDF8.2 markers also were significantly different between P. citriodora and P. hirtella. qDF2 markers had R2 values ranging from 18.26 to 38.41%, whereas the R2 values for qDF8.1 and qDF8.2 markers were 12.90% and 14.03%, respectively. qDF2 in P. citriodora was associated with a significant 4- to 5-day delay in DtoF. qDF8.1 and qDF8.2 in P. hirtella also were associated with a significant delay of DtoF of about 5 days. Lastly, qDM2 was significantly different between P. citriodora and P. hirtella (R2 = 19.34%) such that P. citriodora matures 4 days later than P. hirtella.

Orthologous genes

Orthologous genes that regulate FTi were confirmed using BLAST to estimate the candidate genes from QTLs that are related to DtoFB, DtoF, and DtoM. Protein sequences from 20 A. thaliana FTi genes were collected from the NCBI database. To estimate the FTi gene and copy number in perilla, TBLASTN was performed between the gene protein sequence and the perilla draft genome sequence. After filtering the results (e-value of 1e-10 and 50% query coverage), all of the FTi genes except ELF3 and FLC (AGL15, FLF) were selected. Each gene had at least one copy and TFL1 had six copies in perilla (Table S4). Among whole-copy genes, 2 genes were on QTL regions: ELF4, which is in the qDFB1 region of LG01, and GI, which is in the qDF2 and qDFB2 regions of LG02. AP1 was not in a QTL region but was close to qDF8.1. CO and COL5 also were close to qDF8.2 (Fig. 1). The results from resequencing analysis of parents showed that six SNPs were found in ELF4 and fourteen SNPs in GI (Table 5). Looking at the effect of SNP on the translation of amino acids, one SNP showed a non-synonymous change in the ELF4 gene and two SNPs in the GI gene.

Table 5 SNP of ELF4 and GI gene and effect of SNP in translation

Discussion

This study identified a large number of molecular markers and QTLs involved in flowering and maturation in the relatively little studied genus Perilla using GBS. The genome for perilla has not yet been decoded, and only a few markers (e.g., RAPD, SCAR and SSR) are available [18,19,20,21,22,23]. Significant effort is required to develop new molecular markers to construct high-quality linkage maps for perilla. The GBS method used in this study on the F2 group of perilla enabled the identification of sufficient polymorphic SNPs and genotyping to be performed simultaneously. This allowed us to create a linkage map with a fairly high density and without needing to check or apply existing markers. GBS, which has been applied to other crops, therefore is considered an effective and widely applicable method for genotyping various crops [26,27,28,29,30,31,32,33].

The oil content and composition of perilla seeds are known to change depending on the seed development stage [5], whereas the lipid content of seeds, particularly the fatty acid composition of 18:3 α-linolenic acid and 18:2 linoleic acid, changes little in the early period after flowering but sharply as maturity begins. More specifically, at the beginning of seed formation, the contents of the two components are similar. As maturity begins, 18:3 α-linolenic acid increases and 18:2 linoleic acid decreases. The maturation period therefore can be a very important trait in the breeding program. In the present study, it was confirmed that the flowering period is highly correlated with seed maturity (Table 2). The QTL analysis provides a more detailed explanation of this correlation. For example, the QTLs for DtoFB and DtoF are located in the same LGs, LG01 and LG02. The loci with the highest LODs, however, are located at different positions. In other words, the major QTL associated with the DtoF are located in LG02, but the major QTL associated with the DtoFB is located in LG01. Similarly, the QTL for DtoM also was located in LG01 and LG02, but the LOD was relatively low (LOD = 4.00). This finding suggests that there is less genetic influence on DtoM than on the peak time to DtoFB or DtoF.

In Arabidopsis, CO is a key integrator in the induction of FT expression [51]. Daily CO expression profiles are regulated mainly by the circadian clock-regulated proteins including FLAVIN-BINDING, F-BOX1 (FKF1), KELCH REPEAT, GI, and CYCLING DOF FACTOR (CDF) [52,53,54]. FKF1 and GI proteins form a complex that degrades CDF proteins which repress transcription of CO during the morning, and that up-regulates the expression of CO at the end of the day under LD conditions [53]. GI also directly interacts with EARLY FLOWERING 4 (ELF4), which sequesters GI from the nucleoplasm, where GI binds the promoter of CO, to discrete nuclear bodies [55]. This subnuclear localization of GI by ELF4 affects the level of CO expression. Light signaling pathways and the circadian clock co-ordinate the control of CO activity to induce FT [10]. The functions and roles of SD plants, legume orthologues of CO-like (COL) genes in the control of flowering may differ from those of LD plant Arabidopsis CO. GmCOL1a and GmCOL1b suppressed flowering under LD conditions in soybean [56]. It is suggested that the role of GmCOL1a and GmCOL1b, in the regulation of GmFT2a and GmFT5a may be similar to the role of Hd1 in the regulation of H3a in rice. Hd1 activates the expression of Hd3a under SD conditions, but suppresses it under LD conditions [8, 57, 58]. As a SD plant, three important modules that inhibit flowering in LD conditions in soybean were reported: the PHYA-E1, GI-CO, and miRNA-dependent modules [59]. In this study, the combination of QTL analysis and ortholog gene comparison enabled candidate genes to be identified more easily, such as PcELF4 and PcGI. This result indicated that GI-CO module suggested in soybean may be involved in the regulation of flowering time in perilla.

Gene-based molecular markers are helpful for molecular breeding. In the future, if these perilla genes are verified by molecular biological and genetic studies through mutants, they are expected to be used as molecular markers related to the flowering period of perilla.