According to historical records, a major part of modern Han ethnic group in northeastern China are descendants of immigrant who came from Shandong Province, Hebei Province, and adjacent region since the latter half of nineteenth century [1]. A total of 1026 male individuals of Han population were collected from Changchun City, Jilin Province in Northeast China. All individuals provided informed consent, and then, peripheral blood samples were collected. The ethics committee for biological research at the Fudan School of Life Sciences approved the study. Genomic DNA was extracted using Gentra Puregene Blood Kit (Qiagen). Polymerase chain reaction (PCR) was performed using the AmpFlSTR Yfiler™ PCR Amplification Kit (Thermo Fisher Scientific Company, Carlsbad, CA, USA) in the GeneAmp PCR System 9700 (Thermo Fisher Scientific Company) targeting 17 Y-chromosomal short tandem repeat (Y-STR). PCR products were separated by capillary electrophoresis in ABI PRISM 3130xL Genetic Analyzer (Thermo Fisher Scientific Company). The GeneMapper ID software v3.2 (Thermo Fisher Scientific Company) was used for genotype assignment. Allele and haplotype frequencies, haplotype diversity (HD), haplotype match probability (HMP), discrimination capacity (DC), and Rst values were calculated using the software package Arlequin version 3.5 [2], and provided in Supplementary Tables S1, S2, S3, and S4. Results of Rst were visualized using multidimensional scaling plot (MDS) with SPSS v22 (Chicago, IL, USA). We also conducted a separated run with only Han populations. Our data has been submitted to YHRD and received the accession number Changchun, China [Han], n = 1026, YA004260.

A total of 767 different haplotypes were observed among the individuals in the Han population of the present study, of which 634 were unique and 133 were shared by 2 to 12 individuals (Table S1). The HD of Han population in this study is 0.999104. The HMP value is 0.001869, and the DC value is 0.747563. The HD, HMP, and DC of other involved 29 populations can be found in Table S2. We compared the data in this study (Han_Jilin_WQ) and an independent dataset of 196 sample of Han population (Han_Jilin_YH) in the same location from Han et al. [3]. The HD of two datasets is close, while the DC of this study is lower than that from Han et al. [3], possibly due to some unnoticed close relatives in our large dataset.

To investigate the genetic relationship between Changchun Han population and other populations in China, data of 9066 reference samples from 16 Han and 13 minority populations were obtained from earlier reports (see details in Table S2) [3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21]. As shown in Table S3 and Fig. S1, Changchun Han and most other Han populations tend to cluster together in the center of MDS plot while minority populations scatter in the periphery of MDS plot. This result is consistent with previously reported genetic homogeneity of Han populations [22], with the Han population from Luzhou City [9] in southwest of China as an exception in our analysis. When only Han populations were included in the MDS analysis, Changchun Han population showed great affiliation with Han populations from North and Northeast China (Table S4 and Fig. S2). Furthermore, our sample (Han_Jilin_WQ) is close to the dataset from the same location (Han_Jilin_YH) from Han et al. [3] in MDS plot (Fig. S2). Our analysis results are consistent with a recent study [3] and historical records about the origin of Han populations in Northeast China.

In conclusion, we report the haplotype frequency and forensic parameters based on 17 Y-STR loci of Han populations from Changchun City, Jilin Province in Northeast China. Our analysis results indicated that Changchun Han population is close to Han populations from North and Northeast China. We believe that the data is valuable for both forensics and population genetics.