Dongfang is one of the seven county-level cities of Hainan province, Southern China. It is located on the western coast of Hainan Island (Fig. S1). At the 2010 census, the population of Dongfang had reached 408,309, and the Han made up the largest ethnic group.

In this study, blood samples of 984 unrelated Han individuals were collected from Dongfang, Southern China, after informed consent. This work was approved by the Ethics Committee of the Institute of Forensic Science, Ministry of Public Security, People’s Republic of China. PCR was performed using the DNATyper™ Y29 PCR Amplification kit (Beijing, China) in the GeneAmp PCR System 9700 (Thermo Fisher Scientific Company, USA). Amplified products were separated on the ABI PRISM 3730xL Genetic Analyzer (Thermo Fisher Scientific Company, USA). Raw data was analyzed using the GeneMapper ID-X Analysis Software (Thermo Fisher). The allele and haplotype frequencies in the Dongfang Han population were estimated using the direct counting method [1]. Haplotype diversity (HD) was calculated using the following formula: HD = (n/n − 1) (1 − ∑ Pi2), where n is the total number of haplotypes and Pi was the frequency of the ith haplotype. Gene diversity (GD) was computed similarly to HD. Match probability (MP) was calculated as MP = ∑ Pi2, where Pi was the frequency of the haplotype. Discrimination capacity (DC) was calculated as the ratio between the number of different haplotypes and the total number of haplotypes. Population pairwise genetic distances (RST) and p values were estimated by analysis of molecular variance (AMOVA) and visualized in multidimensional scaling (MDS) plot using the AMOVA&MDS tool available at the Y-STR Haplotype Reference Database (YHRD) website (YHRD, http://www.yhrd.org). A neighbor-joining (NJ) phylogenetic tree was constructed based on RST matrix using the MEGA 7.0.

A total of 749 different haplotypes were found among 984 individuals, of which 645 were unique (Table S1). The HD was 0.9988 and the DC was 0.7612, while the MP was 0.0025. The allele frequency is shown in Table S2. Two hundred eighty-three different alleles were found at all 29 Y-STR loci, and corresponding allelic frequencies ranged from 0.0010 to 0.8059. DYF387S1 showed the highest GD in Dongfang Han population (0.9588), while DYS438 showed the lowest ones (0.3858). The results indicated that the 29 Y-STR loci are highly polymorphic and informative in the Dongfang Han Population.

Pairwise RST and p values were computed based on Yfiler set (Table S3) between Dongfang Han and 15 other populations extracted from the YHRD, including 8 Han populations from Gansu [YA004048], Guizhou [2], Jiangsu [3], Jiangxi [3], Jilin [4], Liaoning [5], Shenzhen [1], and Yunnan [6], as well as 7 ethnic minority populations (Xinjiang Hui [7], Xinjiang Kazakh [8], Liaoning Manchu [9, 10], Gansu Tibetan [YA004043], Xinjiang Uighur [11], Liaoning Xibe [12, 13], Liangshan Yi [14]). As shown in Table S4, the smallest genetic distance (RST = 0.0155) was found between the Dongfang Han population and Guizhou Han population, while the largest genetic distance (RST = 0.1284) was observed with Gansu Tibetan. After Bonferroni correction (p < 0.0004, 136 pairs), no significant differences were found between Dongfang Han population and Han populations from Jiangxi (p = 0.014) and Yunnan (p = 0.0027). The population relationships were presented using MDS plot (Fig. S2) and NJ phylogenetic tree (Fig. S3). As shown in Fig. S2, cluster 1 was generated using relax MDS calculation (threshold RST = 0.01), including Han populations from Gansu, Guizhou, Jiangsu, Jilin, and Liaoning and Liaoning Manchu. All the Han populations are located in the left part of the graph. The results indicated that genetic similarity existed among Han populations although their distribution is very wide. As shown in Fig. S3, two main clusters were observed. The Xinjiang Hui, Xinjiang Kazakh, Gansu Tibetan, and Xinjiang Uighur were clustered in the lower cluster. However, the other 12 populations were clustered together in the upper cluster. The results indicated that Dongfang Han population was more closely related to Guizhou Han population. Pairwise RST and p values (Table S5) were also evaluated using minimal haplotype set (Table S6) and visualized in MDS plot (Fig. S4). The results indicated that most of the pairwise genetic distances show an increase with an increasing number of Y-STRs in haplotype sets. It could be that novel Y-STRs have larger genetic diversities among these populations.

In conclusion, in the present study, the results indicated that the 29 Y-STR loci are highly polymorphic and informative in the Dongfang Han Population. Moreover, Dongfang Han population was more closely related to Han population from Guizhou.

Our haplotype data has been submitted to YHRD and the population accession number is YA004350 (Dongfang, China [Han]). This manuscript follows the guidelines for publication in the literature [15, 16].