Vietnam is one of China’s neighboring countries, in recent years, lots of Vietnamese women settled and married with Chinese men in China; these people mainly live in Yunnan and Guangxi province in Southwest China. The sampling location of Vietnamese population in this paper is indicated in Fig. S1. In order to establish a database of the Vietnamese population in China, we used PowerPlex® 21 kit (Promega Corporation) which includes 20 autosomal short tandem repeat (STR) loci, namely D3S1358, D1S1656, D6S1043, D13S317, Penta E, D16S539, D18S51, D2S1338, CSF1PO, Penta D, TH01, vWA, D21S11, D7S820, D5S818, TPOX, D8S1179, D12S391, D19S433, and FGA. In this study, we presented allele frequencies and forensic statistical parameters of 20 STR loci in the Vietnamese population from Yunnan, China and compared pairwise genetic distances with the other populations.

Buccal swabs or blood samples of 522 unrelated healthy individuals were collected from Wenshan Zhuang and Miao Autonomous Prefecture in Yunnan province, southwest China after informed consent. DNA was extracted using the Chelex 100 protocol [1]. Amplification of 21 loci was performed using the PowerPlex® 21 System PCR kit (Promega Corporation) according to manufacturer’s recommendations. PCR products were separated and detected by capillary electrophoresis in an ABI 3130 Genetic Analyzer (Applied Biosystems, Foster City, CA, USA). Genotyping was done using allelic ladders provided with the kit and the software GeneMapper v.3.2 (Applied Biosystems, Foster City, CA). Control DNA (2800M) was included in the kit for quality control. DNA typing and assignment of nomenclature were based on the ISFG recommendations [2, 3] All experimental steps were carried out according to the Laboratory Internal Control Standards and Kit Controls, in addition, the laboratory participants in this study were accredited according to the ISO/IEC 17025:2005 General Requirements for the Competence of Testing and Calibration Laboratories (CNAS-CL01 Accreditation Criteria for the Competence of Testing and Calibration Laboratories). The allele frequencies, forensic parameters, and Hardy–Weinberg equilibrium were evaluated using the Modified-PowerStats v1.2 software obtained from Promega [4]. Population differentiation between the studied population and previous reference data were analyzed using the Arlequin v3.5 software [5]. Nei’s standard genetic distance between populations was calculated using Phylip3.69 package [6], and a neighbor joining phylogenetic tree was developed and visualized with the MEGA 4 software [7].

The distribution of allele frequencies and forensic statistical parameters in the Vietnamese population is shown in Table S1. A total of 224 alleles and 864 genotypes for all these loci were found, and 6–18 alleles for each locus were observed. The observed heterozygosity (Hobs) and polymorphism information content (PIC) ranged from 0.5766 (TPOX) to 0.8697 (FGA) and 0.5399 (TPOX) to 0.9044 (Penta E), respectively. The power of discrimination (PD) ranged from 0.7836 (TPOX) to 0.9840 (Penta E) with the value of 0.999999999999999999999991 26 for combined power of discrimination (CPD). The power of exclusion (PE) ranged from 0.2638 (TPOX) to 0.7341 (FGA) with the value of 0.999999975 for combined power of exclusion (CPE); all of the loci were consistent with Hardy-Weinberg Equilibrium (p value > 0.05) (Table S1). Therefore, 20 loci exhibited a high forensic efficiency and showed importance for the differentiation of individuals and paternity testing for the Vietnamese population in Yunnan, China. The p values for differentiation test between the studied population and other 22 populations (including 7 Han ethnic and 15 ethnic minority) at the same loci were shown in Table S2. After Bonferroni correction, statistically significant differences were found between the present population and Kejia Guangdong Han, Suzhou Han, Suzhou Hani, Neimenggu Han, Guizhou Dong, Tibetan, Yunnan Nu, and Yunnan Yi (unpublished data) at 13 STR loci, Shandong Han, Shanxi Han, Yunnan Han, Jilin Man, Guangdong Guangxi Gelao and Yunnan Derung at 12 loci, Dongbei Chaoxian, Guangdong Guangxi Jing and Guizhou Bouyei at 11 loci, Guangdong Guangxi Zhuang at 9 loci, Anhui Han at 8 loci, Yunnan Bai at 7 loci, Guizhou Miao at 5 loci, and Yunnan Miao (unpublished data) at 4 loci. Genetic distances between the populations ranged from 0.003242 to 0.242788 as shown in Table S3. Based on the Nei’s standard genetic distances matrix (Table S3) and the NJ phylogenetic tree (Fig. S2), our results showed that the Vietnamese population in Yunnan has a closer relationship with the Yunnan Miao and the Guizhou Miao. In this phylogenetic tree, Yunnan Nu and Yunnan Derung form a unique cluster.

In conclusion, we report the allele frequencies and forensic parameters of 20 autosomal STR loci of the Vietnamese population of Yunnan in China. Our results showed that these 20 STR loci can provide highly informative polymorphic data for paternity testing, individual identification, and genetic population studies. This manuscript follows the guidelines for publication requested by the journal [8, 9].