Forensic Genetics laboratories rely on allele frequencies from short tandem repeat loci (STRs), such as the 20 core loci recommended for CODIS (Combined DNA Index System) in the USA [1]. With the increase of core STR loci for CODIS, there has been development of expanded STR kits. Because of the additional loci, there is a need for supporting population data for casework analyses. Studies have been carried out on more than 20 STRs and with more than 100 individuals in mestizo populations in Peru [2,3,4], but not on any of the 76 native ethnic groups or populations united by existing cultural and linguistic ties in Peru [5]. Among the 76 ethnic groups in Peru, the largest population is Quechua followed by Aymara [6]. The Aymara ethnic group is predominantly found in southeast Peru in the department of Puno and in the border areas with Bolivia [7] and has been studied with nuclear genes, mitochondrial DNA, and insertion-deletion markers [8,9,10,11,12] but not with autosomal STRs.

This study was approved by the ethics committee of the Institute of Tropical Medicine “Daniel Alcides Carrion” from the Universidad Nacional Mayor de San Marcos (Certificate of Approval CIEI-2018-015). Before participating in the study, all participants freely and voluntarily signed their informed consent. DNA was obtained from 190 unrelated adults from the Aymara population from the province of Puno. All participants were born in Puno as well as their parents and grandparents. The samples were taken from 37 individuals residing in the Puno Islands (Anapia, Suana, and Iscaya), 58 individuals residing in districts bordering Bolivia (Unicachi, Ollaraya, Tinicachi, and Yunguyo), and 95 individuals residing in districts that are not on the border with Bolivia (Zepita, Platería, Cuturapi, Juli, Pomata Huacullani, Copani, and Yunguyo). Samples were taken by finger puncture, and 4–6 drops of blood were placed on a Nucleid Card (Copan). These samples were stored in sterile, ambient conditions until use.

Prior to sample processing, the Nucleid Cards (Copan) were dried at 37 °C for 60 min. Then, a 1.2-mm punch was taken from each card and amplified by direct PCR using the Verifiler Express kit (Life Technologies) following the manufacturer’s instructions. Amplified products were detected in the Applied Biosystem™ 3500XL Genetic Analyzer (Life Technologies) following the manufacturer’s recommended protocol. After capillary electrophoresis, the data were imported into the GeneMapper® ID-X v1.5 software to generate the genetic profiles.

The power of discrimination (PD), the content of polymorphic information (PIC), the probability of exclusion (PE), the observed heterozygosity (Ho), and the match probability (PM) were calculated with Power Stats V1.2 software [13]. The expected heterozygosity (He) and detection of departures from Hardy-Weinberg equilibrium, the combined PD, the combined PE, and the population distance (using Fst and Reynold’s genetic distance) were performed with Arlequin software [14]. The population studied (n = 190) was compared with other populations [4, 15,16,17] using 15 STR markers that were common among the different studies (D3S1358, vWA, D16S539, CSF1PO, TPOX, D8S1179, D21S11, D18S51, D19S433, TH01, FGA, D5S818, D13S317, D7S820, and D2S1338).

The allele frequencies and other relevant population parameters for the 23 autosomal STR markers for the Aymara sample population are shown in Supplemental Table 1. All markers were highly polymorphic [18]. There were no detectable departures from Hardy-Weinberg equilibrium expectations except for the FGA locus (p = 0.0248). After applying the Bonferroni correction, the FGA marker no longer departed significantly from expectations.

The average PD per locus was 0.8668 and ranged from 0.6768 to 0.9738. The marker that had the highest value was PENTA E (0.9738, 15 alleles) and the lowest value was observed at the D2S441 locus (0.6768, 7 alleles). The loci with the highest PD and PIC values were PENTA E (0.8793) and D2S441 (0.4312). The PE ranged from 0.7847 (PENTA E) to 0.1532 (D2S441). The PM values ranged from 0.3232 (D2S441) and 0.0262 (PENTA E). He and Ho were similar within a locus, and the averages were 0.7217 and 0.7204, respectively. The He ranged between 0.4718 (D2S441) and 0.8917 (PENTA E), while the Ho range was 0.4579 (D2S441) and 0.8947 (PENTA E). Under the assumption of independence, the combined PD was greater than 0.99999999, and the combined PE was 0.99999994.

Population distances are shown applying Fst and Reynold’s genetic distance [14] in Supplementary Table 2. In this study, a Fst value of less than 0.01 was considered to have little population subdivision, and a value greater than 0.01 was considered indicative of notable population substructure [19, 20]. The Aymara population, the Bolivian population [15], and the Peruvian population [4] all cluster and are close to each other with the greatest distance being with the Ashaninca [16] and US Hispanics [17] (Supplemental Fig. 1). When separating out the Aymara population sample by region, the three regions—the Aymara Island, Bolivian border population, and the population not near the border of Bolivia—are similar (Supplemental Table 3 and Supplemental Fig. 2). The data suggest that Aymara is similar to Peruvian and other Bolivian population data [4, 15] and that the Ashaninca population [16] is notably different from all other populations analyzed herein.

This study provides STR population data from the Aymara, one of the ethnic groups of Peru. The results support the use of the population data for statistical calculations in human identity testing cases in Peru.