Bangladesh is a South Asian country, bound by India to the north, east, and west, Myanmar in the southeast, and the Bay of Bengal to the south. Over 98 % of the population belongs to the Bengali ethnolinguistic group. The remainder is mostly indigenous ethnic minorities covering about 2 % of the total population. So far, 35 such small groups have been identified who are living in different pockets of the hilly zone and some areas of the plane lands in the country. The individuals recruited in this study are from the Bengali population in Bangladesh.

Blood samples were collected from 137 unrelated Bangladeshi males, following procedures that were in accordance with the Helsinki Declaration of 1964, revised in 1983 [1]. Genomic DNA was extracted from blood samples using the Chelex-100 protocol as described by Walsh et al. [2]. Extracted DNA was quantified by using NanoDrop-1000 (Thermo Fisher Scientific, USA). Amplification of the 23 Y-STR loci was performed using PowerPlex® Y23 System PCR amplification kit (Promega Corporation, USA). PCR amplification of all 23 Y-STR loci (DYS19, DYS385a/b, DYS39I, DYS39II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS481, DYS533, DYS549, DYS570, DYS576, DYS635, DYS643, and Y-GATA-H4) was carried out in a Veriti® Thermal Cycler (Applied Biosystems, Foster City, CA, USA) according to the manufacturer’s recommendations. The amplified products were separated by capillary electrophoresis on ABI 3500 Genetic Analyzer (Applied Biosystems, Foster City, CA, USA) using POP-4 polymer and 3500 Series Data Collection Software ver. 1.0. CC5 ILS-500 was used as the internal size standard. Peak sizing and genotype assignments were done by GeneMapper® ID-X Software v1.2. Alleles were designated according to the recommendations for the DNA commission of the International Society of Forensic Genetics (ISFG) guidelines for forensic STR analysis [3].

Allele frequencies for each locus were calculated by direct counting method. The gene or haplotype diversity (GD) was calculated as, GD = n / (n-1) (1-∑Pi 2), where Pi is the frequency of the ith allele or haplotype and n indicates the number of samples [4]. Discrimination capacity (DC) of the haplotypes was calculated as DC = H / n, where H is the number of different haplotypes and n is the total number of samples. A total of 134 different haplotypes were found from 137 individuals with a DC of 0.978 (Table S1). These haplotypes are unique and we did not find any hit with the available haplotype data in YHRD (Release No 49, February 17, 2015). The haplotype diversity values across 23 loci ranged from 0.315 (DYS391) to 0.945 (DYS385a/b) and allele frequencies from 0.007 to 0.818, respectively (Table S2).

Haplotypes with double alleles were observed at DYS576, DYS391, and DYS458 loci. Double alleles at DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS393, DYS435, DYS437, DYS438, DYS439, and DYS448 have been reported by many researchers [5]. Unlike other chromosomes, Y chromosome does not recombine almost its entire length with X chromosome except at its very tips, called pseudoautosomal regions. However, the Y chromosome is prone to a mechanism called intrahelical recombination leading to segmental duplication or deletion of sequences, including coding and non-coding segments. Genomic studies revealed that about one quarter of the Y chromosome consists of eight massive palindromes. Each palindrome readily folds like a hairpin, bringing two arms together. They provide a mechanism of intrahelical recombination between the similar genes on the same chromosome. This process, called gene conversion, aids in the detection and repair of gene mutation in this part of Y chromosome. The observed double alleles are the result of segmental duplication caused by intrahelical recombination. [6].

Four null alleles were observed at DYS448, DYS549, DYS392, and DYS385a/b in the same haplotype. Null alleles have been primarily reported in Asian populations [79]. Null types of alleles may result due to a mutation at a primer building site or large deletion in the Y chromosome [10]. In order to check a possible mutation in the primer binding site, the sample was retyped with Y-Filer™ kit (Life Technologies, USA). The analysis reproduced the null alleles at all the three loci (DYS448, DYS392, and DYS385a/b), which was in concordance with PPY23. Since Promega Corporation and Life Technologies use different primer sequence in their kits, the observed null alleles in at least these three loci are not due to a mutation at a primer binding site [11]. Further studies with sequencing of these alleles may provide a definitive answer.

Pairwise comparison of the haplotype data based on genetic distances (R ST) and P values between the studied population and other population data available in YHRD database was performed using the YHRD AMOVA tool (http://www.yhrd.org/Analyse/AMOVA) [12]. A total of 23 population samples were included in the analysis (Table S3). To portray the relationship between populations, UPGMA tree was constructed based on Rst value using PowerMarker software v3.25. The tree was visualized using traditional rectangular option of the MEGA 6 software. The analysis showed that the Bangladeshi Bengali population is closely related to Indo-Pakistani population living in the UK (RST = 0.0013, P = 0.2855) followed by Indian Tamil (Rst = 0.109, P = 0.0253) population (Figure S1).

In conclusion, the new PowerPlex Y23 system with its six additional loci provides novel information on haplotype distribution of 23 Y-STR markers, and demonstrates its usefulness in determining paternal lineage and personal identification in Bangladeshi Bengali population. It is also important to note here that increasing the number of highly polymorphic Y-STR markers caused a reduction of repeating haplotypes with higher discrimination power and much improved performance, which is very important to the forensic community.