Introduction

Spinocerebellar ataxias (SCAs) are a group of genetically heterogeneous neurodegenerative disorders characterized by the deterioration of the cerebellum and its afferent and efferent pathways. Some of these are caused by expansion of polyglutamine-encoding CAG trinucleotides, like SCA1, SCA2, Machado–Joseph disease (MJD/SCA3), SCA6, SCA7, SCA12, SCA17, and dentatorubro-pallidoluysian atrophy [1].

The overall prevalence of SCAs is ~ 3 per 100,000 individuals, with MJD/SCA3, SCA6, and SCA1 among the most common worldwide [2, 3]. Nevertheless, there are local differences in SCA subtypes prevalence and relative frequency. SCA1 represents approximately 6% of patients with autosomal dominant cerebellar ataxia (ADCA) worldwide; however, its relative frequency among ADCAs ranges between 0% in Mexico and South Korea [4, 5] and 88% in Siberian Yakuts [6]. SCA3 is considered the most frequent ADCA in Japan, Taiwan, Portugal, and Brazil with a relative frequency between 43 and 85% [7,8,9]. In contrast, no cases of SCA3 have been reported in Czech and Finns, and its frequency is very low in Southern India (0.8%) [10,11,12]. The overall prevalence of SCA6 is approximately 0.02 to 0.31 per 100,000 individuals and represents approximately 15% from ADCAs worldwide [3, 13]. The highest frequencies of SCA6 were reported in Japan, Australia, and Germany with 25.5%, 30%, and 42%, respectively [7, 14, 15]. However, there are no cases in Mexicans and Northern Indians [5, 16].

Frequency of LN alleles can influence repeat disease prevalence by altering the microsatellite mutation rate. Disease-causing microsatellites have higher mutation rates than non-disease microsatellites [17], which stem from their propensity to form secondary DNA structures that promote defective DNA synthesis mediated by repair complexes [18, 19]. Microsatellite mutation rate increases with repeat size, even within non-pathogenic ranges [20,21,22]; thus, a high population frequency of LN alleles is expected to increase the probability that at least one non-expanded allele will reach a pathogenic size. There is a well-documented correlation between Huntington disease prevalence and frequency of large alleles within non-pathogenic range (LN alleles) [23, 24]. Likewise, in the case of SCA2, the high frequency of LN alleles has been related to the high prevalence of this disease in Cubans [25]. While an association between frequency of LN alleles and disease prevalence has also been proposed for SCA1, MJD/SCA3, and SCA6, other studies did not validate this association [5].

Lack of interruptions in the CAG repeat tract is also associated with microsatellite instability. A pure CAG repeat tract is prone to form hairpin structures that promote faulty DNA synthesis mediated by repair complexes targeting the DNA hairpins [18, 19]. Thus, DNA sequences interrupting CAG repeat tracts destabilize secondary DNA structures and impair hairpin formation [18, 26]. The ATXN1 microsatellite presents up to three instances of the CAT trinucleotide within the CAG tract, with varying frequency across populations. Studies performed in North American, Indian, Chinese, and Siberian Yakuts people suggest that the higher the population frequency of alleles carrying a single CAT interruption, the higher the relative frequency of SCA1 among SCAs [27,28,29]. Furthermore, except for very rare cases, most expanded alleles are uninterrupted by CAT trinucleotides [30]. A study of a SCA17-causing microsatellite shows similar results, with mutation frequency of pure CAG tracts at least twice as high as in alleles bearing CAA interruptions [31].

The distribution of normal alleles at ATXN1, ATXN3, and CACNA1A has been described in several ethnic groups from Asia, Europe, and Africa, but studies in Latin American populations other than Brazilians and Mexicans are lacking [5, 9, 11, 32,33,34,35,36]. Furthermore, the frequency or presence of CAT interruptions at ATXN1 gene has been described in some populations but not in Latin America [27, 28, 36]. Studying the population frequency of LN alleles and microsatellite repeat interruptions could therefore help inform the epidemiology of these diseases in Latin America.

The Neurogenetics Research Center has implemented the genotyping of ATXN1, ATXN3, and CACNA1A CAG microsatellites, which has allowed us to provide the diagnosis for SCA1, SCA3, and SCA6, as well as to study the epidemiology of these diseases in Peruvian population. Little is known about the factors that drive the prevalence and incidence of these diseases in Peruvian population. In this study, we sought to test whether allele size distribution correlates with frequency of SCA1, MJD/SCA3, and SCA6 in Peru. To this end, we PCR-genotyped 213 individuals of self-reported mestizo ancestry from Northern Lima, Peru, and compared the allele distribution of ATXN1, ATXN, and CACNA1A unstable microsatellites with the frequency of SCA1, MJD/SCA3, and SCA6 across populations. We also tested 40 individuals for the presence of CAT interruptions within the ATXN1 microsatellite.

Methods

Subjects

Using the Minsage software, we estimated that we required 213 subjects to obtain a 95% probability to detect an allele with 1.4% population frequency under no Hardy–Weinberg equilibrium [37]. We targeted for such allele because it was the rarest allele observed, in Brazilians, in any of the studied genes that has a high-enough frequency not to be considered a mutation (< 1%) [32]. Therefore, we recruited 213 healthy unrelated individuals from Northern Lima during a 6-month period of 2013. All participants were self-declared mestizo adults (admixed Peruvian individuals of mostly Amerindian and European ancestry), having Peruvian ancestors for at least two generations. Lima’s population has admixed ancestry, mostly from native Americans and Europeans [38]. Participants were 26.3% male, with ages ranging between 18 and 84 years (median 36, interquartile range 29.25). A standardized neurological examination by trained neurologists was performed with all participants to exclude any current gait instability and any other neurodegenerative disorder. The study was approved by the Institutional Review Board at the Instituto Nacional de Ciencias Neurológicas and informed consent was obtained from all individuals.

Samples

Genomic DNA was extracted from peripheral blood samples by the salting out method and were stored at − 20 °C until analysis [39]. DNA concentration and quality were assessed by EPOCH spectrophotometer (Biotek ®, Winnoski, USA).

Genotyping

Samples were genotyped for CAG repeat size at ATXN1, ATXN3, and CACNA1A. All three loci were amplified by PCR in different reactions using an ABI Prism Thermal cycler. Primers for (CAG)n in ATXN1 gene were Rep1: 5′-GTACGTCCACATTTCCAGTT-3′ and Rep2: 3′-CAACATGGGCAGTCTGAG-5′ [40]; for (CAG)n in ATXN3 were MJD52 5′-CCAGTGACTACTTTGATTCG-3′ and MJD25 3′-AACCCTCACTAGATCCATTC-5′ [41]; and for CACNA1A were S-5-F1 5′-CACGTGTCCTATTCCCCTGTGATCC-3′ and S-5-R1 5′-CACCAGCGGCCCTCGGAGGTACCCA-3′ [42]. PCR products were genotyped by 6% polyacrylamide gel electrophoresis. Briefly, we used samples of known genotype (previously analyzed by capillary electrophoresis) to perform a linear regression of repeat size versus distance migrated on polyacrylamide gel electrophoresis. Next, we used the linear model to predict repeat size of unknown samples based on their migrated distance.

Given that the mutability of the ATXN1 CAG microsatellite is influenced not only by the repeat tract length, but also by the presence of CAT interruptions, we also analyzed the frequency of this interruption in the normal population. We selected the forty individuals with the largest ATXN1 normal alleles and tested them for the presence of CAT interruptions by RFLP. The PCR products were enzymatically digested by LweI (SfaNI) 10 U/μL (Fermentas Life Sciences), for 5 h at 37 °C, and subsequently were electrophoresed in 8% PAGE.

Statistical Analysis

Hardy–Weinberg equilibrium was tested with the Genepop v.4.5.1 [43]. For each gene analyzed, alleles were grouped in normal and large normal according to previously reported thresholds [44]. Specifically, we considered LN alleles those with over 31, 27 and 13 CAG repeats for ATXN1, ATXN3, and CACNA1A, respectively [44]. Normality of each distribution was assessed by Shapiro–Wilk test (S-W). Frequency of LN alleles across populations was compared by a chi-squared test (α = 0.05) and Fisher’s exact test performed with the Stata v.12. We used the Bonferroni correction to adjust for multiple testing. We divided 0.05 by the number of total comparisons to find the new cutoff value to declare that a statistical test was significant. We estimated the 95% confidence interval of the frequencies of large normal alleles using the Klopper–Pearson method [45] implemented in the binom package [46] for the R v.3.5 [47].

Relationship between frequency of LN alleles and relative SCA frequency was tested by Spearman’s correlation coefficient. All figures were created using the R v.3.5.

Results

Distribution of CAG Repeat Size of ATXN1, ATXN3, and CACNA1A Genes in Peruvian Population

ATXN1 showed a normal distribution (S-W, p value > 0.05) slightly asymmetric, while ATXN3 and CACNA1A presented a prominently asymmetric and non-normal distribution (S-W, p value < 0.05) (Fig. 1). ATXN1 presented 12 different alleles ranging between 24 and 35 CAG repeats, with 29 and 30 CAG repeats as the most frequent alleles. DNA digestion with LweI restriction enzyme showed that all forty individuals carried at least one CAT interruption within the ATXN1 CAG repeat tract in both chromosomes. ATXN3 showed 28 different alleles in the range between 9 and 38 CAG repeats; the most frequent ones were 14, 23, and 24 CAG repeats. CACNA1A displayed 12 different alleles ranging from 3 to 17 CAG repeats. The most frequent ones were 11, 12, and 13 CAG repeats. All three loci deviated from Hardy–Weinberg equilibrium with p values 0.014 (ATXN1), < 0.001 (ATXN3), and < 0.001 (CACNA1A) and presented heterozygotes deficit, as shown by the moderate FIS statistics (0.114, 0.154, and 0.227 for ATXN1, ATXN3, and CACNA1A, respectively).

Fig. 1
figure 1

Distribution of CAG repeat size in ATXN1, ATXN3, and CACNA1A genes. The X-axis represents the CAG repeat number and the Y-axis the percent frequency of each allele

Frequency of LN Alleles

We found 68 (16.0%), 40 (9.4%), and 31 (7.3%) LN alleles at ATXN1, ATXN3, and CACNA1A genes, respectively. Differences in LN allele frequency between Peru and other populations did not allow us to conclude any association with relative frequency of SCA1, MJD/SCA3, and SCA6 (Table 1). There was no correlation between frequency of LN alleles and frequency of SCA1 or MJD/SCA3 cases (p value > 0.05). However, there was a significant correlation between frequency of LN alleles at CACNA1A and relative SCA6 frequency (Fig. 2).

Table 1 Comparison of the frequency of LN alleles for CAG repeat in ATXN1, ATXN3, and CACNA1A genes in Peruvian population with other populations
Fig. 2
figure 2

Population frequency of CACNA1A LN alleles correlates with relative SCA6 frequency across populations

Discussion

Frequency of LN alleles at disease-causing unstable microsatellites has been proposed to be associated with prevalence of the corresponding diseases including SCAs [23, 24, 44], presumably because mutation rate increases with microsatellite length, even within non-pathogenic range [22, 50]. In this study, we sought to test whether allele size distribution correlates with frequency of SCA1, SCA3, and SCA6 in Peru. For this purpose, we PCR-genotyped 213 self-reported mestizo individuals from Northern Lima, Peru, and compared the allele distribution of ATXN1, ATXN3, and CACNA1A unstable microsatellites across populations with differing frequency of SCA1, MJD/SCA3, and SCA6. The frequency of LN alleles in the studied population was 16%, 9.4%, and 7.3% for ATXN1, ATXN3, and CACNA1A genes, respectively, and was not associated with relative frequencies of SCA1, MJD/SCA3, and SCA6 in the Peruvian population.

Observed allele size ranges were similar to those reported in other populations (Fig. 1). ATXN1 allele size range (24–35 CAG repeats) in Peruvians was similar to that of Mexican (26–38) [5], Chinese (26–35) [28], and African (21–36) [51] populations. The allele size range of ATXN3 (9–38) was similar to that of individuals in other Latin Americans countries such as Mexicans (14–34) [5], Brazilians (12–34) [32], and Portugueses (14–36) [52], as well as to that of other populations across the globe [14, 16, 44]. CACNA1A repeat range (3–17) was similar to previous reports worldwide [5, 44, 53, 54]. We did not find any individual carrying triplet expansions or mutation-prone, intermediate alleles on any of the genes analyzed.

Frequency of ATXN1 LN alleles in Peru (16%) was moderate compared with other populations (Table 1). Interestingly, frequency of LN alleles in Peruvians was similar to that of Caucasians (16%) even though SCA1 frequency in this population is much higher (15%) than that in Peru (2.6%) [44, 49]. Given the intermediate frequency of ATXN1 LN alleles in Peruvians (Table 1), a moderate frequency of SCA1 cases could be expected in Peru according to the hypothesis that LN alleles are a source of de novo repeat expansions [23, 24, 44]. Moreover, frequency of LN alleles of ATXN1 in Mexican population (27.5%) is among the highest worldwide, although there are no reported SCA1 cases [5]. Furthermore, populations with a varying frequency of SCA1 cases, such as Brazilian (Paraná) [32], Northern Indian [16], Caucasian [44], and Portuguese [9], do not show statistically significant differences in frequency of LN alleles. Particularly, the Siberian Yakuts show a markedly high frequency of SCA1 (88.1%), but their frequency of LN alleles (11.2%) is similar to that of Peru (16.0%, Table 1). These results seem to challenge the role of LN alleles on influencing the SCA1 prevalence; however, we note that, if this effect exists, it may be stronger in populations with high frequency of low number of CAT interruptions. For instance, the frequency of singly interrupted CAG tracts in Siberian Yakuts (65%) is much greater than that of other populations where SCA1 has a low prevalence, such as Chinese Han (8% singly interrupted alleles) [28]. The frequency of LN alleles in India is similar to that of Caucasian populations, but their SCA1 prevalence is higher; these results have been attributed to the difference in the frequency of singly interrupted ATXN1 alleles (2% in Polish and 16% in Indians) [12, 36].

All 40 samples analyzed tested positive for the presence of CAT interruptions within the CAG tract at both chromosomes. DNA sequences interrupting (CAG)n tracts destabilize repeat-induced secondary DNA structures and impair formation of hairpins that promote expansions during DNA synthesis [18, 19, 26]. Thus, the lack of agreement between the frequency of LN alleles and SCA1 frequency in Peruvians might be explained by the high frequency of alleles that have at least two CAT interruptions in this population. However, further analyses quantifying the number of CAT interruptions in the CAG tracts are required to test this hypothesis.

The frequency of LN alleles at the ATXN3 locus was also intermediate in Peru (9%) compared with countries with the most extreme values, such as Southern India (1%) and Japan (21%) [12, 44]. Frequency of LN alleles in Peruvian people was similar to that of Spanish [53], Northern Indian [16], and Caucasian [44] populations, where the frequency of SCA3 ranges between 5 and 30%. However, frequency of LN alleles in Peru was greater than in Mexican [5], Southern Brazilian [32], and Australian [14] which feature low frequency of LN alleles (1.75–5%) but high frequency of SCA3 (12–74%) compared with Peru (5.26%) [49]. Likewise, Czechs and Finnish populations have a similar frequency of LN alleles to that of Peruvians but no cases of SCA3 [10, 11]. These seemingly random differences and similarities in allele frequencies between divergent populations could be explained by genetic drift [55]. Therefore, we could not identify a clear-cut correlation between frequency of LN alleles and relative SCA3 frequency as was suggested in a previous study [44]. This is in agreement with the putatively few inferred origins of the SCA3 mutations according to the association of two common haplotypes with the SCA3 mutation in different worldwide populations and lack of associations of these haplotypes with large normal alleles in healthy population [56, 57].

Frequency of LN alleles at CACNA1A in Peru was moderate (7.28%) compared with other locations such as Northern India (1.7%) and Japan (20%, see Table 1) [16, 44]. Interestingly, populations with higher frequency of CACNA1A LN alleles tend to have a greater SCA6 frequency (Fig. 2), which is consistent with previous studies [44]. Nevertheless, this correlation was strongly driven by a single point corresponding to Japanese population, since statistical significance was lost by removing this single point (p = 0.0745). Further studies are necessary to confirm the observed correlation.

Overall, we found no evidence of association between SCA1 and SCA3 frequency with frequency of LN alleles at their corresponding causal genes. CACNA1A LN alleles showed a statistically significant correlation with SCA6 frequency, but doubts regarding its robustness remain. Thus, our results disagree with a previous study that proposed that frequency of LN alleles is associated with SCA prevalence [44].

We propose that further studies are necessary to discern the role of the frequency of LN alleles, especially for ATXN1 and CACNA1A. Further studies of ATXN1 allele distribution should quantify the number of CAT interruptions, not performed in this study, since this genetic feature seems to play a major role in ATXN1 repeat instability. Correlation between CACNA1A LN alleles and SCA6 frequency should be confirmed by collecting additional data from other populations. Future studies should replace relative disease frequency with disease prevalence, because the relative frequency of any particular SCA will be affected by the epidemiology of other disorders.