Introduction

Spinocerebellar ataxia type 10 (SCA10, OMIM #603516) is a late-onset autosomal dominant cerebellar ataxia with variably associated extracerebellar findings that most characteristically include epilepsy, but also psychiatric disorder, cognitive disability, polyneuropathy, and pyramidal signs (Teive et al. 2004, 2010, 2011; Ashizawa 2012). Apparently, clinical spectrum may vary according to patients’ ethnical background (Teive et al. 2004, 2010; Rasmussen et al. 2001). The disease is caused by an expansion of the pentanucleotide repeat ATTCT in intron 9 of the ATXN10 gene, which is located at chromosome 22q13.3 (Teive et al. 2011; Ashizawa 2012). Normal alleles can vary from 9 to 32 repeats, while alleles from affected subjects range from 800 up to 4500 repeats (Matsuura et al. 1999, 2000; Grewal et al. 1998, 2002). Reduced penetrance has been found for intermediate size alleles of 280–850 repeats (Alonso et al. 2006; Raskin et al. 2007).

SCA10 has shown geographical distribution throughout the Americas. Until very recently, it has only been described in patients from Mexico, Brazil, Argentina, Venezuela, Colombia, USA, and Peru (Matsuura et al. 2002; Rasmussen et al. 2001; Gatto et al. 2007; Gallardo and Soto 2009; Roxburgh et al. 2013; Kapur and Goldman 2012; Bushara et al. 2013; Leonardi et al. 2014; Teive and Ashizawa 2013). Interestingly, except cases involving a Japanese and a Chinese Han families, patients have a recognized family history of Amerindian ancestry, and literature data report two affected subjects descending from the Yaqui, in Mexico, and from the Sioux, also from North America (Kapur and Goldman 2012; Bushara et al. 2013; Wang et al. 2015; Naito et al. 2017). Previous haplotype studies in a limited number of Brazilian and Mexican families have shown a shared common haplotype (Almeida et al. 2009).

The recent description of two families from Asia carrying SCA10 suggests that the original SCA10 mutation might have occurred very early, before the divergence of Proto-Amerindians from ancestral Asians (Wang et al. 2015; Naito et al. 2017). SCA10 is nowadays present in admixed American populations due to admixture with Europeans and Africans who arrived in the Americas in the sixteenth and seventeenth centuries (Avena et al. 2012; Price et al. 2007; Lins et al. 2010; Seldin et al. 2007). However, the ancestral origin of SCA10 mutations remains unclear, since this possible unique origin has to be confirmed with more extensive analyses. SCA10 is a rare disease belonging to a group of genetically heterogeneous neurodegenerative disorders. More than 40 loci have been associated with spinocerebellar ataxia with overlapping clinical symptoms so far. It is well known that founder effects play an important role in the prevalence of genetic disorders that are restricted to certain geographical areas. Therefore, elucidating aspects related to the natural history of the disease will definitely assist clinicians and improve awareness of this disease, providing a more comprehensive genetic counselling that will reflect in a better life quality for patients. Herein, we present a wider study of patients from two different geographical regions bringing molecular evidence to test the hypothesis of the presence of an Amerindian ancestral founder mutation in SCA10 patients.

Materials and Methods

SCA10 Families and Control Population

This study was carried out in 16 Brazilian and 21 Peruvian unrelated families including 29 and 27 individuals, respectively. Brazilian families are from five different Brazilian States: 7 from Rio Grande do Sul, 3 from Santa Catarina, 3 from São Paulo, 1 from Goiás, and 2 from Rio Grande do Norte. Peruvians families are also from different regions of Peru, mainly from Pacific coast and highlands regions. Samples from Peru and from Brazilian States but Rio Grande do Sul were available through Rede Neurogenética. Inclusion criteria were suggestive clinical signs and symptoms of ataxia with a confirmed molecular diagnosis of SCA10 (an expanded ATTCT allele). Samples from 49 unrelated healthy Amerindian “Quechua” individuals from Southern Peru and 51 unrelated Brazilians individuals were included as controls (Almeida et al. 2009). All participants provided signed informed consent. This study is part of a more comprehensive project, which was approved by the Ethics Committee of Clinical Hospital of Porto Alegre (Project #07-259).

ATXN10 Mutation Analysis

Peripheral blood samples (5 mL) were collected from patients and healthy individuals. Genomic DNA was isolated from peripheral blood leucocytes using salting-out method (Miller et al. 1998). Expanded alleles were detected by repeat-primed polymerase chain reaction (RP-PCR) using a locus-specific primer, a repeat-specific primer, and a tailed primer (Table S1). Amplification was carried out under conditions previously described (Matsuura et al. 2002).

Genotyping of Polymorphisms Flanking the ATXN10 Gene

All subjects were genotyped for four highly polymorphic short tandem repeats (STRs) and four intragenic single nucleotide polymorphisms (SNPs). Initially, haplotypes were constructed with four SNPs (rs5764850—C/A, rs72556348—G/A, rs72556349—G/A, rs72556350—C/T) and two dinucleotide STRs (D22S1140 and D22S1153), as published previously (Almeida et al. 2009; Gheno et al. 2017). In order to gain deeper insight about the ancestral origin of the haplotype, two additional tetranucleotide loci (STRs: D22S532 and GATA030P) were added to the original set of markers, spanning a region of 5.2 cM harbouring the ATTCT repeats. These markers were selected based on two criteria: (1) physical distance from SCA10 locus and (2) the presence of alleles with higher frequency in Native Americans than in other populations. The additional STRs data set used for selection was published elsewhere (Wang et al. 2007). The genomic position of markers and mutation location are indicated in Fig. 1.

Fig. 1
figure 1

Diagram of chromosome 22 region including ATXN10 gene harbouring the SCA10 mutation. The ATTCT repeat and flanking regions include exons (vertical lines) and intergenic/introns (horizontal line), showing location of four intragenic SNPs and four microsatellite markers used for full haplotype reconstruction. Distance from ATTCT expansion is as follows: −1.5 Mb for D22S1140; −68 Kb for D22S532; −1198 bp for rs5764850; +48 bp for rs72556348; +303 bp for rs72556349; +370 bp for rs72556350; +21.3 Kb for D22S1153; and +78.5 Kb for GATA030P

Amplification of the polymorphic regions is performed with primer sequences listed in Table S1. Three different genotyping methods were used. First, rs5764850 was determined by qualitative assay by real-time PCR using a validated TaqMan® PCR Assay (Applied Biosystems). PCR was performed in a final volume of 8 μL containing 4 ng of DNA, 0.2 μL of specific TaqMan assay, and 4 μL of 2× PCR Genotyping Master Mix (Applied Biosystems) (Assay ID C_30474239_39). Amplification included an initial step at 50 °C for 2 min (activation of AmpErase UNG function), 95 °C for 10 min (AmpliTaq® Gold activation), followed by 40 cycles of denaturation at 95 °C for 15 s, and annealing/extension at 60 °C for 1 min. Allelic discrimination step was performed at 60 °C for 1 min. Amplicons were analysed by Sequence Detection System software version 1.2.1 (Applied Biosystems) in the ABI Prism 7500 Sequence Detector System (Applied Biosystems) through allelic discrimination plot.

The remaining SNPs were genotyped by DNA sequencing. The three SNPs were amplified in a single fragment of 483 bp. A 25 μL reaction was prepared containing 100 ng of genomic DNA, 5.0 pmol of each primers ATXN10SNPR and ATXN10SNPF1, 200 μM dNTPs (Amersham-Pharmacia), 10 mM Tris–HCl (pH 8.3), 50 mM KCl, 0.75 mM MgCl2, and 1.5 U of Taq DNA Polymerase (Invitrogen). Amplification was performed in a thermal cycler as follows: initial denaturation at 94 °C for 10 min, followed by 30 cycles at 94 °C for 30 s, 63 °C for 1 min, and 72 °C for 30 s, and final extension at 72 °C for 10 min. Amplification was verified by resolving an aliquot using a 1.5% (w/v) electrophoresis gel. Amplicons were submitted to 2.5 U Exonuclease I and 0.25 U Shrimp Alkaline Phosphatase purification before sequencing reaction, which was performed using the BigDye® Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems), according to manufacturer’s instructions. Amplicons were resolved through capillary electrophoresis on an ABI3130xl Genetic Analyzer (Applied Biosystems).

Finally, the identification of four STRs markers was performed simultaneously by multiplex PCR carried out in a final volume of 25 μL, containing 20 ng of genomic DNA, 2.5 pmol of each primer, 200 μM dNTPs (Amersham-Pharmacia), 15 mM Tris–HCl (pH 8.3), 75 mM KCl (PCR Buffer II—Applied Biosystems), 0.75 mM MgCl2, and 1 U of AmpliTaq® Gold DNA Polymerase (Applied Biosystems). One primer of each pair was labelled on its 5′ end with different fluorescent dye in order to discriminate each fragment. Multiplex PCR amplification was performed in a thermal cycler, followed by 30 cycles as: initial denaturation at 95 °C for 10 min and 56 °C for 30 s for annealing, and final extension at 72 °C for 5 min. After amplification, an aliquot of PCR products was mixed with formamide (HiDi™ Formamide, Applied Biosystems) and GeneScan™ 500-LIZ (Applied Biosystems), and capillary electrophoresis was performed in an ABI 3130xl Genetic Analyzer (Applied Biosystems) using a 36 cm × 50 μm capillary containing Performance Optimized Polymer-7 (POP-7, Applied Biosystems). Samples were injected during 22 s at 3.0 kV and electrophoresed at 15 kV for 40 min at 60 °C. Amplicon lengths were estimated using GeneScan™ 500-LIZ molecular weight through GeneMapper® 3.2 software (Applied Biosystems). Samples with length determined by DNA sequencing were used as internal controls in the amplification reaction.

Statistical Analysis

Allele frequencies were determined by direct counting. Haplotypes were inferred using PHASE v. 2.1.1 (Stephens et al. 2001). Statistical analysis was performed using WINPEPI program. Chi-square test was used to compare allele frequencies between patients and controls. The null hypothesis was rejected at p < 0.05. Phylogenetic analysis was conducted employing Network Software 4.6.1.3 to assign the phylogenetic relationship between haplotypes.

Results

A total of 29 Brazilian (16 families) and 27 Peruvian (21 families) SCA10 patients as well as 49 Amerindian Quechua controls were genotyped using eight ATXN10 gene-flanking markers (Fig. 1). Alleles were arbitrarily named according to the number of repetitive tracts. For those families with more than one affected member, the genotype of the index case was chosen to perform statistical analysis in order to avoid counting the same haplotype twice. Taking into account the intragenic SNPs, CGGC haplotype was conserved in all SCA10 families, being the most common among Brazilians (78.2%), Peruvians (90.5%), and Quechuas (94.9%), as expected.

Allelic frequency distribution of genetic markers among all samples included in this study is shown in Table S2. D22S532 is the only marker in which allelic frequencies showed statically difference between each patient groups and Quechua controls. Initially, haplotypes were constructed with informative SNPs and STRs D22S1140 and D22S1153, spanning a region of ~1.7 Mb, flanking the pathogenic expansion on both sides. Results showed 16 and 13 different haplotypes defined within Brazilian and Peruvian families, respectively. Considering those, 2 Brazilian haplotypes were linked to SCA10 mutation while 5 different Peruvians haplotypes were SCA10-linked (Table 1). Amerindian Quechua controls showed 14 different haplotypes. The most frequent SCA10-linked shared haplotype, 19-CGGC-14, was present in 47% of Brazilian and in 63% of Peruvian families (Table 2). This haplotype is also present in Amerindian Quechua controls with a very similar frequency (57%) as shown in Table S3. No statistically significant differences were found between patients and Amerindian Quechua controls.

Table 1 Frequencies of haplotypes associated with SCA10
Table 2 Haplotype 19-CGGC-14 versus others haplotypes among groups

Considering all markers as an extended haplotype, the most frequent one in both patient groups was 19-15-CGGC-14-10.

Frequencies of more common haplotype (H2 and H3), considering markers D22S532 and GATA30P only, observed in the present study are shown comparatively to other global populations (Fig. 2). Black bars represent the sum of frequencies of both haplotypes (H2 + H3), and they were shown to be frequent in Amerindians (38.2%) (America data in Fig. 2), as defined previously by Wang et al. (2007). These haplotypes (H2 and H3) correspond to haplotype #10 and #28, respectively, in our study (Table S4). White bars represent the additional 46 different haplotypes. Interestingly, both highly prevalent haplotypes in Amerindians (H2 and H3) were also found at high frequency in mutant alleles of both groups of SCA10 patients: Brazilians SCA10 (50.0%) and Peruvians (77.0%) (Fig. 2).

Fig. 2
figure 2

Distribution of haplotype frequencies of ancestral informative markers among ethno-geographical groups and 19-CGGC-14 SCA10-linked among patients. Names of corresponding haplotype and their alleles are shown in the box in the upper right part of the figure. Black bars represent the combined alleles of D22S532 and GATA30P markers, while white bars represent all other haplotypes. Data from Brazilian and Peruvian SCA10 patients were generated by this current study. Data from the additional seven different regions were from healthy individuals from Wang et al. (2007)

A median-joining network considering haplotypes constructed based on CGGC plus 2 dinucleotide STRs (D22S1140 and D22S1153) is shown in Fig. 3. This network included SCA10 patients, Quechua controls and Brazilian controls. It is relevant to mention that this network was build up with data from this study and from data published before, but always including the same 6 markers defined above. This network has shown similar pattern among groups, displaying a unimodal-like pattern (Fig. 3).

Fig. 3
figure 3

Network of CGGC haplotype lineage in SCA10 patients, Quechua controls and Brazilian controls. Phylogenetic relationship including two STRs (D22S1140 and D22S1153). H1: 12-CGGC-14; H2: 12-CGGC-18; H3: 12-CGGC-19; H4: 15-CGGC-14; H5: 17-CGGC-14; H6: 18-CGGC-14; H7: 18-CGGC-17; H8: 19-CGG-14; H9: 18-CGGC-17; H10: 19-CGGC-18; H11: 19-CGGC-20; H12: 20-CGGC-14; H13: 20-CGGC-15; H14: 21-CGGC-17; H15: 21-CGGC-18; H16: 21-CGGC-21; H17:12-CGGC-17; H18: 15-CGGC-20; H19: 16-CGGC-15; H20: 17-CGGC-15; H21: 17-CGGC-17; H22: 17-CGGC-18; H23: 19-CGGC-21; H24: 20-CGGC-18; H25: 21-CGGC-14; H26: 22-CGGC-14; H27: 16-CGGC-14; H28: 19-CGGC-19; H29: 20-CGGC-20; H21: 21-CGGC-16

Discussion

SCA10 is a rare disease with unknown prevalence that has been mostly described in Latin American populations (Brusco et al. 2004; Fujigasaki et al. 2002; Sulek-Piatkowska et al. 2010; Vale et al. 2010; Wang et al. 2011). To date SCA10 has been reported outside America in just three cases: in a patient born in the Northern coast of Peru, suggesting a significant Amerindian ancestry19, in a Japanese family (Wang et al. 2015), and, more recently, in a Chinese Han family (Naito et al. 2017).

In the present study, common haplotype 19-CGGC-14 was demonstrated to be present in a large cohort of SCA10 families from both Brazil and Peru. Frequencies of this 19-CGGC-14 SCA10-linked do not differ among groups (Table 1). This haplotype has been originally referred as the “SCA10 haplotype”, considering that it was carried by 2 families from the Southernmost state of Brazil and in one Mexican family as well as in 19 out of 20 patients from a previous study of our group (Gheno et al. 2017), signalling that the mutation has a probable common origin (Almeida et al. 2009).

Amerindian Quechua controls from Peru were chosen for the present study due to their high contribution to the genetic pool of the contemporary mixed Peruvian population. All Peruvian SCA10 cases in our cohort show geographical distribution overlapping with traditional Quechua speaking regions. Interestingly, our data have shown that 19-CGGC-14 shared haplotype is the most frequent among Amerindian Quechua controls (57%), which do not differ statistically from those observed for Brazilian (47%) and Peruvian patients (63%). However, these numbers are statistically different from Brazilian controls (12%) (p < 0.001) as shown in Table 2.

It is well known that America has a history of extensive admixture between three continental populations: Africans, Europeans, and Native Americans. Native American ancestry in Brazil is higher in Northwest (Amazonia), while African and European are more widespread, being the most admixed country in the continent (Ruiz-Linares et al. 2014). The Brazilian population heterogeneity may explain statistical difference between Brazilian patients and controls. Peruvian population has a substantial Native American ancestry widespread, very low African ancestry, and European ancestry in Northern/Central areas (Ruiz-Linares et al. 2014). Therefore, under Amerindian ancestral origin of SCA10 mutation, statistical difference was not expected between Peruvians patients and Quechua controls, although samples were not tested for individual admixture proportion.

Nevertheless, these data have to be interpreted carefully when declaring the exclusive Amerindian origin of SCA10 mutation. This multiloci combination is quite frequent in normal chromosomes of Brazilian populations as well as in Mexicans and Portuguese, as previously shown (Almeida et al. 2009).

In order to address this query, we then extended the haplotype by including two additional STRs, D22S532 and GATA030P. Considering the inclusion of these two loci, two haplotypes, H2 (13–10) and H3 (15–10), are carried out by almost 40% of the Amerindians (Wang et al. 2007), unlike what happens to populations from other continents, as shown in Fig. 1. Hence, we assumed that both could be considered a good indicator of Amerindian ancestry. The two most frequent extended haplotypes in controls are 19-13-CGGC-14-10 (14%) followed by 19-15-CGGC-14-10 (13%) (Table S4). Haplotype 19-15-CGGC-14-10 is carried by 50% of Brazilians and 65% of Peruvians as shown in Table 3. The causative mutation may not necessarily have occurred in the most prevalent haplotype. Frequencies of referring haplotypes were compared among Native populations around the world and among 19-CGGC-14 SCA10-linked in both patient groups are shown in Fig. 1. Interestingly, H3 is shown to be in cis with 19-CGGC-14 and the mutant allele, being present in 7 out of 14 Brazilians (50%) and in 11 out of 17 (65%) Peruvians. These findings are in agreement to our previous hypothesis of a common ancestry of the mutation and reinforce further that the studied chromosome block of markers flanking ATTCT mutation has a probable Amerindian origin. Furthermore, a founder effect might be contributing to the distribution of SCA10 throughout American countries (Almeida et al. 2009).

Table 3 Frequencies of the extended haplotypes associated with SCA10

Data from different fields of knowledge, including genetics, support an Asian origin of nowadays Native Americans. Migrants from Asia settled Beringia, an empty land mass of about 1 million km2 exposed due to the last glacial maximum in the end of Pleistocene (Bortolini et al. 2003). They remained there for at least 5000 years; enough time in which numerous autochthonous mutations found only in Native Americans may have arisen (Bortolini et al. 2003, 2014; Fagundes et al. 2008; González-José et al. 2008). After the disappearance of Beringia and the emergence of Bering Strait, this particular population entered and colonized the America (González-José et al. 2008; Bortolini et al. 2014). Then, it is possible to speculate that the origin of SCA10 mutation may have occurred in this Beringian population, considering the wide distribution of the disease in the American continent.

Further, there is a higher level of genetic diversity and a lower level of population structure in Western South America (Andean area) when compared to Eastern South America (Amazonia, Chaco, and Brazilian Central Plateau; Fagundes et al. 2008; Tarazona-Santos et al. 2001). This may explain a relatively higher diversity of Peruvian haplotypes associated with the SCA10 mutation when compared to Brazilian families, since the latter have shown two different haplotypes 19-CGGC-14 and 19-CGGC-18 associated with SCA10 mutation. The 19-CGGCG-18 is different from the most common haplotype by a four-repeat unit in the D22S1140 marker as a result of a relatively common event in STRs represented by gain or loss of repeat units.

CGGC haplotype lineage networks were constructed in order to improve the understanding of the phylogenetic relationship among haplotypes associated with SCA10. Similar pattern was observed among groups, demonstrating a unimodal-like pattern (Fig. 3). This type of pattern is expected to be observed in networks of patients with same mutation origin, while different mutation origins generate a bimodal-like pattern.

We should take into consideration that haplotypes H2 and H3 are also found in other populations throughout the world despite their high prevalence in Native Americans. Unfortunately, there are no exclusive genetic markers of Native American along the chromosome region assessed in this study that could give undoubtedly information about the SCA10 geographical origin. Considering this limitation, additional studies should focus in evaluating the individual genetic ancestry of family members carrying the most common haplotype by employing different approaches to further assess the founder effect for SCA10 in America.

The present data are the most extensive study ever performed addressing the SCA10 ancestral origin and including a large number of SCA10 families from two different geographical regions in South America. To date, the ancestry of the mutation was hypothesized based on theoretical evidence only. We provide here additional molecular evidence that supports a unique Amerindian common origin for SCA10 mutation.