Introduction

Type 2 diabetes (T2D) and dyslipidemias are two major health concerns in Hispanic populations. When referring to Hispanics, we use the broad US census definition that states that “Hispanic or Latino refers to a person of Cuban, Mexican, Puerto Rican, South or Central American, or other Spanish culture or origin regardless of race,” so it is important to note that within this group, there is substantial heterogeneity with regard to cultural, geographic, economic, environmental, lifestyle, and genetic factors that are known to impact diabetes and dyslipidemia susceptibility. In 2012, diabetes was the 5th leading cause of death among Hispanics in the USA (7th in the overall population) [1]. This disease has substantial economic costs (direct medical costs and reduced productivity), which were estimated to be $245 billion in 2012 [2], as well as profound societal costs in terms of reduced quality of life. The impact of diabetes is particularly strong among Hispanics, given its high prevalence in this group: during 2010–2012, the age-adjusted percentage of people aged 20 years or older with diagnosed diabetes was estimated as 12.8 % in Hispanics vs. 7.6 % in non-Hispanic whites [3]. Substantial variation in rates of diabetes has been reported for different Hispanic groups: 8.5 % for Central and South Americans, 9.3 % for Cubans, 13.9 % for Mexican Americans, and 14.8 % for Puerto Ricans [3]. Dyslipidemias are a major risk factor for cardiovascular disease (CVD), which was the 2nd leading cause of death in Hispanics (first in the overall US population) [1] in 2010. The prevalence of dyslipidemias is very high in Hispanics. For example, in a recent study based on a large sample (N > 169,000) from Northern California [4], individuals of Mexican ancestry had some of the highest prevalences of high low-density lipoprotein cholesterol (LDL-C) (56.8 %), low high-density lipoprotein cholesterol (HDL-C) (50.9 %), and high triglyceride (TG) concentrations (45.4 %) of all the groups included in the study. Similarly to diabetes, there is a heterogeneous pattern of dyslipidemias among Hispanics. In a recent report of the Hispanic Community Health Study/Study of Latinos (HCHS-SOL), including Hispanic individuals of diverse origins (Cubans, Dominicans, Mexicans, Puerto Ricans, Central Americans, and South Americans) [5], 65 % of the participants had some form of dyslipidemia, but distinctive patterns were found between groups for some lipid traits. The highest prevalence of elevated LDL-C was found among Cubans (44.5 vs. 36 % in the full sample), and the highest prevalence of high TG was found among Central Americans (18.3 vs. 14.8 % in the full sample). It should be noted that the threshold used to define high TG concentrations in this study, >200 mg/dL, was higher than the threshold used in the previous reference, >130 mg/dL, and this explains the difference in prevalence between the datasets.

The substantial variation in the prevalence of diabetes and dyslipidemias reported in different Hispanic groups is not surprising, given the broad geographic origins and the diverse genetic and cultural backgrounds of individuals classified as Hispanic. Most of the populations of the Americas are primarily the result of an admixture process involving indigenous Americans, Africans, and Europeans, which started approximately 500 years ago. However, the relative genetic contributions from each of these parental groups show substantial geographic variation [68]. This can be clearly seen in Fig. 1, which depicts a principal component (PC) plot of four American samples (Mexican Americans, Puerto Ricans, Colombians, and Peruvians) from the 1000 Genomes Project (Phase 3) [9], together with European, African, and indigenous American samples. Differences in continental admixture proportions are evident in the plot defined by the first two PC axes (Fig. 1a). The Mexican American and Peruvian individuals primarily have indigenous American and European ancestry, although some Peruvian individuals also show evidence of African ancestry. The Peruvian sample on average has higher indigenous American ancestry than the Mexican American sample. In contrast to the Mexican American and Peruvian samples, the Colombian and Puerto Rican samples have higher African and lower indigenous American contributions (particularly the Puerto Rican sample). Other interesting patterns emerge when evaluating lower PC axes. For example, in the plot of the first and fourth PC axes (Fig. 1b), it is evident that all the American samples align with the Southern European samples (Spain and Italy), but not with the samples from Central and Western Europe (England, CEU sample) or Finland, in agreement with historical data. Finally, the plot of the first and seventh PC axes reveals that the Mexican American sample aligns with the indigenous samples from Meso-America (Nahua and Maya), and the Peruvian sample aligns with the indigenous samples of South America (Quechua from Peru and Aymara from Bolivia), as expected given the geographic locations of the admixed and indigenous American samples. In contrast, neither the Puerto Ricans nor the Colombians align with the indigenous American samples. This is presumably because the indigenous samples included in this plot are not the proper representatives of the indigenous American groups involved in the admixture process that gave rise to the current Puerto Rican and Colombian populations. These PC plots are relevant because they show that (1) there is a broad range of individual admixture proportions in the American samples, which may require appropriate modelling in genetic association tests in order to avoid false positives, and (2) in addition to potential differences in average continental admixture proportions, there may also be differences between admixed populations at the intra-continental level. The variation in genetic structure, particularly at the continental level, in admixed samples throughout the Americas can has practical consequences for genome-wide association (GWA) studies. It is well known that linkage disequilibrium (LD) patterns differ between continental populations [10]. Populations of African ancestry tend to have lower levels of LD than non-African populations, which have undergone bottlenecks and serial founder effects during and after the out-of-Africa migrations. Additionally, allele frequencies may differ across populations [11]. When performing GWA studies, differential allele frequency or LD patterns between populations may have analytical consequences in multiple study phases including discovery, replication, and fine mapping [12••, 13].

Fig. 1
figure 1

Principal component (PC) plot of the four admixed American populations of the 1000 Genomes Project Phase 3 panel (Colombians, Mexican Americans, Peruvians, and Puerto Ricans), as well as African (Esan from Nigeria, Gambians, Luhya from Kenya, Mende from Sierra Leone, and Yoruba from Nigeria), European (Finnish, British, CEU, Italians, and Spaniards), and indigenous American (Nahua and Maya from Mexico, Quechua from Peru, and Aymara from Bolivia) samples. The PC plot was done with the program Eigenstrat, after pruning the markers based on linkage disequilibrium (r 2 > 0.1) and removing four genomic regions on chromosomes 2, 6, 8, and 17. The total number of markers used for the PC plot was ∼98,500. The eigenvectors were inferred using the individuals in the parental populations (African, European, and indigenous American), and then all the individuals were projected onto those eigenvectors. a Plot of PC axes 1 and 2. AFR African samples (blue squares), EUR European samples (red triangles), NAM indigenous American samples (pink pentagons). CLM (Colombians, green circles), MXL (Mexican Americans, purple circles), PEL (Peruvians, orange circles), PUR (Puerto Ricans, light blue circles). b Plot of PC axes 1 and 4. In this plot, the European samples are separated in three groups: Southern Europeans (Spaniards and Italians), Central and Western Europeans (British and CEU), and Finnish. c Plot of PC axes 1 and 7. In this plot, the indigenous American samples are separated in two groups: Mesoamericans (Nahua and Maya) and South Americans (Quechua and Aymara)

Sociocultural and economic factors are also key to understanding the higher prevalence of diabetes and dyslipidemias in Hispanics as a group, and the differences in prevalence observed between Hispanics of diverse origins. In this respect, significant differences in education level and income exist between Hispanics and non-Hispanic whites [14]. Differences in income, education, and several variables used to assess acculturation (e.g., migration, language use and ability, ethnic identity, acculturative distress, and family experience) have also been described between Hispanic groups [15]. In the HCHS-SOL study focused on type 2 diabetes in Hispanics [16], the investigators reported that the number of years living in the USA, education, and household income had a significant effect on diabetes risk. Similarly, in the HCHS-SOL studies focused on dyslipidemia [5], language preference, income, and education were associated with dyslipidemia prevalence in Hispanics.

There have been numerous candidate gene studies in Hispanics focused on type 2 diabetes and lipid traits. However, this review will only focus on the more recent studies undertaken at the genome-wide level. We will describe the main results of linkage and GWA studies in Hispanics and discuss these results in the context of the findings reported in other population groups. We also highlight other genomic approaches that offer great promise to uncover risk factors for type 2 diabetes and dyslipidemias, such as metagenomic analyses.

Genome-Wide Analyses of Type 2 Diabetes in Hispanic Populations

In the mid-1990s, a critical nexus was reached: the combination of sufficiently efficient computational methodologies [1719] and comprehensive, well-mapped interrogations of the human genome [20, 21] gave rise to an era of genome-wide searches for variation contributing to genetically complex traits. A genome-wide linkage scan for type 2 diabetes in Mexican Americans from Starr County, TX, utilized affected sib-pair approaches and successfully identified the NIDDM1 region on chromosome 2 in 1996 [22]. The effects of NIDDM1 were further found to be interacting with variation on chromosome 15 to increase risk of type 2 diabetes in Mexican Americans [23]. In 2000, a gene in the NIDDM1 region, calpain10, was identified as a diabetes susceptibility gene [24], and although calpain10 has not been replicated by GWA studies, Capn10 knockout experiments in mice have demonstrated that the gene plays an important role in diabetes and obesity [25]. Concurrently, genome-wide linkage analyses of type 2 diabetes and age at onset in 27 extended Mexican American pedigrees identified a significantly linked region on the q arm chromosome 10, which was later found to harbor TCF7L2 [26, 27]. The highest reported prevalence of diabetes in the world is in the indigenous Pima population in Arizona [28]. Genome-wide linkage scans on sibling pairs from this population in the mid-1990s identified a region on 11q23-24 significantly linked to BMI and showing evidence of pleiotropic effects on type 2 diabetes [29, 30]. Since this time, efforts to understand the complex underlying genetic etiology of type 2 diabetes have shifted to ever larger-cohort analyses of ever-increasing amounts of genetic variation.

Today, more than 100 loci for type 2 diabetes and glycemic traits have been identified through numerous GWA studies of common and rare variation in populations of diverse ancestral origins [31]; however, to date, very few GWA studies have been published in cohorts of Mexican ancestry. The first GWA study performed in a non-European cohort was published in 2007 and comprised 561 Mexican American type 2 diabetes cases and controls drawn from the Starr County Health Studies [32]. Although no loci reached genome-wide significance, several loci identified in prior GWA studies in Europeans were replicated [32]. This analysis was subsequently expanded (N = 1273) and meta-analyzed with a cohort from Mexico City (N = 1310) in 2011 [33, 34]. The most significant variants observed in this meta-analysis included known regions near HNF1A and KCNQ1. Top association signals were then meta-analyzed with the DIAGRAM and DIAGRAM+ datasets of European ancestry individuals, resulting in two regions reaching genome-wide significance: HNF1A and CDKN2A/CDKN2B (Table 1). Top association signals in both studies were annotated to explore their roles as expression quantitative trait loci (eQTL) in both adipose and muscle tissues, revealing a marked excess of trans-acting eQTL in top signals in both tissue types.

Table 1 Genome-wide significant regions identified in GWA studies for type 2 diabetes in Hispanics

Recently, using chip-based and imputed genotypes, the Slim Initiative in Genomic Medicine for the Americas (SIGMA) Type 2 Diabetes Consortium identified a novel risk haplotype defined by five SNPs in the gene SLC16A11 in a Mexican and Latin American cohort (N = 8214) [35••] (Table 1). The 73-kb haplotype, which is common in indigenous American and East Asian populations, but rare in European and African individuals, was simultaneously identified in a Japanese cohort [36]. Investigations into the origin of this haplotype suggested that based on length, similarity to Neandertal sequence, and an estimated coalescence that postdates the estimated modern human and Neandertal evolutionary split, it entered modern human lineages through introgression via admixture with Neandertals. SLC16A11 is expressed in the liver, and expression in HeLa cells resulted in significant increases in triacylglycerol levels, suggesting that SLC16A11 may increase risk to type 2 diabetes through hepatic lipid metabolism [35••].

Several groups are now publishing analyses of low frequency and rare variant effects on type 2 diabetes risk in Hispanic and indigenous American populations [37•, 38], and more can be expected in the near future. The SIGMA consortium recently performed whole-exome sequencing on a Hispanic sample comprising 3756 individuals and identified one rare missense variant in HNF1A which, based on attempted replication in multiple ancestral groups, appears to be private to Hispanic populations. Characterization in experimental assays confirmed reduced activation of the mutant HNF1A protein on its target promoter. The SIGMA studies highlight the advantages of investigating type 2 diabetes in large cohorts drawn from understudied populations, such as Hispanic groups, in which allele frequency differences may provide increased power to detect novel risk variants [35••].

In an effort to explore the effects of genes influencing glucose homeostasis on type 2 diabetes risk in Hispanic populations, the Genetics Underlying Diabetes in Hispanics (GUARDIAN) Consortium was assembled, including seven Mexican-origin cohorts without type 2 diabetes in a discovery phase and six Mexican-origin cohorts including diabetes cases and controls in a translation phase, for a total sample of 19,871 [39••]. This strategy of utilizing measures closer to gene action may result in increased power to detect genes relevant to type 2 diabetes risk. Genomic regions associated with measures of glucose homeostasis in the discovery group were carried forward and evaluated for association with the clinical outcome of type 2 diabetes in the translation cohorts. Using this approach, GUARDIAN identified both novel and replicated known risk variants (Table 1). The most significant findings localized to known genes KCNQ1, CDKAL1, CDKN2A/B, and MTNR1B. MTNR1B, a gene previously implicated in type 2 diabetes and fasting glucose in populations of European descent, was found in this analysis to be associated with acute insulin response. MTNR1B is expressed in human islet cells, co-localizes with insulin, and is thought to inhibit glucose-stimulated insulin secretion [40, 41]. These analyses demonstrate the complex, interacting, and pleiotropic nature of type 2 diabetes and quantitative measures that underlie it.

Despite comparatively limited cohort sizes, analyses of type 2 diabetes risk in Hispanic populations have driven diabetes gene discovery by leveraging high disease prevalence, population-specific haplotypic variation, and a private mutation spectrum. There is evidence that these findings are relevant across ancestry: effects of variation in Hispanic populations are significantly directionally consistent with analyses in European ancestry, even at fairly modest levels of significance (p < 0.01) [12••, 42, 43]. Furthermore, due to differential LD structure, inclusion of Hispanic populations in trans-ethnic fine mapping and meta-analyses provides an opportunity to narrow windows of association and localize causal alleles [12••].

As noted by Below et al. and others [34, 44], there is a significant enrichment of eQTLs among top type 2 diabetes-associated loci. Genetic heritability estimates for type 2 diabetes are markedly higher than can be explained by the variation identified to date; to characterize this “missing” heritability, Torres et al. composed multiple SNP subsets by partitioning interrogated maker sets into groups by status as eQTL in several insulin-responsive peripheral tissues [45•]. They discovered that these subsets explain a greater portion of type 2 diabetes risk than expected by chance, suggesting a significant role of regulatory variation in diabetes susceptibility. Several reasons have been suggested as to why so much of the genetic heritability of type 2 diabetes remains unmapped to risk loci [46]. Conclusive identification of less common (0.5–5 % MAF) variation of modest effect will require investments in extremely large sample sizes. The heterogeneous nature of Hispanic populations increases the challenge because to detect variation or effects specific to groups or environments may require sample sizes beyond what exist to be collected. There is evidence that parent of origin may influence effects of variants on type 2 diabetes risk [47]. Studies in mouse models also demonstrate that some genetic effects on type 2 diabetes and related traits are modified by sex, diet, and epigenetic effects, indicating that careful environmental modelling and stratification will be necessary to identify some loci subject to interaction effects [48]. Genetic characterizations of larger Hispanic samples are underway, but especially in the case of extremely rare or private variation, a return to family-based study designs will improve power through enrichment of allelic observations and increased environmental and genetic homogeneity [49].

Genome-Wide Analyses of Blood Lipid Levels in Hispanic Populations

As described for type 2 diabetes, genome-wide studies focused on lipid traits in Hispanic populations have been scarce, particularly when compared with studies in European populations. The initial genome-wide studies were linkage studies in Hispanic families. In one of the earliest studies, Rainwater et al. [50] identified two quantitative trait loci (QTL) on chromosomes 3 and 4 that influenced variation in cholesterol concentrations of small LDL particles. Subsequent studies reported evidence of linkage on chromosome 11 for TG/HDL-C ratio [51], chromosome 2 and 16 for total cholesterol (TC) [52, 53], chromosome 12 and 15 for TG [54, 55], and chromosome 7 for LDL-C/HDL-C ratio [53]. The aforementioned studies used a few hundred highly polymorphic microsatellites spanning the human genome, and the linkage signals were quite broad. Two of the studies followed up the linkage signals with association analyses, reporting associations of markers in the gene ADIPOR2 with TG on chromosome 12 [54], and markers near the gene FLJ45974 with LDL-C/HDL-C ratio [53]. More recently, Hellwege et al. [56•] published a much denser family-based linkage analysis using markers of the Illumina HumanExome Beadchip. This study described 104 regions with LOD scores greater than 3 in Hispanic families. The region showing the strongest evidence of linkage was the known CETP gene for HDL-C, and multiple variants within this gene also showed strong associations with this trait [56•].

GWA studies for lipid traits in Hispanics using dense microarrays have only appeared in the literature relatively recently. Table 2 reports the genome-wide association signals that have been described in studies focused on Hispanic populations. Comuzzie et al. [57] carried out a family-based GWA study for obesity-related traits in children participating in the VIVA LA FAMILIA study, and reported a significant association of variants in the APOA5-ZNF259 region with TG concentrations.

Table 2 Genome-wide significant regions identified in GWA studies for lipid traits in Hispanics

In 2013, Coram et al. [58•] performed a GWA study for lipid traits in a cohort of 3642 Hispanic participants from the Women’s Health Initiative SNP Health Association Resource (WHI-SHARe) and reported genome-wide significant signals within or near the genes GCKR, LPL, and APOA/APOC for TG and CETP and APOA/APOC for HDL-C. These authors also showed that there is a substantial overlap in the genes associated with lipid traits in different population groups. When testing the markers showing genome-wide significance or suggestive evidence of association (p ≤ 10−5) in European GWA studies in the Hispanic and African American WHI cohorts, a strong enrichment of small p values was observed in both cohorts. Additionally, there was a significant correlation of the allelic effects of markers with p ≤ 10−5 identified in Europeans in the Hispanic and African American cohorts. The genomic regions showing association in Europeans accounted for a disproportionate amount of variance in both cohorts.

Weissglas-Volkov et al. [59•], also in 2013, performed a GWA study for HDL-C and TG in a sample from Mexico (N = 4361). Genome-wide significant signals were observed in seven regions previously described in European populations: APOA5, GCKR, and LPL for TG and ABCA1, CETP, LIPC, and LOC55908 for HDL-C. A novel genome-wide significant region was also found near the NPC1 gene for TG. In agreement with the findings of Coram et al.[58•], Weissglas-Volkov et al. also observed a highly significant concordance excess in the directionality of effects in Europeans and Hispanics for 100 SNPs that had been identified in European populations. However, the authors also noted that for 82 of the 100 loci, the SNP with the strongest evidence of association in the Mexican sample was not the same as the lead SNP/best-proxy SNP reported in the European sample, and the LD between the European lead SNP and the SNP showing the strongest association in the Mexican sample was low (r 2 < 0.3), suggesting different haplotypic structures or contributing variants underlying risk at these regions in their sample. Using cross-ethnic mapping, the authors were able to narrow several credible regions identified in the European studies (APOA5, MLXIPL, and CILP2). In particular, in the APOA5 region, the number of likely susceptibility variants was reduced to only one, rs964184.

In 2014, Ko et al. [60•] used a different analytical approach from that used in previous efforts, by including only variants showing differences in frequency between European and Native American populations in the GWA study. This study comprised 3323 individuals in the discovery effort and 6159 additional samples in the replication stage, and reported genome-wide significant signals in regions already described in European studies (CETP for HDL-C, LPL and ZNF259 for TG, and CELSR and CETP for TC). Novel signals were identified within or near the genes UGT8 and RORA (for HDL-C) and SIK3 (for HDL-C and TG). The signal observed in SIK3 was close to the APOA5/ZNF259 region on chromosome 11 and corresponds to an intronic SNP that is common in Mexicans but not observed in a Finnish sample [60•]. An interesting finding of this study was that in two regions associated with TG (LPL and APOA5/ZNF259), there was an excess of indigenous American ancestry in Mexican individuals with high TG concentrations with respect to individuals with low TG concentrations. In order to carry out this analysis, the authors used the program LAMP-LD [61] to infer locus ancestry in the relevant genomic regions.

Very recently, Below et al. [62•] published the results of a meta-analysis of lipid traits in 4383 individuals of Mexican ancestry. The genome-wide significant and suggestive associations were followed up in three additional Hispanic samples comprising 7876 individuals. In this study, genome-wide significant signals were identified in or near CELSR2, ZNF259/APOA5, KANK2/DOCK6, and NCAN/MAU2 for TC; LPL, ABCA1, ZNF259/APOA5, LIPC, and CETP for HDL-C; CELSR2, APOB, and NCAN/MAU2 for LDL-C; and GCKR, TRIB1, ZNF259/APOA5, and NCAN/MAU2 for TG. All these regions have been previously described in GWA studies in European populations. However, based on LD and conditional analyses, the authors reported that the lead SNPs observed in the samples of Mexican ancestry for some of the regions (ABCA1 and LIPC for HDL-C, and NCAN/MAU2 for TG) seem to be independent of the lead SNPs described in the European studies. Combining the Mexican dataset with the European Global Lipids Genetics Consortium (GLGC) [63] dataset, five novel genome-wide significant regions were identified in or near the genes FN1 and SAMM50 for TC, LOC100996634 and COPB1 for HDL-C, and LINC00324/CTC1/PFAS for LDL-C. In agreement with the findings of Coram et al. [58•] and Weissglas-Volkov et al. [59•], Below et al. [62•] also observed strong concordance in direction of effects and little evidence of heterogeneity in effect sizes when comparing lead SNPs in the European and Mexican samples. Finally, the authors also carried out genome-wide association analyses focused on eQTLs reported for four different cell types (lymphoblastoid cell lines (LCL), human liver, adipose, and muscle). These analyses uncovered genome-wide significant signals that were not observed in the original GWA study (LPL, LOC102467079, and LIPC for HDL-C; DOK6 for LDL-C; and TRIB1 for TG). Interestingly, several of the significant eSNPs identified in the meta-analysis were the lead SNPs or had p values close to those of the lead SNP in the meta-analysis (e.g., SNPs in the NCAN, CETP, and CELSR2 regions), suggesting that at least some of the top signals observed in the meta-analysis may have a functional effect through regulation of gene expression. Finally, SNPs showing low p values in the meta-analysis were enriched for eQTLs in most of the cell types (except for LDL-C and TC in LCL, which showed no evidence of enrichment), but the levels of enrichment were tissue dependent. These findings emphasize the importance of using a wide variety of tissue types to understand and interpret the possible functional relevance of genetic effects, and suggest tissue-specific roles for some of these eQTLs.

Metagenomic Approaches in Hispanic Populations

In the previous sections, we highlighted the main findings of genome-wide studies for type 2 diabetes and lipid traits focusing on the human genome. However, there are other genomes that are also relevant to study in order to discover potential risk factors for type 2 diabetes and dyslipidemias. In this respect, in the last 5 years, researchers have uncovered interesting associations of metagenomic profiles with these traits. Most of these studies have focused on the gut microbiota. There is substantial evidence indicating that the microbiome may influence the development of diabetes (for relevant examples, see Musso et al. [64]; Amar et al. [65]; Cani et al. [66]; Qin et al. [67]; Karlsson et al. [68••]; Cai et al. [69]). Importantly, it has been shown that the metagenomic markers for type 2 diabetes may differ between populations. Karlsson et al. [68••] reported that the most discriminatory metagenomic clusters identified in a European sample were different to those of a Chinese cohort. These authors highlighted the need to develop metagenomic predictive tools for type 2 diabetes specifically for the geographical location of the populations studied. More recent research has also demonstrated the involvement of the gut microbiome in lipid levels. Fu et al. [70] identified associations of bacterial taxa with blood lipid concentrations and reported that microbiota explained 6 % of the TG variance and 4 % of the HDL variance, independently of sex, age, and genetic risk factors. Recently, Ross et al. [71] characterized the gut microbiome of 63 individuals from the Cameron County Hispanic Cohort (CCHC) by sequencing the 16S rRNA gene. No associations were observed between any of the taxa identified and diabetes status, cholesterol, or TG. However, this study identified significant differences for several taxa (at the phyla, family, and genus level) between the CCHC and the Human Microbiome Project (HMP) stool data, which was used as a sample representative of a healthy Western microbiome. Interestingly, the shifts in the microbial community identified in CCHC vs. HMP are similar to those observed in studies focused on obese individuals and those with type 2 diabetes. Other gut microbiome studies are currently under way in Hispanic populations, and they offer great promise to develop new predictive tools and potential new targets for type 2 diabetes treatment [72, 73].

Conclusion

In this review, we have highlighted the major findings reported in genome-wide studies in Hispanic populations. Although the number of studies has been limited, and the sample sizes are still relatively small in comparison to published studies in European populations, these efforts have provided important insights about the architecture of type 2 diabetes and lipid traits. GWA studies in Hispanics have identified novel susceptibility regions for type 2 diabetes and lipid traits, which require replication in subsequent studies with larger sample sizes to confirm effects. In this respect, it is important to highlight that replication efforts in Hispanics are complicated by the known heterogeneity in ancestral background in this group. Additionally, these studies have uncovered independent signals in some of the regions previously described in other populations. Some of these signals, such as SLC16A11 [35••], RORA, and SIK3 [60•], may be population specific. Although there is clear evidence of allelic heterogeneity for some regions, in general, a substantial concordance has been described in the direction and size of effects between Hispanics and European populations, both for type 2 diabetes and lipid traits. This is something that has been also reported for other population groups [12••, 74, 75]. In particular, the recent trans-ethnic meta-analysis of type 2 diabetes, which included samples from European, East Asian, South Asian, and Hispanic populations, has provided interesting insights about the disease [12••], some of which may also be applicable to lipid traits. In the trans-ethnic meta-analysis of type 2 diabetes, a significant directional concordance in effect was observed between ethnicities for markers with p ≤ 0.001 or p ≤ 0.01 in the European meta-analysis, but this was not the case for markers showing no association (p > 0.05). Additionally, there was no evidence of heterogeneity of allelic effects at the majority of established loci for the disease. As pointed out by the authors, this suggests that (1) many other common variants with small effects remain to be discovered, but this will require larger sample sizes, and (2) it is improbable that the signals observed in most of the susceptibility regions described so far are due to rare variants (the “synthetic association hypothesis”), because these variants are typically not shared between population groups and the patterns of LD with rare variants would be expected to vary between ethnicities (in addition, the largest whole-exome sequencing dataset for type 2 diabetes published to date, that of the SIGMA Consortium [37•], has failed to find supportive evidence for that hypothesis). The trans-ethnic meta-analysis of type 2 diabetes also showed that, for regions showing shared signals between populations, combining samples from different population groups may result in better fine-mapping resolution, due to the differences in LD patterns between population groups, in agreement with findings for other traits, including lipids [7678]. Finally, the analyses focused on gene expression data in Hispanics have shown that this is an insightful approach from different perspectives: (1) statistical power is increased due to reduced penalty for multiple testing with respect to standard GWA studies, (2) the enrichment in eQTLs observed for genetic markers showing associations with type 2 diabetes or lipid levels suggests that some of the susceptibility regions may be due to variants having an effect on transcriptional regulation, and (3) it highlights the importance of studying tissue-specific effects in order to understand the potential mechanisms behind some of the observed associations.

It is important to note that most of the signals described in this review correspond to common markers. An exception has been the recent whole-exome sequencing study that identified the association of a rare missense variant in HNF1A with type 2 diabetes [37•]. The limited sample sizes used in the discovery efforts in Hispanic populations, typically smaller than 5000 individuals, render these studies underpowered to identify low-frequency variants (variants with a frequency between 0.5 and 5 %). Elucidating the role that low-frequency variants have on complex traits and diseases in Hispanics will require studies based on much larger sample sizes and very dense datasets (e.g., whole-genome or whole-exome sequencing studies, or studies using datasets imputed with dense reference panels). Some recent large-scale studies in European populations are providing important insights into the impact of low-frequency and rare variants on type 2 diabetes [79, 80] and lipid traits [81]. Ongoing meta-analytic efforts based on dense imputed data (e.g., data imputed with Phase 3 panel of the 1000 Genomes Project [82] or the dense haplotype panels created by the Haplotype Reference Consortium (http://www.haplotype-reference-consortium.org)) with the participation of many consortia and tens of thousands of samples are currently under way in Hispanics, not only for type 2 diabetes and lipid traits but also for anthropometric traits (body mass index, height, and waist-hip ratio) and blood pressure. These expanded efforts, in combination with those in other population groups, and future trans-ethnic meta-analyses will bring a better understanding of the genetic architecture of these traits, by uncovering more novel common and low-frequency variants, identifying population-specific effects and allelic and locus heterogeneity, and highlighting causal variants taking advantage of better fine-mapping resolution and better annotation of the human genome.