Introduction

Atrial fibrillation (AF) is the most common cardiac arrhythmia that affects nearly 34 million people across the globe. AF can lead to stroke, heart failure, dementia, and death [1]. In addition to clinical risk factors such as advanced age, hypertension, diabetes, obesity, and the male gender [2, 3], many studies have underlined the substantial impact of genetic risk factors in AF.

AF is heritable, with affected individuals having a strong genetic predisposition [4]. It has been shown that one in four AF affected individuals has a first degree relative with AF [5]. Genome-wide association studies (GWAS) have identified more than 100 AF-associated common loci [6, 7] and more than 35 genes have been identified as susceptibility loci by candidate gene studies [8]. More recently, genome- and exome-sequencing studies have found rare genetic variants associated with AF [9, 10].

The current lack of highly effective therapies is due to the lacunae in understanding the mechanisms leading to AF. Uncovering the genetic predisposition to AF can offer great improvement of risk prediction models and algorithms that are currently backed by clinical variables, enable a better understanding of the discrete subtypes of AF and open novel therapeutic options [11, 12].

The different subsets of AF could be associated with a specific set of genes. For example, lone and early-onset AF is mostly found to be due to variations in genes involved in electric signaling that cause electrical disturbances that manifest as AF [13]. Genes involved structurally and functionally with ion channels, intercellular signaling, and homeostatic control are widely implicated with AF pathogenesis. Figure 1 illustrates different mechanisms implicated in AF pathogenesis.

Fig. 1
figure 1

Atrial fibrillation pathogenesis. The illustration depicts the four functional mechanisms involved in the pathogenesis of atrial fibrillation. Mutations in transcription factors lead to abnormalities in cardiac development. Structural gene mutations affect sarcomere architecture. Mutations in cardiac ion channels alter electrical properties resulting in refractoriness and action potential duration (APD) shortening. Mutations in genes encoding calcium handling proteins affect contractility

Though AF can be monogenic as studied in large families, the majority of AF presents as a common complex trait with a significant interplay between the predisposing genes and environmental factors [13, 14]. It is interesting to note that AF susceptibility is influenced by both rare variants and common single nucleotide polymorphisms in the common sets of genes. Table 1 summarizes the biological functions of well-established genes associated with AF across many studies. Genome-wide association studies (GWAS) require no prior hypothesis and are designed to identify common disease-associated variants selectively fabricated on SNP microarrays. GWAS along with the high throughput whole-genome sequencing (WGS) and whole-exome sequencing (WES) studies that discover potential rare variants have led to a paradigm shift in studies of human genetics with their comprehensive approach and their ability to delve deeper into the genetic content. This review focuses on GWAS and high throughput sequencing studies on AF published in the last decade to give an insight into the latest findings.

Table 1 List of commonly implicated genes across multiple studies and their functions

Genes identified by GWAS

The earliest GWAS of this decade in 2010 by Ellinor et al. that sought to identify common variants associated with lone AF included 1355 patients and 12,844 referents of European ancestry. The study defined lone AF as AF with an onset before 66 years of age and without a preceding history of myocardial infarction, heart failure, or known left ventricular systolic dysfunction. A significant association was identified with intronic variant rs13376333 in the KCNN3 gene that encodes a potassium intermediate/small conductance calcium-activated channel (subfamily N, member 3) and is involved in atrial repolarization [26].

In 2017, Lee et al. conducted a two-stage GWAS in Korean patients with early-onset AF. Stage 1 consisted of 672 patients who underwent radiofrequency catheter ablation (RFCA) and 3700 controls. The stage 2 replication cohort consisted of 200 independent cases and 1812 controls. The study identified genes for critical transcription factors (PRRX1, PITX2, TBX5, ZFHX3, HAND2) in addition to NEURL, and PPFIA1 [39]. PRRX1 encodes a homeodomain transcription factor highly expressed in the developing heart. A PRRX1 knockout study found impaired fetal pulmonary vasculature development [34]. Paired-like homeodomain 2 (PITX2) encodes a cardiac transcription factor located at 4q25, the locus identified by the first GWAS to be significantly associated with AF in Europeans and Chinese [30]. The region 4q25 has been identified to harbor strong AF susceptibility signals by a study that interrogated nine established AF associated loci in more than 60,000 individuals of European ancestry [40]. The role of PITX2, in specifying pulmonary venous myocardium is considered important, given the fact that pulmonary veins are the sources of AF trigger in most cases [41, 42]. Adult and larval zebrafish pitx2c (cardiac-specific isoform of pitx2) loss-of-function models displayed atrial conduction defects, altered cardiac metabolism and structural remodelling phenotypes similar to that observed in human AF patients. The study demonstrated the role of PITX2 in regulating expression of ion channels involved in cardiac electrical conduction properties by RNA-seq analysis of loss- and gain-of-function embryonic hearts in zebrafish. Pitx2c−/− adult atria exhibit loss of cardiac tissue integrity, electrophysiological defects leading to arrhythmia, fibrosis and other features of AF. The findings of this study support the hypothesis that underlying cardiomyopathy predisposes to AF [43]. TBX5 plays a critical role in cardiac development. It reduces fibrosis and improves cardiac function by reprogramming non-myocytes. Previously gain-of-function mutations in TBX5 have been associated with early-onset AF [44].

The co-regulation activity of the transcriptional factors TBX5 and PITX2 is required for cardiac calcium regulations and the transcriptional regulatory network that they govern is considered essential for atrial development [45]. ZFHX3 is a transcription factor associated with JAK/STAT signaling pathway could play a role in electrical and structural atrial remodeling mediated via the inflammatory process [38]. The transcription factor HAND2 is involved in the reprogramming of cardiac fibroblasts into functional cardiac-like myocytes and is associated with heart repair [22].

The NEURL gene encodes an E3 ubiquitin ligase. A functional study using embryonic zebrafish showed that knockdown of the NEURL ortholog specifically altered atrial action potential duration (APD) without affecting cardiac contractile function or heart rate [15]. The study also supports the role of NEURL in atrial repolarization and hence, AF. NEURL has been found to interact with PITX2 [39]. PPFIA4 encodes a member of the evolutionarily conserved liprin protein family that regulates the disassembly of focal cell adhesion to aid cell–matrix interactions [33].

In 2017, a Japanese study by Low et al. carried out a well-powered GWAS with 8180 AF cases and 28,612 controls and an additional follow up of 3120 cases and 125,064 controls, identified many of the previously reported AF risk loci (KCNN3, PRRX1, CAND2, PITX2, GJA1, CAV1, C9orf3, SYNPO2L, NEURL, CUX2, TBX5, SYNE2, HCN4, ZFHX3) and new loci near KCND3, PPFIA4, SLC1A4, HAND2, NEBL, SH3PXD2A [46]. The majority of the genetic variants identified by GWAS are found in the non-coding regions of the genome [6, 47]. These variants could most likely be involved in gene regulation. An expression quantitative trait locus (eQTL) can be defined as “a locus that explains a fraction of the genetic variance of a gene expression phenotype”. eQTL analysis involves testing the association between such variants and gene expression levels concerning the phenotype [48]. By expression quantitative trait loci (eQTL) analysis Low et al. found that the locus 1q31.1 that includes PPF1A4 contains an SNP rs17461925 that was found to be significantly associated with increased expression of PPF1A4 in the aorta. The number of different loci identified in the study showcases the polygenic nature of AF implicating genes with diverse functions as is described below. Most of the genes encode ion channel proteins involved in conduction and cell coupling. GJA1 encodes a cardiac gap junction protein connexin43 involved in cell–cell electrical coupling. This connexin protein is found altered in cardiac anomalies [21]. CAV1 selectively expressed in the atria encodes caveolin-1, a cellular membrane protein involved in signal transduction. It negatively regulates a potassium channel protein KCNH2, involved in cardiac repolarization [18, 25]. Hyperpolarization-activated cyclic nucleotide-gated channel HCN4, is a cardiac ion channel protein highly expressed in the sinoatrial node. HCN4 is the predominant gene that encodes pacemaker channel in the heart [49]. Mutations in HCN4 are associated with various forms of sinus nodal dysfunction [23, 50].

KCND3 also encodes an ion channel protein with diverse functions relevant to atrial fibrillation pathogenesis such as regulating neurotransmitter release, neural excitability, heart rate, and smooth muscle contraction [46]. A previous study found a gain-of-function mutation in KCND3 to be associated with the early onset of persistent lone AF [24]. CAND2 on the other hand has been shown to prolong the atrial action potential duration similar to NEURL [15]. C9orf3 encodes aminopeptidase that cleaves angiotensin III to generate angiotensin IV which is involved in the renin-angiotensin pathway (NCBI Gene ID: 84909). SYNPO2L encodes Synaptopodin 2-Like Protein, an actin-associated protein with a role implicated in heart morphogenesis and sarcomere organization by gene ontology (NCBI Gene ID: 79933). A zebrafish knockdown study has shown the important role of SYNPO2L in the structural development and function of the sarcomere [51]. Similarly, SYNE2 encodes the isoforms of nesprin-2, expressed in the heart and skeletal muscle. Nesprin-2 is found localized throughout the sarcomere where it anchors the nucleus to the cytoskeleton and maintains nuclear structural integrity [36]. Low et al. in this study identified a common missense variant in the NEBL gene that encodes a nebulin-like protein expressed in cardiac muscle that binds to actin, which interacts with thin filaments and Z-line-associated proteins in striated muscle [46, 52].

A recent study found that inhibiting the function of SH3PXD2A, a scaffolding protein led to the altered regulation of extracellular matrix degradation and axon guidance by growth cone invadosomes [53]. The study by Low et al. on the Japanese has thus added a new dimension to the understanding of the molecular pathogenesis of AF pointing to genes involved in axon guidance and neural crest development [46].

In 2018, a GWAS in the Norwegian population by Nielsen et al. that included 6337 AF cases and 61,407 controls and an independent replication cohort consisting of 30,679 AF cases and 2,78,895 controls identified two AF loci of which the region 2q31 was linked to an important gene TTN gene [54], that encodes sarcomeric protein titin, the largest human protein highly expressed in cardiac tissues and makes up the striated heart muscle [37]. The other locus 1p32 was previously found to be associated on a genome-wide scale with ECG-derived measurements (QRS amplification and duration) that depict cardiac structure and function, most likely associated with the development of AF. Pathway and functional enrichment analyses of most of the risk variants at AF associated loci in the study point to regions of open chromatin state during fetal heart development implying that impaired cardiac development in the fetus can lead to critical AF risk factors in adult life such as altered elastic properties of atrium resulting in increased left atrial pressure that can facilitate premature depolarizations from pulmonary vein, the potential AF trigger [54].

A large GWAS study that included 60,620 AF cases and 9,70,216 controls from biobanks of samples of European ancestry by Nielsen et al. in 2018 identified risk variants at 111 genomic loci and prioritized 151 candidate genes that included those that are involved in cardiac and skeletal muscle function and integrity (AKAP6, CFL2, MYH6, MYH7, MYO18B, MYO1C, MYOCD, MYOT, MYOZ1, MYPN, PKP2, RBM20, SGCA, SSPN, SYNPO2L, TTN, TTN-AS, WIPF1); mediation of developmental events (ARNT2, EPHA3, FGF5, GATA4, GTF2I, HAND2, LRRC10, NAV2, NKX2–5, PITX2, SLIT3, SOX15, TBX5) along with genes likely to be involved in intracellular calcium handling in the heart (CALU, CAMK2D, CASQ2, PLN), angiogenesis (TNFSF12, TNFSF12-TNFSF13), hormone signaling (CGA, ESR2, IGF1R, NR3C1, THRB), and function of cardiac ion channels (HCN4, KCND3, KCNH2, KCNJ5, KCNN2, KCNN3, SCN10A, SCN5A, SLC9B1). Most of the AF-associated risk variants were present in the non-coding regions suggesting their importance in transcriptional regulation in the adult heart and the development of fetal heart or probably both [55].

A more recent GWAS identified 18 loci associated with left atrium (LA) volume and function demonstrating an association between LA traits derived from cardiac magnetic resonance (CMR) and the development of AF [17]. The study provides evidence for an intrinsic relationship between genetic underpinnings of AF and compromised LA structure and function. The study identified genes linked with cardiomyopathy (DSP, SIX5, CILP) and arrhythmia (MYO18B, TTN, CASQ2, C9OFR3). Mutations in DSP which encodes desmoplakin are associated with cardiomyopathies [56, 57]. SIX5 is localized to a chromosomal region that harbors genes linked with myotonic dystrophy [58]. CILP which encodes cartilage intermediate layer protein 1 has been identified as a marker of fibrosis potentially leading to heart failure. The study also found SNPs associated with the trait, LA active emptying fraction (LAAEF). A decrease in LAAEF has been associated with increased risk of AF [59]. CASQ2 encodes calsequestrin2 which is important for calcium handling and cardiac contraction [17]. Myosin-18B (MYO18B) was recently shown to support myofibril organization in differentiating cardiomyocytes and acts as an actin cross-linker and myosin-2 in cardiac sarcomere [28]. MYO18B was also proposed to be associated with nemaline myopathy and cardiomyopathy [60].

A gene-based association study on 660 patients with a history of either paroxysmal or persistent AF suggested the contribution of ZFHX3 to AF remodeling and response to catheter ablation, an established AF treatment. When single SNPs from GWAS studies do not exhibit a large effect in GWAS univariate analyses, gene-based tests (miniSNP, VEGAS-versatile gene-based association study) can detect genome-wide significant genes [61].

A study that investigated variants causal to AF from loci previously identified from GWAS and a meta-analysis of GWAS, found a rare loss-of-function variant c.105 + 1G > T in SYNPO2L to be associated with a significantly increased risk of developing AF (Odds Ratio, OR 3.5) [6, 62]. The largest meta-analysis of GWAS studies of combined ancestry included more than 65,000 cases and identified 97 loci implicating genes in cardiac development, electrophysiological, contractile, and structural pathways, highlighting the pleiotropic nature of AF associated genetic factors [6]. The locus upstream of PITX2 at 4q25 was found to be the most significant AF associated region in Europeans, Japanese and African Americans. The different loci identified in the meta-analyses were classified into four themes. The loci that are the primary targets of current antiarrhythmic therapy (SCN5A, KCNH2), loci associated with transcriptional regulation, a key feature of AF (TBX5, NKX2), genes whose altered expression levels underlie AF (PRRX1, KCNJ5), and finally loci that are implicated in Mendelian forms of arrhythmia syndromes (CASQ2, PKP2). Mutations in CASQ2 and PKP2 can lead to ventricular tachycardia and impaired cardiomyocyte communication and structural integrity respectively. A previous meta-analysis of the earlier GWAS studies identified PITX2, ZFHX3, and KCNN3, the three top of susceptibility loci and SNPs in or near potential candidate genes involved in cardiopulmonary development, pace-making activity and, signal transduction (PRRX1, CAV1, SYNE2, HCN4, SYNPO2L, and MYOZ1)[36].

In a large scale, multi-ethnic meta-analyses of common and rare variant association studies, Christophersen et al. included individuals in the Atrial Fibrillation Genetics (AFGen) Consortium from the analyses of GWAS, exome-wide association studies (ExWAS), and rare variant association studies (RVAS) in 33 studies, including 22,806 individuals with atrial fibrillation and 132,612 referents [7]. The study identified 12 novel genetic loci that exceeded genome-wide significance, implicating genes involved in cardiac electrical and structural remodeling. The comprehensive meta-analysis identified loci most frequently associated with AF (TTN, PLN, KCNN2, KCNJ5, SH3PXD2A, and SCN10A) and some uncommon AF linked genes (KIFAP3, ANXA4, CEP68, WNT8A, PCM1, and SOX5). KCNN2 encodes calcium-dependent potassium channel SK2 that forms heteromeric channel complexes with SK3 encoded by the previously implicated KCNN3 gene. KCNN2 along with KCNJ5 is known to be involved in the maintenance of the atrial cardiac action potential.

GWAS have demonstrated the polygenic nature of AF. Genome-based prediction models can be used to evaluate genetic risk scores. “An AF polygenic risk score summarizes the cumulative genetic risk” [63]. In 2014, Tada et al. evaluated the risk of incident AF with 12 SNPs associated with AF at a genome-wide significance level (P < 5 × 10–8). AF genotype-risk score (GRS) identified 20% of individuals at two-fold increased risk for incident AF and at 23% increased risk for ischemic stroke [64]. A recent study has shown that genome-wide polygenic risk scores (PRS) derived for common diseases can identify individuals at risk of the disease similar to monogenic mutations. The study employed genome-wide polygenic scores derived from more than 6 million variants from across the genome and identified 6.1% of the population at more than threefold risk for AF [65].

Integrative studies

Recent advances in the “omics” technologies have led to an integrated approach for the simultaneous study of genomics, epigenomics, transcriptomics and proteomics for a holistic understanding of the pathophysiological processes underlying complex human diseases [66]. A study set to unravel the genetic predisposition for AF by probing the non-coding regions, first prioritized candidate genes cross-referencing human and mouse transcriptomic, epigenomic, and chromatin conformation datasets, studied the impact of regulatory elements on putative target gene expression, and identified potential cardiac regulatory elements by Assay for Transposase Accessible Chromatin with high-throughput sequencing (ATAC-seq), a method for genome-wide mapping of chromatin accessibility [67], at the regions associated with AF and deployed EMERGE, an enhancer prediction tool [31]. The study identified BMP10, SMYD2, PITX2, MYOT, TBX5, GJA1, CAV1, HCN4, SPATS2L, and PLN genes to be likely affected by AF associated variants and the role of regulatory region variants in modulating the expression of the potential AF genes. BMP10 and SMYD2 exhibit highly enriched expression in the right atrium. Though the exact function of SPATS2L is not known, a previous study implicated this gene for QT duration [68]. PLN which is enriched in adult tissue showed the highest differential expression in cardiomyocytes in this study. PLN encodes a major substrate for the cAMP-dependent protein kinase in cardiac muscle and was previously found to be involved in the AF pathogenesis [32].

Wang et al. integrated information from three kinds of omics data from samples of European ancestry using computational approaches. GWAS data from the Atrial fibrillation genetics (AFGen) consortium 2017 were integrated with epigenome-wide association study (EWAS) and transcriptome-wide association study (TWAS) which analyzed whole blood DNA methylation markers and whole blood gene expression respectively. The study converted summary statistics from individual omics into gene-level associations and performed tissue-specific network analysis to identify 1931 potential AF-related genes which is much higher than the number of genes identified by GWAS (AFGen 2018) alone. Most of the genes identified in the study were involved in cardiac muscle structure [69].

Genes identified by high throughput sequencing

In recent years, whole genome or exome sequencing studies have identified variants with lesser frequency but that which have a greater effect on the disease phenotype. In lone AF, Olesen et al. found a higher prevalence of rare variations in previously implicated ion-channel proteins involved in cardiac depolarization (e.g. SCN5A and SCN1-3B), or cardiac repolarisation (e.g. KCNQ1, KCNA5, KCND3, KCNE1, KCNH2, and KCNJ2). The results compared with data from NHLBI Exome Variant Server (EVS) found that rare variations (minor allele frequency, MAF < 0.1) were significantly higher in early-onset lone AF patients with an odds ratio (OR) greater than ORs reported for common variations by GWAS [10]. Resequencing genes for cardiac potassium channels found rare variants in these genes to be significantly associated with a cohort of familial AF probands than in healthy controls [70].

Lieve et al. used a panel of genes associated with different cardiomyopathies and catecholaminergic polymorphic ventricular tachycardia to identify the functional genetic determinants in a Dutch family with ventricular arrhythmias and early-onset AF. The next-generation sequencing (NGS) analysis identified a gain–of–mutation in SCN5A in the proband. SCN5A encodes the alpha subunit of the main cardiac sodium channel. A novel missense mutation p.M1851V co-segregated with the clinical phenotype in the family and led to increased channel availability and increased window current [35].

Tucker et al. sequenced ~ 158 kilobases (kb) including PRRX1 in 962 individuals of European descent with or without AF and identified rs577676 within an enhancer that interacts with the PRRX1 promoter to potentially alter its expression in the left atrium. Functional analyses found that suppression of PRRX1 could shorten atrial action potential duration considered to be the hallmark of AF [71]. In a previous WES study on the affected members of a family presenting with a complex phenotype of AF, atrial, and ventricular septal defects, Tucker et al. identified four rare mutations with a gain-of-function effect in the GATA6 gene that encodes a transcription factor that regulates the expression of genes required to maintain electrical stability in the atrium [20]. They also studied a large cohort of early-onset AF using a combination of high-resolution melting and Sanger sequencing to screen GATA6 exons. GATA6 has been implicated in the maintenance of cardiac precursor cells in an undifferentiated proliferative state during cardiac development [72].

Gudbjartsson et al. performed WGS on 2636 Icelanders with AF. A recessive frameshift deletion in MYL4 (c.234delC, p.Cys78Trpfs*29) was identified for the first time. MYL4 encodes myosin essential light chain, the key sarcomeric component found in both embryonic muscle and adult atria [9]. Another study that performed WES on a family presenting with a syndrome characterized by early-onset AF (age < 35 years), conduction disease, and signs of atrial myopathy identified a novel p.Glu111Lys mutation in MYL4 in all its affected members. A primary atrial-specific sarcomeric defect was found to cause the rare AF subtype in this family [27].

A large-scale case–control study sought to identify genes associated with early-onset AF defined as AF onset < 66 years of age. The authors performed WGS with a mean genome coverage of 99.1% and found a loss of function variant in the TTN gene [73]. Interestingly, an excess of loss of function (LOF) variants has been reported in TTN when WES was performed on 24 families. These titin-truncating variants (TTNtv) were also found in an independent cohort of early-onset AF patients. TTNtvs could disrupt the assembly of sarcomeres in both atria and ventricles [74].

A high-throughput sequencing study in 20 parent–offspring trios identified 5 novel rare variants that included a variant in the 5′ regulatory region of PITX2 that downregulated PITX2c in atrial myocytes and four exonic nonsense mutations in SYNE2, ZFXH3, and KCNN3 [75]. A novel mutation (p.Trp498Ter) in the LMNA gene was found to co-segregate with AF-affected members in a four-generation family from the north of China by exome-sequencing [76]. LMNA belongs to the family of intermediate filaments type V lamins. Loss of function mutations similar to the novel mutation identified in this study has been found to underlie the mechanism for LMNA associated dilated cardiomyopathy [77]. Lubitz et al. examined WES data of 1734 AF cases and 9423 controls of predominantly European ancestry. The study did not find any significant association between coding region variants and AF and neither did they observe any significant variant enrichment in the previously reported loci in AF cases suggesting that variants of coding regions may not be the predominant factors to lead to common forms of AF [78].

A cohort study by Yoneda et al. in 2021 performed whole-genome sequencing for 1293 patients with early-onset AF for the analysis of genes included in the commercial gene panels available for arrhythmias and cardiomyopathies. The study found a greater overlap between AF-associated variants and loci linked with inherited cardiomyopathies. The study warrants genetic testing for early-onset AF with the results of the study demonstrating positive genetic test results for up to 16.8% patients diagnosed with AF before 30 years of age [79].

Conclusions

The recent findings point to the genetic complexity of AF with both common and rare variants adding to the heterogeneity. Most AF cases are multifactorial, following the traits of a common complex disorder. Polygenic risk scores that take into consideration the contributions of common variants from relevant pathways can better predict genetic susceptibility. The recent studies have identified loci linked with myocardial structural components, transcription factors and atrial development, cardiomyocyte contractility and electrophysiology to contribute to AF pathogenesis. The major challenge lies in the translation of the genetic data into clinically useful information and clinical decision-making.