Introduction

Hypertrophic cardiomyopathy (HCM) is defined by cardiac hypertrophy of generally the left ventricle in the absence of a clinically identifiable etiology. HCM is a significant cause of sudden cardiac death (SCD) in the young and one of the most common causes of SCD among young athletes [1, 2]. Clinically, HCM has a heterogeneous presentation with varying degrees of hypertrophy, fibrosis, myocyte disarray, left ventricular outflow tract obstruction, ventricular septal morphology, associated symptoms, and sudden death susceptibility [3]. This phenotypic heterogeneity is mirrored by variability in the genetic mutations that confer HCM disease susceptibility.

HCM is the most common inherited cardiovascular disease, and hundreds of mutations in at least 27 putative HCM-susceptibility genes have been identified. Clinically available genetic tests for HCM are available focusing particularly on HCM-susceptibility genes that encode components of the cardiac sarcomere [4, 5] including those that encode proteins of the thick myofilament, the intermediate myofilament, and the thin myofilament [6]. Overall, depending on the phenotypes of analyzed cohorts of HCM cases, approximately 30–60 % of patients with HCM will host a mutation in one of these genes with the most prevalent HCM genotypes being MYH7 and MYBPC3 [7].

Despite the significant progress that has been made in identifying disease-associated mutations, the utility of the HCM genetic test is limited by the background genetic variation inherent to these sarcomeric genes in seemingly healthy subjects [8]. Indeed, identification of a variant of unknown significance (VUS) by the HCM genetic test highlights the dilemma inherent in this clinical test. Recently, the first studies to explore this question have identified a startling number of previously cardiomyopathy-associated genetic variants in population-based exome data [9]. Additionally, new analyses have demonstrated a background rate of rare variants in large cohorts of individuals not enriched for the HCM phenotype, such as the Framingham Heart Study and Jackson Heart Study, NHLBI Exome Sequencing Project (ESP), and the 1000 Genomes Project (1kG), introducing additional complication in the interpretation of a VUS in the HCM genetic test [1014]. Recently, amino acid level conservation was identified as a potential method to distinguish pathogenic mutations from benign variants [11]. Additionally, gene-specific probabilities of pathogenicity have been identified which aids the interpretation of HCM genetic testing. However, these studies have largely relied on publically available whole exome sequenced control cohorts whose results have not been validated for the HCM genes. Additionally, many of these studies have been underpowered to perform detailed analyses as relatively small HCM case cohorts were used for comparisons. To this end, we sought to validate the spectrum and prevalence of background genetic variation that would meet the criteria of a “positive” genetic HCM test from publically available control data using a set of Sanger-sequenced, unrelated controls as well as identify information to aid in the probabilistic interpretation of an HCM genetic test result using a large cohort of HCM cases as well as combine the new and current knowledge to provide an algorithm to aid providers in the interpretation of an HCM genetic test.

Methods

Study Cohorts

The Sanger-sequenced control cohort was comprised of 427 unrelated, ostensibly healthy individuals from various racial and ethnic backgrounds who were subjected to HCM genetic testing by Transgenomic Inc. in New Haven, CT, USA (formerly PGxHealth). These control individuals were volunteers recruited and genotyped by Transgenomic as part of the clinical HCM genetic test validation process. Ethnicity, sex, and age at genotyping were recorded for each subject. A normal 12-lead electrocardiogram, Holter monitoring, and echocardiographic analysis were not a prerequisite for the individuals to be included in this cohort.

Additional estimated frequency of background genetic variation was procured from online databases of next-generation sequencing results in large populations of individuals, including the 1000 Genomes Project (1kG) (www.1000genomes.org/) [15] and the ESP (esp.gs.washington.edu/drupal/) [16].

The HCM case cohort was comprised of 2,178 individuals diagnosed with HCM or referred for HCM genetic testing. This cohort included 1,053 unrelated index cases evaluated and diagnosed at the Mayo Clinic HCM Clinic in Rochester, MN, USA, who had consented for HCM genetic testing between April 1997 and April 2007. An additional 1,125 cases were referred to Transgenomic Inc. for HCM clinical genetic testing. Ethnicity, sex, and age at genotyping were recorded for these additional cases. The veracity of the clinical diagnosis of HCM in these referred cases could not be confirmed. The demographic and clinical characteristics of the cohorts are summarized in Table 1.

Table 1 Cohort demographics

Genetic Analysis

Comprehensive genetic analysis of all subjects was conducted for the coding exons of MYH7-encoded myosin heavy chain (MYH7), MYL2-encoded regulatory myosin light chain (MYL2), MYL3-encoded essential myosin light chain (MYL3), MYPBC3-encoded cardiac myosin binding protein C (MYBPC3), ACTC1-encoded actin (ACTC), TNNC1-encoded cardiac troponin C (TNNC1), TNNI3-encoded cardiac troponin I (TNNI3), TNNT2-encoded cardiac troponin T (TNNT2), and TPM1-encoded alpha-tropomyosin (TPM1).

For cases identified at Mayo Clinic, following receipt of written consent for this Mayo Foundation Institutional Review Board-approved protocol, genetic analysis was performed using PCR, DHPLC, and direct DNA sequencing as previously described [17].

Referral cases sent for commercial FAMILION-HCM genetic testing and the Sanger-sequenced controls were genotyped via Sanger sequencing in both the forward and reverse direction. For both case cohorts and the Sanger-sequenced controls, the genotyping methods were optimized for the genes analyzed. The 1kG and ESP samples were previously whole exome sequenced and genotypes were procured from online databases. The 1kG and ESP whole exome sequencing was not specifically optimized for the genes analyzed in this study. Additionally, assessment of large gene rearrangements was not able to be performed using the methods in this study.

Genetic Variant Classification

Genetic variants predicted to alter the protein, such as missense, in-frame and frameshift insertion/deletion, canonical splice sites (±1–2), and nonsense resulting in a premature truncation, were identified. Variants identified in the case cohort that were completely absent in the control cohort or a variant uniquely identified in only a single control cohort individual across all control cohorts assessed, irrespective of ethnicity, were annotated as rare non-synonymous variants (rNSVs). This strict definition was used in order to polarize the variants for the comparisons between case and control.

The designation of rNSVs herein is not meant to imply pathogenicity or even functional relevance to the respective protein. This designation is intended to reflect that had these variants been discovered during the course of a clinical HCM genetic test, each would be considered a possible pathogenic variant or VUS. All variants were classified using standard HGVS nomenclature. In-frame and frameshift insertions and deletions, splice junction, and nonsense rNSVs likely resulting in a shortened protein product were designated as “radical” rNSVs. As with the designation of rNSVs, the designation of radical is not meant to imply disease pathogenicity but is used as a descriptor for further analysis.

Primary Sequence Conservation and Linear Topology Analysis

Primary sequence analysis was conducted utilizing primary sequences from the UCSC Genome Browser [18]. To calculate the degree of conservation of individual residues across species, primary sequences from 44 species including primates, other placental mammals, and non-mammalian vertebrates were used. The “non-identity score” was determined by calculating the number of primary sequences harboring an amino acid not identical to the human residue at that location (substituted). A non-identity score of 0 indicates that the variant involves an amino acid that is not substituted across the species queried. Scoring range for the non-identity score ranged from 0 (no substitution across species) to 44 (high interspecies variability). Protein domains in linear topology analysis were annotated using published literature [19, 20]. Additionally, a moving window analysis was performed using windows examining approximately 5 % of the overall protein (60 amino acid and 100 amino acid moving windows for MYH7 and MYBPC3, respectively). The percent of case rNSVs and all control variants (all control variants were assessed for this analysis) was analyzed for each window and windows with a statistical overrepresentation of case rNSVs were identified.

Statistical Analysis

Statistical analysis utilized Fisher’s exact test and two-sided t tests, where appropriate, with a threshold of significance set to P < 0.05. Variance was expressed as mean ± standard deviation for cohort demographics and clinical characteristics. All other statistical analyses expressed variance as mean ± standard error. Sequencing conservation scores were analyzed using Wilcoxon nonparametric two-sample tests. To determine the likelihood of disease causation, an estimated predictive value (EPV) expressed with 95 % confidence interval was determined as previously described [21]. For the purpose of this calculation, only rNSVs were used for both cases and controls while polymorphisms did not contribute to the calculated EPV. The Sanger-sequenced and ESP controls were used for the frequencies in order to determine the EPV calculations, unless otherwise stated.

Results

Yield in HCM Cases

Overall, the yield of HCM genetic testing for the Mayo cases was found to be 29.2 % (308/1,053). The yield among the cases referred for the Transgenomic HCM genetic test was slightly higher at 33.6 % (378/1,125, P = 0.03). Given the similar yields between the two case cohorts, the cases were combined for the remaining analyses.

The overall yield in the combined case cohort was found to be 31.5 % (686/2,178). The majority of case rNSVs were missense with a yield of 20.3 % among the cases. rNSVs in MYH7 and MYBPC3 accounted for the vast majority of the rNSVs with yields of 15.6 and 12.0 %, respectively. The rNSVs type between the two genes were strikingly different with MYH7 having a missense yield of 11.7 %, while MYBPC3 had a radical yield of 10.5 %. The remaining seven genes had a yield of 4.8 % with only TNNI3 and TPM1 having an individual yield greater than 1 %. These results are summarized in Fig. 1.

Fig. 1
figure 1

Yield of the HCM genetic test in cases. Bar graph representing the percent yield of the HCM genetic test in HCM cases by gene. The overall yield (black) reflects all rNSVs identified. Missense rNSVs (gray) include rNSVs altering a single amino acid. Radical rNSVs (white) include all in-frame and frameshift insertions and deletions, splice junction, and nonsense rNSVs. The numbers above the bars represent the percentage yield

A total of 392 rNSVs were identified in the cases. Of these, 57 were found to be absent from controls and nominally overrepresented in cases (P < 0.05, Table 2). These rNSVs accounted for 49.0 % (336/686) of the genotype-positive HCM cases. This overrepresentation makes these rNSVs near-definitive pathogenic mutations. Although accounting for a large portion of genotype-positive cases, these 57 rNSVs only represent 14.5 % (57/392) of the rNSVs identified. Due to this disproportion between the number of cases and the number of amino acids altered, these rNSVs heavily skew any case–control analysis. Thus, in an effort to determine the impact of the next “new” rNSVs throughout these genes, these overrepresented rNSVs were removed from subsequent analyses comparing case and control sample counts. The yield of rNSVs within the HCM cases before and after the removal of the overrepresented rNSVs is listed in Table 3.

Table 2 Statistically near-definite HCM-causative missense mutations
Table 3 Yield by gene in cases and controls

rNSV Frequency in Control Cohorts

In order to validate the background rate of “genetic noise” inherent in the HCM genetic test previously reported from the 1kG and ESP, comprehensive genetic analysis of a Sanger-sequenced control cohort was performed. Among the Sanger-sequenced, ostensibly healthy individuals, 5.2 % hosted a rare genetic variant, unique among the control cohort. A complete list of all variants found in the Sanger-sequenced control individuals is summarized in Table 4. In comparison, 2.8 % of the ESP cohort had a missense rNSV, a frequency which was significantly lower than the Sanger-sequenced cohort (P = 0.01). The yield of the 1kG cohort was only 0.9 %, which was lower than the Sanger-sequenced cohort (P = 1.6 × 10−6) and the ESP cohort (P = 9.1 × 10−5). These results are summarized in Fig. 2a.

Table 4 Frequency and type of missense variant in Sanger sequence controls
Fig. 2
figure 2

Frequency of rare genetic variants present in the nine genes associated with sarcomeric/myofilament HCM across cohorts of healthy individuals. Bar graph representing the percent yield of the HCM genetic test among the Sanger-sequenced control cohort (white), the ESP (light gray), and the 1kG (black) cohorts. a Percent yield among all control samples. b Percent yield among Caucasian and African descent samples. c Percent yield among Caucasian samples. d Percent yield among samples of African descent. *P < 10−4 compared with respective control cohort percentage; dagger, 10−4 < P < 0.05

To determine whether variation in the relative racial/ethnicity composition of the cohorts was the source of the disparity, the yield of each cohort was recalculated based on relative distribution of race/ethnicity. As the ESP cohort is exclusively Caucasian and African descent, when limiting the Sanger-sequenced cohort to Caucasian and African descent individuals, the yield was 3.3 %, which was similar to the ESP yield (P = NS). This held true when the yield was calculated separately for Caucasian and African descent samples. The 1kG yields remained lower when calculated independently for each ethnic group. While there were differences between the yields across cohorts, there were no differences in yield between ethnicities within each cohort or when the cohorts were combined. These results are summarized in Fig. 2b–d and Table 5.

Table 5 Yield of control cohorts by race and ethnicity

Due to the low background rate in the 1kG samples, this cohort was removed from the remaining analysis moving forward. Therefore, the combined yield of 2.9 % from the ESP samples and the Sanger-sequenced samples was used for all remaining comparisons. This yield was predominately a result of missense rNSVs; however, a background rate of 0.14 % was identified for radical rNSVs. The identified control radical rNSVs are listed in Table 6. The control yield for each gene is summarized in Table 3.

Table 6 Radical variants identified in controls

Linear Topology and Primary Sequence Conservation

Despite possible clustering in MYH7, no topological region hosted a statistically significant overrepresentation of case-derived variants. Additionally, a moving window analysis did not identify a statistical overrepresentation of case rNSVs when correcting for multiple comparisons. The localization of case rNSVs and control variants for MYH7 and MYBPC3 is illustrated in Fig. 3. Due to the low yield in the remaining genes, a topology was not analyzed.

Fig. 3
figure 3

MYH7 and MYBPC3 linear topology. Linear topology of MYH7 (a) and MYBPC3 (b) with the location of all control missense variants (lines below topology) and HCM case rNSVs (lines above topology) identified

Primary sequence analyses revealed overall rNSVs identified among HCM cases localized to amino acids that were more conserved than rNSVs identified in control individuals. Among the 208 distinct missense rNSVs identified in control individuals, 35.6 % involved residues unsubstituted across species. In comparison, a significantly larger proportion of HCM case rNSVs, 56.7 % (164/289), is localized to a universally conserved (i.e., unsubstituted) residue (P = 3.3 × 10−6).

To quantify this further, a non-identity score was calculated; rNSVs identified in control individuals demonstrated weaker conservation resulting in a higher non-identity score (4.1 ± 0.4) compared to cases (2.1 ± 0.3, P = 3.2 × 10−5). While case rNSVs were more conserved than control rNSVs overall, this difference was only driven by missense rNSVs in MYH7 and MYBPC3, as the removal of these rNSVs resulted in only an insignificant trend in the same direction for non-identities score between cases and controls in the remaining genes [case, 2.1 ± 0.5; control, 2.8 ± 0.6 (p = ns)]. These results are summarized in subpanels a and b of Fig. 4, respectively.

Fig. 4
figure 4

Amino acid conservation. a Bar graph of the percentage of rNSVs identified in the control (white) and HCM case (gray) cohorts, respectively, that involve residues completely conserved across species. The numbers within the bars represent the percentage of rNSVs at unsubstituted residues. b Bar graph of the non-identity score of rNSVs identified in the control (white) and HCM case (gray) cohorts, respectively. The numbers within the bars represent the non-identity score. Error bars indicate standard error. *P < 10−4 compared with respective control cohort percentage; dagger, 10−4 < P < 0.05

Interpreting a Genetic Test Result Utilizing an Estimated Predictive Value Calculation

Interpretation of genetic testing becomes more complicated in the setting of the identified background rate; therefore, EPVs were calculated in an effort to assess a “positive” genetic test result that is reporting a probable disease-causative mutation or a possible one (i.e., a VUS). A radical rNSV in MYBPC3 resulted in an EPV of 98 (95–99) suggesting high probability of pathogenicity. The frequency of radical rNSVs in the remaining genes was too low to provide confidence in the pathogenicity of radical rNSVs. A single missense rNSVs yielded a relatively lower estimated predictive value with an EPV of 78 (74–82). Interestingly, there were drastic differences in the EPV between genes when missense rNSVs were assessed by individual genes. By itself, a rare MYH7 missense rNSVs has a moderate EPV of only 87 (74–96). Surprisingly, the identification of a missense rNSVs in MYBPC3 provides a low EPV (69 (58–78)). Among the minor genes, only ACTC1 and TPM1 having EPVs suggesting high probability (EPV > 90) of pathogenicity and missense rNSVs in TNNI3 conferred an EPV of 84 (54–95) suggesting intermediate probability of pathogenicity. The remaining genes (MYL2, TNNC1, TNNT2, MYL3) all have lower probabilities of pathogenicity for missense rNSVs (EPVs < 75). Due to the rarity of missense rNSVs in the minor genes, it is difficult to upgrade a VUS in one of these genes without additional evidence.

As conservation helped to distinguish between case and control rNSVs in MYH7 and MYBPC3, we assessed whether conservation across species could upgrade or downgrade the EPVs for missense variants in these genes. Pathogenic rNSVs were identified as those identified at residues unsubstituted across all species examined. The EPV for missense rNSVs in MYH7 was increased from 87 (83–91) to 92 (87–95) when the rNSVs involved an unsubstituted residue and decreased to 79 (68–86) when involving substituted residues. The EPV for a universally conserved MYBPC3 missense rNSVs increased from 69 (58–78) to 85 (74–91) and decreased from 69 to 52 (27–69) when the residue involved was substituted in >1 of the 44 analyzed species. These results are summarized in Table 7.

Table 7 Summary of calculated EPVs

Discussion

Genetic Testing Noise in Sudden Cardiac Death-Predisposing Diseases

Only within the past decade have the first systematic studies exploring background genetic variation in diseases that predispose individuals to SCD become available. Ostensibly healthy individuals subjected to genetic analysis of the three canonical genes of long QT syndrome (LQTS), KCNQ1, KCNH2, and SCN5A, had a background rate of ∼5 % [21]. Analysis of healthy individuals subjected to genotyping of canonical arrhythmogenic right ventricular cardiomyopathy (ARVC) desmosomal genes (PKP2, DSP, DSG2, DSC2, JUP, and TMEM43) identified a background rate of ∼16 %, three times greater than the background genetic variation in the three LQTS-susceptibility genes [22].

Recently, studies have identified an alarming background rate of rare variants within the nine HCM-susceptibility genes examined in this study, which are estimated to account for at least 90–95 % of currently genetically identifiable HCM. Similar to the previous studies, we identify a conservative background rate of ∼3 % that may be as high as ∼5 %, in these nine sarcomeric genes. Using a large cohort of Sanger-sequenced controls as a “gold standard” of genetic analysis, we were able to validate the background rate among the ESP samples.

Differences in Reported Genetic Variation in Healthy Individuals

The background rate found among Sanger-sequenced control individuals was 5.2 %, which is significantly higher than found in online databases and previously published manuscripts on this topic [23]. This was largely due to differences in ethnicity; however, the low yield among the 1kG may be due to another possible more concerning explanation. As the publically available control cohorts are all based on next-generation sequencing technologies, there may be issues of varying degrees of coverage and issues of inefficient exon capture. A recent publication highlighted the degree of variation in coverage across a number of genes associated with HCM when using next-generation sequencing technologies [24]. This would suggest caution when using the publically available next-generation sequencing based control data to determine background rates or assigning disease pathogenicity, especially when using the 1kG population.

Interpretation of the HCM Genetic Test

The HCM genetic test is recommended for patients with clinically suspected HCM primarily for the purpose of mutation-specific cascade genetic testing of the index case’s relatives [6, 25]. In this way, a clear biomarker of disease can risk stratify family members without current echocardiographic or clinical manifestation of disease and guide future clinical screening of genotype-positive family members [25, 26]. In addition, there is emerging evidence that a positive genetic test, regardless of the type of rNSVs identified, is an independent prognostic indicator of outcome in HCM [7, 27]. Based on this, the need to interpret the pathogenic significance of a positive genetic test result is of critical importance.

Recent studies have begun to address this dilemma identifying a number of distinguishing features of pathogenic mutations. The overrepresentation of rNSVs in cases over controls has been identified as a characteristic of disease-causing genes for the interpretation of whole exome data; however, there are strikingly different levels of intolerance among different disease genes [28]. This highlights the necessity of disease-specific examination. Recent studies have begun to examine the HCM genes specifically. A recent study found that genes associated with HCM had a high intolerance to genetic variation and found that amino acid conservation may help distinguish pathogenic variants from the benign genetic variants identified in up to 3 % of the population [11]. A second study identified gene-specific probabilities of pathogenicity and highlighted four genes (MYH7, MYBPC3, TNNT2, and TNNI3) with the highest pathogenic probability [12]. While these studies have begun to address the issue of the background genetic noise, most were underpowered to perform many assessments.

In our study, we harness the power of over 2,000 genetically tested HCM cases. We identify three of the same genes as Lopes et al. [12] with only TNNT2 being identified with a low probability of pathogenicity. This may be due to cohort-specific differences; additionally, our removal of the near-definitive, overrepresented variants may lower the probability of TNNT2. Lopes et al. [12] identified seven cases with TNNT2 mutations with three of these samples hosting the same TNNT2 p.Ala104Val mutation. Removal of this clearly pathogenic variant may reduce the probability of the next new variant to a similarly low probability identified in our study.

Additionally, we have identified a large number of near-definitive pathogenic rNSVs, as well as several characteristics of rNSVs identified in HCM cases compared with controls that may aid the cardiologist or genetic counselor in this interpretation. A recent report identified that radical rNSVs in the HCM genes are likely pathogenic [11]. While we identified radical rNSVs in 11.2 % of all HCM cases versus 0.14 % of control subjects which strongly supports that frameshift insertions and deletions, splice junction, and nonsense rNSVs are likely pathogenic, over 90 % of these radical rNSVs fall in MYBPC3 suggesting that radical rNSVs within this gene are likely pathogenic, while caution must be used when assessing radical rNSVs in the remaining genes. However, since all the radical control rNSVs were identified in the ESP cohort, our interpretation may be conservative as a recent paper identified that a much larger percentage of radical variants versus missense variants identified by next-generation sequencing were in fact false positives, as they were not confirmed by Sanger sequencing [29]. Conversely, while missense rNSVs were overrepresented 7.5-fold in HCM cases, the 2.8 % background rate of mutation-like, but presumably benign, variants is too high to ignore. This creates a challenge in interpreting a genetic test that has identified a missense VUS, especially in a number of the minor genes where the identified EPV was less than 75.

Previous studies have indicated that both LQTS- and ARVC-associated mutations preferentially localize to specific structure-function domains (regions). Here, however, when correcting for multiple comparisons, we do not observe an overrepresentation in any of the previously reported regions within either the MYH7-encoded beta myosin heavy chain or the MYBPC3-encoded myosin binding protein C.

In agreement with studies on other diseases, HCM case rNSVs preferentially involved amino acids that were conserved across species. This would suggest the use of conservation as a potential mechanism for distinguishing case rNSVs from benign control variants. However, the statistical difference between case and control rNSVs was only driven by variants in MYH7 and MYBPC3 as reflected in a lower non-identity score than control rNSVs for both genes, but not for the remaining minor genes. This has implications for many in silico prediction tools such as SIFT and PolyPhen, as these tools largely rely on conservation. Our data suggests limiting the use of conservation-based tools to only the two major HCM genes MYH7 and MYBPC or at least extreme caution when applying the various in silico prediction tools for variants involving a minor gene.

Based on our findings and incorporating current literature on background noise in the HCM genetic test, we have created an algorithm to aid in the interpretation of a so-called positive HCM genetic test (Fig. 5). rNSVs with evidence (e.g., segregates with disease, functional data) of pathogenicity for that specific rNSVs should be considered probably pathogenic mutations. Additionally, we provide evidence that the absence of rNSVs in a large set of controls as well as overrepresentation in cases provides sufficient evidence to elevate rNSVs to a probably pathogenic mutation. Herein, we have identified 57 rare rNSVs listed in Table 2 that meet these criteria and should be considered actionable for predictive mutation-specific genetic testing among family members. However, pathogenic uncertainty persists for the next rare rNSVs identified. It is reasonable to assume that radical rNSVs in MYBPC3 are near-definite HCM-susceptibility mutation, while greater caution must be applied to the remaining genes. When/if a missense rNSVs is identified in MYH7 or MYBPC3, it should be interpreted in the context of the conservation of the residue. Currently, rare rNSVs involving one of the minor genes, where we have found conservation does not distinguish case mutations from control variants, should be interpreted cautiously taking into context the EPV as well as the pretest probability. Without additional evidence for causality, such a genetic test result should not be used for predictive testing of the index case’s relatives. Importantly, the results of the HCM genetic test, or any other genetic test, should never completely usurp clinical judgment.

Fig. 5
figure 5

Interpretation of HCM genetic test schematic. An algorithm to guide the interpretation of a positive HCM genetic test. A positive genetic test result involving an rNSVs with additional evidence of pathogenicity (i.e., 1 overrepresented in cases and absent in controls, 2 segregates with disease, 3 functional evidence) is a high probability disease mutation. Additionally, a radical mutation in MYBPC3 is also a high probability disease mutation. If a single missense rNSVs is identified, there is less statistical rigor to classify this rNSVs as a definite HCM-causative mutation. Here, the missense rNSVs’ probability of pathogenicity is influenced by the gene in which it is located. Additionally, conservation across species can enhance the interpretation for MYBPC3 and MYH7 missense rNSVs

Given the transition of clinical genetic testing to whole exome sequencing platforms, the American College of Medical Genetics recently provided a list of genes for which they recommend returning incidental findings of which all nine of the HCM genes analyzed in this study are included. Given our identification that the up to 5 % of the general population may host a rare variant in these nine genes, the ability to assess the pathogenicity of these rare variants is key. Assessing an incidental variant in the context we provide may provide physicians with a statistical framework to distinguish pathogenic from benign variants. With this transition and the increased utilization of clinical genetic testing, the identification of overrepresentation in cases for a particular rare variant can carry great weight in the interpretation of the variants’ pathogenicity. To that end, a publically available database providing HCM case counts for a particular mutation, such as ClinVar (http://www.ncbi.nlm.nih.gov/clinvar/), would be of immense value to clinicians. This would allow physicians to interpret genetic results with constantly evolving information.

Limitations

There are several limitations inherent to this study. While we classify any variant identified in the controls as benign, there may be control variants that are pathogenic. However, a recent study examining the impact of rare genetic variation in HCM genes among well-phenotyped samples from the Framingham and Jackson Heart Studies identified that a very small subset of these general population samples carrying these rare variants had evidence of HCM, and many of the previously reported pathogenic mutations identified in these samples had much smaller estimated effect size than previously reported from HCM families [14]. Given this recent data and the overall prevalence of 1 in 500 for HCM, the vast majority of the rNSVs identified in the controls can be assumed to have little effect individually on the HCM phenotype. Further, if some of the control rNSVs are in fact pathogenic, the EPVs would be underestimated, making our estimates a conservative assumption. The lack of phenotypic information on the referral cohort presents another limitation. As the yield for the HCM genetic test can be impacted by the phenotypic strength of the cases assessed, the absence of clinical phenotype for the referral cases is another limitation of this study. However, when examining the cohorts separately, we see that there is a similar yield between the Mayo cases and the referral cases (29.2 and 33.6 %, respectively). While the yields are lower than the expected published yield of ∼50 %, this is expected given the stringent requirement of absence of identified variants among the controls examined. Additionally, the previously published yield is likely slightly inflated with benign variants that have now been identified in controls given that most previous studies required absence in small control cohorts. Additionally, these samples were all subject to genetic testing based on a physician’s supposition of an HCM diagnosis. Given the number of cohorts examined in this study, there is heterogeneity in the sequencing technology, depth of sequencing, accuracy, depth, etc.; we have made every attempt to control for these issues while maintaining a cohort size powered to do the comparisons. Further, there are differences in the racial/ethnic composition of our control and case cohorts. Specifically, our HCM case cohort is predominantly Caucasian due to the geographic location of subject recruitment. Given that there were no statistical differences between ethnicities, we decided to use the overall yield among all ESP and Sanger controls. This may inflate the relative control rNSVs yield due to the greater diversity of that cohort relative to the cases, but would again result in a more conservative estimate. Despite these limitations, several statistical observations appear robust. Future validation of these observations in independent studies utilizing phenotypically robust cohorts of healthy individuals as well as subjects with HCM should help to mathematically upgrade or downgrade a VUS.

Conclusions

This study provides and validates the ESP-derived background rate of genetic variation in ostensibly healthy individuals for the HCM genetic test using cohort analyzed via the gold standard Sanger sequencing. While mutations are clearly overrepresented in disease, a background rate of up to ∼5 % among healthy individuals prevents pathogenic certainty. We have identified a number of EPV-based associations that can guide a clinician in the interpretation of an HCM genetic test.