Abstract
Identification of genetic variants with effects on trait variability can provide insights into the biological mechanisms that control variation and can identify potential interactions. We propose a two-degree-of-freedom test for jointly testing mean and variance effects to identify such variants. We implement the test in a linear mixed model, for which we provide an efficient algorithm and software. To focus on biologically interesting settings, we develop a test for dispersion effects, that is, variance effects not driven solely by mean effects when the trait distribution is non-normal. We apply our approach to body mass index in the subsample of the UK Biobank population with British ancestry (n ~408,000) and show that our approach can increase the power to detect associated loci. We identify and replicate novel associations with significant variance effects that cannot be explained by the non-normality of body mass index, and we provide suggestive evidence for a connection between leptin levels and body mass index variability.
Similar content being viewed by others
Data availability
The primary data analyzed in this study come from the UK Biobank. Applications for access can be made on the UK Biobank website.
References
Price, A. L., Spencer, C. C. A. & Donnelly, P. Progress and promise in understanding the genetic basis of common diseases. Proc. Biol. Sci. 282, 20151684 (2015).
Hill, W. G., Goddard, M. E. & Visscher, P. M. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet. 4, e1000008 (2008).
Sudlow, C. et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS. Med. 12, e1001779 (2015).
Marchini, J., Donnelly, P. & Cardon, L. R. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat. Genet. 37, 413–417 (2005).
Paré, G., Cook, N. R., Ridker, P. M. & Chasman, D. I. On the use of variance per genotype as a tool to identify quantitative trait interaction effects: a report from the Women’s Genome Health Study. PLoS Genet. 6, e1000981 (2010).
Struchalin, M. V., Dehghan, A., Witteman, J. C., van Duijn, C. & Aulchenko, Y. S. Variance heterogeneity analysis for detection of potentially interacting genetic loci: method and its limitations. BMC Genet. 11, 92 (2010).
Hill, W. G. & Mulder, H. A. Genetic analysis of environmental variation. Genet. Res. (Camb). 92, 381–395 (2010).
Forsberg, S. K. G. et al. The multi-allelic genetic architecture of a variance-heterogeneity locus for molybdenum concentration in leaves acts as a source of unexplained additive genetic variance. PLoS Genet. 11, e1005648 (2015).
Ivarsdottir, E. V et al. Effect of sequence variants on variance in glucose levels predicts type 2 diabetes risk and accounts for heritability. Nat. Genet. 1398–1402 (2017).
Kitano, H. Biological robustness. Nat. Rev. Genet. 5, 826–837 (2004).
Rönnegård, L. & Valdar, W. Recent developments in statistical methods for detecting genetic loci affecting phenotypic variability. BMC Genet. 13, 63 (2012).
Yang, J. et al. FTO genotype is associated with phenotypic variability of body mass index. Nature 490, 267–272 (2012).
Cao, Y., Wei, P., Bailey, M., Kauwe, J. S. K. & Maxwell, T. J. A versatile omnibus test for detecting mean and variance heterogeneity. Genet. Epidemiol. 38, 51–59 (2014).
Cao, Y., Maxwell, T. J. & Wei, P. A family-based joint test for mean and variance heterogeneity for quantitative traits. Ann. Hum. Genet. 79, 46–56 (2015).
Rönnegård, L., Felleki, M., Fikse, F., Mulder, H. A. & Strandberg, E. Genetic heterogeneity of residual variance: estimation of variance components using double hierarchical generalized linear models. Genet. Sel. Evol. 42, 8 (2010).
Box, G. E. P. Non-normality and tests on variances. Biometrika 40, 318–335 (1953).
Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).
Zhang, Z. et al. Mixed linear model approach adapted for genome-wide association studies. Nat. Genet. 42, 355–360 (2010).
Yang, J., Zaitlen, N. A., Goddard, M. E., Visscher, P. M. & Price, A. L. Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 46, 100–106 (2014).
Bycroft, C. et al. Genome-wide genetic data on ~500,000 UK Biobank participants. Preprint at https://www.biorxiv.org/content/early/2017/07/20/166298 (2017).
Turcot, V. et al. Protein-altering variants associated with body mass index implicate pathways that control energy intake and expenditure in obesity. Nat. Genet. 50, 26–41 (2018).
Horikoshi, M. et al. New loci associated with birth weight identify genetic links between intrauterine growth and adult height and metabolism. Nat. Genet. 45, 76–82 (2013).
Freathy, R. M. et al. Variants in ADCY5 and near CCNL1 are associated with fetal growth and birth weight. Nat. Genet. 42, 430–435 (2010).
Hivert, M. F. et al. Genetic determinants of adiponectin regulation revealed by pregnancy. Obesity (Silver Spring) 25, 935–944 (2017).
Perry, J. R. B. et al. Parent-of-origin-specific allelic associations among 106 genomic loci for age at menarche. Nature 514, 92–97 (2014).
Zeggini, E. et al. Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 316, 1336–1341 (2007).
Larsen, T. M., Toubro, S. & Astrup, A. PPARgamma agonists in the treatment of type II diabetes: is increased fatness commensurate with long-term efficacy? Int. J. Obes. Relat. Metab. Disord. 27, 147–161 (2003).
Young, A. I., Wauthier, F. & Donnelly, P. Multiple novel gene-by-environment interactions modify the effect of FTO variants on body mass index. Nat. Commun. 7, 12724 (2016).
Kilpeläinen, T. O. et al. Genome-wide meta-analysis uncovers novel loci influencing circulating leptin levels. Nat. Commun. 7, 10494 (2016).
Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Marazzi, A. Algorithms, Routines, and S-Functions for Robust Statistics. (Chapman and Hall/CRC, New York, 1993).
Lippert, C. et al. FaST linear mixed models for genome-wide association studies. Nat. Methods 8, 833–835 (2011).
Wolfinger, R., Tobiast, R., Sall, J., Tobias, R. & Sall, J. Computing Gaussian likelihoods and their derivatives for general linear mixed models. SIAM J. Sci. Comput. 15, 1294–1310 (1994).
Lippert, C. et al. The benefits of selecting phenotype-specific variants for applications of mixed models in genomics. Sci. Rep. 3, 1815 (2013).
Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).
Acknowledgements
This work was supported by Wellcome Trust grant 095552/Z/11/Z to P.D. and grants 090532/Z/09/Z and 20314/Z/16/Z as core support for the Wellcome Trust Centre for Human Genetics. A.Y. was supported by a Wellcome Trust Doctoral Studentship (099670/Z/12/Z) and by the Li Ka Shing Foundation.
Author information
Authors and Affiliations
Contributions
A.Y. developed the method, led its application to the UK Biobank data, and wrote the paper. F.L.W. was involved in the development and application of the method. P.D. supervised the research and wrote the paper. All work undertaken by F.L.W. was done while F.L.W. was at University of Oxford.
Corresponding authors
Ethics declarations
Competing interests
P.D. is a founder and director of Genomics plc, and a partner of Peptide Groove LLP. The remaining authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
Supplementary Figure 1 Association signal of the additive variance (AV) test for simulated phenotypes with different parameters.
The expected −log10 (P value) of the AV test for different additive and log-linear variance effects of the test locus is indicated by shading. Phenotypes were simulated for 100,000 unrelated individuals (Methods). The test locus had frequency 0.05. To make this plot comparable to Fig. 1, we used the same set of additive effects. As in Fig. 1, the strength of the additive effect is parameterized by the amount of variance explained, h2, if the allele frequency is 0.5. Here the allele frequency is 0.05, so the actual variance explained is 0.19 times the variance explained when the allele frequency is 0.5. The log-linear variance effect is indicated on the y axis and corresponds approximately to the proportional change in phenotypic variance per allele. We have highlighted two regions of parameter space: the area inside the green lines is where the association signal is stronger under the AV test than under the additive test, and the area inside the yellow lines is where the AV test is genome-wide significant (P < 5 × 10−8) but the additive test is not.
Supplementary Figure 2 Comparison of association signal for the additive variance (AV) and additive tests for different sample sizes.
The association signal when testing for both additive and log-linear variance effects (AV test) compared to testing for only an additive effect (additive test) in simulations. The y axis gives the expected log ratio (base 10) of the P value from the additive test to the AV test for different variance effects of the test SNP (x axis), with values above zero indicating a stronger signal from the AV test. The simulations were performed for sample sizes of 10,000 (red), 50,000 (green), and 100,000 (blue), indicated with the different colored curves. The log ratio is plotted as a crossed box if the expected P value from the additive variance test would pass the standard genome-wide significance threshold of 5 × 10−8, and it is plotted with a triangle if neither of the expected P values from the two tests would pass the significance threshold.
Supplementary Figure 3 Relationship between additive and variance effects from GIANT meta-analyses.
Estimated additive (x axis) and variance (y axis) effects on BMI are plotted for all genome-wide loci, shaded in proportion to the negative log10 (P value) for an additive effect, up to a maximum of negative log10 (5 × 10−8), the conventional boundary for genome-wide significance. The additive effects are taken from Locke et al. (Nature 518, 197–206, 2015), and the variance effects are taken from Yang et al. (Nature 490, 267–272, 2012). Because of the mean–variance relationship of untransformed BMI, any locus with an additive effect is expected to have a variance effect, even after inverse normal transformation. The red line has slope 0.1071, determined by robust regression of genome-wide variance effects on additive effects, with weights proportional to the inverse square of the standard error of the estimated variance effects.
Supplementary Figure 4 Relationship between estimated leptin effect and estimated dispersion effect on BMI.
Estimated leptin effect (s.d. change in leptin per allele) (x axis) and dispersion (y axis) effects on BMI are plotted for the top 100 approximately independent SNPs ranked by evidence for a leptin effect (Methods). The leptin effects are taken from Kilpeläinen et al. (Nat. Commun. 7, 2016), and the dispersion effects are taken from our analysis of the UK Biobank. The red line gives the estimated expected dispersion effect for a given leptin effect (Methods).
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–4, Supplementary Tables 1 and 5, and Supplementary Note
Supplementary Table 2
Genome-wide summary statistics for BMI
Supplementary Table 3
Summary statistics for genome-wide significant SNPs
Supplementary Table 4
Summary statistics from the gene-by-environment interaction analysis
Rights and permissions
About this article
Cite this article
Young, A.I., Wauthier, F.L. & Donnelly, P. Identifying loci affecting trait variability and detecting interactions in genome-wide association studies. Nat Genet 50, 1608–1614 (2018). https://doi.org/10.1038/s41588-018-0225-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-018-0225-6
- Springer Nature America, Inc.
This article is cited by
-
Identifying latent genetic interactions in genome-wide association studies using multiple traits
Genome Medicine (2024)
-
Systematic discovery of gene-environment interactions underlying the human plasma proteome in UK Biobank
Nature Communications (2024)
-
Genetic influence on within-person longitudinal change in anthropometric traits in the UK Biobank
Nature Communications (2024)
-
Calibrated prediction intervals for polygenic scores across diverse contexts
Nature Genetics (2024)
-
From Mendel to quantitative genetics in the genome era: the scientific legacy of W. G. Hill
Nature Genetics (2022)