The National Longitudinal Study of Adolescent to Adult Health (Add Health) Sibling Pairs Genome-Wide Data

McQueen, Matthew B.; Boardman, Jason D.; Domingue, Benjamin W.; Smolen, Andrew; Tabor, Joyce; Killeya-Jones, Ley; Halpern, Carolyn T.; Whitsel, Eric A.; Harris, Kathleen Mullan

doi:10.1007/s10519-014-9692-4

The National Longitudinal Study of Adolescent to Adult Health (Add Health) Sibling Pairs Genome-Wide Data

Original Research
Published: 07 November 2014

Volume 45, pages 12–23, (2015)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Behavior Genetics Aims and scope Submit manuscript

The National Longitudinal Study of Adolescent to Adult Health (Add Health) Sibling Pairs Genome-Wide Data

Download PDF

Matthew B. McQueen¹,
Jason D. Boardman²,
Benjamin W. Domingue²,
Andrew Smolen³,
Joyce Tabor⁴,
Ley Killeya-Jones⁴,
Carolyn T. Halpern^4,5,
Eric A. Whitsel^6,7 &
…
Kathleen Mullan Harris^4,8,9

1440 Accesses
26 Citations
15 Altmetric
Explore all metrics

Abstract

Here we provide a detailed description of the genome-wide information available on the National Longitudinal Study of Adolescent to Adult Health (Add Health) sibling pair subsample (Harris et al. in Twin Res Hum Genet 16:391–398, 2013). A total of 2,020 samples were genotyped (including duplicates) arising from 1946 Add Health individuals from the sibling pairs subsample. After various steps for quality control (QC) and quality assurance (QA), we have high quality genome-wide data available on 1,888 individuals. In this report, we first highlight the QC and QA steps that were taken to prune the data of poorly performing samples and genetic markers. We further estimate the pairwise biological relationships using genome-wide data and compare those estimates to the assumed relationships in Add Health. Additionally, using genome-wide data from known regional reference populations from Europe, West Africa, North and South America, Japan and China, we estimate the relative genetic ancestry of the respondents. Finally, rather than conducting a traditional cross-sectional genome-wide association study (GWAS) of body mass index (BMI), we opted to utilize the extensive publicly available genome-wide information to conduct a weighted GWAS of longitudinal BMI while accounting for both family and ethnic variation.

Maximizing the value of twin studies in health and behaviour

Article 15 May 2023

Twin-Singleton Comparisons Across Multiple Domains of Life

Estimating direct and indirect genetic effects on offspring phenotypes using genome-wide summary results data

Article Open access 14 September 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The National Longitudinal Study of Adolescent to Adult Health (Add Health) is a nationally representative longitudinal study including over 20,000 adolescents originally sampled in Grades 7–12 in the United States between 1994 and 1995. Add Health respondents have been followed through adolescence and into early adulthood with four in-home interviews (1995, 1996, 2001–2002 and 2008–2009). The Add Health design included the oversampling of approximately 3,000 pairs of individuals who were raised in the same household. These pairs of individuals are biologically related to varying degrees including monozygotic (MZ) and dizygotic (DZ) twins, full siblings, half siblings and unrelated. For further details on the study design and sampling scheme for the Add Health Sibling Pairs Sample, including phenotypic, environmental and biological assessments, see Harris et al. (2013). During the fourth in-home visit (Wave IV; 2008–2009), Add Health collected saliva on the entire sample of Add Health respondents (N = 15,701), including the sibling pairs subsample. Consent rates (consent to provide saliva for DNA extraction) among the sibling pairs subsample for Wave IV saliva collection was an impressive 96 %, which was similar to the consent rate for the entire Add Health sample. See Harris et al. (2013) for additional information on the Add Health Study design and genetic data.

Genome-wide association studies have largely been conducted using case–control and/or cross-sectional study designs primarily due to efficiency and ease of collection. The integration of genome-wide data into well-characterized longitudinal and prospective cohort studies that include biological relationships such as the Add Health sibling pair subsample has been much more limited. Notable exceptions include the Framingham Heart Study [NHLBI SNP-Health Association Resource (SHARe)] that follows multi-generational samples prospectively, and the Health and Retirement Study (HRS) that surveys a representative sample of individuals over the age of 50 every 2 years and follows them prospectively. Utilizing genetic data from longitudinal and prospective cohort studies has many potential advantages including refinement of phenotypic endpoints, phenotypic change and trajectory. Within the context of the ethnically diverse Add Health sibling pairs subsample of adolescents and young adults, there are additional advantages including family-based assessment and utilizing measured environmental and social factors collected over time.

Here, we provide a description of the genome-wide data that were generated on the Add Health sibling pairs subsample. In particular, we focus on describing the targeted sample for genotyping, the quality control (QC) and quality assurance (QA) steps that were taken and how putative biological relationships were assessed. Using genome-wide data from known reference populations, we also show the genetic ancestry of the Add Health sibling pairs subsample. We also explore the genetic heritability of body mass index (BMI) using the genome-wide data from the Add Health sibling pairs subsample. Finally, rather than conducting a traditional cross-sectional genome-wide association study (GWAS) of BMI, we opt to utilize the rich genome-wide information publicly available to conduct a weighted GWAS of longitudinal BMI while accounting for both family and ethnic variation. Funding for the genotyping of the sibling pairs subsample was provided by the National Institutes of Child Health and Human Development (R01 HD060726).

Materials and methods

Quality control and quality assurance

The QA/QC Report for the Add Health Sibling Pairs Sample is provided in the Supplemental Materials. Briefly, there we describe how the sample was selected, prepared and genotyped, the number of markers removed, the number of samples removed, sex checks and duplicate concordance. The number of individual samples deemed of high quality for subsequent relationship testing, ancestry estimation and genome-wide analysis is N = 1,888. The number of SNP markers (chromosomes 1–22 and ×) with a genotyping call rate of at least 95 % is N = 940,862.

Computer software

For biological relationship testing, PLINK (Purcell et al. 2007) and Kinship-based Inference for GWAS (KING; Manichaikul et al. 2010) were used. For genetic ancestry estimation, we used KING (Manichaikul et al. 2010) and ADMIXTURE (Alexander et al. 2009). R (R Core Team 2013) was used for graphical display of ancestry information. For the estimation of heritability using genome-wide data, genome-wide complex trait analysis (GCTA; Yang et al. 2011) was used. For the GWAS we used SAS 9.3 (SAS Institute, Cary, NC, USA) and R. Once again, R was used for graphical display of the genome-wide association results.

Estimation of genetic relatedness

Using information from chromosomes 1–22 (919,509 SNP markers) on the clean set of 1,888 individual samples, we estimated identity by state (IBS) and identity by descent (IBD) using PLINK (Purcell et al. 2007) as well as the Kinship Coefficient using KING (Manichaikul et al. 2010). These measures are used to test duplicate concordance, confirm expected biological relationships, identify unknown or cryptic relatedness in the sample and provide the information necessary to assess genetic ancestry. The relationship measures are calculated pairwise for all individuals in the dataset. As generally recommended, we pruned autosomal SNPs to establish an approximately independent set of SNP markers to be used for IBS, IBD and Kinship Coefficient estimation. We used a linkage disequilibrium threshold (r ²) of 0.20 with a SNP window size of 50 and number of SNPs to shift window at each step of 5 (PLINK command: –indep-pairwise 50 5 0.20). After pruning, a set of 231,649 autosomal SNP markers in approximate linkage equilibrium was used to estimate the relationship measures. Pairwise mean IBD was estimated using PLINK (“PI_HAT”). However, PLINK’s estimates of IBD may be biased in stratified (multi-ethnic) samples (Manichaikul et al. 2010; Thornton et al. 2012). Therefore, we relied upon the KING package to provide estimates of relationship (Kinship) that are robust to stratification.

Estimation of genetic ancestry

We explored genetic ancestry in two different ways. Note that the sample of N = 1,888 individuals with clean genotypes includes two MZ twin pairs. For the purposes of estimating genetic ancestry, we removed one individual (randomly) from each of the two MZ twin pairs resulting in a final analysis sample of N = 1,886. For our first approach to estimating genetic ancestry, we used KING (Manichaikul et al. 2010) to identify clusters of individuals based upon genetic similarity. KING uses multidimensional scaling (MDS) with Euclidean distance to generate principal coordinates (PCs) that can be used to identify population substructure. For the KING procedure, we used the same set of 231,649 autosomal SNP markers in approximate linkage equilibrium that was used for the estimation of genetic relatedness.

Second, we explored genetic ancestry using the software package, ADMIXTURE (Alexander et al. 2009). ADMIXTURE uses an efficient likelihood model-based estimation of genetic ancestry using genome-wide data. For the ADMIXTURE procedure, we opted for a supervised analysis utilizing a series of known genetic ancestry populations as fixed groups to estimate the proportion of ancestry that individuals from the Add Health sibling pairs subsample share with each ancestral reference population. The ancestral populations used were derived from the Human Genome Diversity Project (HGDP; Li et al. 2008) and International Haplotype Map Project (HapMap; International HapMap 3 Consortium, 2010). Specifically, we utilized 108 samples from the HGDP to represent the Americas (Surui, Maya, Karitiana, Pima and Colombian), and 402 samples from HapMap to represent Europe (CEU), Africa (YRI), China (CHB) and Japan (JPT). In all, we identified 257,035 SNP markers that overlap across the Add Health sibling pairs subsample, the HGDP sample and the HapMap sample. For efficiency using the program ADMIXTURE, we created an autosomal SNP marker set that was in approximate linkage equilibrium (123,198 SNPs) to estimate ancestry.

GCTA heritability of BMI

We used the GCTA software (Yang et al. 2011) to estimate heritability of BMI as measured in Add Health as part of the Wave 2, Wave III and Wave IV data collection. BMI was calculated using the standard formula of mass (kg) divided by height (m) squared (kg/m²) for each respondent. GCTA works by first estimating the genetic relatedness between all possible pairs of individuals. The genetic relatedness measures are known to be sensitive to population stratification, so for this application, we restricted the analysis to white respondents only. The subsequent step in the GCTA process is the estimation of a random effects model, where the random effects have a covariance structure based on the estimated genetic relatedness values. The percentage of total variance associated with the genetic random effects is considered the estimated heritability. For this particular study, we removed all pair-wise relationship measures above 0.025.

Genome-wide association approach

To conduct SNP-by-SNP genome-wide association analysis of BMI, we started initially with 919,509 autosomal markers with a genotyping call rate greater than 95 %. Further steps involved removing SNP markers that show evidence of deviation from Hardy–Weinberg equilibrium (HWE) in 492 unrelated, self-identified white individuals extracted from the entire sample. These 492 individuals were selected via a two-step process. First, we focused on the homogenous self-identified white sample followed by the random selection of one individual from each biological relationship pair. In all, 6,237 autosomal SNPs were flagged for potential deviation from HWE (p < 0.001) and removed from the genome-wide association analysis. On the basis of minor allele frequency (MAF), we further removed SNP markers with an MAF < 0.01 (32,313). Therefore, the final genome-wide association marker set includes 880,959 autosomal SNPs. As noted previously, the sample of N = 1,888 individuals with clean genotypes includes two MZ twin pairs. For the purposes of the genome-wide association analysis, we removed one individual (randomly) from each of the two MZ twins pair resulting in a final analysis sample of N = 1,886.

To optimize statistical power, rather than conduct a traditional family-based association analysis on the related sets of individuals, we opted for a more flexible linear mixed effects model (Bates et al. 2014). This approach allowed us to model longitudinal measures of BMI (Waves II, III and IV) from all 1,886 individuals while accounting for biological relationships (if present) and within-individual variation in BMI as well as controlling for age, sex and MDS-derived components of ancestry. BMI measures from women who were pregnant were excluded from this analysis. Note that only five respondents did not have BMI measures across all three waves.

Weighted association

A major issue plaguing genome-wide studies is multiple testing that arises from testing hundreds of thousands (if not millions) of SNP markers for association with the disease or trait of interest. In response to this issue, many investigators have advocated the use of a Bonferroni-correction to limit the probability of committing type-I errors. However, this comes at a cost of simultaneously increasing the probability of committing type-II errors, thereby diminishing the opportunity of detecting true association signals. This is particularly true of smaller genome-wide association datasets such as the sibling pairs samples. One solution is to utilize prior information into the association scan. In this study, we use a weighted association approach as implemented by Roeder et al. (2006) to accomplish this. While there are a variety of ways to construct weights, there are only two criteria that must be met. First, each weight must be greater than 0 and the mean of the weights must be 1. There are numerous sources of prior information that can motivate the weighting scheme including linkage scans, bioinformatics information, as well as previously conducted (and independent) genome-wide association signals (Roeder et al. 2007; Roeder and Wasserman 2009). Further, the prior information can be in the form of test statistics (i.e. LOD scores, Z scores) or p-values (Roeder et al. 2006).

The weights for this study were derived from the GWAS on BMI as conducted by the Genetic Investigation of Anthropometric Traits (GIANT) consortium (Speliotes et al. 2010). Details on the sample and the analysis procedures can be found elsewhere (Speliotes et al. 2010). Briefly, the GIANT consortium conducting a GWAS on BMI using 249,796 individuals and made the association signals for each of the ~2.8 M SNP markers available to the public. In particular, the p-values from the GWAS served as the prior information used to devise the weighting scheme for the genome-wide association scan from this study. In the original introduction to this approach, Roeder et al. (2006) introduced exponential and cumulative weighting procedures. We opted for a cumulated weighting scheme that can be less sensitive to large prior association signals and we also used a scaling factor (B) of 2 (Roeder et al. 2006).

We focused on markers that either overlapped between the GIANT consortium and this study or GIANT consortium markers that were in reasonable linkage disequilibrium (r ² > 0.80) with SNP markers from this study. In all, we identified 717,411 markers to be tested for association that also have corresponding weights from the GIANT consortium. As a result, p-values from this study may be up-weighted or down-weighted depending upon the association signal from the GIANT GWAS. More specifically, the unweighted (nominal) p-values from this study are divided by the weights as assigned through the GIANT GWAS to generate the weighted p-values. The weighting procedure was conducted in R using the “weighted_FDR.R” script that can be found at http://www.wpic.pitt.edu/wpiccompgen/fdr/.

Results

Biological relationships

Table 1 uses the Kinship Coefficient generated from KING to tabulate the observed pairwise relationship status based upon genome-wide data versus the expected pairwise relationships based upon information from Add Health (using the Add Health variable, “sibcl4”). The sibcl4 variable is one of the classification variables available for the Add Health sibling pairs subsample. This particular classification designates pairs of respondents into monozygotic twin pair (MZ), dizygotic twin pair (DZ), full sibling pair (FS), half-sibling pair (HS), cousin pair (CO), unrelated pair (UN) and undetermined relationship (UD). A total of 1,781,328 (₁₈₈₈ C ₂) pairwise relationship comparisons were conducted. As recommended by the authors of KING (Manichaikul et al. 2010), a Kinship Coefficient greater than 0.354 is categorized as an MZ twin pair (duplicates have been removed), between 0.177 and 0.354 as 1st degree relationship (DZ twin pairs and full sibling pairs—no parent-offspring are present), between 0.0884 and 0.177 as 2nd degree relationship (half-sibling pair and avuncular), between 0.0442 and 0.0884 as 3rd degree relationship (half-avuncular and first cousin) and less than 0.0442 as not related (NR).Note that the expected values for the Kinship Coefficient are 0.50, 0.25, 0.125, 0.0625 and 0.0 for MZ, DZ/FS, HS, CO and UN respectively and the boundaries suggested by the authors of KING are to account for the variability in the estimated Kinship based upon genome-wide data. As can be seen in Table 1, the vast majority of expected relationships are consistent with the observed genetic relationships. However, there are notable discrepancies. For example, there are 33 expected full sibling pairs that are consistent with being half-sibling pairs according to the genetic data. Further, there are two pairs of MZ twins—one set of MZ twin pairs were thought to be a DZ twin pair while the other set were unknown prior to this study. Additionally, a pair of individuals who were thought to be an MZ twin pair is likely an DZ twin pair. As can also been seen in the table we have detected 266 pairs of individuals thought to be unrelated who are at least distally related. The majority of these pairs (264) are 3rd degree relationships (i.e., cousins). In total, there are 664 full sibling/DZ twin pairs (1,328 individuals) that would be utilized for studies employing a traditional sibling pair family-based design. Based upon these relatedness measures, we have created a new variable available in the Add Health data sources called “sibclg” that specifies the biological relationship based upon the genome-wide information as presented here. This variable will be made available to Add Health researchers through the Restricted-Use Data mechanism of Add Health and will be included with other variables related to the sibling pair data.

Table 1 Observed versus expected relationship status

Full size table

Self-reported ethnicity and multidimensional scaling (MDS)

We used the Add Health “ah_race” variable for self-report ethnicity. This variable includes five categories: White, Black, Native American, Asian and Hispanic. Add Health respondents who self-reported as Hispanic were included in the Hispanic category irrespective of whether they also self-reported as White, Black, etc. Of the 1,886 individuals included in this analysis, there are 917 who self-identify as White, 677 who self-identify as Black, 209 who self-identify as Hispanic, 73 who self-identify as Asian, 8 who self-identify as Native American; 2 individuals have unknown self-identified ethnicity (see Table 2). Supplemental figure 3 shows the first 5 principal coordinate (PC) MDS estimates from KING, color-coded by self-identified ancestry. As can be seen in supplemental figure 3A, the first principal coordinate (PC1) distinguishes the European ancestry from African ancestry while the second principal coordinate (PC2) captures variation between European, Asian and to an extent, Hispanic ancestry. In supplemental figure 3B we see variation from Hispanic samples including a small set of self-identified Native American individuals. Supplemental figures 3C and 3D further distinguish between and within self-identified ethnic groups.

Table 2 Self-identified race and proportion of genetic ancestry

Full size table

Self-reported ethnicity and genetic ancestry

Table 2 and supplemental figure 4 displays the proportion of ancestry shared with each of the reference populations of each individual from the Add Health sibling pairs sample. As can be seen in Table 2 and supplemental figure 4A, the vast majority of individuals who self-identify as White have predominately European ancestry (CEU). Supplemental figure 4B illustrates the admixed ancestry typical of African Americans, self-identifying as Black. Self-identified Black individuals have a varying degree of African (YRI), European (CEU) and American (AMR) ancestry as can be seen in Table 2. Likewise, self-identified Hispanic and Native American individuals (supplemental figures 4C and 4D) display an admixed ancestry largely comprised of American (AMR), European (CEU), African (YRI) and to a lesser extent, Chinese (CHB) and Japanese (JPT) ancestry. Finally, supplemental figure 4E includes self-identified Asians. As can also be seen in Table 2 the ancestry of this subgroup is primarily of Chinese origin (CHB) but with measurable admixture of Japanese (JPT) and European (CEU).

GCTA heritability and weighted genome-wide association analysis of BMI

The GCTA heritability of Wave II, Wave III and Wave IV BMI based upon a sample of white respondents was estimated to be 0.82 (SE = 0.081), 0.71 (SE = 0.091), and 0.67 (SE = 0.084) respectively. Using a bivariate approach, the GCTA genetic correlation between Waves II/III, Waves II/IV and Waves III/IV were estimated to be 0.95 (SE = 0.031), 0.85 (SE = 0.042) and 0.99 (SE = 0.033) respectively. The degree of relationship among the 1,886 individuals used in the genome-wide association analysis varies. In all, there are 614 individuals who are not 1st degree relatives (siblings) of others in the sample (N = 614), 609 sibling pairs (N = 1,218 individuals) and 18 sibling trios (N = 54). Tables 2 and 3 provide the characteristics of the sample used for the genome-wide analysis. As noted previously and seen in Table 2, approximately half of the sample self-identifies as White, 36 % Black, 11 % Hispanic, 4 % Asian and less than 1 % as Native American. Table 3 shows the frequency of males (48 %) and females (52 %) as well as the mean age and BMI of the sample at each wave of collection. Consistent with other studies, the mean (and standard deviation) of BMI increases throughout young adulthood.

Table 3 Characteristics of the genome-wide association sample

Full size table

To assess for the presence of systematic biases in genome-wide analyses, we generated a quantile–quantile(Q–Q) plot of the unweighted p-values (supplemental figure 5). As can be seen in the Q–Q plot, there is no evidence of widespread bias that is generating the associations. To illustrate the distribution of the weights as derived from the GIANT consortium, we provide a simple histogram (Fig. 1). As can be seen in Fig. 1, the vast majority of the p-values genome-wide are effectively down-weighted (<1; gray bar, N = 522,093) while a fair number of p-values are substantially up-weighted (>1; colored bars, N = 195,318). These weights were applied to each of the nominal p-values generated via the linear mixed effect model of longitudinal BMI. The resulting −log₁₀ weighted p-values are displayed using a traditional Manhattan Plot (Fig. 2). The red horizontal line on Fig. 2 represents a genome-wide significant threshold (p = 5 × 10⁻⁸) while the blue horizontal line represents a threshold of p = 5 × 10⁻⁵. Table 4 displays the 39 SNP associations achieving p < 5.0 × 10⁻⁵ ordered by genomic location. We chose a threshold of p < 5.0 × 10⁻⁵ as a reasonable, albeit arbitrary, threshold for association signals that warrants potential follow-up as other studies have done (e.g. see Carty et al. 2012). In the table, we report the SNP marker name, chromosome, base pair location, and the nearest gene and where that SNP is located relative to that gene. We also report the allele conferring risk (increasing BMI units), the frequency of that allele and the other allele present in the data (reported as forward strand). Finally, we report the results of the analysis including the linear mixed model coefficient (corresponding 95 % confidence interval), the unweighted p value, the weight applied to that SNP marker and the corresponding weighted p-value. In this analysis, the linear mixed model coefficient may be interpreted as the additive effect of the risk allele on BMI in units of kg/m². For example, the C allele of SNP rs1421085 (FTO gene) is associated with a 0.27 kg/m² increase in BMI. Overall, the list of 39 SNPs include some marker pairs that are likely in high linkage disequilibrium (LD) from the same genic region. Additionally, the influence of the weighting scheme can clearly be seen in the table. Using weights in this way allows for strongly significant markers to sift to the top even when down-weighted. There are six SNP markers that were down-weighted, yet still achieved a genome-wide suggestive level. However, the remaining 30 SNP markers were up-weighted. Therefore, this is largely a list of SNPs that have been pushed towards the top of the association signals as they are SNPs with prior information indicating evidence of association with BMI (GIANT consortium) and achieved at least nominal significance in the Add Health sample. A notable signal includes the highly replicable FTO gene region (chromosome 16) providing evidence that the Add Health sibling pairs sample is an informative genetic dataset for future use.

Table 4 Top weighted genome-wide association signals (p < 5 × 10⁻⁵) ordered by genomic location

Full size table

Discussion

The primary focus of this study was to introduce the Add Health sibling pairs subsample genome-wide association data and conduct initial analyses to demonstrate the scientific potential of the data as a resource to the Add Health community of researchers. Given the unconventional (among traditional genome-wide studies) Add Health pairs subsample, we adopted a relatively unconventional approach to carry out the genome-wide analysis. First, it is estimated that 96 % of all genome-wide studies have been conducted on people of European descent (Bustamante et al. 2011). The reasons and explanations for focusing so exclusively on samples of European descent range from convenience and efficiency (using existing cohort studies that focus on subjects of European descent) to minimizing sources of genetic heterogeneity (Pulit et al. 2010; Bustamante et al. 2011). However, recently, there has been a series of studies that have empirically demonstrated a critical role of multiethnic studies in genome research of complex disease (Pulit et al. 2010; Masunuru et al. 2012; Carlson et al. 2013; Gong et al. 2013; Manichaikul et al. 2012; Manku et al. 2013; Marigorta et al. 2013; Sabater-Lleal et al. 2013). Often, these multiethnic studies will conduct genome-wide analyses within a relatively homogenous European descent sample and simultaneously conduct a genome-wide analysis among a more genetically diverse sample such as African-Americans before combining the association signals using meta-analysis. An alternative approach, and one that was chosen for the present study, conducts the genome-wide analysis on the entire sample across multiple ethnic backgrounds. This approach has been successfully conducted in other studies of complex disease (for examples, see Kurreeman et al. 2012 and Xu et al. 2013). However, rather than use the Add Health sample as a discovery sample (often requiring very large sample sizes) we adopted a weighting scheme based upon the GIANT consortium (Speliotes et al. 2010) that is comprised of a series of European descent samples. Therefore, the approach taken for the present study is one that explores the extent to which the variants discovered in European descent GIANT consortium may also be of relevance to the multiethnic Add Health pairs subsample.

We note here that there are a multitude of valid and reasonable approaches that investigators may take when conducting a genetic study of a multiethnic, family-based sample with longitudinal measures of phenotype, behaviors and the environment. For example, Add Health researchers may be interested in imputing genotypes for purposes of combining association signals across multiple data sources genotyped on different platforms. Add Health researchers may also be interested in incorporating Add Health sampling weights and/or taking into account phenotypic clustering particularly when studying genetic risk factors within an environmental context. Furthermore, the family-based structure of the data would allow for more specific analyses using informative sibling pair family units through a variety of family-based association approaches (e.g. FBAT, Laird et al. 2000).

Through the weighted genome-wide association analysis, we observed association signals that align with previous and in some cases, established genetic variants associated with BMI. For example, we were able to identify the FTO region that has been previously identified and replicated (Frayling et al. 2007). Additionally, we were able to identify variants that are upweighted through the GIANT consortium results, but do not achieve genome-wide significance in either GIANT or the Add Health sibling pairs subsample. These variants may be of particular interest for researchers who wish to explore GxE interactions in the Add Health sample to further explain the variability of the effect of these variants on BMI over time (age and development), behaviors and under particular environmental contexts.

The Add Health study is unique because of the explicit emphasis on properly characterizing the multilevel and multidimensional aspects of adolescents’ lives as they transition to adulthood. This design in conjunction with the related and unrelated pairs data (see Harris et al. 2013) has expanded the scope of the gene-environment interaction perspective to a multilevel perspective in which environmental influences are measured at the level of the state (Boardman 2009), neighborhood (Cleveland 2003), and schools (Boardman et al. 2012). Most importantly, the research design enables the measurement of factors such as social norms (Boardman et al. 2008) that are otherwise difficult to assess. The assessment of these contextual factors has been highlighted as a critical area for future research in gene-environment interplay (Spittel et al. 2013) and the utilization of genome-wide data in conjunction with this social environmental backdrop may provide important insights in the etiology of complex morbidities such as obesity (Boone-Heinonen and Gordon-Larsen 2012).

As described elsewhere (Boardman et al. 2013a, b), the existing gene–environment interaction typology includes models in which genetic risk may be the most evident in the least risky, the most risky, or the typical environments. Depending on the anticipated G×E relationship and the specific phenotype, environments may either trigger or control genetic expression in a causal manner, or they may simply mask otherwise small genetic associations. Without a representation of the full range of environments, one may conclude that a specific polymorphism is either protective, risky, or not associated with a particular phenotype. Belksy and Pluess (2009) make a very strong case for the differential susceptibility hypothesis that argues that environmentally sensitive loci will be protective in the most enriching environments but deleterious in unhealthy environments. This cross-over association cannot be identified without a representative sample from the full continuum of environments that is, again, why the representativeness of the Add Health study is such an important resource in conjunction with the pairs data.

Finally, it is important to note that GWAS is but one use of genome wide data. For instance, the genome-wide relationship models discussed above (Yang et al. 2010) can be extended to incorporate these multilevel design features so that the contextual variation in the heritability of health behaviors can be examined using measured genetic similarity rather than assumed similarity from sibling-based models. Similarly, sibling fixed effects approaches can take advantage of the “random assignment” of risk alleles to examine siblings residing and socializing in similar environments compared to those in very different social contexts (Fletcher and Lehrer 2011). These methods provide unique and new possibilities to identify causal models and have thus far not been extended to the genome-wide level.

Add Health sibling pairs subsample data access

The genome-wide data and phenotype measures used in this study will be made available to the scientific community through the NIH database of Genotypes and Phenotypes (dbGaP) by January 2015. Researchers interested in using the Add Health sibling pairs subsample genome-wide data will be required to access genotype data through the dbGaP authorized access system. Once genotype data are available through the dbGaP and access has been granted, researchers who request other phenotypic data not in dbGaP will be able to apply for a Genome-wide Data Restricted Access Agreement through Add Health beginning in 2015 (http://www.cpc.unc.edu/projects/addhealth). This process will allow approved investigators access to the entire Add Health sibling pairs subsample longitudinal data in addition to the genome-wide data.

References

Alexander DH, Novembre J, Lange K (2009) Fast model-based estimation of ancestry in unrelated individuals. Genome Res 19(9):1655–1664
Article PubMed Central PubMed Google Scholar
Bates D, Maechler M, Bolker B, Walker S (2014) lme4: linear mixed-effects models using Eigen and S4. R package version 1.1-7. http://CRAN.R-project.org/package=lme4
Belsky J, Pluess M (2009) Beyond diathesis stress: differential susceptibility to environmental influences. Psychol Bull 135(6):885–908
Article PubMed Google Scholar
Boardman JD (2009) State-level moderation of genetic tendencies to smoke. Am J Public Health 99(3):480–486
Article PubMed Central PubMed Google Scholar
Boardman JD, Domingue BW, Fletcher JM (2012) How social and genetic factors predict friendship networks. Proc Natl Acad Sci USA 109(43):17377–17381
Article PubMed Central PubMed Google Scholar
Boardman JD, Daw J, Freese J (2013a) Defining the environment in gene–environment research: lessons from social epidemiology. Am J Public Health 103(Suppl 1):S64–S72
Article PubMed Central PubMed Google Scholar
Boardman JD, Domingue BW, Blalock CL, Haberstick BC, Harris KM, McQueen MB (2013b) Is the gene–environment interaction paradigm relevant to genome-wide studies? The case of education and body mass index. Demography 51(1):119–139
Article Google Scholar
Boone-Heinonen J, Gordon-Larsen P (2012) Obesogenic environments in youth: concepts and methods from a longitudinal national sample. Am J Prev Med 42(5):e37–e46
Article PubMed Central PubMed Google Scholar
Bustamante CD, Burchard EG, De la Vega FM (2011) Genomics for the world. Nature 475(7355):163–165
Article PubMed Central PubMed Google Scholar
Carlson CS, Matise TC, North KE, Haiman CA, Fesinmeyer MD, Buyske S, Schumacher FR, Peters U, Franceschini N, Ritchie MD, Duggan DJ, Spencer KL, Dumitrescu L, Eaton CB, Thomas F, Young A, Carty C, Heiss G, Le Marchand L, Crawford DC, Hindorff LA, Kooperberg CL, PAGE Consortium (2013) Generalization and dilution of association results from European GWAS in populations of non-European ancestry: the PAGE study. PLoS Biol 11(9):e1001661
Article PubMed Central PubMed Google Scholar
Carty CL, Johnson NA, Hutter CM, Reiner AP, Peters U, Tang H, Kooperberg C (2012) Genome-wide association study of body height in African Americans: the Women’s Health Initiative SNP Health Association Resource (SHARe). Hum Mol Genet 21(3):711–720
Article PubMed Central PubMed Google Scholar
Cleveland HH (2003) Disadvantaged neighborhoods and adolescent aggression: behavioral genetic evidence of contextual effects. J Res Adolesc 13(2):211–238
Article Google Scholar
Fletcher JM, Lehrer SF (2011) Genetic lotteries within families. J Health Econ 30(4):647–659
Article PubMed Google Scholar
Frayling TM, Timpson NJ, Weedon MN, Zeggini E, Freathy RM, Lindgren CM, Perry JR, Elliott KS, Lango H, Rayner NW, Shields B, Harries LW, Barrett JC, Ellard S, Groves CJ, Knight B, Patch AM, Ness AR, Ebrahim S, Lawlor DA, Ring SM, Ben-Shlomo Y, Jarvelin MR, Sovio U, Bennett AJ, Melzer D, Ferrucci L, Loos RJ, Barroso I, Wareham NJ, Karpe F, Owen KR, Cardon LR, Walker M, Hitman GA, Palmer CN, Doney AS, Morris AD, Smith GD, Hattersley AT, McCarthy MI (2007) A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316(5826):889–894
Article PubMed Central PubMed Google Scholar
Gong J, Schumacher F, Lim U, Hindorff LA, Haessler J, Buyske S, Carlson CS, Rosse S, Bůžková P, Fornage M, Gross M, Pankratz N, Pankow JS, Schreiner PJ, Cooper R, Ehret G, Gu CC, Houston D, Irvin MR, Jackson R, Kuller L, Henderson B, Cheng I, Wilkens L, Leppert M, Lewis CE, Li R, Nguyen KD, Goodloe R, Farber-Eger E, Boston J, Dilks HH, Ritchie MD, Fowke J, Pooler L, Graff M, Fernandez-Rhodes L, Cochrane B, Boerwinkle E, Kooperberg C, Matise TC, Le Marchand L, Crawford DC, Haiman CA, North KE, Peters U (2013) Fine mapping and identification of BMI loci in African Americans. Am J Hum Genet 93(4):661–671
Article PubMed Central PubMed Google Scholar
Harris KM, Halpern CT, Haberstick BC, Smolen A (2013) The National Longitudinal Study of Adolescent Health (Add Health) sibling pairs data. Twin Res Hum Genet 16(1):391–398
Article PubMed Central PubMed Google Scholar
Kurreeman FA, Stahl EA, Okada Y, Liao K, Diogo D, Raychaudhuri S, Freudenberg J, Kochi Y, Patsopoulos NA, Gupta N, Sandor C, Bang SY, Lee HS, Padyukov L, Suzuki A, Siminovitch K, Worthington J, Gregersen PK, Hughes LB, Reynolds RJ, Bridges SL, Bae SC, Yamamoto K, Plenge RM (2012) Use of a multiethnic approach to identify rheumatoid- arthritis-susceptibility loci, 1p36 and 17q12. Am J Hum Genet 90(3):524–532
Article PubMed Central PubMed Google Scholar
Laird NM, Horvath S, Xu X (2000) Implementing a unified approach to family-based tests of association. Genet Epidemiol 19(Suppl 1):S36–S42
Article PubMed Google Scholar
Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, Cann HM, Barsh GS, Feldman M, Cavalli-Sforza LL, Myers RM (2008) Worldwide human relationships inferred from genome-wide patterns of variation. Science 319(5866):1100–1104
Article PubMed Google Scholar
Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM (2010) Robust relationship inference in genome-wide association studies. Bioinformatics 26(22):2867–2873
Article PubMed Central PubMed Google Scholar
Manichaikul A, Palmas W, Rodriguez CJ, Peralta CA, Divers J, Guo X, Chen WM, Wong Q, Williams K, Kerr KF, Taylor KD, Tsai MY, Goodarzi MO, Sale MM, Diez-Roux AV, Rich SS, Rotter JI, Mychaleckyj JC (2012) Population structure of Hispanics in the United States: the multi-ethnic study of atherosclerosis. PLoS Genet 8(4):e1002640
Article PubMed Central PubMed Google Scholar
Manku H, Langefeld CD, Guerra SG, Malik TH, Alarcon-Riquelme M, Anaya JM, Bae SC, Boackle SA, Brown EE, Criswell LA, Freedman BI, Gaffney PM, Gregersen PA, Guthridge JM, Han SH, Harley JB, Jacob CO, James JA, Kamen DL, Kaufman KM, Kelly JA, Martin J, Merrill JT, Moser KL, Niewold TB, Park SY, Pons-Estel BA, Sawalha AH, Scofield RH, Shen N, Stevens AM, Sun C, Gilkeson GS, Edberg JC, Kimberly RP, Nath SK, Tsao BP, Vyse TJ (2013) Trans-ancestral studies fine map the SLE-susceptibility locus TNFSF4. PLoS Genet 9(7):e1003554
Article PubMed Central PubMed Google Scholar
Marigorta UM, Navarro A (2013) High trans-ethnic replicability of GWAS results implies common causal variants. PLoS Genet 9(6):e1003566
Article PubMed Central PubMed Google Scholar
Musunuru K, Romaine SP, Lettre G, Wilson JG, Volcik KA, Tsai MY, Taylor HA, Schreiner PJ, Rotter JI, Rich SS, Redline S, Psaty BM, Papanicolaou GJ, Ordovas JM, Liu K, Krauss RM, Glazer NL, Gabriel SB, Fornage M, Cupples LA, Buxbaum SG, Boerwinkle E, Ballantyne CM, Kathiresan S, Rader DJ (2012) Multi-ethnic analysis of lipid-associated loci: the NHLBI CARe project. PLoS One 7(5):e36473
Article PubMed Central PubMed Google Scholar
Pulit SL, Voight BF, de Bakker PI (2010) Multiethnic genetic association studies improve power for locus discovery. PLoS One 5(9):e12600
Article PubMed Central PubMed Google Scholar
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81(3):559–575
Article PubMed Central PubMed Google Scholar
R Core Team (2013). R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/
Roeder K, Wasserman L (2009) Genome-wide significance levels and weighted hypothesis testing. Stat Sci 24(4):398–413
Article PubMed Central PubMed Google Scholar
Roeder K, Bacanu SA, Wasserman L, Devlin B (2006) Using linkage genome scans to improve power of association in genome scans. Am J Hum Genet 78(2):243–252
Article PubMed Central PubMed Google Scholar
Roeder K, Devlin B, Wasserman L (2007) Improving power in genome-wide association studies: weights tip the scale. Genet Epidemiol 31(7):741–747
Article PubMed Google Scholar
Sabater-Lleal M, Huang J, Chasman D, Naitza S, Dehghan A, Johnson AD, Teumer A, Reiner AP, Folkersen L, Basu S, Rudnicka AR, Trompet S, Mälarstig A, Baumert J, Bis JC, Guo X, Hottenga JJ, Shin SY, Lopez LM, Lahti J, Tanaka T, Yanek LR, Oudot-Mellakh T, Wilson JF, Navarro P, Huffman JE, Zemunik T, Redline S, Mehra R, Pulanic D, Rudan I, Wright AF, Kolcic I, Polasek O, Wild SH, Campbell H, Curb JD, Wallace R, Liu S, Eaton CB, Becker DM, Becker LC, Bandinelli S, Räikkönen K, Widen E, Palotie A, Fornage M, Green D, Gross M, Davies G, Harris SE, Liewald DC, Starr JM, Williams FM, Grant PJ, Spector TD, Strawbridge RJ, Silveira A, Sennblad B, Rivadeneira F, Uitterlinden AG, Franco OH, Hofman A, van Dongen J, Willemsen G, Boomsma DI, Yao J, Jenny NS, Haritunians T, McKnight B, Lumley T, Taylor KD, Rotter JI, Psaty BM, Peters A, Gieger C, Illig T, Grotevendt A, Homuth G, Völzke H, Kocher T, Goel A, Franzosi MG, Seedorf U, Clarke R, Steri M, Tarasov KV, Sanna S, Schlessinger D, Stott DJ, Sattar N, Buckley BM, Rumley A, Lowe GD, McArdle WL, Chen MH, Tofler GH, Song J, Boerwinkle E, Folsom AR, Rose LM, Franco-Cereceda A, Teichert M, Ikram MA, Mosley TH, Bevan S, Dichgans M, Rothwell PM, Sudlow CL, Hopewell JC, Chambers JC, Saleheen D, Kooner JS, Danesh J, Nelson CP, Erdmann J, Reilly MP, Kathiresan S, Schunkert H, Morange PE, Ferrucci L, Eriksson JG, Jacobs D, Deary IJ, Soranzo N, Witteman JC, de Geus EJ, Tracy RP, Hayward C, Koenig W, Cucca F, Jukema JW, Eriksson P, Seshadri S, Markus HS, Watkins H, Samani NJ, Wallaschofski H, Smith NL, Tregouet D, Ridker PM, Tang W, Strachan DP, Hamsten A, O’Donnell CJ (2013) Multiethnic meta-analysis of genome-wide association studies in >100 000 subjects identifies 23 fibrinogen-associated Loci but no strong evidence of a causal association between circulating fibrinogen and cardiovascular disease. Circulation 128(12):1310–1324
Article PubMed Google Scholar
Speliotes EK, Willer CJ, Berndt SI, Monda KL, Thorleifsson G, Jackson AU, Allen HL, Lindgren CM, Luan J, Mägi R, Randall JC, Vedantam S, Winkler TW, Qi L, Workalemahu T, Heid IM, Steinthorsdottir V, Stringham HM, Weedon MN, Wheeler E, Wood AR, Ferreira T, Weyant RJ, Segrè AV, Estrada K, Liang L, Nemesh J, Park JH, Gustafsson S, Kilpeläinen TO, Yang J, Bouatia-Naji N, Esko T, Feitosa MF, Kutalik Z, Mangino M, Raychaudhuri S, Scherag A, Smith AV, Welch R, Zhao JH, Aben KK, Absher DM, Amin N, Dixon AL, Fisher E, Glazer NL, Goddard ME, Heard-Costa NL, Hoesel V, Hottenga JJ, Johansson A, Johnson T, Ketkar S, Lamina C, Li S, Moffatt MF, Myers RH, Narisu N, Perry JR, Peters MJ, Preuss M, Ripatti S, Rivadeneira F, Sandholt C, Scott LJ, Timpson NJ, Tyrer JP, van Wingerden S, Watanabe RM, White CC, Wiklund F, Barlassina C, Chasman DI, Cooper MN, Jansson JO, Lawrence RW, Pellikka N, Prokopenko I, Shi J, Thiering E, Alavere H, Alibrandi MT, Almgren P, Arnold AM, Aspelund T, Atwood LD, Balkau B, Balmforth AJ, Bennett AJ, Ben-Shlomo Y, Bergman RN, Bergmann S, Biebermann H, Blakemore AI, Boes T, Bonnycastle LL, Bornstein SR, Brown MJ, Buchanan TA, Busonero F, Campbell H, Cappuccio FP, Cavalcanti-Proença C, Chen YD, Chen CM, Chines PS, Clarke R, Coin L, Connell J, Day IN, den Heijer M, Duan J, Ebrahim S, Elliott P, Elosua R, Eiriksdottir G, Erdos MR, Eriksson JG, Facheris MF, Felix SB, Fischer-Posovszky P, Folsom AR, Friedrich N, Freimer NB, Fu M, Gaget S, Gejman PV, Geus EJ, Gieger C, Gjesing AP, Goel A, Goyette P, Grallert H, Grässler J, Greenawalt DM, Groves CJ, Gudnason V, Guiducci C, Hartikainen AL, Hassanali N, Hall AS, Havulinna AS, Hayward C, Heath AC, Hengstenberg C, Hicks AA, Hinney A, Hofman A, Homuth G, Hui J, Igl W, Iribarren C, Isomaa B, Jacobs KB, Jarick I, Jewell E, John U, Jørgensen T, Jousilahti P, Jula A, Kaakinen M, Kajantie E, Kaplan LM, Kathiresan S, Kettunen J, Kinnunen L, Knowles JW, Kolcic I, König IR, Koskinen S, Kovacs P, Kuusisto J, Kraft P, Kvaløy K, Laitinen J, Lantieri O, Lanzani C, Launer LJ, Lecoeur C, Lehtimäki T, Lettre G, Liu J, Lokki ML, Lorentzon M, Luben RN, Ludwig B, Manunta P, Marek D, Marre M, Martin NG, McArdle WL, McCarthy A, McKnight B, Meitinger T, Melander O, Meyre D, Midthjell K, Montgomery GW, Morken MA, Morris AP, Mulic R, Ngwa JS, Nelis M, Neville MJ, Nyholt DR, O’Donnell CJ, O’Rahilly S, Ong KK, Oostra B, Paré G, Parker AN, Perola M, Pichler I, Pietiläinen KH, Platou CG, Polasek O, Pouta A, Rafelt S, Raitakari O, Rayner NW, Ridderstråle M, Rief W, Ruokonen A, Robertson NR, Rzehak P, Salomaa V, Sanders AR, Sandhu MS, Sanna S, Saramies J, Savolainen MJ, Scherag S, Schipf S, Schreiber S, Schunkert H, Silander K, Sinisalo J, Siscovick DS, Smit JH, Soranzo N, Sovio U, Stephens J, Surakka I, Swift AJ, Tammesoo ML, Tardif JC, Teder-Laving M, Teslovich TM, Thompson JR, Thomson B, Tönjes A, Tuomi T, van Meurs JB, van Ommen GJ, Vatin V, Viikari J, Visvikis-Siest S, Vitart V, Vogel CI, Voight BF, Waite LL, Wallaschofski H, Walters GB, Widen E, Wiegand S, Wild SH, Willemsen G, Witte DR, Witteman JC, Xu J, Zhang Q, Zgaga L, Ziegler A, Zitting P, Beilby JP, Farooqi IS, Hebebrand J, Huikuri HV, James AL, Kähönen M, Levinson DF, Macciardi F, Nieminen MS, Ohlsson C, Palmer LJ, Ridker PM, Stumvoll M, Beckmann JS, Boeing H, Boerwinkle E, Boomsma DI, Caulfield MJ, Chanock SJ, Collins FS, Cupples LA, Smith GD, Erdmann J, Froguel P, Grönberg H, Gyllensten U, Hall P, Hansen T, Harris TB, Hattersley AT, Hayes RB, Heinrich J, Hu FB, Hveem K, Illig T, Jarvelin MR, Kaprio J, Karpe F, Khaw KT, Kiemeney LA, Krude H, Laakso M, Lawlor DA, Metspalu A, Munroe PB, Ouwehand WH, Pedersen O, Penninx BW, Peters A, Pramstaller PP, Quertermous T, Reinehr T, Rissanen A, Rudan I, Samani NJ, Schwarz PE, Shuldiner AR, Spector TD, Tuomilehto J, Uda M, Uitterlinden A, Valle TT, Wabitsch M, Waeber G, Wareham NJ, Watkins H, Wilson JF, Wright AF, Zillikens MC, Chatterjee N, McCarroll SA, Purcell S, Schadt EE, Visscher PM, Assimes TL, Borecki IB, Deloukas P, Fox CS, Groop LC, Haritunians T, Hunter DJ, Kaplan RC, Mohlke KL, O’Connell JR, Peltonen L, Schlessinger D, Strachan DP, van Duijn CM, Wichmann HE, Frayling TM, Thorsteinsdottir U, Abecasis GR, Barroso I, Boehnke M, Stefansson K, North KE, McCarthy MI, Hirschhorn JN, Ingelsson E, Loos RJ (2010) Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat Genet 42(11):937–948
Article PubMed Central PubMed Google Scholar
Spittel ML, Spotts EL, Deeds BG (2013) Integration of behavioral, social science and genetics research: exploring public health significance. Am J Public Health 103(Suppl 1):S5–S7
Article PubMed Google Scholar
Thornton T, Tang H, Hoffmann TJ, Ochs-Balcom HM, Caan BJ, Risch N (2012) Estimating kinship in admixed populations. Am J Hum Genet 91(1):122–138
Article PubMed Central PubMed Google Scholar
Xu H, Yang W, Perez-Andreu V, Devidas M, Fan Y, Cheng C, Pei D, Scheet P, Burchard EG, Eng C, Huntsman S, Torgerson DG, Dean M, Winick NJ, Martin PL, Camitta BM, Bowman WP, Willman CL, Carroll WL, Mullighan CG, Bhojwani D, Hunger SP, Pui CH, Evans WE, Relling MV, Loh ML, Yang JJ (2013) Novel susceptibility variants at 10p12.31-12.2 for childhood acute lymphoblastic leukemia in ethnically diverse populations. J Natl Cancer Inst 105(10):733–742
Article PubMed Central PubMed Google Scholar
Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW, Goddard ME, Visscher PM (2010) Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42(7):565–569
Article PubMed Central PubMed Google Scholar
Yang J, Lee SH, Goddard ME, Visscher PM (2011) GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 88(1):76–82
Article PubMed Central PubMed Google Scholar

Download references

Acknowledgments

This research uses data from Add Health, a program project directed by Kathleen Mullan Harris and designed by J. Richard Udry, Peter S. Bearman, and Kathleen Mullan Harris at the University of North Carolina at Chapel Hill, and funded by Grant P01-HD31921 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development, with cooperative funding from 23 other federal agencies and foundations. Information on how to obtain the Add Health data files is available on the Add Health website (http://www.cpc.unc.edu/addhealth). The genome-wide data generated for the Add Health sibling pairs subsample as well investigator effort for MBM, JDB, KMH, BD and AS was funded by Grant R01-HD060726 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development.

Conflict of Interest

Matthew McQueen, Jason Boardman, Benjamin Domingue, Andrew Smolen, Joyce Tabor, Ley Killeya-Jones, Carolyn Halpern, Eric Whitsel and Kathleen Harris declare they have no conflicts of interest.

Human and Animal Rights and Informed Consent

All procedures were followed in accordance with the ethical standards of the UNC-Chapel Hill and CU Boulder IRB committees. Informed consent was obtained from all participants.

Author information

Authors and Affiliations

Department of Integrative Physiology, University of Colorado Boulder, 354 UCB, Boulder, USA
Matthew B. McQueen
Institute of Behavioral Science, University of Colorado Boulder, Boulder, USA
Jason D. Boardman & Benjamin W. Domingue
Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, USA
Andrew Smolen
Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, USA
Joyce Tabor, Ley Killeya-Jones, Carolyn T. Halpern & Kathleen Mullan Harris
Department of Maternal and Child Health, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, USA
Carolyn T. Halpern
Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, USA
Eric A. Whitsel
Department of Medicine, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, USA
Eric A. Whitsel
Department of Sociology, University of North Carolina at Chapel Hill, Chapel Hill, USA
Kathleen Mullan Harris
Carolina Center for Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, USA
Kathleen Mullan Harris

Authors

Matthew B. McQueen
View author publications
You can also search for this author in PubMed Google Scholar
Jason D. Boardman
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin W. Domingue
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Smolen
View author publications
You can also search for this author in PubMed Google Scholar
Joyce Tabor
View author publications
You can also search for this author in PubMed Google Scholar
Ley Killeya-Jones
View author publications
You can also search for this author in PubMed Google Scholar
Carolyn T. Halpern
View author publications
You can also search for this author in PubMed Google Scholar
Eric A. Whitsel
View author publications
You can also search for this author in PubMed Google Scholar
Kathleen Mullan Harris
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthew B. McQueen.

Additional information

Edited by Valerie Knopik.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 1566 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

McQueen, M.B., Boardman, J.D., Domingue, B.W. et al. The National Longitudinal Study of Adolescent to Adult Health (Add Health) Sibling Pairs Genome-Wide Data. Behav Genet 45, 12–23 (2015). https://doi.org/10.1007/s10519-014-9692-4

Download citation

Received: 23 May 2014
Accepted: 20 October 2014
Published: 07 November 2014
Issue Date: January 2015
DOI: https://doi.org/10.1007/s10519-014-9692-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The National Longitudinal Study of Adolescent to Adult Health (Add Health) Sibling Pairs Genome-Wide Data

Abstract

Similar content being viewed by others

Maximizing the value of twin studies in health and behaviour

Twin-Singleton Comparisons Across Multiple Domains of Life

Estimating direct and indirect genetic effects on offspring phenotypes using genome-wide summary results data

Introduction