Introduction

Antisocial behaviors (ASB) are disruptive acts characterized by covert and overt hostility and violation of the rights and safety of others [1]. The emotional, social, and economic costs incurred by victims of ASB are far-reaching, ranging from victims’ psychological trauma to reduced productivity when victims miss work to costs incurred by taxpayers in order to staff and run a justice system [2, 3]. ASB has been recognized not merely as a social problem, but also as a mental health economic priority [4]. In addition to causing harm to others, those with ASB are themselves at elevated risk of criminal convictions as well as mental health and substance abuse problems [5]. Moreover, given the relative stability of ASB [6], it is important to also examine personality traits potentially tied to overt behaviors. Previous meta-analyses demonstrated that the Five-Factor Model of personality (FFM), particularly the domains of Agreeableness, Conscientiousness, and Neuroticism, is potentially critical for better illuminating the correlates and causes of ASB [7, 8]. Given all this, it is a research imperative to illuminate the mechanisms underlying the pathogenesis, emergence, and persistence of ASB.

Toward this end, statistical genetic studies have consistently revealed the relevance of environmental and genetic risk factors in the genesis of inter-individual differences in ASB. Family studies—mostly conducted in samples of European ancestry—have demonstrated a considerable heritable component for ASB, with estimates of ~50% [9] across studies. The increasing availability of genome-wide data along with data on dimensional ASB measures facilitates in building more advanced explanatory models aimed at identifying trait-relevant genetic variants, that could serve as moderators of socio-environmental factors and vice versa. Moreover, while heritability estimates can differ across subtypes of ASB (e.g., significantly higher twin-based heritability estimates for aggressive forms (65%) versus non-aggressive, rule-breaking forms (48%) of ASB [10]), these subtypes are genetically correlated (rg = 0.38) [11].

Measuring antisocial behavior, a broad view

Considering multiple forms of ASB together increases power of genetic analysis and may improve our ability to detect new genetic variants. Here, we thus examine a broadly defined construct of ASB, an approach that has successful precedents. Large-scale genomic studies have indicated substantial genetic overlap among psychiatric disorders [12]. A recent genome-wide meta-analysis across eight neuropsychiatric disorders revealed extensive pleiotropic genetic effects (N = 232,964 cases and 494,162 controls) [13, 14]. The study found that 109 out of the total 146 contributing loci were associated with at least two psychiatric disorders, suggesting broad liability to these conditions. Moreover, the Externalizing Consortium recently conducted a multivariate analysis of large-scale genome-wide association studies (GWAS) of seven externalizing-related phenotypes (N = ~1.5 million) and found 579 genetic associations with a general liability to externalizing behavior [15]. Although these very large multivariate approaches are crucial in enhancing genetic discovery across phenotypes, they do not detect all the genetic variation relevant to individual disorders. Since ASB is a critical issue for psychiatry and for society, the present study uniquely focuses on (severe) forms of ASB and persistence over the lifespan. To do so, we initiated the Broad Antisocial Behavior Consortium (BroadABC) to perform large-scale meta-analytical genetic analyses utilizing a broad range of phenotypic ASB measures (e.g., conduct disorder symptoms, aggressive behavior, and delinquency). In our first meta-analysis [16], we demonstrated that effect sizes for SNPs with suggestive evidence of association with ASB were small, as anticipated for most polygenic traits. Still, we found that the collective effect across all of the included variants (typically referred to as “SNP heritability”) explained roughly 5% of the total variation in ASB [16], which is in line with meta-analyses of the ACTION [17] and EAGLE [18] consortium.

To date, however, no previous GWAS meta-analysis targeting broad ASB detected SNPs or genes that are well-replicated. The polygenic architecture of ASB underscores the importance of employing very large samples to yield sufficient power to detect genetic loci of small effect size. Therefore, we substantially boost statistical power by quadrupling the sample size and adding new cohorts to the BroadABC consortium. Since ASB is a critical issue for psychiatry and for society, the present study uniquely focuses on (severe) forms of ASB and persistence over the lifespan.

In our meta-analysis, we also include the results of a GWAS study of disruptive behavior disorders (DBDs) in the context of attention-deficit/hyperactivity disorder (ADHD), which identified three genome-wide significant loci for DBDs [19]. The present study considers multiple measures of ASB in people with and without psychiatric diagnoses across 28 samples to reveal the genetic underpinnings of ASB phenotypes typically studied in psychology, psychiatry, and criminology. These larger samples allow well-powered genetic correlation analyses and improved polygenic risk scores (PRS). Five independent cohorts (total N = 8058) were employed to validate the ASB PRS in different populations, at different developmental stages, and for different ASB phenotypes. Moreover, we conducted a follow-up analysis by using a mouse model of pathological aggression. Since ASB is known to correlate phenotypically with an array of cognitive and health problems [20,21,22,23], we tested for genetic overlap between ASB and a range of other traits and disorders, including anthropometric, cognitive, reproductive, neuropsychiatric, and smoking.

Results

Meta-analysis on broad ASB identifies association with common variants in FOXP2

After quality control and imputation to the Haplotype Reference Consortium (HRC) or 1000 Genomes Project reference panel (see Online Methods), 85,359 individuals from 28 cohorts and a maximum of 7,392,849 variants were available for analysis. We carried out a pooled-sex GWAS meta-analysis for the broad ASB phenotype with METAL [24] and found one genome-wide significant locus, on chromosome 7 (chromosome band 7q31.1, Fig. 1a and Supplementary Table 3). The top lead SNP was rs12536335 (p = 6.32 × 10−10; Fig. 1b, c), located in an intronic region upstream of one of the transcriptional start-sites for the forkhead box protein P2 (FOXP2) gene [25, 26]. Consistent with this finding, a gene-based association test carried out with MAGMA [27], identified a significant association for FOXP2 (p = 7.43 × 10−7, Supplementary Note 3, Supplementary Fig. 1, and Supplementary Table 6). The FOXP2 gene has been related to the development of speech and language [28], yet is also implicated in a wide range of other traits and diagnoses [29] (see Fig. 1d). MAGMA generalized gene-set and tissue-specific gene-set analyses (sex-combined) yielded no significant gene-sets after Bonferroni-correction for multiple testing. The top gene-set for generalized gene-set analysis was activated NTRK2 signals through RAS signaling pathway (Supplementary Table 7), while the top tissue-specific gene expression was the hypothalamus (Supplementary Table 8). We next ran sex-specific GWAS meta-analyses. These analyses did not identify SNPs that reached genome-wide significance (Supplementary Tables 4 and 5).

Fig. 1: SNP-based results from the GWAS meta-analysis (N = 85,359) on broad antisocial behavior.
figure 1

a Manhattan plot of the GWAS meta-analysis, showing the negative log10-transformed p value for each SNP. SNP two-sided p values from a linear model were calculated using METAL [24], weighting SNP associations by sample size. b Regional association plot around chromosome 7:114043159 with functional annotations of SNPs in LD of lead SNP rs12536335 (shown in purple). The plot displays GWAS p value plotted against its chromosomal position, where colors represent linkage disequilibrium and r2 values with the most significantly associated SNP. c The plot displays CADD scores (Combined Annotation Dependent Depletion) and RegulomeDB scores of these SNPs. d PheWAS plot showing the significance of associations of common variation in the FOXP2 gene with a wide range of traits and diagnoses based on MAGMA gene-based tests (with Bonferroni-corrected p value: 1.05e−5), as obtained from GWASAtlas (https://atlas.ctglab.nl).

Mouse model of pathological aggression

Whole-genome sequencing analysis of SNVs in aggressive antisocial BALB/cJ mice compared to BALB/cByJ mice controls revealed differences between these lines located in introns of Foxp2 (rs241912422) and Cntnap2 (rs212805467; rs50446478; rs260305923; rs242237534), a well-studied neural target of this transcription factor.

Heritability and polygenic scoring

SNP heritability

To assess the proportion of variance in liability for broad ASB explained by all measured SNPs, we computed the SNP-based heritability (h2SNP) through LD score regression (LDSC) [30]. The h2SNP was estimated to be 3.4% (s.e. = 1.2%) in the quantitative BroadABC data, 24.8% (s.e. = 3.1%) in the Psychiatric Genetics Consortium/iPSYCH case-control data (with a prevalence estimate of 2% in the population) and 7.7% (s.e. = 1.1%) in the combined meta-analysis.

Polygenic risk scoring in five independent cohorts

To assess how well the PRS derived from our ASB GWAS meta-analysis predicts other measures of ASB, we carried out PRS analyses in five independent cohorts, none of which appeared in the GWAS meta-analysis (Fig. 2 and Supplementary Note 7).

Fig. 2: Bar charts illustrating the proportion of variance (incremental R2, or ΔR2) explained by the PRSs.
figure 2

PRSs are shown for broad ASB associated with childhood ASB in the Dunedin Longitudinal Study (A), with externalizing behavior in the E-Risk Study (B), with Conduct Disorder (C) and Oppositional Defiant Disorder (D) in the Philadelphia Neurodevelopmental Cohort Study, with ASB in the Quebec Longitudinal Study of Children’s Development Study (E), and with time-aggregated ASB in the Quebec Newborn Twin Study (F). Asterisks (*) show statistical significance after applying a Bonferroni correction on the 22 tested phenotypes at p < 0.0023.

Dunedin Longitudinal Study

In New Zealand, participants were derived from the Dunedin Longitudinal Study [31] (N = 1037, assessed 14 times from birth to age 45 years). We tested nine phenotypes and found significant associations with the BroadABC-based PRS for two: childhood ASB and official-records of juvenile convictions. Although not surviving Bonferroni adjustment, we found nominal significant (p < 0.05) association with the BroadABC-based PRS for eight phenotypes. We did not find evidence for a PRS association with partner violence. Lastly, we compared individuals grouped into the following four distinct developmental trajectories of ASB using general growth mixture modeling: low ASB across childhood through adulthood, childhood-limited ASB, adolescent-onset ASB, and life-course persistent (LCP) ASB [32]. Individuals following the LCP antisocial trajectory were characterized by the highest levels of genetic risk (see Supplementary Fig. 2); the nominally significant higher PRS of the LCP trajectory group compared to the low ASB group (p = 0.032 and p = 0.049, for p value thresholds 0.05 and 0.1 respectively) did not survive Bonferroni adjustment. For a full report of the findings in the Dunedin cohort, see Supplementary Table 9 and Supplementary Note 8.

Environmental Risk Longitudinal Twin Study (E-Risk)

In England and Wales, participants were included from the E-Risk Study (N = 2232, assessed five times from birth to age 18 years). We tested eight phenotypes and found significant associations for seven. PRS analyses revealed significant associations with parent- and teacher-reported ASB up to age 12 years, conduct disorder diagnosis up to age 12 years, with the externalizing spectrum at age 18 years, and with official records of criminal convictions up to age 22 years. For a full report of the findings in the E-risk Study, see Supplementary Table 10 and Supplementary Note 8.

Philadelphia Neurodevelopmental Cohort (PNC)

In the United States, participants were included from the PNC Study (N = 4201). We tested two phenotypes and found significant associations for both. We found that higher PRS for ASB were associated with symptom counts of both conduct disorder (p < 0.0001, delta R2 = 1.0%, Supplementary Table 11) and oppositional defiant disorder (p < 0.0001, delta R2 = 0.4%, Supplementary Table 12).

Quebec Longitudinal Study of Children’s Development (QLSCD)

In Canada, participants were included from the QLSCD study (N = 599). We tested one phenotype and did not find a significant association (p > 0.05, Supplementary Table 13) between PRS and the score on a self-report questionnaire related to conduct disorder, delinquency, and broad ASB in young adults (age range = 18–19 years).

Quebec Newborn Twin Study (QNTS)

In Canada, participants were derived from the QNTS study (N = 341). We tested two phenotypes and found a significant association for one. We computed a factor score based upon five teacher-rated assessments of ASB in youngsters during primary school (age range = 6–12 years). We found that higher PRS were associated with a higher factor score of ASB (p = 0.001, for p value thresholds 0.4, adjusted delta R2 = 3.9%, Supplementary Table 14). We failed to find evidence for an association between PRS and self-reported ASB in young adults (p > 0.05).

Genetic correlations through LD score regression

ASB is known to correlate with an array of phenotypes [20,21,22]. At the same time there has been a growing availability of publicly accessible genetic data across these phenotypes. To test whether these phenotypic associations are also reflected in genetic correlations we performed analyses with LDSC in a selection of 73 traits and diagnoses (Supplementary Table 15 and Fig. 3). We found strong correlations between ASB and reproductive traits (e.g., younger age of first birth (rg = −0.58, s.e. = 0.06, p = 2.93 × 10−15)), cognitive traits (e.g., fewer years of schooling (rg = −0.49, s.e. = 0.06, p = 1.94 × 10−10)), anthropometric traits (e.g., increased waist-to-hip ratio (rg = 0.32, s.e. = 0.05, p = 5.59 × 10−6)), neuropsychiatric traits (e.g., more depressive symptoms (rg = 0.63, s.e. = 0.07, p = 2.45 × 10−16)) and smoking related traits (e.g., ever smoked (rg = 0.54, s.e. = 0.08, p = 1.48 × 10−7)). It is important to emphasize here that correlation, even when genetic, does not imply causation.

Fig. 3: Significant genetic correlations of ASB with previously published results of other traits and diseases, computed using cross-trait LD score regression in LDHub, Bonferroni-corrected p value: 0.00068 (bars represent 95% confidence intervals).
figure 3

For traits with significant correlations with ASB in multiple studies (see Supplementary Table 15), we report the most recent study here.

Discussion

Our GWAS meta-analysis of broad ASB in 85,359 individuals from population cohorts and those with a clinical diagnosis related to ASB, revealed one novel associated locus on chromosome 7 (7:114043159, rs12536335), residing in the forkhead box P2 (FOXP2) gene. The lead SNP is relatively proximal (~14 kb upstream) to an important enhancer region located 330 kb downstream of the first transcriptional start site (TSS1) of the gene [26]. This SNP is also in the vicinity (~8 kb upstream) of a second transcriptional start site (TSS2) of FOXP2 that can drive expression of alternative transcripts. The FOXP2 gene is expressed in sensory, limbic, and motor circuits of the brain, as well as the lungs, heart, and gut [26]. It encodes a transcription factor that acts as a regulator of numerous target genes and has been implicated in multiple aspects of brain development (e.g., neuronal growth, synaptic plasticity) [33]. FOXP2 was first identified two decades ago when rare heterozygous mutations of the gene were linked to a monogenic disorder involving speech motor deficits, accompanied by impairments in expressive and receptive language [34, 35]. Nevertheless, there is scant evidence that common FOXP2 variants contribute to inter-individual differences in language function [36, 37]. Though prior behavioral research [38,39,40] reported a link between language problems and ASB, it is premature to over-interpret the FOXP2 findings here. SNPs at this locus have been associated, through GWAS, with a range of externalizing traits, including ADHD [41], cannabis use disorder [42], and generalized risk tolerance [43]. Given the involvement of SNPS at this locus in different behavioral traits and diagnoses, and considering the small effect sizes, it is clear the association of FOXP2 variation with ASB has limited explanatory value on its own. That said, nothing yet precludes the possibility that this SNP may help to yield deeper insights once placed in broader context by future research.

In the present study we also compared the BALB/cJ strain, a mouse model of pathological aggression, to BALB/cByJ controls, and found intronic variants in Foxp2 and one of its downstream targets, Cntnap2. Previous studies in human cellular models have shown that the protein encoded by FOXP2 can directly bind to regulatory regions in the CNTNAP2 locus to repress its expression [44]. Interestingly, mice with cortical-specific knockout of Foxp2 have been reported to show abnormalities in social behaviors [45]. Although these findings may indicate that the intronic SNVs are relevant to the behavioral differences between the strains, further evidence is needed to show that the variants actually have functional relevance for the mouse phenotype. Future studies may utilize complementary data comparing gene expression in the two mouse lines or could investigate functional impact (e.g., do they map to credible enhancer regions, are they likely to alter binding for transcription factors?) of the SNVs identified.

Contrary to previous BroadABC GWAS analyses, we did not find evidence for sex-specific genetic effects in the present study. Although we did have access to sex-specific data in considerable subsets (N = 22,322 males, N = 26,895 females), the power to detect new variants employing such sample sizes is still limited. Compared to our previous study, we found that the variance explained in independent samples by PRS based on the resulting summary statistics has substantially increased from 0.21 to 3.9%. Essentially, we found consistent links of our ASB PRS with multiple antisocial phenotypes at different developmental stages, from different reporting sources, and reflecting measurements from different disciplines (psychology, psychiatry, criminology). These links were found in individuals from New Zealand, Britain, the United States, and Canada, born as much as 30 years apart. We also show that our ASB PRS were more strongly associated with more severe and persistent types of ASB.

Notwithstanding the increase of effect size of the PRS, and calculations yielding a more precise estimate, the variance explained by the PRS was still relatively small, which was expected in light of the low SNP heritability. Given the highly polygenic architecture of ASB, contributing SNPs have low average effect sizes, thus leading to limited predictive power in independent samples. New PRS methods along with further increasing sample sizes will likely further increase the amount of variance accounted for by the PRS. Moreover, the association may be enhanced by improving the quality of phenotype measurements, which is reflected by our PRS results demonstrating the most robust association with high-quality measurement of ASB (using a factor score based upon multiple assessments). Aggregating data from measurements across ages, as opposed to the measures assessed at a single time point, can lead to more reliable trait measures and to better prediction [46]. Phenotypically, adding more extreme ASB phenotypes to the GWAS meta-analysis might also lead to more explained variance. In addition, the inclusion of clinical samples displaying extreme ASB phenotypes (e.g., [multiple] homicide, sexual assaults, etc.) in GWAS studies could help ensuring the generalizability of genetic findings to forensic populations. Thus, future efforts of the BroadABC will continue to focus on more severe forms of ASB and its persistence across the lifespan. Moreover, by considering genetically correlated traits through multi-trait GWAS methods [47] and multi-trait PRS methods [48] it might be possible to boost power for discovery through GWAS meta-analysis and PRS prediction. Lastly, a major limitation of the present study is that our GWAS results are limited to individuals of European ancestry. This Eurocentric bias may lead to more accurate predictions in individuals with European ancestry, compared to non-Europeans, thus potentially increasing disparities in outcomes related to ASB [49, 50]. To realize the full and equitable potential of polygenic risk, future genetic studies on ASB should also include non-European samples.

Developmental criminological research findings, such as the influential developmental taxonomy theory by Moffitt [51, 52], have suggested the existence of distinctive offending patterns across the life-course [53]. These developmental trajectories of ASB are thought to have different underlying etiological processes, with relatively more variance explained by genetic factors for life-course-persistent offending as compared to the more socially influenced adolescence-limited offending. Barnes et al. have previously produced evidence that heritability estimates were not uniform across different offending groups, suggesting that the causal processes may vary across offending patterns [54, 55]. In the present study we found a trend of higher PRS for ASB showing a stronger association with the life-course-persistent trajectory of ASB as compared to the low ASB group. The life-course-persistent trajectory is also known to be associated with profound brain alterations and diminished neurological health [56]. These findings are important since they have the capacity to help improve the current understanding of downstream neurobiological mechanisms relevant to the etiology of antisocial development [56]. Sufficiently powered future studies should thus aim to further elucidate the genetic risk and protective factors that underlie different offending trajectories [57].

Our genetic correlation analyses confirmed previously reported [16, 23, 58] correlations between ASB and a wide range of traits and diagnoses. The relatively small GWAS sample size of some traits, however, coupled with wide confidence intervals (such as agreeableness, rg = −0,81, s.e. = 0.47) calls for larger samples in order to achieve more precise estimates concerning the genetic overlap between personality and ASB. It seems worthwhile to mention again here that partially overlapping genetic architectures do not provide causal insights of any kind. In this case they merely signify the presence of some potentially shared biological mechanisms linking the conditions [59]. One can reasonably conclude, though, there are likely common underlying genetic factors which operate to increase a general vulnerability to a range of psychopathologies. These comorbid effects are in line with findings in the Dunedin Study demonstrating that life-course-persistent offenders are characterized by several pathological risk factors, related to domains of parenting, neurocognitive development, and temperament [52]. This signifies the importance of investigating pleiotropy and considering the complex etiology of the broader ASB phenotype. Large-scale collaborations, such as the BroadABC, will facilitate the expansion of epidemiological studies capable of further exploring the interaction of genetic risk and socio-environmental risks, and how these contribute to the multifaceted origin of ASB.

Methods

Samples

The meta-analysis included 21 new discovery samples of the BroadABC with GWAS data on a continuous measure of ASB, totaling 50,252 participants: The National Longitudinal Study of Adolescent to Adult Health [60] (ADH), Avon Longitudinal Study of Parents and Children [61,62,63], Brain Imaging Genetics [64], CoLaus|PsyCoLaus [65], Collaborative Study on the Genetics of Alcoholism [66], Finnish Twin Cohort [67] (FinnTwin), The Genetics of Sexuality and Aggression [68], Minnesota Center for Twin and Family Research [69], Phenomics and Genomics Sample [70], eight samples of the QIMR Berghofer Medical Research Institute (QIMR; 16Up project [16UP [71]], Twenty-Five and Up Study [25UP [72]], Genetics of Human Agency [73], Prospective Imaging Study of Ageing [74], Semi-Structured Assessment for the Genetics of Alcoholism SSAGA Phase 2 [SS2 [75]], Genetic Epidemiology of Pathological Gambling [GA [76]], Twin 89 Study [T89 [77]], and Nicotine Study [NC [78]]), Spit for Science [79] (S4S), two samples (from different genotype platforms) of the Twin Early Development Study [80], and the TRacking Adolescents’ Individual Lives Survey [81].

We complemented the above data with GWAS summary statistics on case-control data on DBDs from the recently published Psychiatric Genetics Consortium/iPSYCH consortium meta-analysis, which included data from seven cohorts (Cardiff sample, CHOP cohort, IMAGE-I & IMAGE-II samples, Barcelona sample, Yale-Penn cohort, and the Danish iPSYCH cohort), totaling 3802 cases and 31,305 controls [19].

We observed a high genetic correlation between the 21 meta-analyzed BroadABC samples and the 7 Psychiatric Genetics Consortium/iPSYCH samples, with the “Effective N” as weight (rg = 0.93, p = 9.04 × 10−8), indicating strong overlap of genetic effects. Hence, we continued with the combined 28 samples (N = 85,359) for all analyses.

All included studies were approved by local ethics committees, and informed consent was obtained from all of the participants. All study participants were of European ancestry. Full details on demographics, measurements, sample analysis, and quality control are provided in Supplementary Table 1.

Genome-wide association analysis and quality control of individual cohorts

In all 28 discovery samples, genetic variants were imputed using the reference panel of the HRC or the 1000G Phase 1 version 3 reference panel. The regression analyses were adjusted for age at measurement, sex, and the first ten principal components. To harmonize the imputation, data preparation, and genome-wide association (GWA) analyses, a specific analysis protocol (Supplementary Note 1) was followed in the 18 BroadABC discovery samples. Further details on the genotyping (platform and quality control criteria), imputation, and GWA analyses for each cohort are provided in Supplementary Table 2.

Two semi-independent analysts (JJT and EU) performed stringent within-cohort quality control, filtering out poor performing SNPs. SNPs were excluded if they met any of the following criteria: study-specific minor allele frequency (MAF) corresponding to a minor allele count < 100, poor imputation quality ((INFO/R2) score <0.6), and/or Hardy–Weinberg equilibrium p < 5 × 10−6. Moreover, we excluded SNPs and indels that were ambiguous (A/T or C/G with MAF > 0.4), duplicated, monomorphic, multiallelic, or reference-mismatched (Supplementary Note 2 and Supplementary Table 17). Then, we visually inspected the distribution of the summary statistics by creating quantile–quantile plots and Manhattan plots for the cleaned summary statistics from each cohort (Supplementary Notes 46). Discrepancies between the results files of the two semi-independent analysts were examined and errors corrected.

Meta-analyses on combined and sex-specific samples

A meta-analysis of the GWAS results of the 28 discovery samples was performed through fixed-effects meta-analysis in METAL, using SNP p values weighted by effective sample size. We meta-analyzed the BroadABC data (N = 50,252) with case-control data of the Psychiatric Genetics Consortium/iPSYCH consortium meta-analysis (N cases = 3802, N controls = 31,305, N effective = 6323) in METAL, leading to a total effective sample size of 56,575. Since some individuals participating in the eight QIMR studies may be related to some degree, we first meta-analyzed those samples with the sample overlap option in METAL before meta-analyzing those results with the rest of the samples. Although throughout the paper we report the total sample size (N = 85,359), we used the effective sample size (the METAL output) to calculate the PRS, SNP-based heritability (h2SNP) and genetic correlations. After combining all cleaned GWAS data files, meta-analysis results were filtered to exclude any variants with N < 30,000. Consequently, we removed 2,134,049 SNPs, resulting in 7,392,849 SNPs available for analysis. To investigate sex-specific genetic effects, we also ran the meta-analysis in the datasets for which we had sex-specific data (N = 50,252). However, sex-specific SNP heritabilities, as estimated with LD Score Regression, were small and non-significant (3.7% (s.e. = 2.2%) for males and 1.0% (s.e. = 1.8%) for females). Due to the non-significant sex-specific heritability estimates, the genetic correlation of male and female ASB could not be estimated reliably and no sex-specific follow-up analyses were conducted.

Whole-genome sequencing based on genetic differences between the BALB/c strains

Through whole-genome sequencing, we identified single nucleotide variants that distinguish aggressive BALB/cJ mice from control BALB/cByJ strains [82]. Sequencing libraries were prepared from high-quality genomic DNA using the TruSeq DNA PCR-Free kit (Illumina) and ultra-deep whole-genome sequencing (average 30X read-depth across the genome) was performed on a HiSeq X Ten System (Illumina). We developed an efficient data processing and quality control pipeline. Briefly, raw sequencing data underwent stringent quality control and was aligned to either the mm10 (BALB/cJ versus BALB/cByJ strain comparison). Isaac [83] was used to align reads and call single nucleotide variations (SNVs). We excluded SNVs that were covered by less than 20 reads, and that were not present in both animals from the same strain. SnpEff [84] was used to annotate SNVs and explore functional effects on gene function. SNVs differing between the two strains were annotated to a total of 1573 genes, which were subdivided into three different categories (intronic/exonic non-coding and synonymous variants (1422 genes), untranslated regions (90 genes), missense mutations and splicing variants (61 genes)).

Polygenic risk score analyses

PRS were created for ASB using all available SNPs of the discovery dataset [85, 86]. PRS were computed as the weighted sum of the effect-coded alleles per individual. We calculated the PRS for subjects of five independent datasets, selected for their detailed phenotypes related to antisocial outcomes: (1) the Dunedin Study [46], (2) the E-risk study [87], (3) the Philadelphia Neurodevelopmental Cohort [88], (4) the Quebec Longitudinal Study of Child Development [89], and (5) the Quebec Newborn Twin Study [90]. All individuals were of European ancestry. To maintain uniformity across target cohorts, we adhered to the following parameters: Clumping was performed by removing markers in linkage disequilibrium, utilizing the following thresholds: maximum r2 = 0.2, window size = 500 kb. We excluded variants within regions of long-range LD [91] (including the Major Histocompatibility Complex, see Supplementary Table 16 for exact regions). Second generation PLINK [92] was employed to construct PRS for each phenotype, at the following 11 thresholds: p < 1 × 10−6, p < 1 × 10−4, p < 1 × 10−3, p < 1 × 10−2, p < 0.05, p < 0.1, p < 0.2, p < 0.3, p < 0.4, p < 0.5, p < 1.0. There are minor differences in the thresholding approach across the independent cohorts, i.e.,: while constructing the PRS for both ASB phenotypes, the Philadelphia Neurodevelopmental Cohort Study considered all aforementioned thresholds except p < 1 × 10−2, while the other cohorts considered all thresholds except p < 1.0 (see Supplementary Tables 914 for the exact thresholds). To correct for multiple testing, we applied a Bonferroni correction on the 22 tested phenotypes (α = 0.00227).

Genetic correlation analysis

To estimate the genetic correlation between ASB and a range of other phenotypes, we employed Linkage Disequilibrium Score Regression (LDSC) [30] through the LD Hub web portal (http://ldsc.broadinstitute.org/ldhub/) [93]. LD Hub, which is a centralized database of summary-level GWAS results, enables the screening of hundreds of traits. To maximize statistical power among the most likely candidate phenotypes, we focused on domains of traits that have been previously reported to be comorbid with ASB. Genetic correlations of ASB were thus calculated by selecting 68 phenotypes of health, physiological, personality, disorder and disease relevant outcomes in LD Hub. In addition, we manually ran genetic correlations analyses for a selection of traits that were not included in the LD Hub framework. Given their relevance for this study, we employed LDSC to examine the genetic correlation of ASB with FFM domains agreeableness, openness to experience and conscientiousness, by using data of the Genetics of Personality Consortium [94]. Similarly, we used data of the most recent GWAS meta-analysis on neuroticism [95] to compute the genetic correlation with ASB. Lastly, while excluding the iPSYCH/PGC samples (given the extensive sample overlap), we computed the genetic correlation of ASB with ADHD [41]. We corrected for multiple testing by applying a Bonferroni correction on the 73 tested genetic correlations (α = 0.0007).