Introduction

The APOE gene, located on Chromosome 19, expresses Apolipoprotein E, a cluster of apolipoproteins that contributes to the regulation of the cholesterol level. There are three versions (alleles) of the APOE gene: ε2, ε3 and ε4, which can combine into six genotypes, three homozygotic (ε2/ε2, ε3/ε3, and ε4/ε4) and three heterozygotic (ε2/ε3, ε2/ε4 and ε3/ε4). It is well established that elderly carriers of at least one APOE ε4 allele—even if not diagnosed with dementia—may exhibit impaired global cognitive performance, including episodic memory and executive functioning as compared with non-carriers of this allele (Small et al. 2004; Wisdom et al. 2011). This contrasts with reports about superior cognitive performance in young APOE ε4 carriers (e.g., Rusted et al. 2013) in episodic memory (Mondadori et al. 2007), prospective memory (Evans et al. 2013), decision making (Marchant et al. 2010), speed of processing (Marchant et al. 2010), mental arithmetic (Puttonen et al. 2003), and verbal fluency (Marchant et al. 2010; Alexander et al. 2007; Marioni et al. 2016). Furthermore, Yu et al. (2000) reported higher intelligence among young female APOE ε4 carriers. Besides the evidences from young adults, a study by Oriá et al. (2005) suggested superior semantic fluency in APOE ε4-positive children. According to Rusted et al. (2013) and Rusted and Carare (2015), attention rather than memory was consistently superior among young APOE ε4 carriers. The relatively better performance of young APOE ε4 carriers as compared with non-ε4 carriers has been explained by antagonistic pleiotropy, where an allele increases the chances for reproduction early in life and only later on has a negative influence on fitness or survival (e.g., Han and Bondi 2008).

However, superior cognitive performance among young APOE ε4 carriers is by no means a robust finding and many studies have been unable to detect an APOE ε4 advantage (Matura et al. 2014; Dowell et al. 2013; Dennis et al. 2010). A number of studies on young adult APOE ε4 carriers even showed the opposite, for example, inferior working memory (Eramudugolla et al. 2014), or verbal and visual memory (Bloss et al. 2008; Eramudugolla et al. 2014). In a meta-analysis involving data from more than 11,000 participants, Ihle et al. (2012) found no overall evidence for superior executive functioning, including working memory, in young adult APOE ε4 carriers. Ihle et al. (2012) suggested that one reason for the inconsistencies in the literature—apart from Type I errors typical of studies with small sample size—may be that effects of APOE ε4 on performance are process-specific and present mainly in tasks placing high demands on specific cognitive functions. A recent update of this meta-analysis conducted by Weissberger et al. (2018) across seven different cognitive domains found APOE ε4 carriers to marginally outperform other APOE genotypes in executive functions. However, the authors advised extreme caution in interpreting these findings as supportive evidence for antagonistic pleiotropy as suggested by Han and Bondi (2008). Firstly, in fMRI studies the marginally superior performance of APOE ε4 carriers was not conclusively associated with activation of frontal executive networks. Secondly, the heterogenous statistics across studies included in the meta-analysis question the reliability of the conclusions. Moreover, in other studies the influences of APOE polymorphisms on cognitive functions strongly interacted with individual differences in further brain properties, for example, cholesterol transport, neuronal repair (Rusted and Carare 2015), or impairments of certain brain structures (Kunz et al. 2015).

Recent neuroimaging studies have provided solid evidence that APOE ε4 is negatively associated with cognitive abilities by impacting on brain structures (e.g. Fouquet et al. 2014; Alexopoulos et al.et al. 2011). However, also education level as an environmental factor (Hubacek et al. 2001) has been shown to influence grey matter volume and functional connectivity (Arenaza-Urquijo et al. 2013) and, in general, to be a protective factor against acquiring AD later in life (Valenzuela and Sachdev 2006). Hence, it is of interest whether variation in education acts as modifier of the APOE ε4 effect on cognitive abilities. In a sample including a wide age range, Arenaza-Urquijo et al. (2015) found that the ε4 allele was associated with increased memory-related cerebral metabolism only among individuals with higher education levels and suggested a protective influence of education on the relations of gene and cognitive abilities, which may contribute to delay the onset of cognitive decline in APOE ε4 carriers. Therefore, in the present study we considered education level as a crucial factor potentially mitigating the risk of APOE ε4 carriers for early cognitive loss.

A latent variable approach to measuring cognitive phenotypes

So far, studies on APOE and cognitive abilities exclusively relied on observed test scores, which does not allow generalizing above individual differences captured by single tests (e.g., Wacker et al. 2012, for personality and the Catechol-O-methyl-transferase gene). In contrast, multivariate assessments used to estimate latent variables allow to account for measurement error and method specificity and thus capture individual differences at the level of abilities beyond single task performance.

Working memory capacity (WMC) is a central cognitive construct, indicating a persons’ capacity to bind and flexibly update information in short-term memory (e.g., Wilhelm et al. 2013). There are plenty of measures that have been used to capture this ability. Importantly, it has been recognized that indicators from multiple tasks are necessary to control for irrelevant variance components and to generalize measurements to the construct level (Schmiedek et al. 2014). Wilhelm et al. (2013) showed that the capacity of building, maintaining and rapidly updating arbitrary bindings accounts for a large amount of common variance across all traditionally applied WMC tasks, such as updating, n-back, and complex span. Thus, a latent variable indicated by assessments belonging to any of these task classes is expected to be good measure of individual differences in WMC as cognitive phenotype. However, because every single task is capturing additional method specific variance, performance on single tasks needs to be considered in the context of other tasks (see for example Schmiedek et al. 2009, 2014).

Another theoretical approach to WMC distinguishes between it’s primary and secondary components (e.g., Unsworth et al. 2014). Primary memory (PM) refers to the memory capacity used for short-term information storage and direct processing of information, whereas secondary memory (SM) is the capacity to store, keep and recall information in the a long term. Aiming to establish the role of SM in WMC, correlational studies have explored the mediation effect of SM on the relationship between WMC and fluid intelligence (gf; Unsworth and Spillers 2010; Unsworth et al. 2014; Shipstead et al. 2014). Hence, the above-mentioned research on SM, indicates that SM is an important cognitive ability to be considered along with the working memory system. Additionally, further studies argued that WMC and reasoning are nearly isomorphic abilities (Kyllonen and Christal 1990; Süß et al. 2002). The binding theory of WMC mentioned above (Oberauer et al. 2007) explains this strong association by the involvement of mental representations of novel structures that are necessary for solving both, WMC as well as gf tasks.

In previous studies on APOE associations with working memory, n-back tasks have been widely applied (e.g., Sinclair et al. 2015), along with complex span (digit span, Estévez-González et al. 2004) and updating tasks (e.g., Velichkovsky et al. 2015; Reinvang et al. 2010). However, results have been controversial, suggesting both positive and negative impacts of APOE ε4 on cognitive abilities in young adulthood (Greenwood et al. 2005). For this reason, further studies are needed that attempt to capture APOE effects on working memory measured as a latent variable. Similarly, in research on long-term memory performance in APOE ε4 carriers, Wolk et al. (2010) investigated memory retention and found it to be impaired among older APOE ε4 carriers. Similarly, Zokaei et al. (2019) showed impaired long-term memory among older APOE ε4 homozygotes, suggesting that the ε4 allele may be associated with WMC at least partly through the secondary memory system. In contrast to these memory studies, the association between APOE ε4 and accuracy and RT measured during fluid intelligence (gf) tasks has been well explored (Woo and Kim 2017), but usually no effects have been detected in young ε4 carriers.

To summarize, working memory capacity is well established as a crucial domain of cognitive functioning (e.g., Baddeley and Hitch 1974; Wilhelm et al. 2013). There is robust evidence supporting the strong correlations of WMC with SM and gf (Engle et al. 1999; Kyllonen and Christal 1990; Oberauer et al. 2005) and its generalizations across different measurement paradigms (Süß et al.et al. 2002). APOE effects on these cognitive domains among young, healthy adults have already been studied but only on the level of single tasks, yielding inconsistent results. Thus, we argue that the study of APOE effects on cognitive abilities should include multiple measures and latent variables representing WMC, SM and gf.

Aims of the present study

The general aim of this study was to explore the effects of APOE polymorphisms on WMC, SM and gf in young adulthood. Previous research suggests that the APOE polymorphisms might differ in their influence on the domains of WMC, SM and gf. However, due to the strong associations between these cognitive phenotypes, genotype differences should follow similar patterns across cognitive domains. We argue that the study of genotype differences should be carried out at the level of latent variables in order to generalize beyond task specificity and measurement errors.

We do not have strong predictions regarding the direction of group differences according to genotype variants. In light of the available literature we expected young APOE ε4 carriers to perform better or worse as compared with other genotype groups, depending on education levels. Our main expectation was that the latent variable approach would shed light on the generalizability of the effects across task classes and would contribute to establishing robust results.

Method

The data employed here had been collected in four independent studies, but crucial assessments overlapped between samples (see below for details). All participants reported to be native German speakers. In the following, the procedures of APOE genotyping and task data collection are described separately for the four samples. Figure 1 shows that genotype distribution across the samples was highly consistent. Table 1 gives description of demographic information of each sample stratified by ε4/non-ε4 groups. An overview of tasks is provided in Table 2.

Fig. 1
figure 1

APOE genotype distribution for the four samples (N1N4). Note relative frequencies are displayed on the Y-axis

Table 1 Demographic information stratified by ε4 carrier and non- ε4 carrier group
Table 2 Tasks and resulting indicators measured across samples

Participants

The first sample (N1) included 255 young adults who completed a series of working memory, secondary memory and reasoning tasks (see Table 2). The final analyzed sample included N1 = 245 participants (56% females) with successful APOE genotyping. Their age ranged between 18 and 35 years (M = 27.39, SD = 4.8), educational background was heterogeneous, including 28% persons without high-school degree, 51% with high school but no college degree and 21% with academic degrees. APOE genotype distribution was as follows: .8% ε2/ε2, 14.7% ε2/ε3, 3.6% ε2/ε4, 58.4% ε3/ε3, 18.4% ε3/ε4 and 4% ε4/ε4.

The second sample (N2) included 459 adults who completed two working memory tasks and a reasoning task (see Table 2). Of these participants 330 agreed to provide DNA, and APOE genotyping was successful in N = 308 (73.1% females); eight participants were excluded because they were older than 45. The success rate of genotyping was lower as compared with the first sample because a multiplex assay technology was used in this case. The age range in the final sample (N2 = 300) was 18–41 years (M = 23.4, SD = 3.5); 4.5% had no high-school degree; 69.1% completed high-school and 26.4% of the sample held college or higher degrees. APOE genotype distribution was as follows: .3% ε2/ε2, 13.9% ε2/ε3, 2% ε2/ε4, 55.2% ε3/ε3, 26.9% ε3/ε4, 1.6% ε4/ε4.

The third sample (N3) consisted of 244 young adults (50.8% female) who all completed three tasks measuring working memory and reasoning and had valid genotype results (see Table 2). The age range was between 17 and 38 years (M = 25.96, SD = 4.5), educational background was heterogeneous, including 20.4% persons without high-school degree, 68.4% with high school degree only and 10.8% with academic degrees. APOE genotype distribution was as follows: .8% ε2/ε2, 10.2% ε2/ε3, 2% ε2/ε4, 63.5% ε3/ε3, 21.3% ε3/ε4 and 2% ε4/ε4.

The fourth sample (N4) included 206 (51.8% females) young participants who completed the Raven test (see Table 2) and provided valid APOE genotypes. Their age ranged between 18 and 40 years (M = 27.63, SD = 5.4); educational background was heterogeneous, including 24.3% persons without high-school degree, 35% with high school degree only and 34% with academic degrees. APOE genotype distribution was as follows: .5% ε2/ε2, 10% ε2/ε3, 3.9% ε2/ε4, 63.7% ε3/ε3, 20.9% ε3/ε4 and .5% ε4/ε4.

Measurements

Next, we provide short descriptions of the working memory, secondary memory and reasoning tasks used in the present study. These tasks were also applied and evaluated by Wilhelm et al. (2013). Task-specific indicators used in each sample are summarized in Table 2. Further details on the task procedures can be found in Wilhelm et al. (2013).

Binding tasks (binding)

The binding task included 15 trials for the letter-color domain (Bind_lc) and 14 trials each for the word-number domain (Bind_wn) and the location-letter domain (Bind_ll). In each trial, participants were presented with a sequence of item pairs specific to the task domain and asked to memorize the association. For example, in the letter-color binding task, sequences of letter-color pairs were provided. The trial length was 1 s with 3 s-intervals for the letter-color task, 2 s with 1 s-intervals for the word-number binding task, and 1.5 s with 500 ms-intervals for the location-letter binding task. Participants were asked to recall the associations immediately after presentation. For all binding tasks, the number of pairs within trials ranged between two and six (load level).

Updating tasks (updating)

Each updating task included 12 trials; in each trial, a series of two to five randomly selected items (words, digits or positions within a 3 × 3 grid) were presented on the screen. The presentation durations for the verbal and spatial-figural updating tasks depended on the load level of a given trial (level 2: 2 s; level 3: 2.4 s; level 4: 2.8 s; level 5: 3 s), while for the numerical updating task the presentation duration was 1.6 s for each trial; inter-stimulus intervals were 500 ms. Participants were to update and memorize the last item for each semantic category and to report the last item that had appeared in the trial.

Recall 1-Back (RNb)

In the verbal RNb task (RNb_v), participants were presented with one to three boxes per trial (depending on load level). Each box contained a letter and as soon as a new letter appeared in a box, participants were to type in the letter, which just before had been associated with that box. In the numerical RNb task (RNb_n) the procedure was the same except that digits were presented instead of letters. In the spatial-figural RNb task (RNb_f), participants were shown one to three figures (depending on load level) randomly placed within a 3*3 grid. When a figure was presented in a new position in the grid, participants responded by mouse click to indicate the position in the grid where the figure has been shown just before. In the verbal task, the presentation of each stimulus lasted for 2.5, 3.0, and 3.5 s for load level 1–3, respectively. In the numerical task the presentation durations for single stimuli were 2.5, 2.9, and 3.1 s, for load levels 1–3, respectively. For the spatial-figural task, the corresponding presentation times were 2.5, 3.5, and 4.5 s.

Complex span tasks (CSpan)

During the reading (CSpan_v; Kane et al. 2004), operation (CSpan_n), and rotation (CSpan_f) span task, participants were to remember the order of letter presentation (for reading span), order of word presentation (for operation span) or recalling a sequence of arrows with different length and pointing directions (for rotation span), while processing a secondary task in parallel. The secondary task was to identify the semantic correctness of a sentence (for reading span), the correctness of an equation (for operation span) or the correctness of direction of letters. All items for the secondary tasks were presented on the screen. Each complex span task included 12 trials.

Memory updating (MU)

This task, only used in Sample 3 (see Table 2), was adapted from Oberauer et al. (2000). In a 3 × 3 grid, single-digit numbers were consecutively presented and had to be memorized. Subsequently, arrows pointing up or down appeared in individual cells of the grid, one at a time. For upward-pointing arrows, participants mentally added “1” to the digit previously presented in that cell. Downward-pointing arrows instructed to decrease the number by “1”. After a series of updating steps, participants indicated the final digit for each cell. Eighteen experimental trials were presented.

Secondary memory tasks (SM)

The SM tasks were very similar to the binding tasks but did not require immediate recall. In the word–word task (SM_v), two blocks of 20 word pairs each were successively presented, while in the word–number task (SM_n) two blocks each of 20 pairs of a word and a two-digit number were shown. In the letter-position task (SM_f), in a given block, 12 letters were paired with 12 positions in a 4 × 4 grid. All stimulus pairs were presented for 4 s, separated by intervals of 1 s. After the learning phase participants completed an intervening task, which took about 3 min. Then, participants were to recall the paired information in response to either the first or the second element of the pair.

Fluid intelligence (gf)

Fluid intelligence was measured with the Berlin Test of Fluid and Crystallized Intelligence (BEFKI; Wilhelm et al. 2013) in Sample 1. Participants were to solve a series of verbal (gf_v), quantitative (gf_n) and figural (gf_f) deductive reasoning problems. Possible solutions were provided as multiple-choice items. Each task included 16 problems to be completed within 14 min. In Samples 3 and 4 fluid intelligence was measured with 16 items from the Raven’s advanced progressive matrices (Rav; Raven and Court 1979).

Genotyping

DNA extraction was performed on a MagNA Pure 96 robot using a commercial extraction kit (Roche Diagnostics, Mannheim, Germany). The APOE polymorphism was analyzed via real time polymerase chain reaction (PCR) on a Cobas Z 480 light cycler system (Roche Diagnostics, Mannheim, Germany) or on a MALDI-TOF platform (Massarray 4, Agena, San Diego, CA, USA). Primers and hybridization probes for the real time polymerase chain reaction were designed and provided by TIB MolBiol, Berlin, Germany.

Data treatment and scoring

For all tasks, the proportion of correct responses across trials were computed as performance indicators for modeling. For coding the APOE genotype groups, we contrasted participants without APOE ε4 allele (coded as 0) with those who carried at least one APOE ε4 allele (coded as 1). The contrast was thus coded as dummy variable and used in structural equations to predict latent variables representing cognitive abilities (Hildebrandt et al. 2016).

For coding education level, we contrast participants without high school degree (coded as 0) and with high school degree (coded as 1). A product term genotype * education was also calculated. As shown in Table 1, the education and genotype * education variable can only be reasonably calculated in Sample 1, which includes enough participants without high school degree in both, APOE ε4 carrier and non-ε4 carrier groups, for providing robust estimates.

For each performance indicator univariate distributions were visually screened for outliers and distribution shapes. Observations outside the 1.5 inter-quartile range (i.e., outside the whiskers in a boxplot) were defined as univariate outliers (Tukey 1977). Among the 245 participants in Sample 1 with genetic data, there were 32 missing values out of 4410 data points (performance indicators * individuals). These 32 missing data points resulted from univariate outliers in the psychometric measurements that were set to missing values, along with further missings that were due to data recording issues. Multiple random imputation (Allison 2001) was applied to replace the 32 missing values. Imputation was favored above full information maximum likelihood (FIML), because the proportion of missing values was less than .5% (Dong and Peng 2013). The imputation relies on the assumption of normally distributed variables, which was violated by 6 indicators in the present case; however, the normal model performs well even for non-normally distributed variables in case of very limited amount of missings (Schafer 1997). Among the 18 indicators reflecting average task performance as described in the measurement session (see Table 2), ten indicators had no outliers. There were no outliers in the inter-individual distributions in Samples 2, 3 or 4 and only 1 participant with 3 missing data points in Sample 2. These missing points were list-wise deleted in model estimation.

Statistical analyses

Structural equation modeling (SEM) was applied to the data from all four samples. However, due to different numbers of WMC, SM and gf indicators, the models differ across samples. In the first sample, we were able to model four latent factors of WMC, along with a higher-order factor representing general WMC. Additionally, we modeled a latent factor of SM and another latent factor gf that were both correlated with each other and with the general WMC factor. This model had been established previously by Wilhelm et al. (2013) with the same dataset. In the present work, all three latent factors were regressed onto the dummy variables described above, coding genotype groups, as well as education level and the genotype*education product term. Regression weights of the dummy variables thus indicate genetic and education effects and their interaction on the general WMC factor, SM and gf.

Based on the available assessments (see Table 2) in Sample 2, we modeled two task-specific WMC factors. Because there were only two WMC factors in this model, no higher order WMC factor was modeled but task specific factors were allowed to correlate. No SM and gf assessments were available in this sample. Task indicators for the WMC factors were measured in trials with different load levels. Therefore, we also tested the genotype group differences for variables with different difficulty levels.

In Sample 3 there were two indictors for WMC and one for gf (see Table 2). We used these indicators for modeling a WMC/gf latent variable, regressed onto the genotype-coding variables. Finally, Sample 4 contributed with a latent gf factor estimated by performance in the Raven test.

As mentioned above and illustrated in Table 2, some of the assessments overlapped across studies. As a final analysis step we merged Samples 1 and 2 and investigated the genotype and education effects by means of a categorical regression analysis including verbal and spatial-figural RNb tasks available in both studies. Similarly, Samples 1 and 3 were merged to analyze genotype-education-phenotype relations based on the rotation span task (CSpan_f).

Model fit was evaluated with the χ2-statistics, which quantifies the discrepancy between the observed and the model-implied covariance matrix. We further considered alternative fit indices: The Comparative Fit Index (CFI), which should be larger than .95 to conclude good model fit; the Root Mean Square Error of Approximation (RMSEA) and the Standardized Root Mean-square Residual (SRMR), which should both be smaller than .08 in case of good fit (Hu and Bentler 1995).

Results

Results will be described in five sections, including sample-wise analyses (Sects. 14) and the analyses of merged samples (Sect. 5).

Results provided by Sample 1

The WMC, SM and gf factors estimated in the model established by Wilhelm et al. (2013) were first regressed onto the dummy variable, contrasting APOE ε4 carriers with all other genotype groups (Model 1). In a second step (Model 2), the factors in the same psychometric model were regressed onto two coding variables contrasting APOE ε4 carriers with non-ε4 carriers, as well as different education levels (without and with high-school degree) along with the product term of genotype and education. Additional results illustrating single factors of WMC, SM and gf regressed onto the same genotype-coding variables are provided in the Supplementary Material.

Because the model depicted in Fig. 2 has been established by Wilhelm et al. (2013) with the same sample data, we built upon this previous work and did not test alternative model structures for describing the cognitive phenotypes. The common variance measured by the four WMC task classes (Binding, Updating, RNb and CSpan) was captured by a higher order WMC factor. Loadings on the task-specific factors were large and significantly different from zero. A further latent variable accounted for common residual variance among verbal-numerical content (VN). In addition to WMC, the model included SM and gf as correlated factors. Method-specific variance induced by multiple applications of paired associate tasks (Passo) was captured by a method factor.

Fig. 2
figure 2

Schematic representation of structural equation model exploring APOE ε4 effects on working memory capacity (WMC), secondary memory (SM) and fluid intelligence (gf). Binding—binding tasks; updating—updating tasks; RNb: recall 1-back; CSpan—complex span tasks; Bind_lc—letter-color binding; Bind_wn—word–number binding; Bind_ll—location-letter binding; Upd_v—verbal updating; Upd_n—numerical updating; Upd_f—spatial-figural updating; RNb_v—verbal recall 1-back; RNb_n—numerical recall 1-back; RNb_f—spatial-figural recall 1-back; CSpan_v—reading span; CSpan_n—operation span; CSpan_f—Rotation span; SM_v—word–word secondary memory; SM_n—word–number secondary memory; SM_f—letter-position secondary memory; gf_v—verbal fluid intelligence; gf_n—numerical fluid intelligence; gf_f—figural fluid intelligence; Passo—paired associations; VN—verbal–numerical content

In Model 1, the standardized latent variables WMC, SM and gf were regressed onto the dummy variable contrasting APOE ε4 carriers with all other genotype groups. The model fitted the data very well: χ2 (133) = 192.25, CFI = .97, RMSEA = .04, SRMR = .04. All factor loadings indicated in Fig. 2 were significant. The figure further shows that WMC, SM and gf were highly but not perfectly correlated. Thus, APOE effects are expected to be similar in magnitude for all cognitive ability factors. The regression weights testing gene effects (see Fig. 2), can be interpreted as differences between genotypes in terms of standard deviations, because the latent variables were standardized and the predictor (genotype) was dummy-coded. The genetic effects (see Fig. 2) were negative, indicating that WMC in APOE ε4 carriers is worse—by about 1/3 of a standard deviation—as compared with non- ε4 carriers. The APOE ε4 effect on WMC was statistically significant (WMC: − .32, p = .04). APOE ε4 effects on SM and gf were somewhat smaller but also negative and the effect did not reach conventional significance levels (SM: − .2, p = .26; gf: − .31, p = .07), possibly due to the limited sample size. In summary, the data collected in Sample 1 revealed worse cognitive performance in young carriers of APOE ε4 as compared to non-carriers.

In Model 2, the standardized latent variables WMC, SM and gf, configured in the same way as in Model 1, were regressed onto the dummy coded gene and education variables, as well as their product, testing the interaction between education and genotype group. The model fitted data very well: χ2 (163) = 230.7, CFI = .97, RMSEA = .04, SRMR = .04 and all factor loadings were statistically substantial. The interaction effects were as follows: WMC—.32, p = .33; SM—.06, p = .86; gf—.83, p = .02. Thus, the genotype * education interaction was statistically substantial for gf only, but all cognitive measures pointed into the same direction, suggesting lower cognitive performance of APOE ε4-carriers specifically in participants with lower education level.

In order to visualize the interaction at the level of latent factor means, we conducted a multiple group structural equation modeling on Sample 1, which allowed estimating latent means of WM, SM and gf for different groups. Participants were separated into four groups: APOE ε4 carriers with high school degree (N11 = 41), ε4 carriers without high school degree (N12 = 23), non-ε4 carriers with high school degree (N13 = 136) and non-ε4 carriers without high school degree (N14 = 46). In this model the latent variables were scaled by a reference indicator, latent means and variances were freely estimated across groups and model parameters were fixed to be equal across groups. The latent means of WMC, SM and gf for the four groups are illustrated in Fig. 3. The figure shows no performance difference between genotypes in the group with higher education. However, performance of APOE ε4 carriers in WM and gf was worse in the group with low education. Due to the fact that within-group sample sizes were rather small these estimates of latent mean differences may not be very robust, but they indicate a plausible finding on gene-environment interaction.

Fig. 3
figure 3

Interaction plot visualizing differences in latent means of WMC, SM and gf for genotype groups split by education. Blue lines indicate latent means differences for non-ε4 carriers. Red lines indicate differences for ε4 carriers. For explanations of abbreviations, please see Fig. 2 (Color figure online)

Summarizing the results of Sample 1, young carriers of at least one APOE ε4 allele showed worse cognitive abilities as compared with non-ε4 carriers. This negative APOE ε4 effect, however, only holds for ε4 carriers with lower education.

Results provided by Sample 2

In Sample 2 there were fewer WMC assessments than in Sample 1, yielding only verbal and figural WMC task data but from the same RNb tasks as applied in Sample 1. Average performance across three blocks of trials per task with different levels of difficulty (load levels 1, 2 and 3) were used as indicators for modeling a verbal and a figural WMC factor (see Fig. 4). In Model 3 the two factors were allowed to correlate and were both regressed onto the dummy variable contrasting APOE ε4 carriers with non-ε4 carriers. Model 3 fitted the data very well: χ2 (12) = 24.4, CFI = .97, RMSEA = .06, SRMR = .04; all factor loadings were significant and of considerable magnitude. However, the loading structure was slightly heterogeneous because the item block of high difficulty (load level 3) discriminated least between individuals with high and low WMC. For the verbal WMC task domain, the medium difficulty task had best discriminative power. Overall, and contrary to Sample 1, young APOE ε4 carriers showed better performance, especially in the figural domain. The opposite effects in Sample 1 and Sample 2 were similar in magnitude. Sample 2 showed statistically significant differences of about 1/3 SD between APOE ε4 carriers as compared with the non-ε4 genotype groups (regression weights and significance level: RNb_v: .26, p = .03; RNb_f: .33, p = .03). However, as described above, APOE ε4 carriers in Sample 2 were rather highly educated. Because of the very scarce number of APOE ε4 carriers with low education in Sample 2, education could not be tested as moderator.

Fig. 4
figure 4

Schematic representation of structural equation model (Model 3) exploring APOE ε4 effects on working memory capacity (WMC) as compared with non-ε4 carriers. RNb_v—verbal recall 1-back; RNb_f—spatial-figural recall 1-back; RNbv_1—verbal recall 1-back load level 1; RNbv_2—verbal recall 1-back load level 2; RNbv_3—verbal recall 1-back load level 3; RNbf_1—spatial-figural recall 1-back load level 1; RNbf_2—spatial-figural recall 1-back load level 2; RNbf_3—spatial-figural recall 1-back load level 3

Above education level, APOE ε4 effects may be moderated by task difficulty, assuming that more difficult tasks require the neurocognitive system to dynamically respond to varying challenges (Craik and Byrd 1982). Since factor loadings of the indicators with different difficulty levels were heterogeneous in Model 3 (Fig. 4), we tested genotype effects as a function of task difficulty. Table 3 provides standardized regression weights and significance levels. For all load levels, the regression weights corresponded to performance advantages in favor of APOE ε4 carriers of < .25 SDs. Thus, for both verbal and figural tasks, categorical regression analyses revealed numerically, but not significantly better WMC performance by APOE ε4 carriers. Taken together, Sample 2 suggested that the ε4 allele positively influenced cognitive performance but the effects were small and could not be statistically established.

Table 3 Regression weights illustrating APOE ε4 effect on RNb for different load level

Results provided by Sample 3

In Sample 3, we estimated a latent variable gf/WMC by means of three cognitive tasks: the rotation span task (CSpan_f), also used for Sample 1, the memory updating task (MU) and Raven’s progressive matrices (Rav). We then regressed gf/WMC onto the dummy-coded variables, following a similar coding scheme as for Sample 2 (Fig. 5). The model fit was good: χ2 (2) = 5.72, CFI = .98, RMSEA = .09, SRMR = .03. Model 4 tested the effect of ε4 on gf/WMC as compared with non-ε4 carriers: gf/WMC: .19, p = .26. Results did not reveal a significant effect of APOE ε4 allele on gf/WMC, even though numerically the effect was positive.

Fig. 5
figure 5

Schematic representation of structural equation model (Model 4) exploring the difference between ε4 carriers and non-carriers in the latent factor gf (fluid intelligence). MU memory updating; CSpan_f rotation span; Rav Raven’s progressive matrices

Results provided by Sample 4

In Sample 4, a latent gf variable was estimated by three indicators provided by the Raven test (Rav 1–3, three item parcels). Similar to Sample 3, the latent variable of gf was regressed onto the dummy-coded genotype variable following the same coding scheme. The model fit was very good: \(\chi^{2} \left( 2 \right) = 0.52\), CFI = 1.00, RMSEA = .00, SRMR = .02. Model 5 is depicted in Fig. 6 and indicates a positive—albeit non-significant—effect of the APOE ε4 allele on gf: .23, p = .28).

Fig. 6
figure 6

Schematic representation of structural equation model (Model 6) exploring differences between ε4 carriers and non-carriers in the latent factor of gf. Rav Raven’s progressive matrices

Results based on merged Samples 1 and 2

Data from Samples 1 and 2 overlapped in two performance indicators: verbal recall 1-back (RNb_v) and figural recall 1-back (RNb_f). For a more powerful test, we obtained general RNb_v and RNb_f measures for Sample 2 by averaging difficulty level-specific task indicators and merged the data of Samples 1 and 2, allowing to test genotype effects in a much larger sample of 556 participants in total. In the merged sample, there were 56 non-ε4 carriers without and 331 with high school degrees, 28 ε4 carriers without and 127 with high school degrees. Performance for the two overlapping indicators were regressed onto the genotype variable contrasting APOE ε4 carriers against non-ε4 carriers in separate samples split by education level. The results of four linear regression models with categorical predictors (summarized in Table 4) revealed that in the sample with high school degree, both figural and verbal RNb performance was better among ε4 carriers as compared with non-carriers. The positive ε4 effect was significant for figural RNb. In the sample without high school degree, both figural and verbal RNb performance tended to be worse for ε4 carriers as compared with non-carriers, but the effect was not statistically significant.

Table 4 Regression weights illustrating APOE ε4 effect on RNb in merged sample

Results based on merged Samples 1 and 3

Similar to merging Samples 1 and 2, we also combined Sample 1 and 3 with respect to rotation span performance (CSpan_f). Based on the merged samples, we conducted a linear regression analysis testing the overall ε4 allele effects as compared with non-ε4 carriers on CSpan_f performance in a larger sample with 489 participants in total, providing more statistical power. The results revealed no genotype effects, neither in the split sample with higher education nor with lower education. Regression weight for the split sample without high school degree was .02, p = .53, and for the split sample with high school degree it was − .00, p = .9.

Discussion

The general aim of this study was to explore APOE genotype effects on cognitive abilities in young adulthood. Building upon well-established knowledge that the APOE ε4 allele is associated with increased risk for developing Alzheimer’s Disease in older age, it is of crucial interest to investigate whether cognitive performance decrements or benefits are already evident at a younger age. For the purpose of the present study we were able to use a large amount of data and multivariate datasets collected in four independent samples, all including genotyping for APOE polymorphism and multiple cognitive tasks. We argued that existing controversies about APOE effects on cognitive ability in younger age may be resolved by a multivariate approach that allows to control for measurement errors and specificity of measurement methods. Therefore, we applied Structural Equation Modeling to capture the common variance among multiple cognitive tasks tapping into the same abilities. The four available samples were first analyzed separately because only a few assessments overlapped across the four studies. We then merged selected samples with overlapping cognitive tasks across studies in order to extend our analyses to a larger number of participants. This strategy allowed more powerful tests of genotype effects but went along with disadvantages associated with phenotype measurements based on univariate assessments. Results can be summarized as follows: (1) Generally, APOE ε4 effects on cognitive performance were equivocal. (2) Cognitive performance in different genotype groups depended on education level. In low-education groups, APOE ε4 carriers performed worse than non-ε4 carriers. However, in participants with higher education (i.e., above high-school degree) cognitive performance was either indistinguishable between APOE ε4 carriers and non-carriers, or genotype effects tended to be numerically positive, suggesting slightly better performance in favor of APOE ε4-carriers.

APOE genotype associations

Sample 1 revealed statistically significant negative APOE ε4 effects on WMC, assessed as a latent variable integrating performance in binding, updating, recall 1-back and complex span tasks. However, in Sample 2, the effect reversed, even when the genotype effect was tested separately on WMC tasks with different levels of difficulty. Moreover, although not statistically significant, the reasoning factor established in Samples 3 and 4 was associated with APOE ε4 in the same direction as in Sample 2, that is, ε4 carriers performed better than non-carriers. Thus, overall we are inclined to conclude that at least in some cases the effect of the APOE ε4 allele is negative; in other cases it is not conclusive. The conclusions in the literature are probably mixed because of varying roles of education. Furthermore, results provided by Sample 1 revealed that the ε4 effect on WMC, SM and gf differed from each other, even though the three latent variables themselves were highly correlated (see Fig. 2). As revealed by Model 2, the ε4 effect on gf was much larger than on SM and WMC, indicating that gf might be more strongly influenced by the genotype than the other functions. In Sample 3, the combined gf/WMC factor was estimated on the basis of a broader variety of task types including both reasoning and WMC assessments than in the other samples, and thus, these two highly correlated but not completely isomorphic latent variables were confounded. For this reason, we conclude that the genotype effect on cognitive ability in Sample 3 may partly be blurred, since both the gf (Rav) and WMC (CSpan) tasks were used to estimate one latent factor. In Sample 4, only reasoning task measures were used to indicate gf, consequently leading to a larger effect size, even though there was less power due to the smaller size of Sample 4 (N4 = 206) as compared with Sample 3 (N3 = 244). The results of Sample 1, showing differential associations between APOE and different cognitive abilities in terms of effect sizes indicate that genotype differences for WMC and reasoning should be investigated separately. Especially, tasks for memory and mental transformation should be distinguished in future research exploring the association of genetic polymorphisms with cognitive abilities at the latent variable level.

The role of education

In our analysis of Sample 1 data, latent cognitive performance variables were regressed on both, genotype group and education level, as well as their interaction. For Samples 2, 3 and 4, the interaction model could not be applied because power was not sufficient for the low-education group, especially there were only few APOE ε4 carriers with low education. In the merged Samples 1 and 2 and 1 and 3, with higher statistical power we were able to separately investigate the ε4 effects on task performance in high and low education groups. Results indicate that the negative APOE ε4 effect found in Sample 1 alone can basically be attributed to lower education level. After controlling for education, there were no overall effects of ε4 on WMC. Thus, negative genotype effects were only present in low education participants. This is in accordance with previous findings (e.g., Arenaza-Urquijo et al. 2015) that better education may help postpone the onset of cognitive impairments among APOE ε4 carriers.

Strength and novelty of the present study

The most notable contributions and novelties of the current study derive from estimating dependent phenotypic variables on the latent level, indicated by multiple behavioral tasks and associating them with APOE genotypes in young adulthood. We argue that associations between complex psychometric measurements and APOE gene polymorphisms were inconsistent across previous studies possibly because task specificity had not been accounted for. We aimed to address this scenario with the present comprehensive set of data, accounting for measurement specificity by latent variable modeling. We had good reason to empirically address this argument because—to give an example—if we had examined the single task (univariate) data collected in the first sample separately, we would have observed regression weights for single task indicators (Table S1 in Supplementary Material) that were unstable and inconsistent through all single tasks. This can be seen in the large p-values or rather small effect sizes obtained in some of the tasks. Consequently, from a psychometric point of view, the best way of dealing with such inconsistencies is to generalize across single tasks by using latent variables. Building upon the well-established latent variable model of WMC (Wilhelm et al. 2013), the results in this specific sample became more robust. However, we show that APOE effects on cognitive ability are complex and inconsistency may arise by heterogeneity in education level, task domains and difficulty.

Possible limitations

Whether we find the APOE ε4 allele to be positively or negatively associated with cognitive ability may depend on various factors. In our samples all participants were healthy young adults, while in some previously published studies the inferior cognitive performance of APOE ε4 carriers was also associated with mental disorder (e.g., Eramudugolla et al. 2014). Further, in all our samples, the age of participants was around 20–40 years, which is probably a large range as compared to previous studies (Rusted et al. 2013). In the study by Eramudugolla et al. (2014), cognitive abilities were recorded among participants grouped into different age cohorts, each within an age range of 4 years. In fact, whether the association of APOE ε4 with cognitive ability depends on age and gender has been discussed by Rusted and Carare (2015) who concluded that systematic age and gender differences play an important role in gene-behavior relationships and may interact with other biological factors such as estrogen level. Another limitation of the present study is that the size of the individual samples may not be large enough. Because of the small effect sizes, very large samples are needed for genetic association studies (Izaks et al. 2011) with sufficient statistical power. In our study the individual samples were relatively small for certain genotypes and can only provide a preliminary step making a case for a multivariate approach in future studies.

Conclusions

The aim of the present study was to explore how APOE ε4 is associated with cognitive abilities among young adults. Importantly, this study is the first to generalize across single measurement approaches of the cognitive phenotypes of secondary memory, working memory and reasoning. From the results as they stand, ε4 allele has a negative effect on cognitive performance when associated with lower education. The evidence for negative pleiotropy (positive effects) of APOE ε4 in young adults is, however, weak, because of the role of education in the network of potential effects on cognitive abilities. Importantly, protective factors such as discussed in the literature on cognitive reserve (Valenzuela and Sachdev 2006) need to be taken into consideration in future studies on APOE polymorphism effects on cognitive ability in young adults.