Introduction

Neurodevelopment is a complex interdependent system, and yet epidemiological studies of neurodevelopment generally consider performance across dimensions of neurodevelopment in isolation. For example, studies may report on the relationship between a sociodemographic variable and childhood intelligence, executive functioning, or behavior, but not on all three dimensions simultaneously. Modern approaches have emphasized the need for “deep phenotyping”. This approach uses statistical models that reflect the known complexity and interrelatedness of neurodevelopmental processes, and considers sources of variability within and across developmental or psychopathological domains [1].

One statistical approach to deep phenotyping involves dimension reduction techniques, which take advantage of information in patterns of covariance across instruments. There are major conceptual advantages in jointly modeling domains of neurodevelopment. Accounting for interrelatedness between developmental domains may be more clinically relevant, since behavioral traits rarely present in isolation. This may be particularly true for children with a pathological diagnosis. For instance, the hallmarks of an ADHD diagnosis include problems with attention, hyperactivity, executive functioning, and impulsivity [2]. Patients with ADHD often also present with anxiety, conduct disorder, depression, and difficulties in forming social relationships [3]. Even in a general population sample, traits such as anxiety and depression are often highly comorbid [4], implying potentially common neurological underpinnings for these traits. Despite these co-occurring patterns, there is a trend in etiological studies to either focus on a holistic outcome such as ADHD, or to adopt a trait-driven approach and to statistically assume characteristics such as anxiety or depression are independent. However, in a general population sample, a more nuanced approach that accounts for correlational patterns across traits may be more neurologically and clinically relevant.

Many perinatal, social, behavioral, environmental, and demographic characteristics have been associated with neurodevelopment [5, 6]. However, few etiological studies have attempted to jointly model domains of neurodevelopment while accounting for their interrelationships. As neurological capabilities scaffold into phenotypes during childhood, examining the associations between prenatal and early life characteristics and neurodevelopmental phenotypes may provide more insights into underlying etiological pathways. Our goal was to estimate associations between perinatal, social, demographic and behavioral characteristics and neurodevelopmental phenotypes while accounting for outcome interdependencies using a phenotyping approach.

Methods

Study Population

The Mount Sinai Children’s Environmental Health Study is a prospective cohort study of primiparous women with singleton pregnancies who delivered at the Mount Sinai Hospital in New York City between May 1998 and July 2001 [7]. Mother/infant pairs were followed from pregnancy until the child was 7–9 years of age. Women were recruited during prenatal visits at either the Mount Sinai Diagnostic and Treatment Center, which serves a predominantly East Harlem population, or at one of two private practices on the Upper East Side of Manhattan. Exclusions have been detailed elsewhere [9]. After exclusions, there were 404 women with available birth data. Additionally, a small number of women who enrolled after birth and participated in follow-up visits are included in the neurodevelopmental principal components analysis (n = 48). Participants were invited to return for neurodevelopmental follow-up visits with their child at ages 1, 2, 4–5, 6, and 7–9 years (Appendix 1).

Child Behavior and Executive Functioning

We measured children’s executive functioning with the behavior rating inventory of executive functioning (BRIEF). The BRIEF is a parent-report assessment of the child’s executive functioning [10], which consists of 86 items that are rated on a 3-level scale from “never” to “almost always”. Validity studies report good reliability with high test–retest reliability (mean r s = 0.81 for parents across scales) and internal consistency (Cronbach’s alphas range from 0.80 to 0.98 across scales) [10]. Individual items are summarized into eight clinical scales (Initiate, Working Memory, Plan/Organize, Organization of Materials, and Monitor, and their age-normed T-scores. Clinical scales are then collapsed into two indices, the Behavioral Regulation Index and the Metacognition Index, which also include age-normed T-scores. The Behavioral Regulation and Metacognition indices are combined into the Global Executive Composite score. Mothers completed the BRIEF at the 4–5, 6, and 7–9 year follow-up visits. We used the average T-scores across visits.

We assessed children’s problem and adaptive behaviors in the home and community setting with the parent report version of the behavioral assessment system for children (BASC) [11]. Internal consistency reliability of this instrument is good (Cronbach’s alphas average 0.80 across scales and ages), and test–retest reliabilities are also high (mean r s = 0.85 for preschool, mean r s = 0.87 for children ages 6–11) [12]. Parents completed a survey consisting of over 200 items that describe the frequency of a specific behavior on a four point scale from “Never” to “Almost Always”. Items are collapsed into clinical and adaptive scales, which are normed to the general population by child age to produce T-scores. Clinical and adaptive scales are then consolidated into composites, which are also age normed. These composites included externalizing problems (aggression, hyperactivity, conduct problems), internalizing problems (anxiety, depression, somatization), adaptive skills (adaptability, leadership, social skills), and the Behavioral Symptoms Index (aggression, hyperactivity, anxiety, depression, attention problems, atypicality). Withdrawal is the only scale that is not included in a composite. The BASC was completed at the 4–5, 6, and 7–9 year visits, and we used the average T-scores across visits. We also calculated measures of internal consistency for the BASC and the BRIEF for our sample using Cronbach’s alphas.

Psychometric Intelligence Testing

Children were administered the Wechsler Preschool and Primary Scales of Intelligence-III (WPPSI-III) at age 6 (mean age = 6.2, SD = 0.2), and the Wechsler Intelligence Scales-IV (WISC-IV) between the ages of 7–9 years (mean age = 7.8, SD = 0.8). A Full Scale IQ (FSIQ) is generated from the age-normed composites. Because the FSIQ from the WPPSI-III and WISC-IV are highly correlated (r s = 0.84 in our population), if a child returned for both visits, we preferentially used the WISC-IV scores for all subtests. In our analyses, we were only able to include subscales that were administered in both the WPPSI-III and the WISC-IV (see "Appendix 2" for full list of included scales).

Covariates

During their third trimester and at follow-up visits, mothers completed questionnaires that assessed a variety of sociodemographic, occupational, environmental, medical history and behavioral characteristics. We classified maternal characteristics as follows: maternal education at follow-up (high school or less vs some college or higher), maternal age (< 20, 20–25, > 25), maternal race/ethnicity (white/non-white), smoking during pregnancy (we examined both a binary form of this variable that included any smoking during pregnancy vs no smoking, and a three level categorical variable that included no smoking, smoking in first trimester only, smoking in either second or third trimesters), alcohol use during pregnancy (ever/never), and canned fish consumption during pregnancy (≥ 1 time per week or < 1 time a week). Frequency of canned fish consumption was queried in the prenatal questionnaire with the options of never, less than 1×/month, 2×/month, 1×/week, 2×/week, and 3×/week or more. We dichotomized this variable at 1x/week based on prior literature.

A perinatal database at the Mount Sinai Department of Obstetrics, Gynecology, and Reproductive Science was used to abstract delivery characteristics and birth outcomes, including head circumference, birth weight, birth length, and gestational age. We categorized gestational age into preterm (< 37 weeks) or term (37 weeks or more) and used a continuous measure of head circumference (centimeters). Birth weight and length were dichotomized at the median (< 51 cm vs ≥ 51 cm for birth length; < 3270 vs ≥ 3270  g for birth weight).

The home observation for measurement of the environment (HOME scale) (Caldwell and Bradley 1984) was administered at 12 and 24 months, and include involvement, learning materials, organization, acceptance, responsivity, and variety. Descriptions are included in "Appendix 3". We used mean overall HOME scores and HOME subscale scores across the 1 and 2 year visits. The HOME overall score exhibited a wide range with sufficient variability across scores and was included as a continuous variable. However, the HOME subscale scores exhibited a limited range with most observations clustered at the higher end of the distributions. We thus categorized the HOME subscale scores into tertiles and included them as ordinal categorical variables. The HOME subscale scores met assumptions for linearity when categorized in this fashion.

Maternal intelligence was assessed during pregnancy using the Peabody Picture Vocabulary Test-III [13].

Statistical Methods

Characteristics by Follow-Up Status

We examined demographic characteristics of mothers by follow-up and covariate status. We used Chi square goodness-of-fit tests with an alpha cutoff of 0.05 to assess if mothers from the original birth cohort (n = 404) who returned for follow-up and completed the BASC, the BRIEF, the WPPSI-III/WISC-IV differed from mothers who did not return for a complete follow-up visit.

Principal Components Analysis

We performed dimension reduction on the BASC, BRIEF, WPPSI-III, and WISC-IV by using a principal components analysis with an orthogonal varimax rotation. We included standardized versions of both the composite scores and the subscales of the instruments. The neurodevelopmental scales included in this analysis are listed in "Appendix 2". We examined criteria for factorability, including Kaiser’s measure of sampling adequacy and Bartlett’s test of sphericity, both of which assess the suitability of the data for principal components analysis based on correlations among the variables [14]. We also examined communalities to assess suitability of items, and average communality size to assess adequacy of sample size [15]. To determine the number of factors, we examined factors with eigenvalues greater than one, and also considered parallel analysis to optimize the number of factors selected [16]. Parallel analysis computes eigenvalues from a correlation matrix that is derived from a random dataset with the same numbers of observations and variables as the original data, and compares them against the eigenvalues from the observed data. We also report measures of internal consistency and reliability for the factor structure. These include Cronbach’s alphas for each factor item within each factor, an overall alpha, the overall McDonald’s Omega (total and hierarchical), and factor specific omega total values. In order to aid interpretation, we scaled factors so that positive/negative attributes go in the same direction across all factors in the regression analyses. Thus, positive scores on all of the factors indicate better outcomes and negative scores indicate more adverse outcomes. For regression analyses, we standardized all factors to have a mean of 0 and a standard deviation of 1. Thus, a beta coefficient of one can then be interpreted as an increase of one standard deviation of the factor.

Sensitivity Analyses

In sensitivity analyses, we assessed whether the factor structure differed for whites and non-white, whether the factor structure differed with a promax rotation, and whether the factor structure differed when restricting to those children with prenatal data. Additionally, to justify using mean scores over time, we assessed whether BRIEF and BASC scores were stable over time using two methods. First we re-examined the factor structure at each time point, and we examined Spearman’s correlation coefficients of the composite scores for participants with measurements at multiple time points.

Finally, we also considered an exploratory structural equation model (performed in R, with package psych), of the variables on the full population (n = 210), with the same parameters specified as in the original PCA.

Association Analyses of Early Life Characteristics with Neurodevelopmental Factors

We estimated associations between characteristics hypothesized to be associated with neurodevelopment, and orthogonal varimax-rotated factors, in mutually adjusted analyses. We considered covariates that have previously been hypothesized to be associated with neurodevelopment. These covariates included maternal education, maternal race, smoking during pregnancy, alcohol consumption during pregnancy, canned fish consumption during pregnancy, birth head circumference, preterm birth, child sex, and the HOME subscale scores of organization, learning materials, involvement, and variety. Although canned fish consumption is not widely regarded as a critical predictor of neurodevelopment, there is a substantial recent literature that does support prenatal fish consumption as an important contributor to neurodevelopment [16,17,18,19,21]. We multiply imputed missing covariate data and estimated associations between early life characteristics and factor scores in multivariable linear regression models (PROC GLM with PROC MI and MIANALYZE in SAS, 10 imputations). In sensitivity analyses, we evaluated whether associations were different by white race.

In order to evaluate whether using the phenotypes resulted in different effect estimates compared to a more traditional approach of using instrument-specific composite scores, we performed a case study of a single exposure with well-characterized associations with neurodevelopment: smoking. We estimated associations between smoking and the phenotypes, and also estimated associations between smoking and neurodevelopment as measured by the highest loading composite item from each factor. In these analyses, we did not adjust for birth characteristics as they may be intermediate on the causal pathway between smoking and neurodevelopment. The final adjustment set was otherwise the same as in the primary analyses.

All statistical analyses were performed in SAS V9.4. PCA characteristics including alphas, omegas, and the Exploratory Structural Equation Model were performed in Rv3.3.1.

Results

Study Population Characteristics and Follow-up

Of the 404 eligible women who enrolled during pregnancy, 162 returned for at least one visit between 6 and 9 years and completed the BASC, the BRIEF, and the WPPSI-III/WISC-IV. An additional 48 women who enrolled after birth had at least one complete visit between 6 and 9 years, although these women had no prenatal data available (Appendix 1 and Table 1). 210 participants in total returned for at least one complete visit between ages 6–9 years, and these participants were all included in the principal components analysis. The majority of the 162 participants with prenatal data were young (64.9% under 25 at delivery) and non-white (82.5%). Most participants reported not drinking alcohol (83.0%), not smoking during pregnancy (82.7%), and most had an education of high school or less at enrollment (73.3%; Table 1). The distributions of education at enrollment, maternal age at delivery, race, alcohol consumption during pregnancy, and smoking during pregnancy were generally similar among those who did and did not return for follow-up, although mothers who were single, divorced, or widowed were more likely to return for follow-up than mothers who were married (p = 0.03). Internal consistency measures were good for our sample, and compared favorably with previously reported population measures of internal consistency (with > 0.80 average for the BASC, and > 0.90 average for the BRIEF).

Table 1 Characteristics of study population at enrollment and follow up, Mount Sinai Children’s Environmental Health Cohort

Principal Components Analysis

Kaiser’s measure of sampling adequacy was 0.71, above the standard of 0.60 [22], and Bartlett’s test of sphericity was significant (χ2 (666) = 13,875, p < 0.01). Parallel analysis indicated six factors had eigenvalues greater than those generated from random data, while seven factors had eigenvalues greater than one. The seven factor solution was almost equivalent to the six factor solution, with the seven factor solution including a separate factor for verbal intelligence. In the six factor solution, the items for verbal intelligence loaded with perceptual reasoning items. We selected the seven factor solution because perceptual reasoning and verbal intelligence capture different aspects of intelligence [23], and it had both good statistical fit based on the eigenvalues and was in line with previous literature on neurodevelopment. All neurodevelopmental scales loaded on at least one factor at > 0.30, and all had sufficiently high communalities (all scales had communalities > 0.50, and the average communality was 0.79), thus all scales were retained. Factor structures were similar for varimax and promax rotation. In order, the seven factors explained 37.92, 13.71, 7.86, 6.33, 5.10, 4.25, and 3.05% of the variance in the data, for a total of 78.22%. McDonald’s omega hierarchical and Cronbach’s alpha indicated good fit, with values of 0.71 and 0.93, respectively. The standard cutoff for research purposes for the omega hierarchical value is 0.70. Cronbach’s alphas for each factor item were also high across scales, generally above the standard cutoff of 0.8 (Supplemental Table 6).

In order of variance explained, these seven factors are herein described as: (1) impulsivity/externalizing, (2) executive functioning, (3) internalizing, (4) perceptual reasoning, (5) adaptability, (6) processing speed, and (7) verbal intelligence (Table 2).

Table 2 Varimax rotated factor pattern structure and item loadings of childhood neurodevelopmental scales (n = 210)

Sensitivity Analyses

In sensitivity analyses we examined consistency of the factor structure by race (data not shown). The principal components analysis was similar when conducted separately for whites and non-whites. The factor structure was also similar when using a promax rotation, and when restricting to children with available prenatal data. Using an exploratory structural equation model similarly resulted in a seven factor solution. BRIEF and BASC scores were additionally stable over time. The seven-factor solution produced the same factors when using behavioral scores from each time point, rather than the means. Additionally, correlations of behavioral composite scores over time were strong, ranging from a low of 0.50 for some indices and cross-time comparisons, to 0.73 for others (see Supplementary Table 5)”.

Early Life and Neurodevelopment Factors

After accounting for the interrelationships among neurodevelopmental outcomes there were several notable associations (Table 3). The strongest associations for modifiable characteristics were for canned fish consumption, education, and preterm birth. Mothers who consumed canned fish at least once a week had children who scored half a standard deviation higher on the perceptual reasoning factor (\(\hat{\beta}\) 0.50, 95% CI 0.03, 0.97). In contrast, children of mothers with a high school education or less had Verbal Intelligence factor scores approximately half a standard deviation lower than children of mothers with a higher education level (\(\hat{\beta}\) − 0.47, 95% CI − 0.78, − 0.17). Preterm birth was also associated with more adverse Processing Speed (\(\hat{\beta}\) − 0.72, 95% CI − 1.31, − 0.13), and Internalizing (\(\hat{\beta}\) − 0.62, 95% CI − 1.23, − 0.02) factor scores. Of the HOME scores, only Organization displayed any associations with neurodevelopment, with a one tertile increase (corresponding approximately to slightly more than a one-point increase in the Organization score) resulting in a quarter of a standard deviation improvement in Executive Functioning (\(\hat{\beta}\) 0.26, 95% CI 0.04, 0.49), and a small tenth of a standard deviation improvement in Adaptability factor scores (\(\hat{\beta}\) 0.10, 95% CI 0.00, 0.19).

Table 3 Covariate-adjusted linear regression associations between prenatal and early life characteristics and neurodevelopmental factors in the Mount Sinai Children’s Environmental Health Cohort (n = 162)

Of the non-modifiable characteristics, white race was associated with stronger perceptual reasoning and verbal intelligence scores (perceptual reasoning \(\hat{\beta}\) 0.68, 95% CI 0.25, 1.10; Verbal Intelligence \(\hat{\beta}\) 0.81, 95% CI 0.42, 1.20), but was not associated with any other neurodevelopmental outcome. Girls averaged much higher Adaptability scores (\(\hat{\beta}\) 0.54, 95% CI 0.24, 0.84) and higher Processing Speed scores (\(\hat{\beta}\) 0.31, 95% CI 0.00, 0.62). Finally, larger head circumference at birth was associated with worse Executive Functioning factor scores (\(\hat{\beta}\) − 0.12, 95% CI − 0.22, − 0.01), but better Perceptual Reasoning (\(\hat{\beta}\) 0.10, 95% CI 0.00, 0.19).

These associations were generally similar in the crude analyses, although some additional characteristics were found to have significant bivariate associations that were attenuated in multivariable adjusted models (Appendix 4). These multivariate regression associations also generally held in strata-specific analyses for whites and for non-whites, and were similar when additionally adjusting for maternal IQ, maternal marital status, and maternal age (data not shown).

As a case study, to compare associations across analysis methods, we examined the relationship of maternal prenatal smoking with neurodevelopment as measured by factor scores and the instrument-specific composite scores (Table 4). Of the 162 participants included in the analyses, 28 mothers reported any smoking during pregnancy, 17 of whom reported quitting before the second trimester.

Table 4 Comparison of prenatal smoking and neurodevelopment associations by analysis method (n = 162)

Any smoking during pregnancy was associated with worse impulsivity and externalizing factor scores in both methods assessed (impulsivity and externalizing factor \(\hat{\beta}\) − 0.51, 95% CI − 0.92, − 0.10; externalizing composite (\(\hat{\beta}\) − 0.60, 95% CI −1.00, − 0.21). However, while smoking during pregnancy was not associated with worse Executive Functioning factor scores, it was associated with approximately 1/2 a standard deviation worse Metacognition Index scores (Executive Functioning factor \(\hat{\beta}\) − 0.32, 95% CI − 0.72, 0.10; Metacognition Index \(\hat{\beta}\) − 0.47, 95% CI − 0.86, − 0.09). This trend was more pronounced when examining associations between smoking in later pregnancy and associations with executive functioning (EF Factor \(\hat{\beta}\) − 0.32, 95% CI − 0.94, 0.30; Metacognition Index \(\hat{\beta}\) − 0.55, 95% CI −1.12, 0.03). The smoking-neurodevelopment associations were generally stronger for participants who smoked during late pregnancy. While smoking in later pregnancy was negatively associated with the adaptive skills composite (adaptive skills composite \(\hat{\beta}\) − 0.66, 95% CI −1.23, − 0.10), the association was closer to the null for the corresponding factor (Adaptability factor \(\hat{\beta}\) − 0.27, 95% CI − 0.83, 0.31). Smoking was not associated with the internalizing composite or Internalizing factor scores, or with any of the IQ composites or factor scores. Overall, accounting for the correlations among outcome measures by rotating factors to be uncorrelated with one another resulted in attenuation of estimates for all but the Impulsivity and Externalizing factor, though all estimates were on the same side of the null.

Discussion

Summary

We identified seven factors that together captured 78% of the variation in a principal components analysis of the BASC, BRIEF, and WPPSI-III/WISC-IV in children between 6 and 9 years old. These factors were: (1) impulsivity/externalizing, (2) executive functioning, (3) internalizing, (4) perceptual reasoning, (5) adaptability, (6) processing speed, and (7) verbal intelligence. Although these factors roughly align with the composite indices of the included instruments, items from both the BASC and the BRIEF loaded onto all of the first three factors. This implies the existence of a meaningful correlational structure that is perhaps as strong across the BASC and the BRIEF as it is within-instrument: the Behavioral Regulation Index of the BRIEF and the Externalizing Index of the BASC appear to measure a similar underlying domain and load together, while the Metacognition Index of the BRIEF and the attention subscale from the BASC appear to measure another unique domain and load together.

Additionally, etiological interpretations were different for factor scores and composite scores. Although smoking in later pregnancy was associated with externalizing, executive functioning, and adaptive skills when using the composite scores, it was associated with only the impulsivity and externalizing factor when using the orthogonal varimax rotated factor scores, which rotates factors to be uncorrelated with one another. This implies that at least part of the association between smoking during pregnancy and the Metacognition Index and the adaptive skills composite may be due to the correlation between those constructs and the externalizing composite. Accounting for such correlations by using factor scores may portray associations that are more accurate than using instrument-specific composites alone. Finally, the modifiable characteristics of education, canned fish consumption during pregnancy, preterm birth, and HOME Organization were associated with different neurodevelopmental factors.

Factor Structure

The factors to some extent aligned with measurement method. Specifically, scales from the BASC and the BRIEF loaded together without IQ items on four of the seven factors, while IQ items in the Processing Speed and Perceptual Reasoning factors loaded independently of the BASC and the BRIEF. The only exception was that organizational deficits from the BRIEF loaded positively on the Verbal Intelligence factor. Importantly, these clustering patterns also aligned with the measurement constructs of the instruments: the BRIEF and the BASC are both based on parent-report, while the WPPSI-III/WISC-IV are performance-based and assessed by research personnel. A limitation of the analysis is that we were unable to include multi-method measurements of the same neurodevelopmental outcome; for instance, our analysis included parent-report measures of executive functioning but lacked performance-based assessments of executive functioning. Previous factor analyses on performance-based and parent-report measures of executive functioning have shown that measurement method may explain more variance than the underlying domain [24, 25]. In those studies, parent-reported executive functioning and performance-based executive functioning loaded on separate factors, and the correlation between them was low, suggesting they measured different underlying features of executive functioning. It is possible that this difference in measurement method drove the differential loadings of performance-based intelligence and parent-reported behavior and executive functioning in this analysis. Since factor structures of neurodevelopment vary according to the included instruments, specific phenotypes may vary across studies as different studies adopt different measurement tools. However, generalizable phenotypes may emerge from factor analyses of various instruments, such as the Impulsivity and Externalizing phenotype, which may help in interpreting etiologic associations across different studies.

Early Life Characteristics and Neurodevelopmental Factors

Several maternal and behavioral characteristics, as well as features of the home environment, were associated with factor scores after accounting for their interdependence. These associations were generally consistent with prior literature. Of the modifiable characteristics, there were particularly strong associations for consuming canned fish at least once a week during pregnancy and Perceptual Reasoning factor scores in childhood, higher maternal education and Verbal Intelligence factor scores, and preterm birth and Internalizing and Processing Speed factor scores, all of which are consistent with prior literature [17, 18]. Although canned fish is often a source of other contaminants which have been associated with adverse neurodevelopment, such as PCBs and mercury [26], fish is also a source of other beneficial nutrients such as polyunsaturated fatty acids. Several studies of prenatal fish consumption support our findings that the benefits of fish consumption may outweigh the negative effects of environmental contamination [18, 27].

The only association that appeared inconsistent with prior findings was the relationship between head circumference at birth and Executive Functioning. Interestingly, while larger birth head circumference was associated with more deficits in Executive Functioning factor scores, it was also associated with better Perceptual Reasoning factor scores, in both unadjusted and adjusted analyses. While children with autism have larger head circumferences during childhood than average [28], and have more problems with executive functioning [29], these children also have smaller head circumferences at birth [28]. In the animal literature, larger brain size across species has been associated with higher levels of inhibitory control [30], although this ecological data does not necessarily apply within-species or at birth. This area of research is therefore relatively unexplored, and further research is necessary before drawing more definitive conclusions.

A final limitation is the number of statistical tests performed in the process of this analysis, with 11 early childhood predictors and 7 possible outcome factors. We did not implement a correction method such as Bonferroni due to the exploratory nature of this analysis and because covariates were chosen due to their hypothesized association with neurodevelopment, therefore Bonferroni correction is likely to be overly conservative [31]. Our methods inherently reduce the dimensionality of the outcome space by condensing multiple neurodevelopmental assessments into seven factors instead of many-times as many individual outcome scales. Additionally, effect size and confidence intervals are both integral to interpretation, and multiple testing corrections only affect the p value. Nonetheless, results should be interpreted with caution, and we cannot exclude the possibility that some associations may be due to chance.

Maternal Prenatal Smoking and Neurodevelopment

Consistent with the prior literature, children of participants who reported smoking during later pregnancy had worse behavioral scores for the BASC’s externalizing composite, the BRIEF’s Metacognition Index, and the BASC’s adaptive skills composite, after adjustment for maternal education and race. However, after applying a rotational method to the factors that implemented statistical independence of the factors, only the association between smoking and the impulsivity and externalizing factor remained in our study. The relationship between prenatal exposure to cigarettes and externalizing behaviors, ADHD, and executive functioning deficits in childhood has been frequently reported [32, 33]. However, observed relationships between prenatal cigarette smoking and deficits in executive functioning may partially reflect associations between smoking and impulsivity and externalizing behaviors [34], due to the high correlation between these two constructs (rs for the externalizing composite and Metacognition Index in our population = 0.62).

It should be noted that the reported association between smoking and Impulsivity and Externalizing may also be confounded by postnatal exposure to smoking and unmeasured maternal traits that increase both the propensity for smoking and risk of externalizing or executive functioning disorders (e.g., parental ADHD) [34], which we were unable to control for. Although we report the strongest associations for maternal smoking in the second and third trimesters with Impulsivity and Externalizing factor scores, there are a number of potential caveats for these findings. Women who fail to quit smoking during pregnancy may smoke more per day and over a longer period. Children of women who smoked in later pregnancy may thus have a higher cumulative dose as well as a longer duration of exposure. Although the estimates for maternal first trimester smoking are not significant, they are in the same direction as the later pregnancy estimates. However, women who quit smoking before the second trimester may also have fewer problems with impulse control and addictive behaviors. Such characteristics may be either genetically or environmentally related to their children’s propensity for such behaviors, and could confound associations between maternal smoking and childhood behavior. Additionally, the numbers of participants in each smoking category was relatively small; only 11 women reported any smoking in later pregnancy. However, the patterns of associations were similar when including all 28 women who reported any smoking during pregnancy, so the small numbers might not pose a significant threat.

Strengths and Weaknesses

A major strength of this analysis is the ability to examine multiple characteristics and multiple neurodevelopmental outcomes while accounting for interdependencies among the covariates and the outcomes. The longitudinal, prospective nature of the original study allowed us to examine whether characteristics from the prenatal stage were associated with outcomes in later childhood. Dimension reduction enabled the simultaneous examination of a wide range of behaviors and cognitive outcomes, while taking advantage of correlational structures underlying different instruments. Accounting for the correlational structure among the factors may help clarify associations of specific characteristics with specific neurodevelopmental phenotypes, as we demonstrated in the case study of maternal smoking. Another advantage is the reduction in number of tests performed; examining associations with each subscale of each instrument would pose a larger threat from multiple testing. Finally, the principal components approach we employed presents conceptual advantages: factors that consolidate information across scales may provide a richer source of information than any single scale or composite. In etiological analyses, such factors may more closely mirror biological pathways that may be affected by prenatal and early life characteristics.

The most notable weakness is loss to follow-up; approximately 60% of the sample recruited at birth did not return for any follow-up assessment between the ages of 4–9 years. However, the distribution of the characteristics was mostly equivalent across these groups, so loss to follow-up was not influenced by any known covariates, with the possible exception of marital status. The loss to follow up did result in a reduced sample size, which may influence the quality of the principal components analysis. Earlier recommendations for the necessary sample size for factor analysis range from at least 100 to at least 300, although later studies have suggested that the sample size is less important as long as communalities are high and there are enough items with strong loadings on each factor [15]. Our principal components analysis had strong communalities and several items loaded highly on each factor, so the sample size of 210 for deriving the factor structure is likely adequate.

Another weakness is that although we were able to include important covariates in all models, only the models for smoking were built etiologically to examine associations with neurodevelopment. Depending on the “exposure” of interest, the associations in the general predictive models may include mediators or adjust for colliders, which may result in biased associations [35].

A feature of the study that is both a strength and a limitation is that the study population was quite diverse and included both wealthy, mostly white mothers from the Upper East Side of Manhattan, and low-income, mostly minority mothers from East Harlem. Although these neighborhoods are adjacent, their socioeconomic features are widely divergent. Regardless, the multivariate regression analyses remained similar for whites and non-whites, and the factor structure was similar for both whites and non-whites, suggesting such racial stratification may not be a significant threat.

In summary, we identified several maternal, birth, and home environment characteristics that were associated with neurodevelopmental phenotypes in linear regression analyses in a racially and socioeconomically diverse, urban cohort of mother–child pairs enrolled during pregnancy. We demonstrated that associations between smoking and correlated outcome domains may be substantially attenuated after accounting for their correlational structure. This “deep phenotyping” approach, that takes advantage of orthogonal rotation techniques, may more accurately represent associations with correlated neurodevelopmental domains than more traditional approaches that use instrument-specific composite scores. Phenotyping approaches may thus be useful in future etiological analyses of neurodevelopment in rich datasets with sufficient participants and measurements of multiple neurodevelopmental outcomes.