1 Introduction

Fish is a low fat, high protein food containing nutrients that are important for optimal fetal brain and nervous system development (Innis 2007; Ralston and Raymond 2010). However, all fish also acquire small amounts of naturally occurring methylmercury (MeHg) from the aquatic food chain (National Research Council 2000). MeHg in adequate concentrations is a neurotoxicant that can affect the central nervous system (ATSDR 1999). The level of exposure associated with adverse effects is presently unknown. The fetus is known to be especially vulnerable to MeHg and the highest risk is considered to be during pregnancy (ATSDR 1999). Recent updates from U.S. federal agencies have encouraged fish consumption that is low in mercury by women of child bearing age for nutritional benefits (EPA/FDA 2014).

“The Seychelles Child Development Study (SCDS) is a longitudinal cohort study designed to test the hypothesis that prenatal MeHg exposure from a diet high in fish is associated with adverse neurodevelopmental outcomes. The Seychellois consume ocean fish daily, but do not consume sea mammals or fresh water fish” (Davidson et al. 2011, p. 712). Comprehensive neurodevelopmental assessments on the SCDS Main Cohort were carried out at 10 different ages between 6 months and 24 years and have found no consistent pattern of evidence to support the hypothesis that prenatal MeHg exposure from consumption of fish with naturally acquired levels is associated with delays in neurodevelopment. In fact, we have observed improved performance on some endpoints associated with increasing prenatal MeHg in the range achieved by fish consumption. These benefits were found as early as 29 months (Davidson et al. 1995) and were still present in the evaluation at 17 years of age (Davidson et al. 2011) and most recently at 22 and 24 years of age (van Wijngaarden et al. 2017). These findings have been attributed to the influence of the nutritional benefits of fish, albeit the nutritional status indicators during pregnancy were not measured in the Main Cohort. Subsequently we measured maternal nutritional status in a new cohort and found evidence supporting the benefit of certain nutrients (Davidson et al. 2008; Strain et al. 2008; Lynch et al. 2011; Strain et al. 2012).

Studies in the Faroe Islands have reported some adverse associations between prenatal MeHg exposure and child development (Grandjean et al. 1997; Debes et al. 2006). However, the Faroes population consumes pilot whales which are contaminated by other pollutants such as PCBs (see review by Weihe and Joensen 2012), while the Seychelles population does not consume sea mammals and their PCB levels are very low (Davidson et al. 1998). Follow-up studies in the Faroe Islands (Debes et al. 2016) and studies in Canadian Inuit with concomitant PCB exposure (Jacobson et al. 2015) also reported associations that they attributed to MeHg exposure.

In 1989, Cox and colleagues reported that the relationship between prenatal MeHg exposure and children’s development was possibly nonlinear and that prenatal exposure to MeHg above 10 ppm (measured in maternal hair) may be associated with declines in neurodevelopment (Cox et al. 1989). Primary analyses of the SCDS Main Cohort data were based on conventional linear models. These models assume a linear relationship between prenatal MeHg levels and neurodevelopmental test scores. However, if the true exposure/outcome relationship is nonlinear, then forcing a straight line fit to the data may lead to a biased inference about the nature of the association. Nonlinear relationships have been reported with other exposures such as the inverted U–shaped association between maternal blood manganese levels at delivery and birth weight in full-term infants (Zota et al. 2009).

We have previously explored nonlinear analysis (Axtell et al. 2000; Huang et al. 2005) using semiparametric additive models (Hastie and Tibshirani 1990) as secondary analyses of the SCDS Main Cohort data at the 66-month (Davidson et al. 1998) and 9-year (Myers et al. 2003) evaluations. A nonlinear curve, which appeared to be nearly flat when the prenatal MeHg level was below approximately 12 ppm and linear above that level, led Huang et al. (2005) to suggest that the beneficial nutritional effects may predominate at low prenatal MeHg exposures whereas adverse MeHg associations may occur at higher prenatal exposure levels. Additionally Huang et al. (2007) carried out a regression tree analysis (Breiman et al. 1984) of the 9-year data, examining whether the prenatal MeHg-outcome relationship could be non-homogeneous between subgroups of the population as there might be vulnerable subgroups.

In this paper we reanalyze the SCDS Main Cohort data at age 17 years with two approaches, semiparametric additive models to explore overall nonlinear prenatal MeHg trends, and a combination of regression tree and semiparametric additive models to explore nonlinear and non-homogeneous prenatal MeHg associations in different subpopulations.

2 Materials and methods

2.1 Participants and MeHg exposure

A total of 705 children of the 779 Seychellois infant-mother pairs originally enrolled in 1989–1990 (Marsh et al. 1995), were still eligible for evaluation at age 17 years. Seventy-four participants were excluded for a priori specified reasons including lack of prenatal exposure data, medical conditions that might affect development, or withdrawal from the study (Davidson et al. 1995). Prenatal MeHg exposure was measured in maternal hair samples as previously described (Cernichiari et al. 1995). “Recent postnatal exposure was measured in a 1-cm length of each child’s hair closest to the scalp taken at the time of testing. A total of 600 participants ranging in age from 15.7 to 18.4 years participated in the 17 year test battery” (Davidson et al. 2011, p. 712) and 462 of them have at least one neurocognitive endpoint and a full set of covariates. We compared the participants with and without a full set of covariates, and they appear comparable (Davidson et al. 2011, p. 713 Table 1a). The sizes of data sets range from 451 to 456 for the seven endpoints considered in this re-analysis.

Table 1 Tree analysis of covariate effects for the WJ-II Calculation outcome

This research was reviewed and approved by the Institutional Review Boards of both the University of Rochester and the Republic of Seychelles in accordance with national and institutional guidelines for the protection of human subjects.

2.2 Neurocognitive testing procedures

Since a linear line may be thought of as a rough approximation of a nonlinear curve, we adopted the results from linear analysis (Davidson et al. 2011) to select endpoints for nonlinear analysis. We chose to reanalyze only those 7 endpoints for which the coefficient for prenatal MeHg exposure in the primary linear regression analysis had a two-tailed p value less than or equal to 0.2 (Davidson et al. 2011). The endpoints analyzed were as follows: the Woodcock–Johnson Test of Scholastic Achievement-II (WJ-II) Calculation and Applied Problems subtests, the California Verbal Learning Test (CVLT) Short Delay and Long Delay subtests, and three subtests from the Cambridge Neuropsychological Test Automated Battery (CANTAB): the Log Total Trials and the Log pre-Extra Dimensional Errors from the Intra-Extra Dimensional Shift Set (IED), and the Square Root Between Errors from Spatial Recognition Memory (SRM). For the CANTAB subtests, the Log and Square Root transformations were used to stabilize the variance in the linear analysis (Davidson et al. 2011) and we used the same transformed outcomes here. “Four subjects who tested as color blind were excluded from the CANTAB analyses since it requires color vision” (Davidson et al. 2011, p. 713). An increase in the score is associated with improved performance on the WJ-II and CVLT subtests, but for the three CANTAB subtests decreased scores mean improved performance. Participants in the 17 year test battery range in age from 15.7 to 18.4 years.

2.3 Covariates

The covariates included in this analysis were the same as those in the linear analysis (Davidson et al. 2011) and included the following: child sex, socioeconomic (SES) status (measured at 9 years using the Hollingshead Four-Factor Socioeconomic Status modified for use in the Seychelles), maternal intelligence (measured using the Kaufman Brief Intelligence Test Matrices at the 10.5 year subject evaluation; K-BIT), child’s age at testing, and recent postnatal MeHg exposure.

2.4 Statistical analysis

The analysis was carried out using the S-PLUS (Insightful 2007) and R (2015) software packages. Two statistical approaches were explored. First semi-parametric additive models (Hastie and Tibshirani 1990) were applied to the data to explore overall nonlinear prenatal MeHg trends. This approach assumed nonlinear and homogeneous MeHg-outcome relationships while retaining the linear structure for covariates and postnatal MeHg effects as in primary linear analysis (Davidson et al. 2011). The essential ideas for fitting nonlinear curves are locally weighted averaging of the observations that fall in a window and the window moves continuously on the data range. Additive models have often been applied to deal with nonlinearity between the dependent and predictor variables in data analysis, e.g. Rahman et al. (2017). For fitting a nonlinear prenatal MeHg curve estimate, we adopted penalized Fourier regression (Huang and Chan 2014) based on a combination of un-penalized lines, as in linear models, and penalized trigonometric series. For the trigonometric series, 6 cosine and sine functions of x = MeHg, cos(2kπx/mrange) and sin(2kπx/mrange) for k = 1,…,6 are taken, where mrange denotes the range of prenatal MeHg exposure. The penalty for the trigonometric functions is constrained as λk2, with λ chosen so that the approximate degree of freedom (df) is 3 for each curve, while linear regression uses only 1df. Huang and Chan (2014) have shown that the combined line-trigonometric approach has a mathematical equivalence to a mixed effects (ME) model representation and to local linear smoothing methods (Fan and Gijbels 1996). A plot showing the curve of the fitted smooth function describes the contribution of prenatal MeHg to the additive predictor for the developmental outcome.

The second approach considers nonlinear and non-homogeneous prenatal MeHg effects by a combination of regression tree (Breiman et al. 1984) and semi-parametric additive models. When a complex relationship is expected between the response variable and independent variables, regression tree methods can be used for capturing non-additive and non-homogeneous effects by partitioning data to homogeneous subsets or clusters, e.g. Kim (2010). We identified the patterns of covariate effects at different levels of WJ-II Calculation scores using a regression tree without regard to prenatal MeHg exposure. We selected the WJ-II Calculation subtest because Davidson et al. (2011) had previously identified its association with prenatal MeHg exposure (slope = 0.39, p = 0.02). The regression tree was formed by successively recursively partitioning the data into two groups. At each partitioning, a cut point for one of the covariates was chosen such that the two groups’ WJ-II Calculation scores were statistically the most different from each other. The repeated splitting of the data results in the growth of the tree and the final clusters (groups) are termed “terminal nodes”. It is clear that the more splitting done, the smaller the bias and the larger the variance of the estimated means at each terminal node. The final tree size is determined by cross-validation (Breiman et al. 1984) that was similar to assessing the bias and variance trade-off for common regression problems (pruning the tree). The average of the responses falling in a terminal node (cluster) of the final tree is used to estimate the cluster mean for the covariate effects.

After partitioning the data into clusters, we used semi-parametric additive models (Hastie and Tibshirani 1990) to estimate the nonlinear curve for prenatal MeHg associations for each cluster on the 7 outcomes, while adjusting for covariate associations based on the cluster means from the regression tree. Let Yij denote the jth observed dependent variable in cluster i, and Xij denote the corresponding prenatal MeHg exposure. The model under consideration is

$$\text{Model}\;1:\;\text{E}\left( {\text{Yij}} \right) =\upalpha_{\text{i}} + \text{s}_{{_{\text{i}} }} \left( {\text{Xij}} \right).$$

For each of the outcomes, the cluster means αi’s were estimated by the averages of outcome values falling in that cluster. Model 1 assumes that the cluster means αi’s adequately account for the covariate effects and allows separate (heterogeneous) nonlinear effects of prenatal MeHg exposure among clusters by the si functions. The model makes no assumptions about the functional forms of the relationships between the means of outcomes (E(Yij)) and prenatal MeHg, and is thus able to highlight trends that differ from linear lines. However, this model assumes that the effects of αi and si(x) are additive. Observe that if si(x) = βx, then Model 1 reduces to a linear relationship of the outcome on prenatal MeHg. Thus the nonlinear assumption includes the linear association as a special case. The smooth functions in Model 1 were fit by penalized Fourier regression as in the additive models. A plot shows the curves of the fitted smooth functions si’s, and the cluster means αi’s are seen as the values of curves when prenatal MeHg is set to 0. The points on the plot are data (Xij, Yij), i.e. the values of prenatal MeHg exposure and the dependent variable. If the curves appear to be approximately parallel, then it means that the prenatal MeHg trends are homogeneous between clusters. If two curves cross at some level C of prenatal MeHg exposure, then one group has a larger response for Xij < C and the other group has a larger response at larger MeHg levels Xij > C, and hence their associations with MeHg may be heterogeneous at higher levels.

3 Results

3.1 Covariate associations from regression tree

We report only the results of the regression tree on the covariate effects in the second approach, as the semi-parametric additive models assume linear covariate effects and the results were similar to those in Davidson et al. (2011). The tree analysis using the WJ-II Calculation subtest and covariates resulted in four clusters. The cluster rules, number (proportion) of observations in each cluster, and the mean (SD) of their pre- and postnatal MeHg exposure are presented in Table 1. The means for prenatal MeHg exposure appear similar between the clusters. In addition, an F-test for one-way analysis of variance (ANOVA) with p = 0.27 confirms that the differences between the prenatal MeHg means of the 4 clusters were not significant. An ANOVA F-test comparing the recent postnatal MeHg means between the 4 clusters was significant (p < 0.0001) with the highest average recent postnatal MeHg exposure in cluster 3 (17.2 ppm, SD 3.8 ppm). SES was selected as the first partition variable and the two groups had an average WJ-II Calculation score of 81.5 (range 11–121, n = 269) and 90.4 (range 54–152, n = 185). Maternal IQ measured by K-BIT and recent postnatal MeHg were next entered as partition criteria in the regression tree. These findings support the significance of SES and maternal IQ on the WJ-II Calculation scores as reported in Davidson et al. (2011). In the primary linear analysis the slopes for SES and maternal IQ were 0.25 and 0.15 respectively with significant p-values (<0.05) (Davidson et al. 2011). The slope for recent postnatal MeHg in that analysis was −0.04 but it was not significant (p = 0.82).

The average WJ-II Calculation score for observations falling in each cluster, estimates of αi’s in Model 1 for WJ-II Calculation, are also given in Table 1. Cluster 4 had the highest average WJ-II Calculation score of 90.4. As shown in Table 1, it contains 185 (40.75%) of the 451 observations with an SES ≥ 26.25. Cluster 2 has the second highest average WJ-II Calculation score of 85.0 and contains 178 (39.21%) children. Comparing the covariates between clusters 2 and 4, cluster 2 has a lower SES, but with a maternal IQ ≥ 67.5 and a recent postnatal MeHg exposure ≤13.1 ppm; their average performance on the WJ-II Calculation score differs by 5.4 points. Cluster 1, with 65 (14.32%) observations, has a lower SES than cluster 4, a maternal IQ < 67.5, and the average WJ-II Calculation score is lower by about 15 points. Cluster 3 has the smallest number of observations 26 (5.73%) and a low average WJ-II Calculation score of 73.5. Comparing clusters 2 and 3 suggests that a postnatal MeHg exposure >13.1 ppm may have a nonlinear adverse association with an average difference of 11.5 points in WJ-II Calculation scores. An ANOVA F-test comparing the WJ-II Calculation score means between the 4 clusters is significant (p < 0.0001). For the other six outcomes, the estimated ith-cluster mean αi in Model 1 is the average of outcome values in each cluster and is given in Table 1. ANOVA F-tests comparing the means of six outcomes between the 4 clusters are all significant with p values ranging from 0.0034 to <0.0001. The results of the ANOVA F-tests support the partition into clusters by the regression tree that resulted in different covariates characteristics between groups and significant differences in outcomes.

Covariate effects appear to be generally consistent with those reported in Davidson et al. (2011) and with previous knowledge. For example, for cluster 4 with SES ≥ 26.25, the average response is the largest for WJ-II and CVLT subtests and smallest for CANTAB subtests among the 4 clusters, implying participants with higher SES do better on these computerized neurocognitive tests.

3.2 Prenatal MeHg exposure

The results of the semi-parametric additive models which are not related to the regression tree, are shown in Fig. 1 for WJ-II Calculation scores, CVLT-Long Delay, and log-transformed CANTAB IED Total Trials to Completion scores. The results for the other 4 outcomes (not shown) are similar. These analyses explore the overall nonlinear prenatal MeHg trends. They assume homogeneous MeHg-outcome relationships and the linear structure for covariates and recent postnatal MeHg associations as in Davidson et al. (2011). In the linear analysis (Davidson et al. 2011), increasing prenatal MeHg exposure was associated with improved WJ-II Calculation scores (slope = 0.39, p = 0.02) and beneficial log-transformed CANTAB IED Total Trials to Completion scores (slope = −0.01, p = 0.02), while the p = 0.12 for CVLT-Long Delay was not significant (slope = 0.02). By visual examination, the curve for the WJ-II Calculation scores slowly increases at prenatal MeHg levels below approximately 15 ppm and is flat above 15 ppm, while linear trends are seen for CVLT-Long Delay and log-transformed CANTAB IED Total Trials for MeHg levels below approximately 8 and 12 ppm respectively. It is important to note sparsity of data at the higher MeHg levels and trends in the upper range have more variation making estimation more tentative. Overall, the plots in Fig. 1 show small oscillations from the linear trends that are consistent with the linear analysis (Davidson et al. 2011) at the lower MeHg levels, but effects in the upper range studied, above approximately 8–15 ppm of prenatal MeHg exposure are uncertain due to the increase in variability.

Fig. 1
figure 1

The curve of the fitted smooth function with 3 degrees of freedom for prenatal MeHg effects to the additive predictor for WJ-II Calculation subtest, CVLT—Long Delay subtest, and log CANTAB IED test total trials. This analysis is unrelated to regression trees and assumes homogeneous MeHg-outcome relationships and a linear structure for covariates and postnatal MeHg effects. The left panel shows the partial residual plots and the smoothed curves of the prenatal MeHg association. The same curves are shown in the right panel again without residuals on a magnified scale to clearly view the oscillating trends with the vertical marks along the bottom illustrating the distribution of prenatal MeHg levels. An increase in the score is associated with improved performance on the WJ-II Calculation and CVLT—Long Delay subtests, while for the log CANTAB IED test total trials decreased scores mean improved performance

Based on the second approach to explore nonlinear and non-homogeneous prenatal MeHg effects, the curves for the 4 WJ-II Calculation clusters are shown in Fig. 2. The curves for clusters 2 and 4 are slowly increasing, consistent with the linear analysis. For cluster 1, the curve slowly increases to an exposure of approximately 15 ppm. For cluster 3 with only 26 observations, the curve increases to about 7 ppm and then decreases slowly to 15 ppm. The curves for clusters 1 and 3 cross at around 8 ppm and cluster 1 has a higher WJ-II Calculation score after the crossing point. This suggests non-homogeneity of prenatal MeHg associations at higher exposure levels in Clusters 1 and 3, but the two clusters account for only 20% of the data. The curves for the WJ-II Applied Problems (not shown) are similar to those for the WJ-II Calculation except that the changing point from increasing to decreasing for cluster 3 is approximately 8 ppm and the curves for clusters 1 and 3 cross at around 10 ppm. The linear analysis of the association between the WJ-II Applied Problems and prenatal MeHg exposure did not reach significance (slope = 0.19, p = 0.20) (Davidson et al. 2011). The R2 values for WJ-II Calculation and Applied Problems increase to 0.16 and 0.15 respectively from those of the linear analysis (0.11 for each) (Davidson et al. 2011). This phenomenon is somewhat expected since Model 1 uses a total of 16 df, while that for the linear analysis is 7 df. F-tests for checking whether the changes in R2 values are significant cannot be performed since the linear model in Davidson et al. (2011) is not nested in model (1) (note that the covariate effects in model (1) are taken account by αi’s).

Fig. 2
figure 2

WJ-II Calculation subtest plots from the additive model with 3 degrees of freedom each for the smoothed terms of prenatal MeHg exposure, adjusted for cluster effects from the regression tree. Points show the observed values and lines show predicted values from the model, a dotted line (cluster 1), b short dash line (cluster 2), c long dash line (cluster 3), and d solid line (cluster 4). The line types are chosen to be in an increasing order (dotted, short dash, long dash, and solid) as the cluster index increases from 1 to 4. Panel e shows all 4 curves simultaneously and the line types correspond to those in ad. An increase in the WJ-II Calculation score is associated with improved performance

The trends for CVLT-Long Delay and Short Delay were similar and hence we discuss only the Long Delay (Fig. 3). The curves for clusters 1, 2 and 4 are slowly increasing at lower MeHg levels, and becoming nearly flat afterwards. The curve for cluster 3 changes from increasing to flat to decreasing at around 8 ppm, with an influential value (22.7 ppm) driving the curve slightly upward at the end. Here the curves of clusters 1 and 3 are close at levels of 4–6 ppm and then cluster 1 has a higher score than cluster 3 at MeHg levels >9 ppm, again suggesting non-homogeneity of prenatal MeHg associations at higher exposure levels in Clusters 1 and 3. The R2 values increase from 0.069 and 0.066 in the linear analysis to 0.084 and 0.11 in Model 1 for the CVLT-Long Delay and Short Delay respectively.

Fig. 3
figure 3

CVLT—Long Delay subtest plots from the additive model with 3 degrees of freedom each for the smoothed terms of prenatal MeHg exposure, adjusted for cluster effects from the regression tree. Points show the observed values and lines show predicted values from the model, a dotted line (cluster 1), b short dash line (cluster 2), c long dash line (cluster 3), and d solid line (cluster 4). The line types are chosen to be in an increasing order (dotted, short dash, long dash, and solid) as the cluster index increases from 1 to 4. Panel e shows all 4 curves simultaneously and the line types correspond to those in ad. An increase in the CVLT—Long Delay score is associated with improved performance

For the log-transformed CANTAB IED Total Trials to Completion scores, the curve for cluster 2 (Fig. 4) is slowly decreasing (a beneficial association); for cluster 4, the curve slowly decreases below approximately 10 ppm, then goes slightly upward and finally is nearly flat; for cluster 1, the curve is generally decreasing. The trend for cluster 3 is irregular, but there are only 26 observations in this cluster. The four curves move with different rates such that the curves for clusters 2 and 4 cross at around 11 ppm, and those for clusters 1 and 3 cross at around 10 ppm. Compared to the other clusters, cluster 4 starts with a lower number of IED total trials at the lowest prenatal MeHg exposure and after the crossing point, has a higher number of IED trials than cluster 2. This suggests non-homogeneity of prenatal MeHg associations at higher exposure levels. A similar pattern was present in the other two CANTAB endpoints (Log IED pre-Extra Dimensional Errors, and the Square Root Between Errors from SRM which are not shown). The R2 values increase to 0.064, 0.070, and 0.066 in Model 1 from 0.033, 0.033, and 0.051 respectively in the linear analysis for Log IED Total Trials, Log IED pre-Extra Dimensional Errors, and the Square Root Between Errors from SRM.

Fig. 4
figure 4

CANTAB Intradimensional–Extradimensional Discrimination (IED) Total Trials plots after logarithmic transformation from the additive model with 3 degrees of freedom each for the smoothed terms of prenatal MeHg exposure, adjusted for cluster effects from the regression tree. Points show the observed values and lines show predicted values from the model, a dotted line (cluster 1), b short dash line (cluster 2), c long dash line (cluster 3), and d solid line (cluster 4). The line types are chosen to be in an increasing order (dotted, short dash, long dash, and solid) as the cluster index increases from 1 to 4. Panel e shows all 4 curves simultaneously and the line types correspond to those in ad. A decrease in the log CANTAB IED Total Trials to Completion score is associated with improved performance

4 Discussion

In this analysis of 7 outcomes in the 17 year SCDS Main Cohort data, using nonlinear statistical methods that explain more variances than linear analysis (Davidson et al. 2011), we found no evidence to support an adverse association with children’s neurodevelopment at prenatal MeHg exposure levels below 10 ppm measured in maternal hair growing during pregnancy. However, evaluating subgroups of this population with non-linear analysis we found a suggestion that there may be adverse associations of prenatal MeHg exposure above 10 ppm and children’s neurodevelopment in some individuals. The 10 ppm threshold used here follows that of Cox et al. (1989) since it is beyond the scope of this analysis to determine a threshold level, though Fig. 1 shows a range of 8–15 ppm and earlier Huang et al. (2005) had conjectured 12 ppm.

Linear analysis has been the a priori primary analysis since the SCDS began, but we have periodically analyzed the data for nonlinear or non-homogeneous trends (Axtell et al. 2000; Huang et al. 2005, 2007). The improved scores for the WJ-II Calculation and IED total trials seen in the linear analysis (Davidson et al. 2011) as MeHg exposure increases were also present in the semi-parametric additive models with the improving scores for WJ-II Calculation leveling off at approximately 15 ppm, and in the regression tree approach for clusters 2 and 4 that represent a majority (80%) of the data for both endpoints. Since there is no reason to believe that MeHg exposure at any concentration should improve performance, it seems likely that MeHg measurements are a surrogate marker for beneficial nutrients from fish consumption. Although maternal nutritional status was not measured in this cohort, we have measured nutrients in other SCDS cohorts (Davidson et al. 2008; Lynch et al. 2011; Strain et al. 2008, 2012).

The regression tree analysis supported the significance of maternal IQ and SES that was identified in the linear analysis and suggested a possible adverse association with recent postnatal MeHg in cluster 3 above an exposure of approximately 13 ppm. We are cautious about interpreting the recent postnatal MeHg exposure nonlinear associations in a small group of only 26 observations because the SCDS was originally designed to address prenatal MeHg exposure, and our measure of postnatal exposure reflects only a 1 month exposure period. In addition, we are not aware of any a priori basis why this group (lower SES, higher maternal IQ, and higher recent MeHg exposure) should be particularly sensitive to prenatal MeHg compared with the children in the other three clusters.

When additive models were applied to the 4 identified subject clusters, the dose response curves suggested some nonlinear patterns. That finding was not apparent in the linear analysis. The dose response curves for clusters 1 and 3 crossed at higher exposure levels for the WJ-II Calculation and Applied Problems scores and the CVLT Short- and Long-Delay. This finding suggests that there may be a non-homogeneous prenatal MeHg association in the upper range of exposure for two of the clusters. This non-homogeneity is also apparent in the 3 CANTAB subtests where there are several intersections of the 4 curves. The intersections for most of the dose response curves occur at levels close to 10 ppm and there are fewer subjects above 10 ppm. Although the subgroups generated by this regression tree analysis did not provide clear evidence of adverse associations, studies of other groups might find such associations (Julvez et al. 2013).

Although the possibility of adverse associations at prenatal MeHg exposures above about 10 ppm is intriguing, we are cautious in making that interpretation because of the limited number of subjects in that exposure range. These interesting dose response curves for prenatal MeHg exposure present in different subpopulations do suggest a need for development of new statistical procedures to confirm the variability between such curves. Because of the descriptive nature of additive models, we consider the results presented here to be exploratory.

The strength of this analysis is that it examines nonlinear and nonhomogeneous trends in the data that might not be detected by linear analysis. However, there are also limitations. Some subgroups identified by the tree analysis may be so small that their associations may be difficult to interpret in a population study. In addition, the interpretations of the fitted nonlinear curves are not as simple as in linear analysis.

In conclusion, this non-linear additive model and regression tree analysis supports the linear analysis reported earlier (Davidson et al. 2011). We found no consistent evidence for associations of prenatal MeHg exposure from consuming fish with naturally occurring levels of MeHg and children’s neurodevelopment in the SCDS Main Cohort at 17 years of age. The current results do raise some interesting points though. The dose response curves appear to be nonlinear at higher exposures and some individuals or subpopulations with prenatal exposures above certain levels (possibly 10 ppm based on Cox et al. 1989) may have adverse associations with children’s neurodevelopment. However the effects in the upper range of prenatal MeHg exposure studied are uncertain due to the sparsity of data at these levels in SCDS. Further study is needed to continue examining nonlinear and non-homogeneous relationships of neurodevelopmental outcomes and prenatal MeHg exposure through fish consumption during pregnancy.