Introduction: Selection Processes and Risk in Perinatal Epidemiology

In perinatal epidemiology, we seek to establish the effects of exposures on outcomes among dynamic and complex populations: the population of people who may conceive, pregnant women, their fetuses, neonates, infants, and women in the postpartum and inter-conception periods (who may or may not become pregnant again). Some of these populations are not well-defined (e.g., people who may not desire children but are nonetheless at risk for pregnancy), some are difficult to enumerate (e.g., blastocysts and embryos in early gestation), and most of them present challenges to researchers.

These populations are biologically interrelated, and particularly so during pregnancy. Most populations in reproductive and perinatal epidemiology are characterized by key transitions that remove people or gestations from the “at risk” population, altering the pool among whom outcomes can be studied. Beginning before conception, there is an extended process of cohort attrition from embryonic development through birth and early childhood [1,2,3, 4•]. By the time a woman realizes she is pregnant, the most extensive cohort attrition has already occurred [4•, 5]. We cannot measure the instances of fertilization and can only measure implantations and subsequent losses with great difficulty [4•, 6, 7]. Of the former, it is estimated that only about one third makes it to live birth, be it preterm or term birth [4•, 5]. These processes of selection (e.g., implantations to clinically recognized pregnancies, to birth) and attrition (e.g., the loss of preterm live births and stillbirths that are absent from the population of term live births) determine the populations that we study in perinatal epidemiology (illustrated in Fig. 1 over the course of gestation). An important implication of these transitions is the recognition that the population of live births has been culled significantly by the time we study it (e.g., to examine infant outcomes). Other populations within which we seek to analyze causal effects are similarly affected by selection processes, e.g., the population of preterm live births, who represent a small and highly selected subset of the gestations that reach viability [8••].

Fig. 1
figure 1

Selection and attrition processes during the course of human reproduction and gestation

The selection occurring at each of these stages can result in biases of different types, depending on the research question. We employ causal diagrams to illustrate these biases [9, 10]. In studies seeking to establish the causal effect of prematurity on neonatal death [11], confounders include maternal and pregnancy characteristics (e.g., race/ethnicity, multi-fetal gestation) as well as pathologies like preeclampsia, chorioamnionitis, and those that we do not know (Supplemental Fig. S1, Panel A). However, if we restrict our population to preterm births and seek to establish the causal effect of a preterm birth precursor (e.g., preeclampsia) on neonatal death, then the same causal structure results in selection bias (Supplemental Fig. S1, Panel B) [8••, 12]. One cannot simply alter the sampling frame to circumvent this selection bias, as in the classical Berkson bias [13, 14], which can be avoided by recruiting study participants in a setting other than the hospital. For this question, no sample of preterm infants will be immune from this type of selection—except for the hypothetical case in which babies were randomly assigned to being delivered preterm. The forces that determine this sample composition are diverse and include a host of social, environmental, and biological processes that are largely out of the investigator’s control [11, 15, 16].

Thus, the selection processes that determine the populations we study in perinatal epidemiology are more fundamental—and more intractable—than in other fields of epidemiology. They are both widespread and largely unobservable. They affect the questions we ask (by defining the scope of causal effects we can hope to estimate), how we define our target population, the study designs we can use, the biases incurred in analyses among these populations, and the methods we may use to control for these biases. In this paper, we discuss these selection processes and their implications for perinatal epidemiology.

Methodological Considerations for Causal Inference in Perinatal Epidemiology: Recent Developments and Foundational Concepts

As epidemiologic methods have evolved in the last two decades to include newer approaches for estimating causal effects [9, 17,18,19,20,21,22,23], there has been a growing focus on applying tools and methods, such as causal diagrams and bias analysis, to address selection issues in perinatal epidemiology [8••, 24••, 25•, 26]. One well-known bias that these advanced tools are being applied to is the bias that affects gestation-stratified analyses of prenatal exposures and postnatal endpoints (resulting from conditioning on gestational-age, a mediator and a collider) [8••, 24••, 27]. The counter-intuitive findings that can result are well-documented (e.g., a seemingly protective effect of preeclampsia on cerebral palsy, infant death, and many other infant morbidities [28,29,30,31,32,33]), as is their non-causal basis [8••, 24••, 27].

One approach to addressing this particular selection issue is to analyze all pregnancies reaching a given gestation as the population at risk for pregnancy outcomes (variously described as the “ongoing pregnancies denominator” and the “fetuses-at-risk denominator”) [34, 35]. This approach prevents conditioning on gestational age after the beginning of the time at risk and has been applied to an increasing number of fetal, maternal, and infant outcomes [36, 37•, 38,39,40]. This approach is non-controversial for outcomes that occur before the onset of labor (e.g., induction of labor, antepartum stillbirth) [34, 4142, 43••, 4445], and is also intuitive: The population at risk for an antepartum stillbirth at or after a given gestation (say, 37 weeks) is not deliveries occurring at 37-weeks’ gestation; gestations that continue to 38 weeks, 39 weeks, and beyond were also at risk for this outcome at 37 weeks. Therefore, this approach uses the population of gestations reaching 37 weeks as the denominator for this outcome, regardless of whether birth occurred at 37 weeks or later. Extension of this formulation to neonatal and childhood conditions likely to have a prenatal origin has been proposed [38,39,40, 46]. However, the application of this approach to postnatal outcomes is controversial and has been shown to result in misleading estimates [43••, 47].

Although it prevents conditioning on gestational age after the beginning of time at risk, use of the ongoing pregnancies denominator for postnatal outcomes promptly runs into another, related problem: It results in the inclusion of denominator units that have not yet reached the beginning of the time at risk [43••, 44]. Studying the role of pre-conception and prenatal factors in relation to endpoints that can only be diagnosed among those having reached a specific milestone (such as live birth, or a given childhood age) represents a common challenge in perinatal epidemiology. In the example above of term antepartum stillbirth risk (i.e., at 37 weeks’ gestation or later), the infants who are born preterm (whether liveborn or stillborn) are not counted, nor are gestations that ended in early pregnancy loss. This concern about populations missing due to not reaching the beginning of the time at risk (i.e., left truncation [48, 49]) is also increasingly being discussed (and, when applied to neonatal outcomes occurring after live birth, has been termed “live birth bias”[25•, 26]).

Say that researchers are analyzing the effect of a preconception environmental exposure on the risk of a childhood outcome like autism spectrum disorders (ASD). Suppose that the exposure, like other prenatal and preconception causes of ASD (e.g., advanced parental age, ambient air pollution) [50,51,52], also increases risk of miscarriage and stillbirth [53,54,55]. The outcome of ASD, by definition, may only occur and be measured in conceptions that survive to viability and which result in a live birth that reaches a given age (approximately 3 years old, for ASD). Again, there is selection at several steps in the reproductive process (Supplemental Fig. S2). Given that both our environmental exposure of interest and other factors (referred to generally as “Exposure B” in Fig. S2, Panel A) affect the risk of miscarriage, of stillbirth, and of ASD, conditioning on survival beyond each of these steps opens unblocked backdoor paths between exposure and outcome, resulting in bias (Fig. S2, Panel B). One approach that has been proposed is to control for these common causes of the selection variables (e.g., fecundity, fetal loss) and the subsequent outcome (e.g., ASD), in an attempt to close biasing pathways opened up by selection [25•, 56]. However, the meaning of such an estimate, and of the counterfactual it represents, is unclear [47, 57••, 58].

A foundational epidemiologic concept is that of “population at risk” for a given outcome [59, 60••]. Only by rigorously defining, identifying, and sampling the population at risk of the outcome can we validly estimate the effect of an exposure on that outcome. Specifically, all members of the denominator must be able to become members of the numerator (i.e., be at risk of the outcome during follow-up). The following thought experiment shows what can happen when we fail to meet this basic criterion.

A Thought Experiment Demonstrating the Utility of Conditioning on Survival

Let us consider a large double-blind randomized controlled trial to test a treatment, to be initiated before conception, aimed at preventing autism spectrum disorders (ASD). Given that ASD are rare, but have a relatively high risk of recurrence [61, 62], researchers focus on women who have had a first child diagnosed with ASD and are planning a second child. In total, 4000 women are recruited, half of whom are randomized to treatment and half to placebo. Participants attempt conception for up to 12 cycles and collect weekly urine samples to detect implantation. Taking full advantage of the imaginary nature of this study, we assume full compliance and no dropouts. Even though this is an experimental pre-conception cohort with complete follow-up, providing an estimate based on all randomized women is not necessarily the best option. The treatment may affect any of the steps prior to a child surviving to age 3 (conception, early pregnancy loss, fetal survival, and childhood survival), when the outcome can be measured. The imaginary results of this trial are summarized in Table 1.

Table 1 Results of a hypothetical randomized controlled trial of a treatment preventing autism spectrum disorders (ASD) among women with a prior child with ASD

The last column shows the relative risk (RR) calculated based on the number of ASD cases divided by the denominator at each step. When all randomized women are considered, as customary in most trials, the treatment appears to reduce the probability of ASD by 36% (i.e., RR = 0.64). Yet, if a pharmaceutical company advertised this figure to demonstrate the effectiveness of their treatment, most would consider it misleading. Half of the drug’s apparent effect is due to attrition prior to the time when diagnosis of ASD is even possible: Women in the treatment arm have a 21% lower probability of having a child who survives to age 3, due mostly to decreases in conceptions. Given survival to age 3, the treatment reduces the probability of ASD by 18% (RR = 0.82). This is a much smaller protective effect than the intention-to-treat analysis implied. The overall RR of 0.64, despite being an “unbiased estimate” of the overall treatment effect, is neither very useful nor very transparent. Indeed, if the overall effect were the desired one, an even more impressive result could be achieved by giving women long-acting reversible contraception or sterilization.

It could be argued that the more appropriate estimate is not the relative risk of having a child with ASD but, rather, the relative risk of having a child without ASD, as that is the desired endpoint. This is expressed by a risk ratio of (710/2000)/(881/2000) = 0.805. From this perspective, the placebo appears to be superior, with a 20% higher probability of having a child without ASD. However, this estimate answers a different question and obscures the fact that, given that a child survives to age 3, the treatment does reduce ASD risk.

This (admittedly artificial) example highlights the difficulty of using a denominator that includes units not at risk, if the exposure differentially affects the probability of reaching the at-risk stage. If (continuing our hypothetical scenario) a brain lesion characteristic of ASD could be detected by ultrasound from week 20 of gestation, estimating the risk of this outcome among all pregnancies surviving to 20 weeks would be appropriate. However, given current clinical capabilities, measuring the risk of actual ASD only among survivors to age 3 is arguably more relevant from a clinical perspective.

Epidemiologists often condition on post-exposure events without agonizing over it. For example, when examining the risk of infertility in women exposed to maternal smoking in utero, not only do we condition on live birth but also on survival to sexual maturity, despite evidence that maternal smoking affects the probability of both these events [63, 64]. Yet, aside from the extreme difficulty of reconstructing a posteriori the original pregnancy cohort, it is an incontrovertible (if cynical) fact that infertility is not a concern for those who have died. Furthermore, using the entire cohort would attenuate the effect of prenatal exposure to maternal smoking by including in the study population units that are “prisoners” of the denominator as they cannot experience the outcome.

It is worth noting that, in a study such as the above, the study population is often restricted to pregnancy planners (e.g., [65,66,67]), which adds a further—and more controversial—layer of conditioning, if children of smokers were less likely to use contraception in a consistent manner (as has been reported for smokers [68]). Unlike the unborn conceptuses and the children who did not reach sexual maturity, those who are not included in a study because they had conceived by accident had a risk of infertility greater than 0, and their exclusion could lead to overestimating the effect of prenatal smoking on fertility.

While these examples effectively show the potential problems of using denominators that include units not at risk of the outcome, the fact that exposures can differentially affect competing events should, at a minimum, be discussed. This is particularly true when presenting comparisons over time or across populations, as the probability of surviving to given milestones may differ over time or between populations (e.g., survival of very preterm infants).

What Was the Question Again? Incorporating Stakeholder Perspectives into Formulation of Causal Questions

It bears repeating that formulating a good research question is fundamental for sound science. We argue that this principle be applied in perinatal epidemiology, despite the various factors that play into an investigator’s scientific and analytical choices (e.g., availability of a given dataset that may drive questions, responsiveness to funding agencies’ current priorities, the temptation to use “fashionable” methods). Without discounting the importance of incremental knowledge gains whose public health or clinical applications may only become apparent later, we nonetheless advocate for selecting questions whose results will drive further scientific discovery, policy, and practice (e.g., those that may be translated into public health interventions or policies, and which individual people care about [69, 70]). The preceding thought experiment demonstrates that the stage at which we define our causal question is critical for both estimation and interpretation of the effects. In the population of pre-pregnancy women, preventing conception itself is extremely effective in preventing a case of an adverse childhood outcome. However, non-outcomes are not equal in the eyes of stakeholders: A non-outcome owing to early pregnancy loss does not have the same meaning as a non-outcome owing to a conception that is carried to term, resulting in a child who is ASD-free.

The question “What is the effect of treatment among 3-year-olds?” imposes selection and does not express the full effects of the exposure on all reproductive processes leading up to the outcome. Furthermore, this selection is likely to result in bias because of many other competing risks throughout the reproductive process (Supplemental Fig. S3). There are various stages that a censoring-outcome confounder may introduce bias under selection (Fig. S3, Panel A); however, a censoring-outcome confounder must affect survival to only one such stage to introduce this selection bias (as with Confounder C in Fig. S3, Panel B). Although the full effect of treatment is not captured by this question (“What is the effect of treatment among 3-year-olds?”), and bias due to selection is likely, this is nevertheless a question that people care about. In this case, it is likely the question that stakeholders care most about, which should be prioritized even as we consider approaches to address selection bias inherent in answering such questions. In these instances, whenever possible, we should also provide estimates of the effect of the exposure on key selection stages (e.g., the probability of clinical pregnancy and live birth).

Choosing a question that is relevant to the people who will use those findings sometimes results in bias, but changing the question to be one that does not incur these biases may make it less relevant. In the thought experiment above, we find the causal question of most interest is, “Compared to placebo, what is the causal effect of the experimental drug on ASD incidence among children at risk for this outcome?” Although this specific outcome and definition of time at risk imposes selection onto the population among whom we estimate the causal effect, these selection factors are motivated by the scientific question. In other words, the censoring processes (illustrated in Figs. 1, S2, and S3) are not nuisance parameters whose consequences we wish to adjust away; rather, they are variables of causal interest that meaningfully impact our scientific question. Specifically, these censoring variables affect how our population at risk is constituted. We see utility in understanding how these censoring processes and competing events may affect our estimated association, but like others [57••], we disagree that the most logical solution is to attempt to adjust away their influence. Doing so is conceptually analogous to redefining the population of interest as all conceptions or all women enrolled in the study. As noted above, this change in focus includes study participants who are never at risk for the outcome and also changes the question to one that patients care less about. Finally, it is not a plausible solution to adjust for all common causes of the outcome (here, ASD) and conception, fetal loss, stillbirth, and infant death. This represents an extremely large number of variables, some of which will likely remain undefined.

Changing the study population changes the causal question, which in turn alters applications of the results—sometimes dramatically. Thinking through, and defining, one’s research question in detail is an often overlooked, yet essential, step of the research process. Considerations to address when defining the research question include, who is the target population? How does our sample population differ from this target population? What is the outcome? Who is at risk for it, and over what time? What is the exposure, and how is it temporally related to the outcome? Given exposure and outcome, what are the confounders? How well can we measure each of these variables? What are the likely biases, and how can we adjust for or minimize these in the analysis phase?

In addition to these considerations, we also advocate for addressing the following point in helping guide question formulation: What could be done with these results if our study finds evidence of an association (or if it does not)? All these considerations matter, and the weight given to each depends on the investigator and the specific project. When studying the effects of prenatal or preconception exposures on childhood endpoints, focusing too much attention on one problem (e.g., how do we address bias owing to selection processes in human reproduction?) risks losing sight of other concerns. We argue that fundamental (and commonsense) concerns of outcome definition and analysis (e.g., ensuring that all study participants are at risk for the outcome) should take precedence, as should choosing questions that stakeholders care about—not just reducing bias.

Conclusion

Selection and resulting biases are omnipresent in human reproductive and perinatal processes. In fact, these processes of transition, attrition, and survival are arguably at the core of human reproduction [4•]. They complicate our task to estimate unbiased exposure-outcome associations, but it may not be in our best interest to focus solely (or even mostly) on minimizing such biases. To illustrate why, we present one last thought experiment, the most extreme so far.

Let us consider the ideal study design to study some adult outcome and contrast this with the optimal study design to study birth outcomes. Say we are interested in whether a given dietary pattern affects risk of developing incident hypertension among adults in their 40s and 50s. We can imagine enrolling a large sample of normotensive adults in their 20s, in a perfect universe with easy recruitment methods, excellent participation rates, and high retention throughout our desired study period. We can imagine a world where participants would gladly comply with whichever dietary regimen they are randomly assigned, from a large and diverse list of possibilities. We could follow these people for years or decades, and track their blood pressure trajectories, hypertension incidence, and a host of other risk factors and outcomes for good measure.

Now, picture conducting an analogous study to estimate the effects of various dietary regimens before and during pregnancy on a childhood outcome (say obesity, or ASD again). Even with the low administrative burdens, easy enrollment and retention, and sample with high compliance, this task is considerably more difficult by comparison. In our hypertension study, we had resources to track the few losses to follow up. The rate of mortality was not very high among our age group, so few participants were censored due to death. In contrast, in our preconception cohort, no matter how well-funded or beautifully-designed our study is, we can count on at least 25% of the original conception population at risk for childhood outcomes being lost, many of them before we can even enumerate them or know of their existence. Worse still, the losses may be differential due to the exposure. It is difficult to draw a parallel to the adult blood pressure study, but perhaps, there is a cataclysmic disaster or an alien invasion that removes a quarter of the population from the adult study, without our being able to track them (or even, precisely how many were taken). In the adult study, we at least were aware of the existence of all study participants, but not so in our preconception cohort. In preconception cohorts, all women are enumerated and known, but the internal and hidden nature of conception and early embryonic development means that a large share of our potential at-risk population cannot even be detected. Are women not able to get pregnant? Losing conceptuses early? Losing embryos later but still before the pregnancy is recognized? Again, it is difficult to imagine how we could explain this challenge to our colleagues in adult cardiovascular epidemiology—perhaps there is an impenetrable, opaque force-field that prevents some unknown proportion of the participants in their study from being enumerated, or even having their existence known to the study team. It is even harder to devise an analogy for infertility, yet another selection factor that can be affected by our exposure of interest (like early pregnancy loss)—particularly since some infertile couples will never know their status, if they never try to conceive.

There may be more apt analogies to describe the challenges facing perinatal epidemiologists in assessing causal effects amidst the dynamic, hidden, and interconnected populations of pregnancy, birth, and childhood, but we believe that no realistic ones can fully capture the dynamics of the many selection processes we have focused on here. What we propose is that, rather than trying to combat the alien invaders to recapture lost study participants, or to see through opaque and impenetrable force-fields, we acknowledge the challenges facing us and use our epidemiologic tools to understand them as best as we can. Thus, adjusting away selection may not always be possible in perinatal epidemiology, but we should remain vigilant in seeking to understand these processes and how they affect our results. We should also be mindful to formulate analytical approaches that conform to the foundational tenets of epidemiology (e.g., populations at risk), and questions that matter to stakeholders. When this process takes us down the path of confronting bias owing to selection processes and competing risks, we should do our best to understand and address these biases, while still letting our question drive the analytical approach.