1 Introduction

1.1 Overview

Causal analysis is concerned with mitigating the effect of confounding variables on inferences regarding the exposure–disease relationship. With binary exposures, propensity scores capturing the association between confounders and the exposure of interest can be used for matching, stratification, or regression adjustment [1]. Inverse probability of exposure weighting offers another framework for the estimation of causal effects. For continuous exposure variables, generalized propensity scores have been developed [2, 3].

The complex etiology of many chronic conditions motivates joint consideration of multiple exposure variables and their possible interacting effects [4]. Examples include the study of smoking and alcohol consumption on pancreatic cancer [5], the interacting effect of smoking and drinking on dietary intake of antioxidants on oral, pharyngeal, or laryngeal squamous cell carcinoma [6], and environmental studies of several different air pollutants and their effects on lung cancer risk [7]. When the joint effects of multiple exposures are under consideration, multivariate propensity scores can play a useful role. Vegetabile et al [8] describe a joint model for multivariate binary exposures through the specification of a model for a latent multidimensional Gaussian variable. Williams and Crespi [9] develop a generalized propensity score and use it to construct inverse multivariate density-weighted estimating functions. Chen and Zhou [10] propose the use of entropy balancing for multiple continuous treatments as a framework for causal analysis; this approach is best suited for settings in which the dimensions of the exposure and confounding variables are modest.

In settings with a single toxin, interest may lie in the causal effect of different exposure patterns defined in terms of the frequency of the exposure, the dose of the exposure on exposure occasions, and their interaction. Di Credico et al [11], for example, study the effect of the duration and intensity of smoking on the risk of head and neck cancers. This is the setting of our motivating study assessing the effect of maternal drinking patterns on childhood cognition. Here, even though previous studies show that high levels of prenatal alcohol exposure (PAE) can cause serious developmental problems in children [12,13,14,15], the impact of low levels of exposure is not well understood. Still less is known about the impact of “dose rate" which refers to the possibility that infants exposed to the same total volume of alcohol over the course of a pregnancy may have very different cognitive deficits depending on whether the mother drinks small amounts relatively frequently, or whether she drinks large amounts but less frequently (chronic versus binge drinking). This motivates the development and study of the methods to better understand possible complex causal effects of PAE on cognition outcomes.

When considering confounding in the causal analysis of multiple exposures, the effect of confounders is usually modeled on each marginal exposure variable with dependencies accommodated between the exposures through joint models - conditional associations are typically assumed to be functionally independent of any confounders. We consider a more general framework in which confounders may affect second moments (i.e., the relation between different features of exposures to a single toxin). Specifically, we develop methods to study the joint effect of the frequency of drinking and the dose of alcohol per drinking occasion in expectant mothers on the cognitive development of their child [16]; we then apply the methods we develop to the motivating study.

The remainder of the paper is as follows. In Sect. 2, we define a response model involving the main effects of two binary exposures and their interaction, along with a generalized propensity score. We formulate a general model for the exposures in which confounders affect the first and second-order moments so that they explain how the two aspects of exposure vary together. We then describe how this can be used for regression adjustment and inverse weighting. The importance of modeling the effect of confounders on the association between exposures is investigated in Sect. 3 by evaluating the bias of causal estimands from regression adjustment and inverse probability of exposure weighting when naive propensity scores only deal with the marginal effects of confounders. In Sect. 4, we consider the case of continuous exposures including semi-continuous exposures and carry out an analogous investigation for analyses based on regression adjustment and inverse density weighting for estimating causal effects. A two-stage approach for estimating the causal effects of semi-continuous exposures is proposed in Sect. 5. An application to the Detroit Cohort Study [12] is given in Sect. 6 and concluding remarks and topics for future research are given in Sect. 7. We focus on the case of a bivariate exposure but as suggested by the notation in Sect. 2.1, the methods can be naturally generalized to accommodate higher dimensional exposures.

2 Causal Analysis with Multiple Exposures

2.1 Specification of the Full Model

Let Y denote a continuous response variable of interest, \({\varvec{A}}=(A_{1}, \dots , A_{p})^{\prime}\) a \(p \times 1\) vector of exposure variables with \(p \ge 2\), and \({\varvec{B}}=(A_{1}A_{2}, \ldots , A_{p-1}A_{p})^{\prime}\) a \(q\times 1\) vector of all pairwise products where \(q=p(p-1)/2\). Higher-order interaction terms can be considered, but here we focus on two-way interactions and so let \({\varvec{X}} = ({\varvec{A}}', {\varvec{B}}')'\) contain the exposure data of interest. Furthermore, let \({\varvec{Z}}=(Z_1, \ldots , Z_r)^{\prime}\) be an \(r \times 1\) vector of confounding variables obscuring the effects of \({{\varvec{X}}}\) on the response Y. The data-generating model for the response is taken to have the form

$$\begin{aligned} Y = {{\varvec{X}}}'\varvec{\theta } +{\varvec{Z}}'\varvec{\zeta } + \epsilon \;, \end{aligned}$$
(2.1)

where \(\varvec{\theta } = ( \varvec{\alpha }', \varvec{\beta }')'\) with \(\varvec{\alpha }=(\alpha _0, \alpha _1,\dots ,\alpha _p)'\) and \(\varvec{\beta }=(\beta _1,\ldots ,\beta _q)'\), \(\varvec{\zeta }=(\zeta _1,\ldots ,\zeta _r)'\), and \(\epsilon\) is a mean zero error term with \({\varvec{X}}, {\varvec{Z}} \perp \epsilon\). Let \({\varvec{A}}_{(-j)} = (A_1,\ldots ,A_{j-1}, A_{j+1},\ldots , A_p)'\) be the vector of all exposure variables excluding \(A_j\), let \({\varvec{B}}_{(-j)}\) be the vector of all products of \(A_j\) with the \(p-1\) other exposures, and let \(\varvec{\beta }_{(-j)}\) be the corresponding vector of coefficients. The causal effect of a one unit change in \(A_j\) from \(a_{j\circ }\) to \(a_{j\circ } +1\) given \({\varvec{A}}_{(-j)}= {\varvec{a}}_{(-j)}\) and \({\varvec{Z}}\) is \(\alpha _j + {\varvec{a}}'_{(-j)} \varvec{\beta }_{(-j)}\). This model therefore enables one to study the causal interacting effects of the two exposures.

When \(\dim \{{\varvec{Z}}\}=r\) is large or one is concerned about getting the functional form of the effects of confounders correct in the response model, regression adjustment on the propensity score mitigates the effects of confounders. Let \(f_{a \mid z}({\varvec{a}} \mid {\varvec{z}})\) be the conditional joint probability distribution or mass function for multivariate continuous or discrete exposures given confounders. Imai and Van Dyk [17] defined the generalized propensity score (GPS) as \(G({\varvec{A}}, {\varvec{Z}})=f_{a \mid z}({\varvec{A}} \mid {\varvec{Z}})\) so that \({\varvec{Z}} \perp I({\varvec{A}}={\varvec{a}}) \mid G({\varvec{A}}, {\varvec{Z}})=g\). Additionally, they stated that if there is a finite dimensional quantity \({\varvec{S}}({\varvec{Z}})\) such that \(G({\varvec{A}}, {\varvec{Z}})\) depends on \({\varvec{Z}}\) only through \({\varvec{S}}({\varvec{Z}})\), matching or stratification on \({\varvec{S}}({\varvec{Z}})={\varvec{s}}\) or \(G({\varvec{A}}, {\varvec{Z}})\) is equivalent. For a single Bernoulli exposure variable, the generalized propensity score is the conventional propensity score, namely the conditional probability of being exposed given confounders (i.e., \(P(A_1=1 \mid {\varvec{Z}})\)), whereas for a scalar continuous exposure it is the conditional density of the exposure \(f_{a_1 \mid z}(A_1 \mid {\varvec{Z}})\); based on Assumption 3 of [17], it will often be sufficient to condition on a linear function of confounders such as \({\mathbb {E}}(A_1 \mid {\varvec{Z}})\). For multivariate exposures, the generalized propensity score is the multivariate joint distribution of exposures given confounders, \(G({\varvec{A}}, {\varvec{Z}})=f_{a \mid z}({\varvec{A}} \mid {\varvec{Z}})\). When the dependence of \({\varvec{A}}\) on \({\varvec{Z}}\) in \(G({\varvec{A}},{\varvec{Z}})\) can be expressed through a low dimensional function \({\varvec{S}}({\varvec{Z}})\), this generalized propensity score \({\varvec{S}}({\varvec{Z}})\) can be considered the effective propensity score; we consider this in what follows.

2.2 Propensity Scores for Two Binary Exposures

2.2.1 Regression Adjustment

Here, we aim to determine the minimal requirements for a potential vector-valued propensity score \({{\varvec{S}}}({\varvec{Z}})\) to ensure valid causal inference regarding the exposure effects conditional on \({{\varvec{S}}}({\varvec{Z}})\). We assume that data are available from independent individual processes and consider properties of a single process in much of what follows. From (2.1), note

$$\begin{aligned} \mathbb {E}_{Y\mid {\varvec{X}}, {{\varvec{S}}}}(Y) = \mathbb {E}_{{\varvec{Z}}\mid {\varvec{X}},{{\varvec{S}}}}\{ \mathbb {E}_{Y\mid {\varvec{X}},{{\varvec{S}}},{\varvec{Z}}} (Y) \} = {{\varvec{X}}}'\varvec{\theta } + \mathbb {E}_{{\varvec{Z}}\mid {\varvec{X}},{{\varvec{S}}}}({\varvec{Z}}')\varvec{\zeta }\;. \end{aligned}$$
(2.2)

Using \(P(\cdot )\) to informally denote a joint probability mass function or density, we write

$$\begin{aligned} P({\varvec{Z}}\mid {\varvec{X}},{{\varvec{S}}}) = \displaystyle \frac{P({\varvec{X}}, {\varvec{Z}}, {{\varvec{S}}})}{\int _{{\varvec{Z}}} P({\varvec{X}}, {\varvec{Z}}, {{\varvec{S}}})d{\varvec{z}}} = \displaystyle \frac{P({\varvec{X}}\mid {\varvec{Z}}, {{\varvec{S}}})P({\varvec{Z}},{{\varvec{S}}})}{\int _{{\varvec{Z}}} P({\varvec{X}}\mid {\varvec{Z}}, {{\varvec{S}}}) P({\varvec{Z}}, {{\varvec{S}}})d{\varvec{z}}} \;, \end{aligned}$$
(2.3)

so if \({{\varvec{S}}}({\varvec{Z}})\) contains enough information to construct the conditional distribution for the exposure variables given \({\varvec{Z}}\) (i.e., if \({\varvec{X}} \perp {\varvec{Z}} \mid {\varvec{S}}\)), then following Assumption 3 of [17], \({\varvec{S}}\) serves as a valid propensity score. In this case, the final term in (2.2) is independent of \({\varvec{X}}\) and a working regression model of the form

$$\begin{aligned} {\mathbb {E}}(Y \mid {\varvec{X}}, {\varvec{S}} ) = {\varvec{X}}'\varvec{\lambda } + {{\varvec{S}}}' \varvec{\eta } \end{aligned}$$
(2.4)

will give an estimator \(\hat{\varvec{\lambda }}\) which is consistent for the causal parameter \(\varvec{\theta }\).

Here, we give an illustrative example involving a bivariate binary exposure \({\varvec{A}}=(A_1, A_2)'\) where \(A_k\) indicates exposure status with respect to exposure k, \(k=1, 2\). Let \({\varvec{Z}}\) be a set of all potential confounders. Suppose the status for exposure k can be characterized by the marginal model \(P(A_k\mid {\varvec{Z}}) = P(A_k\mid {\varvec{Z}}_k)\), where \({\varvec{Z}}_k\) is a subvector of \({\varvec{Z}}\), \(k=1, 2\). We may assume, for example, that \(\text {logit}P(A_k=1\mid {\varvec{Z}}_k)={{\varvec{Z}}}'_k\varvec{\gamma }_k\). While such marginal models are typically used when considering scalar exposures, they may not be sufficient to fully characterize the joint distribution of exposures if some confounding variables exert a second-order effect. Let \({\varvec{Z}}_3\) be another set of confounders characterizing the dependence between \(A_1\) and \(A_2\) given \({\varvec{Z}}_1\) and \({\varvec{Z}}_2\) through a dependence regression model

$$\begin{aligned} \log \text {OR}(A_1, A_2 \mid {\varvec{Z}}) ={{\varvec{Z}}}'_3\varvec{\gamma }_{3}\; , \end{aligned}$$
(2.5)

where

$$\begin{aligned} \text {OR}(A_1, A_2 \mid {\varvec{Z}})= \frac{P(A_{1} = 1, A_{2} = 1 \mid {\varvec{Z}}) / P(A_{1} = 0, A_{2} = 1 \mid {\varvec{Z}})}{P(A_{1} = 1, A_{2} = 0 \mid {\varvec{Z}}) / P(A_{1} = 0, A_{2} = 0 \mid {\varvec{Z}})}. \end{aligned}$$

Based on the marginal models for \(A_k \mid {\varvec{Z}}_k\), \(k=1, 2\), and (2.5),

$$\begin{aligned} P(A_{1}=1, A_{2}= 1 \mid {\varvec{Z}}) = \left\{ \begin{array}{lcl} \frac{a-\left[ a^2-4 \text {OR}\left( \text {OR}-1\right) P(A_1=1\mid {\varvec{Z}}_1) P(A_2=1\mid {\varvec{Z}}_2)\right] ^{1 / 2}}{2\left( \text {OR}-1\right) }\;, &{}&{} \text {OR} \ne 1 \\ P(A_1=1\mid {\varvec{Z}}_1)P(A_2=1\mid {\varvec{Z}}_2)\;, &{}&{} \text {OR} = 1\\ \end{array} \right. \end{aligned}$$

where \(a=1-(1-\text {OR})\sum _{k=1}^2P(A_k=k\mid {\varvec{Z}}_k)\) [18] and we let \(\text {OR}\) denote \(\text {OR}(A_1, A_2 \mid {\varvec{Z}})\). The generalized propensity score is obtained by noting that \(P(A_{1} = 1, A_{2} = 0 \mid {\varvec{Z}})= P(A_{1} = 1 \mid {\varvec{Z}}_1)-P({\varvec{A}} = {{\textbf {1}}} \mid {\varvec{Z}})\) and \(P(A_{1} = 0, A_{2} = 1 \mid {\varvec{Z}})= P(A_{2} = 1 \mid {\varvec{Z}}_2)-P({\varvec{A}} = {{\textbf {1}}} \mid {\varvec{Z}})\), from which \(P({\varvec{A}} = {{\textbf {0}}} \mid {\varvec{Z}})\) can be obtained. A sufficient propensity score would be \({\varvec{S}}({\varvec{Z}}) = (S_1({\varvec{Z}}_1), S_2({\varvec{Z}}_2), S_3({\varvec{Z}}_3))'\) where \(S_k({\varvec{Z}}_k) = P(A_k=1\mid {\varvec{Z}}_k)\), \(k=1, 2\) , and \(S_3({\varvec{Z}}) = P({\varvec{A}} = {{\textbf {1}}}\mid {\varvec{Z}})\); alternatively, one could define \(S_k({\varvec{Z}}_k)={{\varvec{Z}}}'_k \varvec{\gamma }_k\), \(k=1, 2, 3\).

Here, \({\varvec{Z}}_1\) and \({\varvec{Z}}_2\) are sets of confounders influencing the marginal causal effects of \(A_1\) and \(A_2\) on Y, and conditioning on \((S_1,S_2)\) is sufficient if \(A_1\) does not interact with \(A_2\) (i.e., if \(\beta _1=0\)). If \(\beta _1 \ne 0\), then it is necessary to model the association between \(A_1\) and \(A_2\) given \(({\varvec{Z}}_1', {\varvec{Z}}_2')'\) as a function of possibly different sets of confounders \({\varvec{Z}}_3\) as suggested in (2.5). We investigate the implications of not doing this adequately in Section 3.

2.2.2 Inverse Probability Weighting

Here, we consider the use of a generalized propensity score as weight in inverse probability weighting analysis for two binary exposures. Let \(P({\varvec{A}} \mid {\varvec{Z}}; \varvec{\gamma })\) denote the conditional joint probability of \({\varvec{A}}=(A_1, A_2)'\) given confounders under the positivity condition [1] where \(P({\varvec{A}}={\varvec{a}} \mid {\varvec{Z}}; \varvec{\gamma })>0\) for all possible values of \({\varvec{Z}}\). The inverse “joint probability”-weighted estimating function [19] is expressed as

$$\begin{aligned} U_1(\varvec{\varphi };\varvec{\gamma })=\frac{I\left( {\varvec{A}}={\varvec{a}}\right) }{P\left( {\varvec{A}}={\varvec{a}} \mid {\varvec{Z}};\varvec{\gamma } \right) }D({\varvec{X}})\left[ Y-{\mathbb {E}}^R(Y \mid {\varvec{X}};\varvec{\varphi })\right] , \end{aligned}$$
(2.6)

where \({\mathbb {E}}^R(\cdot )\) represents an expectation in the setting where \({\varvec{X}} \perp {\varvec{Z}}\) and \({\mathbb {E}}^R(Y \mid {\varvec{X}};\varvec{\varphi })={{\varvec{X}}}'\varvec{\varphi }\) defines a marginal structural model. Here,

$$\begin{aligned} D({\varvec{X}}) = \frac{\partial }{\partial \varvec{\varphi }} {\mathbb {E}}^R(Y \mid {\varvec{X}};\varvec{\varphi })= (1,A_1,A_2,A_1A_2)'. \end{aligned}$$

To show that (2.6) is an unbiased estimating function, we take the expectation of \(U_1(\varvec{\varphi };\varvec{\gamma })\) with respect to \(Y \mid {\varvec{X}}, {\varvec{Z}}\) to obtain

$$\begin{aligned} \frac{I\left( {\varvec{A}}={\varvec{a}}\right) }{P\left( {\varvec{A}}={\varvec{a}} \mid {\varvec{Z}};\varvec{\gamma }\right) }D({\varvec{X}})\left[ {\mathbb {E}}(Y \mid {\varvec{X}}, {\varvec{Z}};\varPsi )-{\mathbb {E}}^R(Y \mid {\varvec{X}};\varvec{\varphi })\right] , \end{aligned}$$
(2.7)

where \(\varPsi =(\varvec{\varvec{\theta }}', \varvec{\zeta }')'\). The expectation of (2.7) with respect to \({\varvec{Z}} \mid {\varvec{A}}={\varvec{a}}\) gives

$$\begin{aligned} \sum _{{\varvec{Z}}}\frac{I\left( {\varvec{A}}={\varvec{a}}\right) }{P\left( {\varvec{A}}={\varvec{a}} \mid {\varvec{Z}};\varvec{\gamma } \right) }{{\textrm{D}}({\varvec{X}})\left[ {\mathbb {E}}(Y \mid {\varvec{X}}, {\varvec{Z}};\varPsi )-{\mathbb {E}}^R(Y \mid {\varvec{X}};\varvec{\varphi })\right] P({\varvec{Z}} \mid {\varvec{A}}={\varvec{a}}) .} \end{aligned}$$
(2.8)

Since

$$\begin{aligned} P({\varvec{Z}} \mid {\varvec{A}}={\varvec{a}}) = \frac{P({\varvec{A}}={\varvec{a}} \mid {\varvec{Z}};\varvec{\gamma })P({\varvec{Z}};\varvec{\tau })}{P({\varvec{A}}={\varvec{a}})}, \end{aligned}$$

we can rewrite (2.8) as

$$\begin{aligned} \sum _{{\varvec{Z}}}\frac{I\left( {\varvec{A}}={\varvec{a}}\right) }{P\left( {\varvec{A}}={\varvec{a}} \right) }D({\varvec{X}})\left[ {\mathbb {E}}(Y \mid {\varvec{X}}, {\varvec{Z}};\varPsi )-{\mathbb {E}}^R(Y \mid {\varvec{X}};\varvec{\varphi })\right] P({\varvec{Z}};\varvec{\tau })\;, \end{aligned}$$

which effectively results in the expectation being taken in the setting where \({\varvec{Z}} \perp {\varvec{A}}\), equivalent to how this would be done in the hypothetical randomized trial [19]. Since (2.6) has expectation zero, solving \(U_1(\varvec{\varphi };\varvec{\gamma })=0\) defines an estimator that is consistent for the causal estimand of interest. In practice, \(\varvec{\gamma }\) must be estimated but provided a \(\sqrt{n}-\)consistent estimator is used in its place, consistency of \(\hat{\varvec{\varphi }}\) is still achieved.

3 Exposure Dependence Model Misspecification

3.1 Regression Adjustment

3.1.1 General Framework

Here, we derive the limiting bias of the estimators of exposure effects obtained through regression adjustment when the dependence model is misspecified through a) an inappropriate conditional independence assumption, or b) specification of an exchangeable dependence structure functionally independent of confounders; this is defined more formally in what follows.

Consider the following regression model with a generalized propensity score adjustment,

$$\begin{aligned} {\mathbb {E}}(Y \mid {\varvec{X}}, \tilde{{\varvec{S}}} )={\varvec{X}}'\varvec{\varphi }+ \tilde{{\varvec{S}}}'\varvec{\psi } , \end{aligned}$$
(3.1)

where \(\tilde{{\varvec{S}}}=\tilde{{\varvec{S}}}({\varvec{Z}})\) is a vector-valued propensity score which might be insufficient for adjustment under model misspecification, e.g., assuming conditional independence among exposures or using an inadequate dependence model. The contribution to the least squares equation for \(\varvec{\varphi }\) from a single individual based on (3.1) is \({\textbf{U}}(Y\mid {\varvec{X}}, \tilde{{\varvec{S}}};{\varvec{\varphi }}, \varvec{\psi }) = ({\varvec{X}}', \tilde{{\varvec{S}}}')'(Y- {{\varvec{X}}}'{\varvec{\varphi }}-\tilde{{\varvec{S}}}'\varvec{\psi })\). With a sample of n independent individuals yielding data \(\{(Y_i, {\varvec{X}}_i, {\varvec{Z}}_i), i=1,\ldots , n\}\), the least squares estimator of the exposure effect is the solution to \(\sum _{i=1}^n{\textbf{U}}(Y_i \mid {\varvec{X}}_i, \tilde{{\varvec{S}}}_i;{\varvec{\varphi }}, \varvec{\psi })=0\) and is denoted by \(\hat{\varvec{\varphi }}\); this is consistent for the solution to \({\mathbb {E}}\{{\textbf{U}}(Y\mid {\varvec{X}}, \tilde{{\varvec{S}}}; {\varvec{\varphi }}, \varvec{\psi })\}=0\) where the expectation is taken with respect to the full data generation process [20, 21].

Again we consider the bivariate case with \({{\varvec{X}}}=(1, A_1, A_2, A_1A_2)'\) and the corresponding coefficient \(\varvec{\varphi }=(\varphi _0, \varphi _1,\varphi _2,\varphi _3)'\). The explicit form of the limiting value of the estimator \(\tilde{\varvec{\varphi }}\) for \(\varvec{\theta }\) in (2.1) is

$$\begin{aligned} {\tilde{\varphi }}_1&=\alpha _1+\varvec{\zeta }'\varDelta _{11}+\varvec{\psi }'\varDelta _{21}\;,\end{aligned}$$
(3.2)
$$\begin{aligned} {\tilde{\varphi }}_2&=\alpha _2+\varvec{\zeta }'\varDelta _{12}+\varvec{\psi }'\varDelta _{22}\;, \end{aligned}$$
(3.3)
$$\begin{aligned} {\tilde{\varphi }}_3&=\beta _1+\varvec{\zeta }'\Big (\frac{\Sigma _3}{\sigma _3^2}-\varDelta _{11}\frac{\rho _{13}\sigma _1}{\sigma _3}-\varDelta _{12}\frac{\rho _{23}\sigma _2}{\sigma _3}\Big ) \nonumber \\&\quad -\varvec{\psi }'\Big (\frac{\Lambda _3}{\sigma _3^2}+\varDelta _{21}\frac{\rho _{13}\sigma _1}{\sigma _3}+\varDelta _{22}\frac{\rho _{23}\sigma _2}{\sigma _3}\Big )\;, \end{aligned}$$
(3.4)

where \(\varvec{\zeta }\) and \(\varvec{\psi }\) are given in (2.1) and (3.1),

$$\begin{aligned} \varDelta _{1k}&=\frac{(\rho _{12}-\rho _{13}\rho _{23})(\frac{\sigma _k^2 \Sigma _{3-k}}{\sigma _1\sigma _2}-\frac{\sigma _k }{\sigma _3}\frac{\rho _{13}\rho _{23}}{\rho _{k3}}\Sigma _3)+(\frac{\rho _{13}^2\rho _{23}^2}{\rho _{k3}^2}-1)(\Sigma _k-\frac{\sigma _k }{\sigma _3}\rho _{k3}\Sigma _3)}{\sigma _k^2\{(\rho _{12}-\rho _{13}\rho _{23})^2-(1-\rho _{13}^2)(1-\rho _{23}^2)\}}\;,\\ \varDelta _{2k}&=\frac{(\rho _{12}-\rho _{13}\rho _{23})(\frac{\sigma _k^2 \Lambda _{3-k}}{\sigma _1\sigma _2}-\frac{\sigma _k }{\sigma _3}\frac{\rho _{13}\rho _{23}}{\rho _{k3}}\Lambda _3)+(\frac{\rho _{13}^2\rho _{23}^2}{\rho _{k3}^2}-1)(\Lambda _k-\frac{\sigma _k }{\sigma _3}\rho _{k3}\Lambda _3)}{\sigma _k^2\{(\rho _{12}-\rho _{13}\rho _{23})^2-(1-\rho _{13}^2)(1-\rho _{23}^2)\}}\;, \end{aligned}$$

\(\Sigma _k=\textrm{cov}({\mathbb {E}}(A_k \mid {\varvec{Z}}), {\varvec{Z}})\), \(\Lambda _k=\textrm{cov}({\mathbb {E}}(A_k\mid {\varvec{Z}}), \tilde{{\varvec{S}}})\), \(\Sigma _3=\textrm{cov}({\mathbb {E}}(A_1 A_2 \mid {\varvec{Z}}), {\varvec{Z}})\) , and \(\Lambda _3=\textrm{cov}({\mathbb {E}}(A_1A_2 \mid {\varvec{Z}}),\tilde{{\varvec{S}}} )\). We also have \(\sigma _k^2=\textrm{var}(A_k)\), \(\sigma _3^2=\textrm{var}(A_1A_2)\), \(\rho _{12}=\textrm{corr}(A_1, A_2)\), and \(\rho _{k3}=\textrm{corr}(A_k, A_1A_2)\), \(k=1, 2\); see Online Appendix A for more details.

With two binary exposures, \(\sigma _k^2=p_k(1-p_k)\) and \(\sigma _3^2=p_{12}(1-p_{12})\), where \(p_k=P(A_k=1)\), \(k=1,2\) , and \(p_{12}=P(A_1A_2=1)\). The correlation \(\rho _{12}\) is

$$\begin{aligned} \rho _{12}=\frac{p_{12}-p_1p_2}{\sqrt{p_1(1-p_1)p_2(1-p_2)}} \end{aligned}$$

[22], and

$$\begin{aligned} \rho _{k3}=\frac{p_{12}-p_kp_{12}}{\sqrt{p_k(1-p_k)p_{12}(1-p_{12})}}, \; k=1, 2. \end{aligned}$$

3.1.2 A Misspecified Dependence Model

If exposure variables are treated as conditionally independent given confounders, one might specify \(\tilde{{\varvec{S}}}\) as \(\tilde{{\varvec{S}}}=(S_1({\varvec{Z}}_1), S_2({\varvec{Z}}_2))'\) or \(\tilde{{\varvec{S}}}=(S_1({\varvec{Z}}_1), S_2({\varvec{Z}}_2), S_1({\varvec{Z}}_1)S_2({\varvec{Z}}_2))'\), where \(S_1({\varvec{Z}}_1)\) and \(S_2({\varvec{Z}}_2)\) are the marginal propensity scores. A naive model using regression adjustment might be specified as

$$\begin{aligned} {\mathbb {E}}(Y \mid {\varvec{X}}, {\varvec{S}}) = {{\varvec{X}}}'{\varvec{\varphi }} + \psi_1 S_1({\varvec{Z}}_1) + \psi_2 S_2({\varvec{Z}}_2), \end{aligned}$$
(3.5)

or possibly

$$\begin{aligned} {\mathbb {E}}(Y \mid {\varvec{X}}, {\varvec{S}} ) = {{\varvec{X}}}'{\varvec{\varphi }} + \psi_1 S_1({\varvec{Z}}_1) + \psi_2 S_2({\varvec{Z}}_2)+ \psi_3 S_1({\varvec{Z}}_1)S_2({\varvec{Z}}_2). \end{aligned}$$
(3.6)

In the most general setting, such propensity score regression adjustments are insufficient to render \({\varvec{X}}\perp {\varvec{Z}}\mid \tilde{{\varvec{S}}}\) since \(A_1A_2\) depends on the joint model of \(A_1\) and \(A_2\) given \({\varvec{Z}}\). As a result, estimates of the causal effects \(\varphi _1, \varphi _2\) and \(\varphi _3\) are inconsistent with the asymptotic bias given by the second and third terms on the right-hand side of Eqs. (3.2)–(3.4). It can be seen that the bias depends on the second-order association between the exposures and confounders, the association between exposures and generalized propensity scores, and variances of exposures.

A more flexible exposure model may recognize the conditional association between exposures (given marginal confounders \({\varvec{Z}}_1\) and \({\varvec{Z}}_2\)), but assume no role of confounders on the second-order dependence. In this case, one might consider a propensity score term based on

$$\begin{aligned} {\mathbb {E}}(A_1A_2 \mid {\varvec{Z}})=\textrm{cov}(A_1, A_2 \mid {\varvec{Z}}_1, {\varvec{Z}}_2)+S_1({\varvec{Z}}_1)S_2({\varvec{Z}}_2), \end{aligned}$$

where \(\textrm{cov}(A_1, A_2 \mid {\varvec{Z}}_1, {\varvec{Z}}_2)=\rho \sqrt{V(A_1 \mid {\varvec{Z}}_1)V(A_2 \mid {\varvec{Z}}_2)}\) and \(V(A_k \mid {\varvec{Z}}_k)\) is the residual variance from the propensity score for exposure k, \(k=1, 2\). In this case, the regression model adjusted by the generalized propensity score may be,

$$\begin{aligned} {\mathbb {E}}(Y \mid {\varvec{X}}, {\varvec{S}})= {{\varvec{X}}}'{\varvec{\varphi }} + \psi_1 S_1({\varvec{Z}}_1) + \psi_2 S_2({\varvec{Z}}_2)+\psi_3 {\mathbb {E}}(A_{1}A_{2} \mid Z_{1}, Z_{2})\;. \end{aligned}$$
(3.7)

We explore the impact of such misspecifications in Sect. 3.3.

3.2 Inverse Probability Weighting

A misspecified propensity score will have a different impact in the framework of the inverse propensity score-weighted estimating function. Let \(P_0({\varvec{A}}={\varvec{a}} \mid {\varvec{Z}}; \delta )\) denote the misspecified (naive) conditional joint density or mass function indexed by \(\delta\), and let \(U_{10}(\varvec{\varphi };\varvec{\delta })\) and \(U_{20}(\varvec{\delta })\) be the estimating functions as follows:

$$\begin{aligned} U_{10}(\varvec{\varphi };\varvec{\delta })&=\frac{I\left( {\varvec{A}}={\varvec{a}}\right) }{P_0\left( {\varvec{A}}={\varvec{a}} \mid {\varvec{Z}} ;\varvec{\delta }\right) }D({\varvec{X}})\left[ Y-{\mathbb {E}}^R(Y \mid {\varvec{X}};\varvec{\varphi })\right] \;,\end{aligned}$$
(3.8)
$$\begin{aligned} U_{20}(\varvec{\delta })&=\frac{\partial \log P_0({\varvec{A}} \mid {\varvec{Z}}; \varvec{\delta })}{\partial \varvec{\delta }}\;. \end{aligned}$$
(3.9)

Let \(\varvec{\delta }^*\) be the solution to \({\mathbb {E}}_{{\varvec{A}},{\varvec{Z}}}\{U_{20}(\varvec{\delta });\varvec{\gamma },\varvec{\tau }\}=0\), the limiting value to which the estimator \(\hat{\varvec{\delta }}\) converges when fitting the misspecified model. The limiting value of \(\varvec{\varphi }\) is obtained by setting the equation below to zero and solving \(\hat{\varvec{\varphi }}\),

$$\begin{aligned} \sum _{{\varvec{A}}}\sum _{{\varvec{Z}}}\frac{{\textrm{I}\left( {\varvec{A}}={\varvec{a}}\right) P({\varvec{A}}={\varvec{a}} \mid {\varvec{Z}};\varvec{\gamma })}}{{\textrm{P}_0\left( {\varvec{A}}={\varvec{a}} \mid {\varvec{Z}};\varvec{\delta }^*\right) P({\varvec{Z}};\varvec{\tau })}}{\textrm{D}({\varvec{X}})\left[ {\mathbb {E}}(Y \mid {\varvec{X}}, {\varvec{Z}};\varPsi )-{\mathbb {E}}^R(Y \mid {\varvec{X}};\varvec{\varphi })\right] }. \end{aligned}$$
(3.10)

A misspecified propensity score \(P_0({\varvec{A}}={\varvec{a}} \mid {\varvec{Z}};\varvec{\delta })\) is the product of two marginal propensity scores \(P(A_1=a_1 \mid {\varvec{Z}};\varvec{\delta })P(A_2=a_2 \mid {\varvec{Z}};\varvec{\delta })\) under the conditional independence assumption. In this case, under the present formulation, we have

$$\begin{aligned} U_{20}(\varvec{\delta })=(U_{20,1}^{\prime }(\varvec{\gamma }_1), U_{20,2}^{\prime }(\varvec{\gamma }_2)), \end{aligned}$$

where

$$\begin{aligned} U_{20,k}(\varvec{\gamma }_k)=\frac{\partial \text {log} P(A_k=a_k \mid {\varvec{Z}}_k; \varvec{\gamma }_k)}{\partial \varvec{\gamma }_k}, \; k=1, 2, \end{aligned}$$

and \(\varvec{\delta }^*=(\varvec{\gamma }_1', \varvec{\gamma }_2')'\) so \(\hat{\varvec{\delta }}\) is consistent for the target parameters \(\varvec{\gamma }_1\) and \(\varvec{\gamma }_2\) but is of insufficient dimension to characterize the second-order dependence.

When the association is assumed functionally independent of confounders, the misspecified propensity score \(P_0({\varvec{A}}={\varvec{a}} \mid {\varvec{Z}};\varvec{\delta })\) is an inadequately modeled conditional joint density or mass function \(P(A_1=a_1, A_2=a_2 \mid {\varvec{Z}}_1, {\varvec{Z}}_2;\varvec{\delta })\). In this case, we have \(U_{20}(\varvec{\delta })=\partial \text {log} P({\varvec{A}}={\varvec{a}} \mid {\varvec{Z}}_1, {\varvec{Z}}_2; \varvec{\delta })/\partial \varvec{\delta }\), and \(\varvec{\delta }^*\) will not be consistent for the target parameters \(\varvec{\gamma }_1\) or \(\varvec{\gamma }_2\).

3.3 Empirical Bias and Efficiency

Here, we report on simulation studies investigating the limiting values of the exposure effects subject to model misspecification. We consider a bivariate binary exposure and assess the relative asymptotic bias of estimates of exposure effects under two types of model misspecification of Sect. 3.1.2 under both regression adjustment and IPW analyses.

Suppose a sample of n independent replicates \(\{(Y_i, {\varvec{X}}_i, {\varvec{Z}}_i), i=1\dots n\}\) are generated with \(n=500\). Let \({\varvec{Z}}\) contain binary confounding variables with \(Z_1 \perp Z_2\), \(P(Z_{1}=1)=P(Z_{2}=1)=0.5\), and \(P(Z_{3}=1 \mid Z_{1}, Z_{2}) = \text {expit}\{-0.896 +(\log 2) Z_{1}+(\log 3) Z_{2}\}\) giving \(P(Z_{3}=1)=0.5\). We let \(P(A_1=1 \mid Z_1)=\text {expit}(\gamma _{10}+\gamma _{11}Z_1)\) where \(\varvec{\gamma }_1=(\gamma _{10}, \gamma _{11})'=(-0.764, \log 2)'\) so \({\mathbb {E}}(A_{1})=0.4\), and \(P(A_2=1 \mid Z_2)=\text {expit}(\gamma _{20}+\gamma _{21}Z_2)\) with \(\varvec{\gamma }_2=(\gamma _{20}, \gamma _{21})'=(-0.112, \log 1.25)'\) so \({\mathbb {E}}(A_{2})=0.5\), Finally, we let \(\log \text {OR}(A_{1}, A_{2}\mid Z_3)=\gamma _{30}+\gamma _{31}Z_3\) with \(\varvec{\gamma }_3=(\gamma _{30}, \gamma _{31})'\); we consider \(\gamma _{31} \in [-5, 5]\) and for each value of \(\gamma _{31}\) , we solve for \(\gamma _{30}\) in (2.5) such that \({\mathbb {E}}\{\text {OR}(A_{1}, A_{2})\}=4\). For response Y, we generate data according to (2.1) with \(\varvec{\theta }=(\alpha _0, \alpha _1, \alpha _2, \beta _1)'=(0, 2,2,4)'\) and \(\varvec{\zeta }=(0.5, 0.5, 0.5)'\).

Under the assumption of conditional independence between exposures, the regression model adjusted by the generalized propensity scores can be fitted as in (3.5) or (3.6), which is denoted as Reg1 and Reg2 in Fig. 1, respectively. The IPW approach involves solving the exposure effect in (3.10). We denote the estimated effect as \(\hat{\varvec{\varphi }}=({\hat{\varphi }}_1, {\hat{\varphi }}_2, {\hat{\varphi }}_3)'\) for \({\varvec{X}}=(A_1, A_2, A_1A_2)'\), and the relative bias of the estimated exposure effect for \({\varvec{X}}\) is shown in Fig. 1 as \(({\hat{\varphi }}_1-\alpha _1)/\alpha _1\), \(({\hat{\varphi }}_2-\alpha _2)/\alpha _2\), and \(({\hat{\varphi }}_3-\beta _1)/\beta _1\). In the bottom right panel of Fig. 1, we show the relative bias of the causal effect of a one unit change in \(A_2\) given \(A_{1}=a_1=1\) or 0 and \({\varvec{Z}}\), and denote it as \(({\hat{\varDelta }}_2(A_{1})-\varDelta _2(A_{1}))/\varDelta _2(A_{1})\) where \(\varDelta _2(A_{1})=\alpha _2 + a_{1} {\beta }_{1}\) and \({\hat{\varDelta }}_2(A_{1})={\hat{\varphi }}_2 + a_{1} {{\hat{\varphi }}}_{3}\). The absolute value of relative bias of the exposure effect increases as the covariate effect on the exposure dependence increases. Moreover, the regression adjustment leads to less bias in effect estimation than the IPW approach in these settings. Adjusting for \((S_1({\varvec{Z}}), S_2({\varvec{Z}}))'\) or \((S_1({\varvec{Z}}), S_2({\varvec{Z}}), S_1({\varvec{Z}})S_2({\varvec{Z}}))'\) produces similar results. When \(\gamma _{31}=0\) (i.e., the conditional association between exposure is functionally independent of confounders), the bias of the estimated exposure effect is negligible when conditional independence is assumed between two binary exposures.

Fig. 1
figure 1

Empirical relative bias with standard errors (reflected by shaded bands) of estimators of exposure effects from inadequate propensity score modeling assuming conditional independence between two binary exposures when the second-order confounder has the effect \(\gamma _{31}\); regression adjustment (Reg) and inverse weighting (IPW) are considered; the marginal association between exposures is \(\text {OR}=4\)

We follow (3.7) and (3.10) to conduct the estimation for the second type of misspecification, that is, using an exchangeable dependence structure which assumes no second-order effects of confounders on the exposure. The results are shown in Fig. 2. When there is no confounder affecting the dependence (i.e., \(\gamma _{31}=0\)), the curves pass through zero points on both axes, indicating both the regression adjustment and IPW approaches produce unbiased estimates as the models are correctly specified under the assumption of no second-order effects of confounders on the exposure. Similarly as in Fig. 1, the absolute value of relative bias of the exposure effect would increase as the covariate effect on the exposure dependence (\(\gamma _{31}\)) increases. The regression adjustment leads to less bias in effect estimation compared to the IPW approach.

Fig. 2
figure 2

Empirical relative bias with standard errors (reflected by shaded bands) of estimators of exposure effects from inadequate propensity score modeling due to the omission of a second-order confounder with effect \(\gamma _{31}\); regression adjustment (Reg) and inverse weighting (IPW) are considered; the marginal association between two binary exposures is \(\text {OR}=4\)

4 Multivariate Continuous Exposures

In this section, we consider multiple continuous exposures and semi-continuous exposures.

4.1 Multivariate Normal Exposures

4.1.1 Regression Adjustment

Here, we consider \({\varvec{A}} \mid {\varvec{Z}} \sim \textrm{BVN}(\varvec{\mu }({\varvec{Z}}), \Sigma ({\varvec{Z}}))\) with \({\varvec{A}}=(A_1, A_2)'\), \({\varvec{Z}}=({\varvec{Z}}'_1, {\varvec{Z}}'_2, {\varvec{Z}}'_3)'\), \(\varvec{\mu }({\varvec{Z}})=(\mu _1({\varvec{Z}}), \mu _2({\varvec{Z}}))'\), \(\mu _k({\varvec{Z}})={\mathbb {E}}(A_k \mid {\varvec{Z}}) = {\mathbb {E}}(A_k \mid {\varvec{Z}}_k)\), \(\textrm{var}(A_k \mid {\varvec{Z}})=\sigma ^2_k\), \(k=1,2\), and \(\textrm{corr}(A_1, A_2 \mid {\varvec{Z}})= \rho ({\varvec{Z}}_3)\). We suppose \({\mathbb {E}}(A_k \mid {\varvec{Z}}) = {{\varvec{Z}}}'_k\varvec{\gamma }_k\) with a dependence model

$$\begin{aligned} \log \left\{ \displaystyle \frac{1+\rho ({\varvec{Z}}_3)}{1 -\rho ({\varvec{Z}}_3)} \right\} = {{\varvec{Z}}}'_3\varvec{\gamma }_{3} . \end{aligned}$$
(4.1)

Since the variance parameters are functionally independent of \({\varvec{Z}}\), a sufficient propensity score is \({\varvec{S}}({\varvec{Z}}) = (S_1({\varvec{Z}}_1), S_2({\varvec{Z}}_2), S_3({\varvec{Z}}_3))'\) where \(S_k({\varvec{Z}}_k) = {{\varvec{Z}}}_k'\varvec{\gamma }_k\), \(k=1, 2\) and \(S_3({\varvec{Z}}) = {{\varvec{Z}}}'_3 \varvec{\gamma }_3\) [17] since

$$\begin{aligned} f(a_1,a_2\mid {\varvec{Z}},{\varvec{S}})) = \displaystyle \frac{1}{ 2 \pi \sqrt{\mid \Sigma (S_3)\mid }} \exp \{ - ({\varvec{a}}-{\varvec{S}})' \Sigma ^{-1}(S_3)) ({\varvec{a}}- {\varvec{S}}) )/2\}, \end{aligned}$$

where \(({\varvec{a}}-{\varvec{S}}) = (a_1 - S_1, a_2-S_2)'\) and

$$\begin{aligned} \Sigma (S_3) = \begin{bmatrix} \sigma _1^2 &{} \rho ({\varvec{Z}}_3) \sigma _1 \sigma _2 \\ \rho ({\varvec{Z}}_3) \sigma _1 \sigma _2 &{} \sigma _2^2 \end{bmatrix}. \end{aligned}$$

An alternative to (4.1) would be to model \(B_1 = A_1 A_2\) given \({\varvec{Z}}\) directly to define \(S_3({\varvec{Z}}) = {\mathbb {E}}(B_1 \mid {\varvec{Z}}_3)\) and note that in this case, we could replace the covariance entries of \(\Sigma\) with \(S_3({\varvec{Z}}) - S_1({\varvec{Z}}_1) S_2({\varvec{Z}}_2)\); the key point is that \({\varvec{S}}({\varvec{Z}}) = (S_1({\varvec{Z}}_1), S_2({\varvec{Z}}_2), S_3({\varvec{Z}}_3))'\) contains the relevant information on confounders in the exposure model.

4.1.2 Inverse Joint Density Weighting

Inverse density weighting can be used when exposures are continuous [23]. With conditional joint density \(f({\varvec{a}} \mid {\varvec{z}}; \varvec{\gamma })\) as in (2.6), we write the inverse joint density-weighted estimating function as

$$\begin{aligned} U_1(\varvec{\varphi };\varvec{\gamma })=\frac{I\left( {\varvec{A}}={\varvec{a}}\right) }{f({\varvec{a}} \mid {\varvec{z}};\varvec{\gamma })}D({{\varvec{X}}})\left[ Y-{\mathbb {E}}^R(Y \mid {\varvec{X}};\varvec{\varphi })\right] . \end{aligned}$$
(4.2)

The expectation of \(U_1(\varvec{\varphi };\varvec{\gamma })\) with respect to \(Y \mid {\varvec{X}}, {\varvec{Z}}\) and \({\varvec{Z}} \mid {\varvec{X}}\) gives

$$\begin{aligned} \int _{{\varvec{Z}}}\frac{I\left( {\varvec{A}}={\varvec{a}}\right) }{f({\varvec{a}} \mid {\varvec{z}};\varvec{\gamma })}D({{\varvec{X}}})\left[ {\mathbb {E}}(Y \mid {\varvec{X}}, {\varvec{Z}};\varvec{\theta }, \varvec{\zeta })-{\mathbb {E}}^R(Y \mid {\varvec{X}};\varvec{\varphi })\right] f({\varvec{z}} \mid {\varvec{a}})d{\varvec{z}}, \end{aligned}$$
(4.3)

where

$$\begin{aligned} f({\varvec{z}} \mid {\varvec{a}}) = \frac{f({\varvec{a}} \mid {\varvec{z}};\varvec{\gamma })f({\varvec{z}};\varvec{\tau })}{f({\varvec{a}})}. \end{aligned}$$

Thus, we can rewrite (4.3) as

$$\begin{aligned} \int _{{\varvec{Z}}}\frac{I\left( {\varvec{A}}={\varvec{a}}\right) }{f\left( {\varvec{a}} \right) }D({{\varvec{X}}})\left[ {\mathbb {E}}(Y \mid {\varvec{X}}, {\varvec{Z}};\varvec{\theta }, \varvec{\zeta })-{\mathbb {E}}^R(Y \mid {\varvec{X}};\varvec{\varphi })\right] f({\varvec{z}};\varvec{\tau })d{\varvec{z}}, \end{aligned}$$
(4.4)

which results in the expectation being taken in the setting where \({\varvec{Z}} \perp {\varvec{A}}\) and gives consistent causal estimate \(\hat{\varvec{\varphi }}\) from solving \(U_1(\varvec{\varphi };\varvec{\gamma })=0\).

4.2 Multivariate Semi-continuous Exposures

4.2.1 Regression Adjustment

Here we suppose \(A_{1}\) and \(A_{2}\) are non-negative semi-continuous exposure variables with a mass at zero and we adopt a two-part generalized propensity score model [24] to address the effects of confounding variables. While these two exposures could in general represent exposures to different agents, here we consider the case where they represent different aspects of the exposure to a single agent. Specifically for motivating study, we may let \(A_1\) and \(A_2\) represent the drinking frequency (proportion of drinking days per week) and drinking intensity (dose per drinking occasion) – among non-drinkers, these degenerate to zero. We can therefore let \(A^{+}={\mathbb {I}}(A_1A_2 > 0)\) indicate positive exposure, \( L _ { k } = \ l o g ( A_{k} \mid A^{+}=1)\), \(k=1,2\), and \(\pi _{1}({\varvec{Z}}_0)={\mathbb {E}}(A^{+} \mid {\varvec{Z}}_{0})\).

We consider the true response model is (2.1) with \({\varvec{X}}=(1, A^+, A^+L_{1}, A^+L_{2}, A^+L_{1}L_{2})'\). We let

$$\begin{aligned} \pi _{1}({\varvec{Z}}_0)=\text {expit}({{\varvec{Z}}}_0'\varvec{\gamma }_0), \end{aligned}$$

where \({{\varvec{Z}}}_0=(1, {\varvec{Z}}_1', {\varvec{Z}}_2')'\), and given \(A^+=1\), we suppose \(L_k={{\varvec{Z}}}_k'\gamma _k+R_k\), \(k=1, 2\) , and \({\varvec{R}}=(R_{1}, R_{2})'\) is bivariate normal with \(\textrm{var}(R_k)=\sigma _k^2\), \(k=1, 2\), \(\rho ({\varvec{Z}})=\textrm{corr}(R_1, R_2 \mid {\varvec{Z}})\) , and \(\log \{(1+\rho ({\varvec{Z}})/(1-\rho ({\varvec{Z}}))\}={{\varvec{Z}}}_3'\varvec{\gamma }_3\). As such, we have

$$\begin{aligned} {\mathbb {E}}(L_{k} \mid {\varvec{Z}}_{0}, {\varvec{Z}}_{k})&=\pi _{1}({\varvec{Z}}_{0}){\mathbb {E}}(L_{k} \mid A^+=1, {{\varvec{Z}}}_{k})\;,\; k=1, 2\;,\\ {\mathbb {E}}(L_{1}L_{2} \mid {\varvec{Z}})&=\pi _{1}({\varvec{Z}}_{0}){\mathbb {E}}(L_{1}L_{2} \mid A^+=1, {\varvec{Z}}) \end{aligned}$$

to form the generalized propensity score as \({\varvec{S}}=(\pi _{1}({\varvec{Z}}_0), {\mathbb {E}}(L_{1} \mid {\varvec{Z}}_{0}, {\varvec{Z}}_{1}), {\mathbb {E}}(L_{2} \mid {\varvec{Z}}_{0}, {\varvec{Z}}_{2}), {\mathbb {E}}(L_{1}L_{2} \mid {\varvec{Z}}))'\) or \({\varvec{S}}=(S_0({\varvec{Z}}_0), S_1({\varvec{Z}}_1),S_2({\varvec{Z}}_2),S_3({\varvec{Z}}_3))'\) and use it for adjustment in the regression model (2.4) as

$$\begin{aligned} {\mathbb {E}}(Y \mid {\varvec{X}}, {\varvec{S}}) = {\varvec{X}}'\varvec{\lambda } + {{\varvec{S}}}' \varvec{\eta } , \end{aligned}$$
(4.5)

where \({\varvec{X}}=(1, A^+, A^+L_{1}, A^+L_{2}, A^+L_{1}L_{2})'\).

4.2.2 Inverse Joint Density Weighting

The inverse density of exposure weighted estimating function is given here by

$$\begin{aligned} U_0(Y \mid {\varvec{X}};\varvec{\varphi },\varvec{\varGamma }, \varvec{\gamma }_0)=w({\varvec{X}}, {\varvec{Z}})D({\varvec{X}})[Y-{\mathbb {E}}^R(Y \mid {\varvec{X}};\varvec{\varphi })], \end{aligned}$$
(4.6)

where \(\varvec{\varGamma }=(\varvec{\gamma }_1', \sigma _1^2, \varvec{\gamma }_2', \sigma _2^2, \varvec{\gamma }_3')'\),

$$\begin{aligned} w({\varvec{X}}, {\varvec{Z}})=\frac{{\mathbb {I}}({\varvec{X}}={\varvec{x}})}{\{\pi _{1}({\varvec{Z}}_{0}; \varvec{\gamma }_0)f(l_{1}, l_{2} \mid A^+=1, {\varvec{Z}})\}^{A^+}\{1-\pi _{1}({\varvec{Z}}_{0}; \varvec{\gamma }_0)\}^{1-A^+}}, \end{aligned}$$

and \(D({\varvec{X}})=(1, A^+, A^+L_{1},A^+L_{2}, A^+L_{1}L_{2})'\).

Stabilization can be used to address extreme weights [19, 25] where

$$\begin{aligned} w^s({\varvec{X}}, {\varvec{Z}})=\frac{{\mathbb {I}}({\varvec{X}}={\varvec{x}})\{\pi _{2}(\varvec{\gamma }_0^{\dagger })f(l_{1},l_{2} \mid A^+=1)\}^{A^+}\{1-\pi _{2}(\varvec{\gamma }_0^{\dagger })\}^{1-A^+}}{\{\pi _{1}({\varvec{Z}}_{0}; \varvec{\gamma }_0)f(l_{1}, l_{2} \mid A^+=1, {\varvec{Z}}) \}^{A^+}\{1-\pi _{1}({\varvec{Z}}_{0}; \varvec{\gamma }_0)\}^{1-A^+}}, \end{aligned}$$

with \(\pi _{2}(\varvec{\gamma }_0^{\dagger })\) and \(f(l_{1},l_{2} \mid A^+=1)\) unconditional quantities with respect to the conditional quantities \(\pi _{1}({\varvec{Z}}_{0}; \varvec{\gamma }_0)\) and \(f(l_{1}, l_{2} \mid A^+=1, {\varvec{Z}})\) for weights stabilization. The inverse weighted estimating function for \(\varvec{\varphi }\) with stabilized weights is given by (4.6) with \(w^s({\varvec{X}}, {\varvec{Z}})\) in place of \(w({\varvec{X}}, {\varvec{Z}})\) giving

$$\begin{aligned} U_0(Y \mid {\varvec{X}};\varvec{\varphi },\varvec{\varGamma },\varvec{\varGamma }^{\dagger }, \varvec{\gamma }_0, \varvec{\gamma }_0^{\dagger })=w^s({\varvec{X}}, {\varvec{Z}})D({\varvec{X}})[Y-{\mathbb {E}}(Y \mid {\varvec{X}};\varvec{\varphi })]\;. \end{aligned}$$
(4.7)

The estimate of \(\varvec{\varphi }\) is the solution from solving \(U_0(Y \mid {\varvec{X}};\varvec{\varphi },\varvec{\varGamma }, \varvec{\gamma }_0)=0\) or \(U_0(Y \mid {\varvec{X}};\varvec{\varphi },\varvec{\varGamma },\varvec{\varGamma }^{\dagger }, \varvec{\gamma }_0, \varvec{\gamma }_0^{\dagger })=0\); See Online Appendix B and C, for more details on parameter estimation through the second-order generalized estimating equations (GEE2) [26, 27] and derivation of the asymptotic covariance of the estimator.

4.3 Misspecification of Propensity Scores

4.3.1 Regression Adjustment and Inverse Density Weighting

If we treat the bivariate continuous or semi-continuous exposures as conditionally independent, the generalized propensity score can be adjusted similarly as in the regression model (3.5). That is,

$$\begin{aligned} {\mathbb {E}}(Y \mid {\varvec{X}},\tilde{{\varvec{S}}}) = {{\varvec{X}}}'{\varvec{\varphi }} + \tilde{{\varvec{S}}}'\varvec{\psi }\;, \end{aligned}$$
(4.8)

where \({\varvec{X}}=(1, A_{1}, A_{2}, A_{1}A_{2})'\) contains continuous exposures, \(\tilde{{\varvec{S}}}=({\mathbb {E}}(A_{1} \mid Z_{1}), {\mathbb {E}}(A_{2} \mid Z_{2}), {\mathbb {E}}(A_{1} \mid Z_{1}){\mathbb {E}}(A_{2} \mid Z_{2}))'\). For semi-continuous exposures, we have \({\varvec{X}}=(1, L_{1}, L_{2}, L_{1}L_{2})'\), and \(\tilde S = {\left( {{\pi _1}\left( {{Z_0}} \right),\mathbb{E}\left( {{L_1}\mid {Z_0},{Z_1}} \right),\mathbb{E}\left( {{L_2}\mid {Z_0},{Z_2}} \right),\mathbb{E}\left( {{L_1}\mid {Z_0},{Z_1}} \right)\mathbb{E}\left( {{L_2}\left| {{Z_0},{Z_2}} \right.} \right)} \right)^\prime }\) in (4.8).

When exposures are continuous, the inverse density of exposure weighted estimating equation used for exposure effects estimation is given by

$$\begin{aligned} {{\textrm{U}}_{10}(\varvec{\varphi };\varvec{\delta })}=\frac{{\mathbb {I}}\left( {\varvec{X}}={\varvec{x}}\right) f\left( a_{1}\right) f\left( a_{2} \right) }{f\left( a_{1} \mid Z_{1};\varvec{\delta } \right) f\left( a_{2} \mid Z_{2};\varvec{\delta } \right) }D({{\varvec{X}}})\left[ Y-{\mathbb {E}}^R(Y \mid {\varvec{X}};\varvec{\varphi })\right] =0, \end{aligned}$$
(4.9)

and when exposures are semi-continuous, the inverse density of exposure weighted estimating equation is given by

$$\begin{aligned} w^s({\varvec{X}}, {\varvec{Z}})D({\varvec{X}})[Y-{\mathbb {E}}^R(Y \mid {\varvec{X}};\varvec{\varphi })]=0, \end{aligned}$$
(4.10)

where

$$\begin{aligned} \small w^s({\varvec{X}}, {\varvec{Z}})=\frac{{\mathbb {I}}({\varvec{X}}={\varvec{x}})\{\pi _{2}(\varvec{\gamma }_0^{\dagger })f(l_{1} \mid A^+=1)f(l_{2} \mid A^+=1)\}^{A^+}\{1-\pi _{2}(\varvec{\gamma }_0^{\dagger })\}^{1-A^+}}{\{\pi _{1}({\varvec{Z}}_{0}; \varvec{\gamma }_0)f(l_{1} \mid A^+=1, {Z}_{1})f(l_{2} \mid A^+=1, {Z}_{2})\}^{A^+}\{1-\pi _{1}({\varvec{Z}}_{0}; \varvec{\gamma }_0)\}^{1-A^+}}, \end{aligned}$$

and \(D({\varvec{X}})=(1, A^+, A^+L_{1},A^+L_{2}, A^+L_{1}L_{2})'\).

When the dependence model assumes no role of confounders on the association between exposures, the regression model adjusted by the generalized propensity score is similar to (3.7),

$$\begin{aligned} {\mathbb {E}}(Y \mid {\varvec{X}}, \varvec{{\tilde{S}}})&= {{\varvec{X}}}'{\varvec{\varphi }} + \tilde{{\varvec{S}}}'\varvec{\psi }\;, \end{aligned}$$
(4.11)

where \({\varvec{X}}=(1, A_{1}, A_{2}, A_{1}A_{2})'\), \(\tilde{{\varvec{S}}}=({\mathbb {E}}(A_{1} \mid {\varvec{Z}}_{1}), {\mathbb {E}}(A_{2} \mid Z_{2}), {\mathbb {E}}(A_{1}A_{2} \mid {\varvec{Z}}_{1}, {\varvec{Z}}_{2}))'\). For semi-continuous exposures, we have \({\varvec{X}}=(1, L_{1}, L_{2}, L_{1}L_{2})'\), \(\tilde{{\varvec{S}}}=(\pi _1({\varvec{Z}}_0), {\mathbb {E}}(L_{1} \mid {\varvec{Z}}_0, {\varvec{Z}}_{1}), {\mathbb {E}}(L_{2} \mid {\varvec{Z}}_0, {\varvec{Z}}_{2}), {\mathbb {E}}(L_{1}L_{2} \mid {\varvec{Z}}_0, {{\varvec{Z}}}_1, {{\varvec{Z}}}_2))'\).

The inverse density weighting approach using the misspecified joint density of exposures is to solve the following estimating equation for continuous exposures,

$$\begin{aligned} U_{10}(\varvec{\varphi };\varvec{\delta })=\frac{{\mathbb {I}}\left( {\varvec{X}}={\varvec{x}}\right) f\left( a_{1}, a_{2} \right) }{f\left( a_{1}, a_{2} \mid {\varvec{Z}}_{1},{\varvec{Z}}_{2};\varvec{\delta } \right) }D({\varvec{X}})\left[ Y-{\mathbb {E}}^R(Y \mid {\varvec{X}};\varvec{\varphi })\right] =0. \end{aligned}$$
(4.12)

For the semi-continuous exposures, the inverse density of exposure weighted estimating equation is to solve (4.10) with

$$\begin{aligned} w^s({\varvec{X}}, {\varvec{Z}})=\frac{{{\text {I}}({\varvec{X}}={\varvec{x}})\{\pi _{2}(\varvec{\gamma }_0^{\dagger })f(l_{1},l_{2} \mid A^+=1)\}^{A^+}\{1-\pi _{2}(\varvec{\gamma }_0^{\dagger })\}^{1-A^+}}}{{\{\pi _{1}({\varvec{Z}}_{0}; \varvec{\gamma }_0)f(l_{1}, l_{2} \mid A^+=1, {{\varvec{Z}}}_{1},{{\varvec{Z}}}_{2}) \}^{A^+}\{1-\pi _{1}({\varvec{Z}}_{0}; \varvec{\gamma }_0)\}^{1-A^+}}}. \end{aligned}$$
(4.13)

4.3.2 Empirical Bias and Efficiency

In this section, we report on simulation studies based on the random sample of size 500 with 1000 repetitions to estimate the exposure effect when exposures are continuous or semi-continuous. The estimated exposure effect through both regression adjustment and inverse density-weighting approach would be assessed in terms of the relative bias, i.e., \((\hat{\varvec{\varphi }}-\varvec{\theta })/\varvec{\theta }\).

When exposures in the true response model (2.1) are bivariate continuous, we let \(A_{k} =\gamma _{k0} +\gamma _{k1}Z_{k}+R_{k}\), \(k=1,2\), where

$$\begin{aligned} {\varvec{R}}=(R_{1}, R_{2})' \sim {\mathcal {N}}\left( {\varvec{0}}, \begin{pmatrix} \sigma _{1}^2 &{} \sigma _{12} \\ \sigma _{12} &{} \sigma _{2}^2 \end{pmatrix}\right) , \end{aligned}$$

and \(Z_{k} \perp R_{k}\). Additionally, we have \(\log \{(1+\rho )/(1-\rho )\}=\gamma _{30}+\gamma _{31}Z_{3}\). We let \(\gamma _{10}= 2\) and \(\gamma _{11}=\log 2\) such that \({\mathbb {E}}(A_{1})=2\), let \(\gamma _{20}= 0.5\) and \(\gamma _{21}=\log 1.25\) such that \({\mathbb {E}}(A_{2})=0.5\). We let \(\gamma _{31} \in [-5, 5]\) and solve for \(\gamma _{30}\) such that \({\mathbb {E}}(\rho )=0.5\).

Fig. 3
figure 3

Empirical relative bias with standard errors (reflected by shaded bands) of estimators of exposure effects from inadequate propensity score modeling assuming conditional independence between two continuous exposures when the second-order confounder has the effect \(\gamma _{31}\); regression adjustment (Reg) and inverse weighting (IPW) are considered; the marginal association between exposures is \(\rho =0.5\)

The relative biases of exposure effects estimated from (4.8), (4.9), (4.11), and (4.12) under two types of model misspecifications are shown in Figs. 3 and 4. The patterns of relative biases for continuous exposures are quite different from the case of binary exposures, but the main findings are similar in general. We again see the relative bias of the exposure effect increases as the covariate effect on the exposure dependence increases. Compared to the case of binary exposures, the estimated relative bias is relatively lower for continuous exposures. Regression adjustment tends to produce a smaller bias than the inverse density weighting approach with higher efficiency.

Fig. 4
figure 4

Empirical relative bias with standard errors (reflected by shaded bands) of estimators of exposure effects from inadequate propensity score modeling due to the omission of a second-order confounder with effect \(\gamma _{31}\); regression adjustment (Reg) and inverse weighting (IPW) are considered; the marginal association between two continuous exposures is \(\rho =0.5\)

For semi-continuous exposures, we let \(\pi _{1}({\varvec{Z}}_{0})=\text {expit}(\gamma _{00}+\gamma _{01}Z_{1}+\gamma _{02}Z_{2})\) where \(\gamma _{00}=1.484\), \(\gamma _{01}=1\), \(\gamma _{02}=2\) such that \({\mathbb {E}}(\pi _{1})=0.7\) in the first part of the model. For the second part where given \(A^+=1\), we suppose \(\ l o g (A_{k} \mid A^+=1)=\gamma_{k0}+\gamma_{k1}Z_{k}+R_{k}\), \(k=1, 2\), where

$$\begin{aligned} {\varvec{R}}=(R_{1}, R_{2})' \sim {\mathcal {N}}\left( {\varvec{0}}, \begin{pmatrix} \sigma _{1}^2 &{} \sigma _{12} \\ \sigma _{12} &{} \sigma _{2}^2 \end{pmatrix}\right) . \end{aligned}$$

We let \(\gamma _{10}= 2\) and \(\gamma _{11}=\log 2\) such that \({\mathbb {E}}(\log A_{1} \mid A^+=1)=2\), let \(\gamma _{20}= 0.5\) and \(\gamma _{21}=\log 1.25\) such that \({\mathbb {E}}(\log A_{2}\mid A^+=1)=0.5\). Additionally, we have \(\rho =\sigma _{12}/(\sigma _{1}\sigma _{2})\), and \(\log \{(1+\rho )/(1-\rho )\}=\gamma _{30}+\gamma _{31}Z_{3}\), where we let \(\gamma _{31} \in [-5, 5]\) and solve for \(\gamma _{30}\) such that \({\mathbb {E}}\{\rho (A_1, A_2)\}=0.5\).

Figures D1 and D2 in Online Appendix D show the relative biases of semi-continuous exposure effects estimated via (4.8), (4.10), (4.11) and (4.13) under two model misspecifications. The patterns of the relative bias as a function of the covariate effect \(\gamma _{31}\) are similar to those of Figs. 3 and 4, while the case of semi-continuous exposures can lead to higher variability in estimates compared to binary and continuous exposures, especially under inverse density weighting and conditional independence. Again regression adjustment yields a smaller bias and greater precision compared to inverse density weighting in these simulation settings.

5 A Two-Stage Causal Analysis for Bivariate Semi-continuous Exposures

Here, we propose a two-stage approach for causal analysis involving semi-continuous exposures, again considering the special case that a single binary indicator \(A^+\) indicates that an individual is exposed, and if \(A^+=1\) then variables \(A_k\), \(k=1,\ldots , p\) record the different facets of exposure. It is helpful to let \(A_k^c=A^+(A_k-\bar{A}_k)\) denote the the centered value of \(A_k\) where \(\bar{A}_k={\mathbb {E}}(A_k \mid A^+=1)\), \(k=1, \dots , p\), and \({\varvec{A}}=(A_1^c,\dots , A_p^c)'\) is the vector of exposures, \({\varvec{B}}=(A_1^c A_2^c, \ldots , A_{p-1}^c A_p^c)'\) is the vector of pairwise products of exposures, \({\varvec{X}}=(A^+, {\varvec{A}}^{\prime }, {\varvec{B}}')'\), and \({\varvec{Z}}\) is an \(r \times 1\) vector of confounders. The data-generating model is

$$\begin{aligned} {\mathbb {E}}(Y \mid {\varvec{X}}, {\varvec{Z}};\varvec{\beta })=\beta _0+\beta _1 A^+ + \varvec{\beta }_2'{\varvec{A}}+\varvec{\beta }_3' {\varvec{B}}+\varvec{\beta }_4' {\varvec{Z}}. \end{aligned}$$
(5.1)

5.1 Stage I: The Dose–Response Surface

In stage I, we focus on the subgroup analysis modeling the dose–response surface among those with \(A^+=1\) (the exposed). For a sample with \(\{Y_i, {\varvec{X}}_i, {\varvec{Z}}_i, i=1, \dots , n \}\), the estimating function is \(U_{1}(\varvec{\theta })=\sum _{i=1}^n U_{i1}(\varvec{\theta })\) for the covariate adjustment where

$$\begin{aligned} U_{i1}(\varvec{\theta })=A_i^+ D_1({\varvec{A}}_i, {\varvec{B}}_{i}, {\varvec{Z}}_i)\left\{ Y_i- \eta _1({\varvec{A}}_i, {\varvec{B}}_{i}, {\varvec{Z}}_i; \varvec{\theta })\right\} , \end{aligned}$$
(5.2)

\(D_1({\varvec{A}}_i, {\varvec{B}}_{i}, {\varvec{Z}}_i)=(1, {\varvec{A}}_{i}^{\prime }, {\varvec{B}}_{i}',{\varvec{Z}}_i')'\), and

$$\begin{aligned} \eta _1({\varvec{A}}_i, {\varvec{B}}_{i}, {\varvec{Z}}_i; \varvec{\theta })=\theta _1+\varvec{\theta }_2' {\varvec{A}}_{i} + \varvec{\theta }_3' {\varvec{B}}_{i}+\varvec{\theta }_4' {\varvec{Z}}_i. \end{aligned}$$

Alternatively, if \({\varvec{S}}_A={\mathbb {E}}({\varvec{A}} \mid A^+=1, {\varvec{Z}})\) and \({\varvec{S}}_B={\mathbb {E}}({\varvec{B}} \mid A^+=1, {\varvec{Z}})\), we can set

$$\begin{aligned} U_{i2}(\varvec{\theta })=A_i^+ D_1({\varvec{A}}_i, {\varvec{B}}_i, {\varvec{S}}_{i})\left\{ Y_i- \eta _1({\varvec{A}}_i, {\varvec{B}}_i, {\varvec{S}}_{i}; \varvec{\theta })\right\} , \end{aligned}$$
(5.3)

where \(D_1({\varvec{A}}_i, {\varvec{B}}_i, {\varvec{S}}_{i})=(1, {\varvec{A}}_i', {\varvec{B}}_i', {\varvec{S}}_{i}')'\), and

$$\begin{aligned} \eta _1({\varvec{A}}_i, {\varvec{B}}_{i}, {\varvec{S}}_i; \varvec{\theta })=\theta _1+\varvec{\theta }_2' {\varvec{A}}_{i} + \varvec{\theta }_3' {\varvec{B}}_{i}+\varvec{\theta }_4' {\varvec{S}}_i. \end{aligned}$$

We have \(\hat{\varvec{\theta }}\) from solving \(\sum _{i=1}^n U_{i1}(\varvec{\theta })=0\) or \(\sum _{i=1}^n U_{i2}(\varvec{\theta })=0\) which is the causal effect of the continuous part of exposure.

5.2 Stage II: The Effect of Exposure Status

The exposure effect of \({\varvec{A}}\) and \({\varvec{B}}\) is obtained from stage I and is used as an offset in stage II. Specifically we use \({\tilde{\eta }}_0({\varvec{A}},{\varvec{B}}; \varvec{\theta })=\hat{\varvec{\theta }}_2' {\varvec{A}}_{i} + \hat{\varvec{\theta }}_3' {\varvec{B}}_{i}\) as an offset taking the value zero for those with \(A^+=0\). Stage II is directed at assessing the effect of the binary exposure \(A^+\); many propensity score-based methods can be used to assess the effect of exposure at the reference level. We now discuss 4 possible approaches: simple regression adjustment, propensity score regression adjustment, inverse probability weighting (IPW), and augmented inverse probability weighting (AIPW).

5.2.1 A. Simple regression adjustment

For the simple regression adjustment, we have the estimating function as

$$\begin{aligned} U_{i3}(\varvec{\alpha }^\circ , \varvec{\theta })=D_2(A_i^+, {\varvec{Z}}_i)\left\{ Y_i- \eta _2(A_i^+, {\varvec{Z}}_i; \varvec{\alpha }^\circ , \varvec{\theta })\right\} , \end{aligned}$$
(5.4)

where \(\varvec{\alpha }^\circ =(\alpha _0, \alpha _1, \varvec{\alpha }_5')'\), \(D_2(A_i^+, {\varvec{Z}}_i)=(1, A_i^+, {\varvec{Z}}_i')'\), and

$$\begin{aligned} \eta _2(A_i^+, {\varvec{Z}}_i; \varvec{\alpha }^\circ , \varvec{\theta })=\alpha _0+\alpha _1 A_i^+ + \varvec{\alpha }_4' {\varvec{Z}}_i+\text {offset}(\tilde{\eta }^Z_0({\varvec{A}}, {\varvec{B}}; \varvec{\theta }) ). \end{aligned}$$

The superscript Z in \(\tilde{\eta }^Z_0\) indicates that \(\varvec{\theta }\) is estimated from (5.2) by adjusting for \({\varvec{Z}}\). If \(\varPsi =(\varvec{\alpha }^{\circ \prime }, \varvec{\theta }')'\), we solve the joint estimating equations,

$$\begin{aligned} \sum _{i=1}^n {U}_i(\varPsi )=\sum _{i=1}^n\left( \begin{array}{c} U_{i1}(\varvec{\theta })\\ U_{i3}(\varvec{\alpha }^\circ , \varvec{\theta }) \end{array}\right) = {\textbf{0}} \end{aligned}$$

to get \({\hat{\varPsi }}\).

5.2.2 B. Propensity score regression adjustment

If \(S_0={\mathbb {E}}(A^+ \mid {\varvec{Z}})\), replacing \({\varvec{Z}}\) with \(S_0\) in (5.4), we obtain the estimating function

$$\begin{aligned} U_{i4}(\varvec{\alpha }^\circ , \varvec{\theta })&=D_2(A_i^+, S_{i0})\left\{ Y_i- \eta _2(A_i^+, S_{i0}; \varvec{\alpha }^\circ , \varvec{\theta })\right\} \;,\end{aligned}$$
(5.5)
$$\begin{aligned} U_{i5}(\varvec{\psi })&=D_3({\varvec{Z}}_i)\left\{ A_i^+-{\mathbb {E}}(A_i^+ \mid {\varvec{Z}}_i; \varvec{\psi }) \right\} \;, \end{aligned}$$
(5.6)

where \(D_2(A_i^+, S_{i0})=(1, A_i^+, S_{i0})'\), \(D_3({\varvec{Z}}_i)=(1, {\varvec{Z}}_i)'\), \({\mathbb {E}}(A_i^+ \mid {\varvec{Z}}_i; \varvec{\psi })=\text {expit}(\varvec{\psi }'{\varvec{Z}}_i)\), and

$$\begin{aligned} \eta _2(A_i^+, S_{i0}; \varvec{\alpha }^\circ , \varvec{\theta })=\alpha _0+\alpha _1 A_i^+ + {\alpha }_4 S_{i0}+\text {offset}(\tilde{\eta }^S_0({\varvec{A}}, {\varvec{B}}; \varvec{\theta })). \end{aligned}$$

The superscript S in \(\tilde{\eta }^S_0\) indicates that \(\varvec{\theta }\) is estimated from (5.3) by adjusting with the generalized propensity score. Let \(\varPsi =(\varvec{\alpha }^{\circ \prime }, \varvec{\theta }', \varvec{\psi }')'\), we can consider solving the joint estimating equations to obtain \({\hat{\varPsi }}\),

$$\begin{aligned} \sum _{i=1}^n {U}_i(\varPsi )=\sum _{i=1}^n\left( \begin{array}{c} U_{i2}(\varvec{\theta })\\ U_{i4}(\varvec{\alpha }^\circ , \varvec{\theta })\\ U_{i5}(\varvec{\psi }) \end{array}\right) = {\textbf{0}} . \end{aligned}$$

5.2.3 C. Inverse probability weighting (IPW)

We first consider a marginal structural model with the offset,

$$\begin{aligned} \eta _3(A^+; \varvec{\phi }, \varvec{\theta })={\mathbb {E}}(Y \mid A^+; \varvec{\phi }, \varvec{\theta })=\phi _0+\phi _1 A^+ + \text {offset}(\tilde{\eta }^S_0({\varvec{A}}, {\varvec{B}}; \varvec{\theta }) ) . \end{aligned}$$

Then we define the estimating function

$$\begin{aligned} U_{i6}(\varvec{\phi },\varvec{\theta },\varvec{\psi })&=\frac{I(A_i^+=a_i^+)}{P(A_i^+=a_i^+ \mid {\varvec{Z}}; \varvec{\psi })} D_4(A_i^+)\left\{ Y_i- \eta _3(A_i^+; \varvec{\phi }, \varvec{\theta })\right\} \;, \end{aligned}$$

where \(D_4(A_i^+)=(1, A_i^+)'\). Let \(\varPsi =(\varvec{\phi }', \varvec{\theta }', \varvec{\psi }')'\), so we can consider solving the joint estimating equations,

$$\begin{aligned} \sum _{i=1}^n {U}_i(\varPsi )=\sum _{i=1}^n\left( \begin{array}{c} U_{i2}(\varvec{\theta })\\ U_{i5}(\varvec{\psi })\\ U_{i6}(\varvec{\phi },\varvec{\theta },\varvec{\psi }) \end{array}\right) = {\textbf{0}}. \end{aligned}$$

5.2.4 D. Augmented inverse probability weighting (AIPW)

The AIPW approach combines an outcome regression (OR) model and a propensity score (PS) model. The AIPW estimators are shown to be doubly robust, with estimators consistent if either the outcome regression model or the propensity score model is correctly specified [28]. To obtain this estimator, we begin by specifying the estimating functions for the outcome regression model through the G-computation approach [29, 30].

We consider

$$\begin{aligned} U_{i7}(\varvec{\theta }, \varvec{\lambda }_1)&= A_i^+ D_3({\varvec{Z}}_i)\left\{ Y_i - \varvec{\lambda }_1' {\varvec{Z}}-\textrm{offset}(\tilde{\eta }_0^S({\varvec{A}}, {\varvec{B}}; \varvec{\theta }))\right\} \;,\\ U_{i8}(\varvec{\theta }, \varvec{\lambda }_2)&= (1-A_i^+) D_3({\varvec{Z}}_i)\left\{ Y_i - \varvec{\lambda }_2' {\varvec{Z}}-\textrm{offset}(\tilde{\eta }_0^S({\varvec{A}}, {\varvec{B}}; \varvec{\theta }))\right\} \;. \end{aligned}$$

Then the AIPW estimator \({\hat{\mu }}_1\) and \({\hat{\mu }}_0\) can be obtained by solving the estimating equation \(\sum _{i=1}^n U_{i9}(\varvec{\theta },\varvec{\psi }, \mu _1)=0\) and \(\sum _{i=1}^n U_{i,10}(\varvec{\theta },\varvec{\psi }, \mu _2)=0\), where

$$\begin{aligned} \small U_{i9}(\varvec{\theta },\varvec{\psi }, \mu _1)= \frac{Y_i A_i^+ -\left\{ A_i^+-P(A_i^+=1 \mid {\varvec{Z}}; \varvec{\psi })\right\} \left\{ \varvec{\lambda }_1' {\varvec{Z}} + \textrm{offset}(\tilde{\eta }_0^S({\varvec{A}}, {\varvec{B}}; \varvec{\theta }))\right\} }{P(A_i^+=1 \mid {\varvec{Z}}; \varvec{\psi })} -\mu _1, \end{aligned}$$

and

$$\begin{aligned} \small U_{i,10}(\varvec{\theta },\varvec{\psi }, \mu _2)= \frac{Y_i (1-A_i^+) -\left\{ P(A_i^+=1 \mid {\varvec{Z}}; \varvec{\psi })-A_i^+\right\} \left\{ \varvec{\lambda }_2' {\varvec{Z}} + \textrm{offset}(\tilde{\eta }_0^S({\varvec{A}}, {\varvec{B}}; \varvec{\theta }))\right\} }{1-P(A_i^+=1 \mid {\varvec{Z}}; \varvec{\psi })} -\mu _2.\\ \end{aligned}$$

Let \(\varPsi =(\varvec{\theta }', \varvec{\psi }', \varvec{\lambda }', \varvec{\mu }')'\), we can solve the joint estimating equations \(\sum _{i=1}^n {U}_i(\varPsi )={\varvec{0}}\), where

$$\begin{aligned} {U}_i(\varPsi )=(U'_{i2}(\varvec{\theta }), U'_{i5}(\varvec{\psi }), U'_{i7}(\varvec{\theta }, \varvec{\lambda }_1), U'_{i8}(\varvec{\theta }, \varvec{\lambda }_2), U'_{i9}(\varvec{\theta },\varvec{\psi }, \mu _1), U'_{i,10}(\varvec{\theta },\varvec{\psi }, \mu _2))'. \end{aligned}$$

From standard estimating function theory, we have

$$\begin{aligned} \text {asvar}(\sqrt{n}({\hat{\varPsi }}-\varPsi ))={\mathcal {A}}^{-1}(\varPsi ){\mathcal {B}}(\varPsi )[{\mathcal {A}}^{-1}(\varPsi )]', \end{aligned}$$

where \({\mathcal {A}}(\varPsi )=E\{-\partial {U}_i(\varPsi )/\partial \varPsi '\}\) and \({\mathcal {B}}(\varPsi )=E\{{U}_i(\varPsi ){U}_i'(\varPsi )\}\), and the expectation is taken with respect to the true distribution. In practice, we use sample averages to estimate \({\mathcal {A}}(\varPsi )\) and \({\mathcal {B}}(\varPsi )\), and evaluate the resulting expressions at the estimates to obtain \({\mathcal {A}}^{-1}({\hat{\varPsi }}){\mathcal {B}}({\hat{\varPsi }})[{\mathcal {A}}^{-1}({\hat{\varPsi }})]' \;,\) where

$$\begin{aligned} {\mathcal {A}}({\hat{\varPsi }})=\left. -\frac{1}{n}\sum _{i=1}^n \partial U_i(\varPsi )/{\partial \varPsi '}\right| _{\varPsi ={\hat{\varPsi }}}, \end{aligned}$$

and

$$\begin{aligned} {\mathcal {B}}({\hat{\varPsi }})=\left. \frac{1}{n} \sum _{i=1}^n U_i(\varPsi )U_i'(\varPsi )\right| _{\varPsi ={\hat{\varPsi }}}. \end{aligned}$$

5.3 Simulation Studies on the Two-Stage Approach

Here, we evaluate the performance of the two-stage approach for the analysis of a bivariate semi-continuous exposure under the data-generating process of Sect. 4.3.2. We consider the stage II estimator based on various combinations of methods in stages I and II. For stage I, confounder regression (5.2) and propensity score adjustment (5.3) are used. In stage II, we employ methods A) to D) of Sect. 5.2. We see that when all models are correctly specified, the empirical bias is negligible across all methods as shown in Table 1; use of IPW in stage II yields the estimates with the highest standard error, while the AIPW estimator improves the efficiency as expected.

Table 1 Simulation results with sample size \(n=500\), repetitions \(N=1000\): relative bias and standard error of stage II estimator of the binary exposure effect when models are correctly specified

Table 2 displays the relative bias and empirical standard error of the stage II estimators when models are misspecified in stage II. Here, the confounders are transformed as \(V_1 = \exp (Z_1/2)+Z_2\;, V_2 = Z_2/(1+\exp (Z_1))+10\;, V_3 = (Z_1Z_2/10 +0.6)^2\) and \({\varvec{V}}=(V_1, V_2, V_3)'\) is used instead of \({\varvec{Z}}=(Z_1, Z_2, Z_3)'\) in stage II for model misspecification. If either the outcome regression (OR) model or the propensity score (PS) model is misspecified in Stage II, biased estimates are obtained in methods reliant on the respective model. The AIPW estimator exhibits the double robustness property. When both models are misspecified in stage II, the AIPW approach yields an estimator with a large finite sample bias and high standard error.

Table 2 Simulation results with sample size \(n=500\), repetitions \(N=1000\): relative bias and standard error of stage II estimator of the binary exposure effect with model misspecification in stage II

6 Prenatal Alcohol Exposure and Cognition

Here, we consider a large prospective cohort study comprised of 480 inner-city pregnant African–American mothers recruited between September 1986 and April 1989 [12]. Quantitative measures of alcohol consumption during pregnancy were obtained at each maternal prenatal visit using a timeline follow-back interview [31] and converted to three common metrics: the number of standard drinks per drinking occasion, frequency of drinking (days/month), and average oz absolute alcohol (AA/day). Additional variables collected included socioeconomic status (SES), maternal age, maternal years of education, marital status, and family income; maternal smoking (cigarettes/day) and marijuana, cocaine, and other illicit drug use (days/month), maternal history of alcohol abuse or dependence; and child’s sex, race, and whether s/he was raised by his/her biological mother and/or was in foster care; see Table D1 in Online Appendix D. The mothers and their children were followed for 19-21 years postpartum with a retention rate of 79.0% (\(N=377\)). A variety of cognitive tests were administered to each child to assess various facets of cognitive function. The results of the various cognitive tests were synthesized by fitting a second-order confirmatory factor analysis giving a composite measure of cognitive function [32] which we adopted as the response, Y, in the analyses that follow.

Among the 377 mothers in this analysis, 16.2% reported no alcohol consumption during pregnancy resulting in the alcohol exposure variables being semi-continuous. For mothers who drank alcohol during pregnancy, two main exposures (oz prenatal absolute alcohol exposure per drinking occasion, and the proportion of drinking days during pregnancy) were log-transformed and centered. Here, we evaluate the causal effects in four ways as discussed in the previous sections: subgroup analysis involving drinkers, analyses based on the bivariate semi-continuous exposure using conventional regression adjustment and the two-part model, and full sample analysis with the two-stage approach. We aim to evaluate the causal effect of alcohol drinking, two main exposures, and their interaction on child cognitive function while controlling for selected measurements listed in Table 3.

Table 3 Estimates for the effects of each confounder

6.1 Subgroup Analysis Involving Drinkers

We first restrict attention to drinkers and focus on the causal effects of alcohol exposure on children’s cognition. Let \(A_1\) represent the number of standard drinks per drinking occasion, and \(A_2\) represent the proportion of days in each month of drinking (days/month); we refer to these as the intensity and frequency of drinking, respectively. If \(A^+ = {\mathbb {I}}(A_{i1}A_{i2} > 0)\) indicates any consumption of alcohol, to conduct the analyses among drinkers, we consider the regression model

$$\begin{aligned} {\mathbb {E}}\left( Y_i \mid A_i^+=1, {\varvec{L}}_i, {\varvec{Z}}_i \right) =\mu +\alpha _1 L_{i1}+\alpha _2 L_{i2}+\alpha _3 L_{i1}L_{i2}+{\varvec{Z}}_i'\varvec{\zeta }, \end{aligned}$$
(6.1)

where \(L_{ik}=A_i^+ \{\log A_{ik}- {\mathbb {E}}(\log A_{ik} \mid A_i^+=1)\}\). When using propensity score regression adjustment, we consider

$$\begin{aligned} {\mathbb {E}}\left( Y_i \mid A_i^+=1, {\varvec{L}}_i, {\varvec{S}}_i \right)&=\mu +\alpha _1 L_{i1}+\alpha _2 L_{i2}+\alpha _3 L_{i1}L_{i2} + {\varvec{S}}_i'\varvec{\zeta }\;, \end{aligned}$$
(6.2)

where \({\varvec{S}}_i=(S_{i1}, S_{i2}, S_{i3})'\), with \(S_{ik}={\mathbb {E}}(L_{ik} \mid A_i^+=1, {\varvec{Z}}_i)\), \(k=1, 2\), \(S_{i3}={\mathbb {E}}(L_{i1}L_{i2} \mid A_i^+=1, {\varvec{Z}}_i)\). The estimated effects of each confounder are given in columns 3-5 of Table 3 for each of the respective models.

6.2 Analyses Based on the Bivariate Semi-continuous Exposure Model

The conventional regression model adjusted by all the confounders is fitted as

$$\begin{aligned} {\mathbb {E}}\left( Y_i \mid A_i^+, {\varvec{L}}_i, {\varvec{Z}}_i \right) =\nu +\lambda _0 A_{i}^+ +\lambda _1 L_{i1}+\lambda _2 L_{i2}+\lambda _3 L_{i1}L_{i2}+{\varvec{Z}}_i'\varvec{\eta }, \end{aligned}$$

with \({\varvec{Z}}_i\) the set of confounders. Using a two-part model for the generalized propensity score as in [24], the regression model adjusted by generalized propensity score is

$$\begin{aligned} {\mathbb {E}}\left( Y_i \mid A_i^+, {\varvec{L}}_i, {\varvec{S}}_i \right) =\nu +\lambda _0 A_{i}^+ +\lambda _1 L_{i1}+\lambda _2 L_{i2}+\lambda _3 L_{i1}L_{i2} + {\varvec{S}}_i' \varvec{\eta }, \end{aligned}$$

where as in (4.5), \({\varvec{S}}_i=(\pi _{i1}({\varvec{Z}}_i), {\mathbb {E}}(L_{i1} \mid {\varvec{Z}}_i), {\mathbb {E}}(L_{i2} \mid {\varvec{Z}}_i), {\mathbb {E}}(L_{i1}L_{i2} \mid {\varvec{Z}}_i))'\) with

$$\begin{aligned}&\pi _{i1}({\varvec{Z}}_i)=\text {expit}(\bar{{\varvec{Z}}}_i'\varvec{\gamma }_0)\;, \\&{\mathbb {E}}(L_{ik} \mid {\varvec{Z}}_i)=\pi _{i1}({\varvec{Z}}_i){\mathbb {E}}(L_{ik} \mid A_i^+=1, {{\varvec{Z}}}_{i}) \;, k=1, 2\;,\\&{\mathbb {E}}(L_{i1}L_{i2} \mid {\varvec{Z}}_i)=\pi _{i1}({\varvec{Z}}_i){\mathbb {E}}(L_{i1}L_{i2} \mid A_i^+=1, {\varvec{Z}}_i)\;. \end{aligned}$$

The results are shown in Table 3.

The inverse density weighting approach to estimating causal effects involves solving (4.7) with stabilized weights. We assume the exposures follow a bivariate normal distribution when we estimate the conditional joint density of the two exposures.

6.3 Full Sample Analysis with the Two-Stage Approach

We also address the effects of confounding variables through the two-stage approach as discussed in Sect. 5. We first define the offset from subgroup analysis on drinkers using estimates obtained from (6.1) or (6.2) which is the stage I. Let \({\tilde{\eta }}_i={\hat{\alpha }}_1 L_{i1}+{\hat{\alpha }}_2 L_{i2}+{\hat{\alpha }}_3 L_{i1}L_{i2}\) be the offset in stage II and use regression adjustment with covariates, regression adjustment with propensity score, IPW, and AIPW to differentiate the actual exposure effect as a drinker.

6.4 Results

We assess the covariate balance using Pearson correlation [23, 33]; see Table D2 in Online Appendix D. The Pearson correlations were computed between each exposure and each covariate in Table 3 before generalized propensity score adjustment or weighting adjustment. To evaluate the performance of each approach, we analyzed the covariate balance after conditioning on the generalized propensity score or applying inverse probability/density weighting. All the methods described above substantially reduce the maximum absolute correlation and average absolute correlation compared to the conventional regression approach shown in the third row of Table 3, indicating improvement of covariate balance.

In the full sample analysis with the two-part model, we found that the regression approach with generalized propensity score adjustment reduced covariate imbalance more effectively than the inverse density weighting approach. In the full sample analysis with the two-stage approach, the propensity score-based methods in stage II (e.g., IPW) achieve better covariate balance than the regression approach. Overall, our results demonstrate the effectiveness of each analysis with different methods in improving covariate balance, with the optimal approach depending on the specific analysis context.

Table 4 Estimated effects of being a drinker versus non-drinker, the logarithm of prenatal absolute alcohol exposure per occasion, the logarithm of proportion of drinking days across pregnancy, and their interaction

Table 4 presents the causal effects estimated from both subgroup analysis and full sample analysis with the various methods. All estimated effects are negative, including the effects of drinking at the reference level versus not drinking, prenatal absolute alcohol exposure per occasion, the proportion of drinking days, and their interaction, confirming that alcohol consumption by expectant mothers has a negative effect on children’s cognitive development. For subgroup analysis among drinkers, the results from the regression with covariate adjustment and the regression model adjusted by generalized propensity scores are quite similar, indicating the cognitive function score of the child decreases as the mother increases the amount of alcohol consumed at each drinking occasion, or increases her drinking frequency. The interaction term is negative so the effect of increasing the intensity of drinking on each occasion is greater when the frequency of alcohol use during pregnancy is greater, and vice versa. For full sample analysis, the conventional regression adjustment and the regression model adjusted by the two-part generalized propensity scores also generate similar estimates except for the drinker indicator. The inverse density weighting approach results in a larger effect size of alcohol exposure per occasion and smaller effect sizes of drinking frequency and their interaction compared with estimates obtained from the regression methods. In full analysis with the two-stage approach, the effect of drinking obtained from the propensity score-based methods in stage II, that is regression with a propensity score adjustment, IPW, and AIPW method, is larger than the estimate obtained from regression approaches. The IPW approach in stage II has the largest standard error among all the methods, whereas the AIPW approach generates a similar, but more precise estimate of the causal effect.

Fig. 5
figure 5

Dose–response surface and contour plot generated from stage I in two-stage analysis: the effect of changes in drinking frequency (proportion days drinking) and intensity (dose/occasion) on child cognitive deficit; reference levels: 1.81 oz AA/occasion, 0.13 of the pregnancy drinking

We built a dose–response surface on the exposure estimates obtained from stage I using a regression model with generalized propensity score adjustment in the two-stage approach. The resulting surface is shown in the left panel of Fig. 5. We define the reference level of a child’s cognition when the mother has an average level of intensity and density of alcohol use, that is having 1.81 oz absolute alcohol per occasion and spending approximately 4 days per month (0.13 months) consuming alcohol. The x-axis and y-axis show the increment levels of drinking frequency and intensity, respectively. This graphical depiction shows the expected reduction in child’s cognitive function resulting from increased drinking frequency and intensity. The right panel of Fig. 5 displays a contour plot that shows the change in cognition associated with different combinations of changes in intensity and frequency of alcohol intake. For instance, spending approximately 9 more days per month (0.3 months) drinking based on the reference level while maintaining the same drinking intensity, or spending 3 more days per month while also increasing the dose of drinks by 2 oz on each drinking occasion based on the reference level would both result in a 4-unit decrease in the child’s cognitive function score.

7 Discussion

In this article, we describe methods for assessing the effect of two or more exposure variables on a continuous response. Confounding is addressed through the use of a generalized propensity score. In this setting, the generalized propensity score is based on the joint density of exposures and in particular is the vector of linear functions of confounders necessary to construct this joint density. For inverse density weighting, we stabilize the weights via the marginal joint density of the exposures. A key finding is that when confounders affect the joint distribution of the exposure variables, they must be accounted for in the propensity score modeling to ensure consistent estimation and valid inference regarding the interactions between the exposures. Regression adjustment appears to be less sensitive to misspecification of the dependence structure and omission of the second-order effects of confounders, and moreover, it leads to more precise estimation of causal effects. We did not consider matching or stratification on the generalized propensity score but these represent alternative approaches for dealing with confounders [34].

Note that much of the literature on causal analysis includes the conceptualization of potential outcomes [35, 36] for each unit, here denoted by \(\{Y({\varvec{a}})\) for \({\varvec{a}} \in {\mathcal {A}}\}\) where \({\mathcal {A}}\) may be a multidimensional space for possible exposures. We do not make use of potential outcomes in this paper, but our methods could be reformulated and formalized within that framework.