1 Introduction

Structural Equation Models (SEMs) represent a sophisticated statistical tool particularly suited for datasets featuring latent variables, which are not directly observable but are inferred from observed variables. Comprising two integral components, SEMs include a measurement equation and a structural equation. The measurement equation explores the connections between unobserved latent variables and observable manifest variables, while the structural equation delves into the interplay among endogenous latent variables, exogenous latent variables, and covariates. Typically, the primary focus of research lies within the structural equation. SEMs find wide application across disciplines such as Psychology, Biology, and others, where latent variables are prevalent. For further illustration, refer to Martens (2005), Lee and Zhu (2000), and Liu et al. (2008).

Traditionally, SEMs assume linear relationships among latent variables in the structural equation. Kenny and Judd (1984) introduced a nonlinear SEM (NSEM) that extended this methodology to include relationships such as interaction and quadratic terms. Lee (2007) generalized NSEM to include a broader set of nonlinear relationships. However, misspecification of the parametric form at the latent level, whether the model is linear or nonlinear, can result in very poor estimation. Recently, some semiparametric approaches have been developed. For example, Bauer (2005), Fahrmeir and Raach (2007), and Guo et al. (2012) used basis expansions to approximate the nonlinear structural relationships using semiparametric SEM (SSEM). To achieve simultaneous estimation and model selection (Guo et al. 2012) applied the Bayesian Lasso method to the SSEM. The Bayesian Lasso performs well in SSEM, however, it ignores correlation of the features which leads to inefficient parameter estimation and model selection.

This is highly concerning when cubic splines are used, because they tend to be highly correlated since each column is a transformed version of the same variables (Keele 2008). This paper accesses this correlation by putting fused Lasso and elastic net prior on the cubic spline coefficient parameters. The fused lasso is shown to be a good method for multiple linear regression when the features have a natural order, specifically when there is side by side correlation (Tibshirani et al. 2005). On the other hand (Zou and Hastie 2005) proved that elastic net can often outperform a regular Lasso in both real world data set and simulation studies with similar sparse representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together.

The rest of this paper is organized as follows. In Sect. 2, we introduce our Bayesian SSEM framework with its associated basis representation with Fused Lasso prior and the Elastic Net prior. In this section we propose the Bayesian Fused Lasso and Bayesian Elastic Net based methods to achieve simultaneous estimation and model selection. In Sect. 4 we describe our MCMC algorithm to fit our models. To illustrate our proposed methods we introduce two simulation studies in Sect. 5. Subsequently in Sect. 6 we apply our Fused Lasso and Elastic Net based Bayesian SEMs to analyze Monitoring the Future: A Continuing Study of American Youth (12th-Grade Survey) data. Finally in Sect. 7 we discuss some related issues and possible extensions for future work.

2 Bayesian Semiparametric Structural Equation Models

2.1 Semiparametric Structural Equation Models

Semiparametric structural equation models consist of two parts, a measurement equation and a structural equation part. For a random sample of n independent subjects, the measurement equation defines the relationship between the observed \(p\times 1\) vector of manifest variables \({\varvec{y}}_i\) and the unobserved \(q\times 1\) vector of latent variables \(\varvec{w}_i\) as follows:

$$\begin{aligned} \varvec{y}_i=\varvec{Ac}_i+ \varvec{\Lambda } \varvec{w}_i+\varvec{\epsilon }_i, \quad i=1,2,\ldots ,n, \end{aligned}$$
(1)

where \({\varvec{c}}_i\) is an \(r\times 1\) vector of known functions of the \(s\times 1\) vector of fixed covariates \({\varvec{x}}_i\), \({\varvec{A}}\) and \(\varvec{\Lambda }\) are unknown parameter matrices, \(\varvec{\epsilon }_i\) is a \(p\times 1\) vector of measurement errors.

The latent variable \({\varvec{w}}_i\) is written in two parts, a \(q_1\times 1\) vector of endogenous latent variables \(\varvec{\eta }_i\) and a \(q_2 \times 1\) vector of exogenous latent variables \({\varvec{\xi }}_i\), i.e. \({\varvec{w}}_i=(\varvec{\eta }_i^T, \varvec{\xi }_i^T)^T\). We have the following general model which defines the relationship between the exogenous and endogenous latent variables,

$$\begin{aligned} \varvec{\eta }_i=\varvec{\Pi }\varvec{\eta }_i+{\varvec{F}}({\varvec{x}}_i,\varvec{\xi }_i)+ \varvec{\zeta }_i, \quad i=1,2,\ldots ,n, \end{aligned}$$
(2)

where \(\varvec{\zeta }_i\) is a vector of residuals and \({\varvec{F}}({\varvec{x}}_i,\varvec{\xi }_i)\) is a vector of unknown functions of the covariates \({\varvec{x}}_i\) and exogenous latent variables \(\varvec{\xi }_i\).

For the model introduced in Eqs. 1 and 2 model, we require the following assumptions:

  • \(\varvec{\epsilon }_i\) are independently distributed as \(N({\varvec{0}},\varvec{\Psi }_{\epsilon })\) with \(\varvec{\Psi }_{\epsilon }=diag(\psi _{\epsilon 1},\psi _{\epsilon 2},\ldots ,\psi _{\epsilon p})\).

  • \({\varvec{w}}_i\) and \(\varvec{\epsilon }_i\) are independent, and \({\varvec{w}}_i\) are independently distributed.

  • \(\varvec{\zeta }_i\) follows \(N({\varvec{0}},\varvec{\Psi }_{\zeta })\) with \(\varvec{\Psi }_{\zeta }=diag(\psi _{\zeta 1},\psi _{\zeta 2},\ldots ,\psi _{\zeta q_1})\).

  • \(\varvec{\xi }_i\) and \(\varvec{\zeta }_i\) are independently distributed, and \(\varvec{\xi }_i\) follows \(N({\varvec{0}},\varvec{\Phi })\)

  • \(\Pi _0=I-\Pi \) is nonsingular and \(|\Pi _0|\) is independent of the elements of \(\Pi \).

Theoretically, \({\varvec{F}}({\varvec{x}}_i,\varvec{\xi }_i)\) can be any linear or nonlinear function of of \({\varvec{x}}_i\) and \(\varvec{\xi }_i\) with or without interaction terms like \(\xi _{i1}\xi _{i2}\). In this paper, we consider a nonparametric structural equation similar to Guo et al. (2012) and we approximate the nonparametric function \({\varvec{F}}({\varvec{x}}_i,\varvec{\xi }_i)\) using basis expansions. Using basis functions the structural equation 2, in general case, can be represented as

$$\begin{aligned} \varvec{\eta }_i=\varvec{\Pi }\varvec{\eta }_i+{\varvec{B}} {\varvec{H}}({\varvec{x}}_i,\varvec{\xi }_i)+\varvec{\zeta }_i, \end{aligned}$$
(3)

where \({\varvec{H}}({\varvec{x}}_i,\varvec{\xi }_i)\) is an \(N_H\times 1\) vector of basis functions, and \({\varvec{B}}_{q_i\times N_H}\) is the coefficient parameter matrix associated with \({\varvec{H}}({\varvec{x}}_i,\varvec{\xi }_i)\).

To illustrate the structural equation with basis functions, consider a simple example with \(\Pi =0\), one covariate, one endogenous and two exogenous latent variables. Any function \({\varvec{F}}({\varvec{x}}_i,\varvec{\xi }_i)\) can be decomposed into two parts, functions with only one variable as \(f_1\), \(f_2\) and \(f_3\), which could be constant, and functions with interactions as \(f_{12}\), \(f_{13}\) and \(f_{23}\), which must be functions of both two parameters, i.e.,

$$\begin{aligned} \eta _i&=F(x_i,\xi _{i1},\xi _{i2})+\zeta _i\\&=f_1(x_i)+f_2(\xi _{i1})+f_3(\xi _{i2})+f_{12}(x_i,\xi _{i1})+f_{13}(x_i,\xi _{i2}) \\&\quad +f_{23}(\xi _{i1},\xi _{i2})+f_{123}(x_i,\xi _{i1},\xi _{i2})+\zeta _i, \end{aligned}$$

The above formulation indicates that for modeling \(f_1\), \(f_2\) and \(f_3\), a linear basis expansion can be used, such as piece-wise polynomials, natural cubic splines, etc. In such cases,

$$\begin{aligned} f_j(.)=\sum _{m_j=1}^{M_j}\beta _{jm_j}h_{jm_j}(.), \quad j=1,2,3 \end{aligned}$$

where \(\{h_{jm_j}(.), m_j=1,\ldots ,M_j\}\) are basis functions. For modeling \(f_{12}\), \(f_{13}\) and \(f_{23}\), tensor product basis expansion can be used as follows:

$$\begin{aligned} f_{kl}(.,.)=\sum _{m_k=1}^{M_k}\sum _{m_l=1}^{M_l}\beta _{m_km_l}^{(kl)}h_{km_k}(.)h_{lm_l}(.), \quad k,l=1,2,3. \end{aligned}$$

2.2 Bayesian Fused Lasso Semiparametric SEM (BFLSEM)

The unknown parameters in the measurement equation 1 are \(\varvec{\Lambda }_y=({\varvec{A}},\varvec{\Lambda })\) and \(\varvec{\Psi }_{\varvec{\epsilon }}\). On the other hand in structural equation 3, the unknown parameters are \(\varvec{\Lambda }_{w}=(\varvec{\Pi }, {\varvec{B}})\), \(\varvec{\Psi }_{\varvec{\zeta }}\) and \(\varvec{\Phi }\). Some elements of \(\varvec{\Lambda }_y\) must be fixed for identifiability purposes.

For the measurement equation 1, an index matrix \({\varvec{M}}=(m_{kj})_{p\times (r+q)}\) is created as follows (Lee and Zhu 2000),

$$\begin{aligned} m_{kj}=\left\{ \begin{array}{ll} 1 &{}\quad \text {if}\quad \lambda _{ykj} \text { is unknown}\\ 0 &{} \quad \text {otherwise} \end{array} \right. \end{aligned}$$

where \(\lambda _{ykj}\) is the kj-th element of \(\varvec{\Lambda }_y\). If there is an unknown parameter in k-th row of \(\varvec{\Lambda }_y\) for \(k=1,\ldots ,p\), this means that \(r_{yk}=\sum _{j=1}^{r+q}m_{kj}>0\). We denote \(\varvec{\Lambda }_{yk}^*\) as the \(r_{yk}\times 1\) vector of unknown parameters and specified a conjugate prior for \(\{\varvec{\Lambda }_{yk}^*,\psi _{\epsilon k}\}\),

$$\begin{aligned}{} & {} \varvec{\Lambda }_{yk}^*|\psi _{\epsilon k} \thicksim N_{r_{yk}}(\mu _{0yk}^*, \psi _{\epsilon k}{\varvec{H}}_{0yk}^*) \end{aligned}$$
(4)
$$\begin{aligned}{} & {} \psi _{\epsilon k}^{-1} \thicksim Gamma(\alpha _{0\epsilon k},\beta _{0\epsilon k}) \end{aligned}$$
(5)

where \(\mu _{0yk}^*\), \({\varvec{H}}_{0yk}^*\), \(\alpha _{0\epsilon k}\) and \(\beta _{0\epsilon k}\) are hyperparameters.

For the structural equation 3, let \(\Lambda _{wh}\)be the h-th row of \(\varvec{\Lambda }_{w}\) where \(h=1,\ldots ,q_1\). As mentioned earlier, we assigned Bayesian fused Lasso priors for each \(\Lambda _{wh}\) and assigned the inverse-Wishart prior for \(\varvec{\Phi }\).

$$\begin{aligned}{} & {} \varvec{\Lambda }_{wh}|\psi _{\zeta h},\varvec{\tau }_{\Lambda _{wh}},\varvec{\upsilon }_{\Lambda _{wh}} \thicksim N(0, \psi _{\zeta h}{\varvec{D}}_{\Lambda _{wh}}),\\{} & {} \psi _{\zeta h}^{-1} \thicksim Gamma(\alpha _{0\zeta h}, \beta _{0\zeta h}),\\{} & {} \pi (\varvec{\tau }_{\Lambda _{wh}}^2) \propto \prod _{j=1}^{q_1}\frac{\lambda _{\Pi _h}^2}{2}e^{-\lambda _{\Pi _h}^2\tau _{\Pi _hj}^2/2} \prod _{j=1}^{N_X}\frac{\lambda _{B_{1h}}^2}{2}e^{-\lambda _{B_{1h}}^2\tau _{B_{1j}j}^2/2} \prod _{j=1}^{N_T}\frac{\lambda _{B_{2h}}^2}{2}e^{-\lambda _{B_{2h}}^2\tau _{B_{2h}j}^2/2},\\{} & {} \pi (\varvec{\upsilon }_{\Lambda _{wh}}^2) \propto \prod _{j=1}^{N_T}\frac{\mu _{B_{2h}}^2}{2}e^{-\mu _{B_{2h}}^2\upsilon _{B_{2h}j}^2/2},\\{} & {} \varvec{\Phi } \thicksim IW(\varvec{R}_0, \rho _0), \end{aligned}$$

where \(N_h\) is the number of non-constant spline basis functions, and \(N_h=N_x+N_T\), where \(N_x\) is the number of basis functions related x’s, and \(N_T\) is the number of basis functions related to exogenous latent variables. \({\varvec{B}}_h=({\varvec{B}}_{1h}^{T},{\varvec{B}}_{2h}^{T})^T\), where \({\varvec{B}}_{1h}\) are the coefficients corresponding to the x’s and \({\varvec{B}}_{2h}\) are the coefficients corresponding to the exogenous latent variables. \(\varvec{\tau }_{\Lambda _{wh}}\) and \(\varvec{\upsilon }_{\Lambda _{wh}}\) are mutually independent, and the covariance matrix \({\varvec{D}}_{\Lambda _{wh}}^{-1}\) is a diagonal tridiagonal mixed matrix. \({\varvec{D}}_{\Lambda _{wh}}^{-1}=diag({\varvec{D}}^{11}_{q_1\times q_1},{\varvec{D}}^{22}_{N_X\times N_X},{\varvec{D}}^{33}_{N_T\times N_T} )\), where \({\varvec{D}}^{11}_{q_1\times q_1}\) is a diagonal matrix with

$$\begin{aligned} \text {main diagonal} =\Big \{\frac{1}{\tau _{\Pi _hj}^2}, j=1,\ldots ,q_1\Big \} \end{aligned}$$

\({\varvec{D}}^{22}_{N_X\times N_X} \) is also a diagonal matrix with

$$\begin{aligned} \text {main diagonal} =\Big \{\frac{1}{\tau _{B_{1h}j}^2}, j=1,\ldots ,N_X\Big \} \end{aligned}$$

\({\varvec{D}}^{33}_{N_T\times N_T} \) is a tridiagonal matrix with

$$\begin{aligned}{} & {} \text {main diagonal} =\Big \{\frac{1}{\tau _{B_{2h}j}^2}+\frac{1}{\upsilon _{B_{2h}j-1}^2}+\frac{1}{\upsilon _{B_{2h}j}^2}, j=1,\ldots ,N_T\Big \}\\{} & {} \text {off diagonals} = \Big \{-\frac{1}{\upsilon _{B_{2h}j}^2}, j=1,\ldots ,N_T-1\Big \} \end{aligned}$$

All the \(\lambda \)’s are tuning parameters with gamma priors.

The extended Bayesian Fused Lasso prior has additional parameters, however, with the priors specified as above, it is straightforward to derive the full conditional distribution (Kyung et al. 2010). As a result we can use MCMC methods to generate samples from the joint posterior distribution of parameters.

The model can be easily extended to the case where X’s has side by side correlation. We only need to change \({\varvec{D}}^{22}_{N_X\times N_X} \) to tridiagonal matrix with

$$\begin{aligned}{} & {} \text {main diagonal} =\Big \{\frac{1}{\tau _{B_{1h}j}^2}+\frac{1}{\upsilon _{B_{1h}j-1}^2}+\frac{1}{\upsilon _{B_{1h}j}^2}, j=1,\ldots ,N_X\Big \}\\{} & {} \text {off diagonals} = \Big \{-\frac{1}{\upsilon _{B_{1h}j}^2}, j=1,\ldots ,N_X-1\Big \}\\{} & {} \pi (\varvec{\upsilon }_{\Lambda _{wh}}) \propto \prod _{j=1}^{N_X}\frac{\mu _{B_{1h}}^2}{2}e^{-\mu _{B_{1h}}^2\upsilon _{B_{1h}j}^2/2} \prod _{j=1}^{N_T}\frac{\mu _{B_{2h}}^2}{2}e^{-\mu _{B_{2h}}^2\upsilon _{B_{2h}j}^2/2}, \end{aligned}$$

It is easy to derive the full conditional distribution and use MCMC methods to generate samples from the joint posterior distribution of parameters for our Bayesian Fused Lasso Semiparametric SEM (BFLSEM).

2.3 Bayesian Elastic Net Semiparametric SEM (BENSEM)

The measurement equation model with prior is exactly the same as in Sect. 2.2, however, for the structural equation part we assign priors based on Elastic Net as follows,

$$\begin{aligned}{} & {} \varvec{\Lambda }_{wh}|\psi _{\zeta h},\varvec{\tau }_{\Lambda _{wh}}, \varvec{\upsilon }_{\Lambda _{wh}} \thicksim N(0, \psi _{\zeta h}{\varvec{D}}_{\Lambda _{wh}}),\\{} & {} \psi _{\zeta h}^{-1} \thicksim Gamma(\alpha _{0\zeta h}, \beta _{0\zeta h}),\\{} & {} \pi (\varvec{\tau }_{\Lambda _{wh}}) \propto \prod _{j=1}^{q_1} \frac{\lambda _{\Pi _h}^2}{2}e^{-\lambda _{\Pi _h}^2\tau _{\Pi _hj}^2/2} \prod _{k=1}^{N_G} \prod _{j=1}^{N_k} \frac{\lambda _{1B_hk}^2}{2}e^{-\lambda _{1B_hk}^2\tau _{B_hkj}^2/2}\\{} & {} \varvec{\Phi } \thicksim IW(\varvec{R}_0, \rho _0), \end{aligned}$$

where \(\varvec{X}\) is reordered. Strongly correlated covariates are grouped together, so we have \(N_G\) blocks of \(\varvec{X}\)’s, including one block for independent \(\varvec{X}\)’s if any exists. And \(k=1,\ldots ,N_G\). For block, k, \(N_k\) is the total number of members in the block. \({\varvec{D}}_{\Lambda _{wh}}\) is a diagonal matrix with diagonal elements. If \(\varvec{X}\)’s in the corresponding block k are correlated, the diagonal elements are \((\tau _{B_hkj}^{-2}+\lambda _{2B_hk})^{-1}\); if X’s in the corresponding block k are independent, the diagonal elements are \(\tau _{2B_hkj}^2\), in other words \(\lambda _{2B_hk}=0\). And similar to the Bayesian fused lasso, all the \(\lambda \)’s have gamma priors. It is still straightforward to derive the full conditional distribution (Li and Lin 2010), and use MCMC methods to generate samples from the joint posterior distribution of parameters from our Bayesian Elastic Net Semiparametric SEM (BENSEM).

3 Posterior Distributions in Our Bayesian Semiparametric SEM

3.1 Posterior Distribution in the Measurement Equation

Using the conjugate prior for \(\varvec{\Lambda }_{yk}^*\) and \(\psi _{\epsilon k}\) from 4 and 5, we can easily get the posterior distributions as:

$$\begin{aligned}{} & {} \varvec{\Lambda }_{yk}^*|rest \sim N_{r_{yk}}({\varvec{H}}_{yk}({\varvec{H}}_{0yk}^{*-1}\varvec{\mu }_{0yk}^*+\varvec{G}_{yk}{\varvec{y}}_k^*), \psi _{\epsilon k}({\varvec{H}}_{0yk}^{*-1}+\varvec{G}_{yk}\varvec{G}_{yk}^T)^{-1}) \end{aligned}$$
(6)
$$\begin{aligned}{} & {} \psi _{\epsilon k}^{-1}|rest \sim Gamma(\alpha _{0\epsilon k}+n/2, \beta _{0\epsilon k}\nonumber \\{} & {} \quad +\dfrac{1}{2}({\varvec{y}}_k^{*T}{\varvec{y}}_k^*+\varvec{\mu }_{0yk}^{*T} {\varvec{H}}_{0yk}^{*-1} \varvec{\mu }_{0yk} - \varvec{\mu }_{yk}^T {\varvec{H}}_{yk}^{-1} bm\mu _{yk})) \end{aligned}$$
(7)

where \(\varvec{G}_y=(\varvec{C}^T, \varvec{\Omega }^T)^T\), \(\varvec{C}=\{{\varvec{c}}_1, \ldots , {\varvec{c}}_n\}\) and \(\varvec{\Omega } = \{\varvec{\omega }_1,\ldots ,\varvec{\omega }_n\}\).

3.2 Posterior Distribution in the Bayesian Structure Equation of Fused Lasso (BFLSEM)

Let \(\varvec{G}_{\omega }=(\varvec{g}_{\omega 1},\ldots ,\varvec{g}_{\omega n})\), where \(\varvec{g}_{\omega i}=(\varvec{\eta }_i^T, {\varvec{H}}(x_i,\varvec{\xi }_i)^T)^T\). Full conditionals in the structure equation for the h-th row of \(\varvec{\Lambda }_{\omega }\) is:

$$\begin{aligned}{} & {} \varvec{\Lambda }_{wh}|\varvec{\Omega }, \psi _{\zeta h},\varvec{\tau }_{\Lambda _{wh}},\varvec{\upsilon }_{\Lambda _{wh}} \thicksim N_{q_1+N_H}((\varvec{G}_{\omega }^{T} \varvec{G}_{\omega } + {\varvec{D}}_{\Lambda _{wh}}^{-1})^{-1} \varvec{G}_{\omega }^{T} (\varvec{\eta }_h-\beta _{0h}\varvec{1}_n), \nonumber \\{} & {} \quad \psi _{\zeta h}(\varvec{G}^{T} \varvec{G}_{\omega } + {\varvec{D}}_{\Lambda _{wh}}^{-1})^{-1}), \end{aligned}$$
(8)

where \(\varvec{\Lambda }_{\omega h}=(\varvec{\Pi }_h^T,{\varvec{B}}_h^{T} )^T\). \(N_h\) is the number of non-constant spline basis functions, and \(N_h=N_x+N_T\), where \(N_x\) is the number of basis functions related x’s, and \(N_T\) is the number of basis functions related to exogenous latent variables.

Let \({\varvec{B}}_h=({\varvec{B}}_{1h}^{T},{\varvec{B}}_{2h}^{T})^T\), where \({\varvec{B}}_{1h}\) are the coefficients corresponding to the x’s and \({\varvec{B}}_{2h}\) are the coefficients corresponding to the exogenous latent variables. Note that \(\varvec{\tau }_{\Lambda _{\omega h}}=(\tau _{\Pi _{h1}^2},\ldots ,\tau _{\Pi _{h q_1}^2},\tau _{\Pi _{B_h 1}^2},\ldots ,\tau _{\Pi _{B_h N_H}^2})^T\), and the full conditional distribution for \(\varvec{\tau }_{\Lambda _{\omega h}}\) are:

$$\begin{aligned}{} & {} 1/\tau _{\Pi _hj}^2|\varvec{\Pi }_h,\psi _{\zeta h} \sim IN\left( \sqrt{\dfrac{\lambda _{\Pi _h}^2 \psi _{\zeta h}}{\Pi _{hj}^2}}, \lambda _{\Pi _h}^2\right) \\{} & {} 1/\tau _{B_{1hj}}^2|{\varvec{B}}_{1h},\psi _{\zeta h} \sim IN\left( \sqrt{\dfrac{\lambda _{B_{1h}}^2 \psi _{\zeta h}}{(B_{1hj})^2}}, \lambda _{B_{1h}}^2\right) \\{} & {} 1/\tau _{B_{2h}j}^2|{\varvec{B}}_{2h},\psi _{\zeta h} \sim IN\left( \sqrt{\dfrac{\lambda _{B_{2h}}^2 \psi _{\zeta h}}{(B_{2h}j)^2}}, \lambda _{B_{2h}}^2\right) \\{} & {} 1/\upsilon _{B_{2h}j}^2|{\varvec{B}}_{2h},\psi _{\zeta h} \sim IN\left( \sqrt{\dfrac{\lambda _{4}^2 \psi _{\zeta h}}{(B_{2h(j+1)}-B_{2h(j)})^2}}, \lambda _{4}^2\right) \end{aligned}$$

for \(j=1,\ldots ,NT-1\).

The full conditional of \(\psi _{\zeta h}\) is:

$$\begin{aligned} \psi _{\zeta h}|\varvec{\Lambda }_{wh},\varvec{G}_{\omega } \sim IG\left( \alpha _{0\zeta h}+\dfrac{n+q_1+N_H+1}{2},\beta _{1\zeta h}\right) \end{aligned}$$

where \(\beta _{1\zeta h}=\beta _{0\zeta h}+\dfrac{1}{2} [(\varvec{\eta }_h -\beta _{0\,h}\varvec{1}_n - \varvec{G}_{\omega }^T\Lambda _{\omega h})^T(\varvec{\eta }_h -\beta _{0\,h}\varvec{1}_n- \varvec{G}_{\omega }^T\Lambda _{\omega h})+\varvec{\Lambda }_{\omega h}^T {\varvec{D}}_{\omega h}^{-1}\varvec{\Lambda }_{\omega h}]\)

Let the prior of \(\lambda \)’s to be Gamma distribution and the full conditional distributions of them is:

$$\begin{aligned}{} & {} \lambda _{\Pi _h}^2 |\varvec{\tau }_{\Pi _h} \sim Gamma\left( q_1+r_{0\omega }, \sum _{j=1}^{q_1}\tau _{\Pi _hj}^2/2+\delta _{0\Pi }\right) \\{} & {} \lambda _{B_{1h}}^2 |\varvec{\tau }_{B_{1h}} \sim Gamma\left( N_X+r_{0B_1}, \sum _{j=1}^{N_X}\tau _{B_{1h}j}^2/2+\delta _{0B_1}\right) \\{} & {} \lambda _{B_{2h}}^2 |\varvec{\tau }_{B_{2h}} \sim Gamma\left( N_T+r_{0B_2}, \sum _{j=1}^{N_T}\tau _{B_{2h}j}^2/2+\delta _{0B_2}\right) \\{} & {} \lambda _{4}^2 |\varvec{\upsilon }_{B_{2h}} \sim Gamma\left( N_T+r_{0B_{22}}-1, \sum _{j=1}^{N_T-1}\upsilon _{B_{2h}j}^2/2+\delta _{0B_{22}}\right) \end{aligned}$$

3.3 Posterior Distribution in the Bayesian Structure Equation of Elastic Net (BENSEM)

Full conditionals in the structure equation for the h-th row of \(\varvec{\Lambda }_{\omega }\) is:

$$\begin{aligned}{} & {} \varvec{\Lambda }_{wh}|\varvec{\Omega }, \psi _{\zeta h},\varvec{\tau }_{\Lambda _{wh}}, \thicksim N_{q_1+N_H}((\varvec{G}_{\omega }^{T} \varvec{G}_{\omega } + {\varvec{D}}_{\Lambda _{wh}}^{-1})^{-1} \varvec{G}_{\omega }^{T} (\varvec{\eta }_h-\beta _{0h}\varvec{1}_n),\\{} & {} \quad \psi _{\zeta h}(\varvec{G}^{T} \varvec{G}_{\omega } + {\varvec{D}}_{\Lambda _{wh}}^{-1})^{-1}),\\{} & {} 1/\tau _{B_hkj}|\varvec{\Lambda }_{\omega h}, \psi _{\zeta h} \sim IG\left( \sqrt{\dfrac{\lambda _{1\Lambda _hk}^2 \psi _{\zeta h}}{\Lambda _{\omega hkj}^2}},\lambda _{1\Lambda _hk}^2\right) \end{aligned}$$

for \(j=1,\ldots ,N_k\), where \(\lambda _{1\Lambda _hk}=\;\lambda _{1\Pi _hk}\), when \(\varvec{\Lambda }_{whk}\) are the coefficients of the endogenous latent variables; and \(\lambda _{1\Lambda _hk}=\lambda _{1B_hk}\), when \(\varvec{\Lambda }_{whk}\) are the coefficients of the exogenous latent variables.

The full conditional of \(\psi _{\zeta h}\) is:

$$\begin{aligned} \psi _{\zeta h}|\varvec{\Lambda }_{wh},\varvec{G}_{\omega } \sim IG\left( \alpha _{0\zeta h}+\dfrac{n+q_1+N_H+1}{2},\beta _{1\zeta h}\right) \end{aligned}$$

where \(\beta _{1\zeta h}=\beta _{0\zeta h}+\dfrac{1}{2} [(\varvec{\eta }_h -\beta _{0\,h}\varvec{1}_n - \varvec{G}_{\omega }^T\Lambda _{\omega h})^T(\varvec{\eta }_h -\beta _{0\,h}\varvec{1}_n- \varvec{G}_{\omega }^T\Lambda _{\omega h})+\varvec{\Lambda }_{\omega h}^T {\varvec{D}}_{\omega h}^{-1}\varvec{\Lambda }_{\omega h}]\)

Let the prior of \(\lambda \)’s to be Gamma distribution and the full conditional distributions of them is:

$$\begin{aligned}{} & {} \lambda _{\Pi _h}^2 |\varvec{\tau }_{\Pi _h} \sim Gamma\left( q_1+r_{0\Pi }, \sum _{j=1}^{q_1}\tau _{\Pi _hj}^2/2+\delta _{0\Pi }\right) \\{} & {} \lambda _{1B_{hk}}^2 |\varvec{\tau }_{\Lambda _{hk}} \sim Gamma\left( N_k+r_{1B_{hk}}, \sum _{j=1}^{N_k}\tau _{1B_{h}kj}^2/2+\delta _{1B_{hk}}\right) \\{} & {} \lambda _{2B_{hk}}^2 |{\varvec{B}} \sim Gamma\left( N_k+r_{2B_{hk}}, \dfrac{1}{2\psi _{\zeta h}}\sum _{j=1}^{N_k}\Lambda _{\omega hkj}^2+\delta _{2B_{hk}}\right) \end{aligned}$$

where \(\varvec{\Lambda }_{\omega h k}\) represent the \(\Lambda \)’s belong to the group k.

4 MCMC Algorithm to Fit Our Bayesian Semiparametric SEM

The parameters from the measurement equation are denoted as \(\theta _1^T=\{\Lambda _y, \Psi _{\epsilon }\}\), while the parameters from the structure equation are denoted as \(\theta _2^T=\{\Lambda _{\omega },\Psi _{\xi },\Phi \}\). Let the parameter of interest be \(\theta =(\theta _1^T,\theta _2^T)^T\).

Here are the variables we use in MCMC Algorithm:

  • \(\varvec{Y}=\{{\varvec{y}}_1, \ldots , {\varvec{y}}_n\}\), and \({\varvec{y}}_i\) is \(p \times 1\) vector of manifest variables.

  • \(\varvec{X}=\{{\varvec{x}}_1, \ldots , {\varvec{x}}_n\}\), and \({\varvec{x}}_i\) is \(s \times 1\) vector of fixed covariates.

  • \(\varvec{C}=\{{\varvec{c}}_1, \ldots , {\varvec{c}}_n\}\), and \({\varvec{c}}_i\) is \(r \times 1\) vector of known function of \({\varvec{x}}_i\).

  • \(\varvec{\Omega } = \{\varvec{\omega }_1,\ldots ,\varvec{\omega }_n\}\), and \(\varvec{\omega }_i\) is \(q \times 1\) vector of latent variables.

where \(i = 1, \ldots , n\)

\(\varvec{\Omega }\) are unobservable latent variables, we can generate it from the full conditional distribution \(p(\varvec{\Omega }|\varvec{Y}, \varvec{X}, \varvec{C}, \varvec{\theta })\). Because the latent variables are independent among the subjects, we can write the full conditional distribution as \(p(\varvec{\Omega }|\varvec{Y}, \varvec{X}, \varvec{C}, \varvec{\theta })=\prod _{i=1}^n p(\varvec{\omega }_i|{\varvec{y}}_i, {\varvec{x}}_i, {\varvec{c}}_i, \varvec{\theta })\). Let \(g_{yi}=({\varvec{c}}_i^T, \varvec{\omega }_i^T)^T\). The full conditional distribution of \(\varvec{\omega }_i\) is:

$$\begin{aligned} p(\varvec{\omega }_i|{\varvec{y}}_i, {\varvec{x}}_i, {\varvec{c}}_i, \varvec{\theta })\propto & {} p({\varvec{y}}_i|{\varvec{c}}_i, \varvec{\omega }_i,\theta _1) p(\varvec{\eta }_i|{\varvec{x}}_i, \varvec{\xi }_i, \theta _2) p(\varvec{\xi }_i|\theta _2)\nonumber \\\propto & {} \exp \{-\dfrac{1}{2} ({\varvec{y}}_i-\Lambda _y g_{yi})^T \varvec{\Psi }_{\epsilon }^{-1}({\varvec{y}}_i-\Lambda _y g_{yi}) -\dfrac{1}{2}\varvec{\xi }_i^T\varvec{\Phi }_i^{-1}\varvec{\Phi }_i \nonumber \\{} & {} -\dfrac{1}{2}(\varvec{\eta }_i-\beta _0-\Lambda _{\omega }g_{\omega i})^T\varvec{\Psi }_{\zeta }^{-1}(\varvec{\eta }_i-\beta _0-\Lambda _{\omega }g_{\omega i}) \} \end{aligned}$$
(9)

\(\varvec{\omega }_i\) can be sampled using Metropolis Hastings (MH) algorithm with a proposal distribution \(q(\varvec{\omega }_i^*|\sigma _{\omega }^2) \sim N(\varvec{\omega }_i^{(j)},\sigma _{\omega }^2 \Sigma _{\omega })\), where \(\varvec{\omega }_i^*\) is the proposed new value and \(\omega _i^{(j)}\) is the value from previous step (jth step). From Guo et al. (2012),

$$\begin{aligned} \Sigma _{\omega }^{-1}=\Lambda ^T \varvec{\Psi }^{-1} \Lambda + \begin{pmatrix} \Pi _0^T\varvec{\Psi }_{\zeta }^{-1}\Pi _0 &{} -\Pi _0^T\varvec{\Psi }_{\zeta }^{-1}{\varvec{B}} \Delta _H \\ -\Delta _H^T {\varvec{B}}^T \varvec{\Psi }_{\zeta }^{-1} \Pi _0 &{} \varvec{\Phi }^{-1}+\Delta _H^T{\varvec{B}}^T\varvec{\Psi }_{\zeta }^{-1}{\varvec{B}} \Delta _H \end{pmatrix} \end{aligned}$$
(10)

where \(\Delta _H=\partial {\varvec{H}}({\varvec{x}}_i, \varvec{\xi }_i)/\partial \varvec{\xi }_i^T|_{\varvec{\xi }_i={\varvec{0}}}\). The proposed \(\varvec{\omega }_i^*\) can be accepted with the probability \(\min \{1, \dfrac{p(\varvec{\omega }_i^*|{\varvec{y}}_i, {\varvec{x}}_i, {\varvec{c}}_i, \varvec{\theta })}{p(\varvec{\omega }_i^{(j)}|{\varvec{y}}_i, {\varvec{x}}_i, {\varvec{c}}_i, \varvec{\theta })}\}\). \(\varvec{\Omega }\) can be sampled using Gibbs sampler.

For \(\theta _1\), sample \(\varvec{\Lambda }_{yk}^*|rest\) and \(\psi _{\epsilon k}|rest\) from 6 and 7 respectively.

For \(\theta _2\), the posterior distribution of the parameters are different between Bayesian fused Lasso and Bayesian Elastic Net. We can sample the unknown parameters from the posterior distribution we get on Sects. 3.2 and 3.3.

5 Simulation Study

To illustrate the use of our Fused Lasso and Elastic Net prior based SEMs we have considered the case where the covariates have correlations. Under this framework it is of interest to compare among our BFLSEM (based on Fused Lasso prior), BENSEM (based on Bayesian Elastic Net prior) with Guo et al. (2012) (based on Bayesian standard Lasso prior or BSLSEM).

5.1 Simulation 1

We follow the simulation setup on Guo et al. (2012), setting \(n=500\), \(p=9\), \(q_{1}=1\), \(q_{2}=2\) and \(\varvec{A} = \text{ diag }(0^{*},0^{*},0^{*},\mu _{4},\ldots ,\mu _{9})\), \(\varvec{c_{i}}=(1,\ldots ,1)^{T}\),

$$\begin{aligned} \varvec{\Lambda }^{T}&= \left[ \begin{array}{ccccccccc} 1.0^{*} &{} \lambda _{21} &{} \lambda _{31} &{} 0^{*} &{} 0^{*} &{} 0^{*} &{} 0^{*} &{} 0^{*} &{} 0^{*} \\ 0^{*} &{} 0^{*} &{} 0^{*} &{} 1.0^{*} &{} \lambda _{52} &{} \lambda _{62} &{} 0^{*} &{} 0^{*} &{} 0^{*} \\ 0^{*} &{} 0^{*} &{} 0^{*} &{} 0^{*} &{} 0^{*} &{} 0^{*} &{} 1.0^{*} &{} \lambda _{83} &{} \lambda _{93} \end{array} \right] , \end{aligned}$$

where \(\mu _{4} = \cdots = \mu _{9} = \lambda _{21} = \cdots = \lambda _{93} = \zeta =.36\) and \(\{\phi _{11}, \phi _{12}, \phi _{22}\} = \{1,.25, 1\}\). The function, \(f(\xi _{i1},\xi _{i2})= f_{1}(\xi _{i1}) + f_{2}(\xi _{i2}) + f_{12}(\xi _{i1},\xi _{i2})\), where \(f_{1}(\xi _{i1})=\sin (\xi {i1}) - \xi _{i1}\), \(f_{1}(\xi _{i1}) = \exp (\xi _{i2})/2.5 - 3.0\) and \(f_{12}(\xi _{i1},\xi _{i2}) = 0\), has been used to define the underlying relationship between the endogenous and exogenous latent variables. Also, this function is considered unknown and will be approximated using natural cubic splines, i.e.,

$$\begin{aligned} f_{j}(\xi _{ij})&\approx \beta _{j2}\xi _{ij}\sum _{m=1}^{K-2}\beta _{j,m+2}\left( d_{m}(\xi _{ij})- d_{K-1}(\xi _{ij})\right) \\ f_{12}(\xi _{i1},\xi _{i2})&\approx \beta _{12}^{(12)}\xi _{i1}\xi _{i2} + \sum _{m_{1}=1}^{K-2}\xi _{i2}\left( d_{m_{1}}(\xi _{i1})- d_{K-1}(\xi _{i1})\right) \\ {}&\quad + \sum _{m_{2}=1}^{K-2}\xi _{i1}\left( d_{m_{2}}(\xi _{i2})- d_{K-1}(\xi _{i2})\right) \\&\quad +\sum _{m_{1}=1}^{K-2}\sum _{m_{2}=1}^{K-2}\left( d_{m_{1}}(\xi _{i1})- d_{K-1}(\xi _{i1})\right) \left( d_{m_{2}}(\xi _{i2})- d_{K-1}(\xi _{i2})\right) , \end{aligned}$$

with \(d_{k}(\xi _{ij}) = \left[ \left( \xi _{ij} - \kappa _{k}\right) _{+} - \left( \xi _{ij} - \kappa _{k}\right) _{+}\right] /\left( \kappa _{K} - \kappa _{k}\right) \) where K is the number of knots and \((\kappa _{k}, k = 1,\ldots ,K)\) are the location of the knots. The knot locations are selected using a truncated power series basis developed in Hastie et al. (2009). In general cubic splines will be correlated, thus the use of the fused Lasso is appropriate.

We consider \(s=35\) with true parameter values

$$\begin{aligned} b_{l} = {\left\{ \begin{array}{ll} 0.5 &{}\quad \text{ if } l \in \{1,2,3\} \\ -0.7 &{}\quad \text{ if } l \in \{4,5\} \\ 0.85 &{}\quad \text{ if } l \in \{6,\ldots , 15\} \\ 0.7&{}\quad \text{ if } l =32 \\ 0.5 &{}\quad \text{ if } l =33 \\ 0 &{}\quad \text{ otherwise } \end{array}\right. }. \end{aligned}$$

To induce correlation of the covariates \(x_{1},\ldots ,x_{31}, x_{34}, x_{35}\) are simulated from a multivariate standard normal distribution where \(\text{ corr }(x_{i},x_{j})=.5^{|i-j|}\), \(i\ne j \in (6,\ldots ,15)\), \(\text{ corr }(x_{i},x_{j})=.7\), \(i-j=1, i \in (1,2,3)\), \(\text{ corr }(x_{i},x_{j})=.9\), \(i\ne j \in (4,5)\) and all other correlations equal to 0. The covariate of \(x_{32} \sim 2\text{ Binomial }(1,.5)\) and \(x_{33} \sim N(-\,0.5, 1)\).

Table 1 summarizes the parameter estimates from the 50 simulations using the BFLSEM (based on fused Lasso prior),BENSEM (based on elastic net prior) and BSLSEM (based on standard Lasso prior). The \(b_{i}\) parameters which relate the covariates to the endogenous latent variable are slightly closer to the true value when BFLSEM is used, however for most of the parameters it is only a slight improvement. The covariates with \(\text{ corr }(x_{i},x_{j})=.7\), \(i\ne j \in (1,2,3)\) have the most marked improvement when BFLSEM is used instead of the BSLSEM or BENSEM. All models are efficient at shrinking the insignificant parameters to 0. As several parameter true value are set to be zero we cannot calculate the relative bias, however in Table 1 we include the \(Relative Change = \frac{(\hat{\beta } - \beta )}{(|\hat{\beta }| + |\beta |)/2}\).

There is a fairly significant difference in the spline estimates between the BSLSEM and our proposed two models (BFLSEM and BENSEM). For the spline parameters that are not equal to zero it is not possible to determine which of the models is better in terms of estimation. However, in many of these cases the standard deviations of BSLSEM are significantly higher; while BFLSEM and BENSEM are similar to each other. For the spline parameters that are equal to zero both BFLSEM and BENSEM shrink the estimates nearer to zero than BSLSEM and many have significantly lower standard deviations. Moreover, BENSEM is slightly better than BFLSEM.

To measure the models efficiency at predicting the endogenous latent variable using the covariates and exogenous latent variables, we consider three measures of RMSE.

  • RMSE\((\hat{f}) = \sqrt{\sum _{i=1}^{n}\left( \hat{f}(\xi _{i1},\xi _{i2}) - f(\xi _{i1},\xi _{i2})\right) ^{2}/n}\) is a measure of the models ability to approximate the nonlinear relationship between the endogenous and exogenous latent variables,

  • RMSE\((\hat{B}) = \sqrt{\sum _{i=1}^{n}\left( \varvec{X}\varvec{\hat{B}} - \varvec{X}\varvec{B}\right) ^{2}/n}\) is a measure of the models ability to relate the covariates to the endogenous latent variables and

  • RMSE \(= \sqrt{\sum _{i=1}^{n}\left( \left( \varvec{X}\varvec{\hat{B}} + \hat{f}(\xi _{i1},\xi _{i2})\right) - \left( \varvec{X}\varvec{B} + f(\xi _{i1},\xi _{i2})\right) \right) ^{2}/n}\) is a measure of the models overall ability to predict the endogenous latent variable.

The most significant improvement in the BFLSEM and BENSEM appears to be in the RMSE\((\hat{f})\) which suggests that it is much better at defining the relationship between the endogenous and exogenous latent variables. And RMSE\((\hat{f})\) of BENSEM is slightly lower than BFLSEM’s. A possible reason there was little impact from on the covariate parameters is that it is very difficult to simulate complex correlation structures. If more covariance structures are examined we believe the difference could be significant.

Table 1 Simulation 1 result for fused Lasso, elastic net and standard Lasso

5.2 Simulation 2

In order to compare the difference in defining the relationship between the endogenous and exogenous latent variables among the three competing model, we randomly choose one of the simulation study and let the coefficient of the covariate to be zeros and plot the surface of \(f(\xi _{i1},\xi _{i2})\). Figure 1 shows the true relationship between exogenous latent variables and endogenous latent variable based on function \(\eta =F(x,\varvec{\xi })\); Fig. 2 shows the relationship between them based on the simulation data, and some of the surface does not have data. Figures 34, and 5 show the estimated surface via original Lasso (BSLSEM), Fused Lasso (BFLSEM), and Elastic Net (BENSEM). In Fig. 3, BSLSEM perform badly when \(\eta _1\) and \(\eta _2\) both greater than 0. From Fig. 2, there are no data when both \(\eta _1\) and \(\eta _2\) are greater than 2.5. BFLSEM and BENSEM perform similarly. In this simulation, BFLSEM performed marginally better, when both \(\eta _1\) and \(\eta _2\) are less than 0.

Fig. 1
figure 1

Simulation 2: true surface for \(\eta =F(x,\varvec{\xi })\)

Fig. 2
figure 2

Simulation 2: true surface for simulated data

Fig. 3
figure 3

Simulation 2: estimated surface via BSLSEM

Fig. 4
figure 4

Simulation 2: estimated surface via BFLSEM

Fig. 5
figure 5

Simulation 2: estimated surface via BENSEM

6 Application in Monitoring the Future: A Continuing Study of American Youth

We apply our BFLSEM and BENSEML to analyze Monitoring the Future: A Continuing Study of American Youth (12th-Grade Survey). There are three exogenous latent variables of interests, cigarette morbidity, marijuana morbidity and behavior risk index; one endogenous latent variable, alcohol morbidity. We want to analyze how cigarette morbidity, marijuana morbidity and behavior risk index affect alcohol morbidity. We used the subset from the Monitoring the Future data: 1878 students who had drinking experience. More details about the data and all descriptions can be obtained from https://monitoringthefuture.org/.

The endogenous latent variable, alcohol morbidity, is measured by the following items:

  • The occasions that students had alcoholic beverages to drink, more than just a few sips in their lifetime.

  • The occasions that students had alcoholic beverages to drink, more than just a few sips last year.

  • The occasions that students had alcoholic beverages to drink, more than just a few sips last month.

  • The number of times that the students had five or more drinks in a row in the last two weeks.

The first exogenous latent variable, cigarette morbidity, is measure by the following items:

  • The occasions that students smoked cigarettes in their lifetime.

  • The occasions have students smoked cigarettes during the past 30 days.

The second exogenous latent variable, marijuana morbidity, is measure by the following items:

  • The occasions that students smoked marijuana in their lifetime.

  • The occasions that students smoked marijuana last year.

  • The occasions that students smoked marijuana last month.

The third exogenous latent variable, behavior risk index, is measure by the following items:

  • During the last four weeks, the number of whole days of school students have missed because they skipped.

  • During the last four weeks, the number of whole days of school students have missed because other reasons.

  • During a typical week, the number of evenings students go out for fun and recreation.

  • On the average, how often students go out with a date.

  • During an average week, how much students usually drive.

As a result, there are totally 14 manifest variables. The \(\varvec{\Lambda }\) in the measurement equation is given by:

$$\begin{aligned} \varvec{\Lambda }^T=\left( \begin{array}{cccccccccccccc} 1 &{} \lambda _{21} &{} \lambda _{31} &{} \lambda _{41} &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 1 &{} \lambda _{62} &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 1 &{} \lambda _{83} &{} \lambda _{93} &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 1 &{} \lambda _{11,4} &{} \lambda _{12,4} &{} \lambda _{13,4} &{} \lambda _{14,4} \end{array}\right) \end{aligned}$$
(11)

Let \({\varvec{A}}=\text {diag}(0,\ldots ,0,\mu _5,\ldots , \mu _{14})\) and \({\varvec{c}}_i=(1,\ldots , 1)^T\). In addition, we have five covariates, which are gender, geographic area, living with siblings, father education level and mother education level. Let \({\varvec{x}}_i=(x_{1i},\ldots ,x_{5i})\) To study the interaction between the exogenous latent variables and endogenous latent variable, we proposed following structure equation model:

$$\begin{aligned} \eta _i= & {} {\varvec{x}}_i {\varvec{b}}^T + f_1(\xi _{1i}) + f_2(\xi _{2i}) + f_3(\xi _{3i}) \nonumber \\{} & {} + f_{12}(\xi _{1i}, \xi _{2i}) + f_{13}(\xi _{1i}, \xi _{3i}) + f_{23}(\xi _{2i},\xi _{3i}) \end{aligned}$$
(12)

where \({\varvec{b}} = (b_1, \ldots , b_5)\). Similar to simulation study, natural cubic splines are used in function \(f(\cdot )\) with 5 knots. MCMC chains of 100,000 iterations are generated and the burnin is 30,000. We use both BFLSEM and BENSEM in this case, and compare the result with the BSLSEM. Table 2 shows the estimates from measurement equation. The estimates are very similar among all three methods.

Table 2 Non-spline parameter estimation using posterior means and posterior standard deviations

The structure equation results for BFLSEM, BENSEM, and BSLSEM are presented in Tables 3, 4, and 5 respectively. We notice that some of the \(\beta \)’s in BENSEM and BSLSEM did not converge completely. Comparing parameter estimates from Tables 3, 4, and 5 we observe that BFLSEM performed best in this application, with all the \(\beta \)’s properly converged. The result from BFLSEM Table 3 shows that there is an interaction between marijuana morbidity and behavior risk index. The main effect of cigarette morbidity is also significant. The graphs of the two-way interaction of these three exogenous latent variables shows their relation with endogenous latent variable. Figure 6 shows there a weak interaction between cigarette morbidity and marijuana morbidity, but both main effects are highly significant. When cigarette morbidity or/and marijuana morbidity increase, alcohol morbidity increases. Figure 7 shows similar pattern with cigarette morbidity and behavior risk index. Figure 8 shows the interaction between marijuana morbidity and behavior risk index. When behavior risk index is in the higher level, as marijuana morbidity increases, alcohol morbidity increases faster.

Table 3 Spline parameter estimation using BFLSEM
Table 4 Spline parameter estimation using BENSEM
Table 5 Spline parameter estimation using BSLSEM
Fig. 6
figure 6

Estimated surface for cigarette morbidity and marijuana morbidity

Fig. 7
figure 7

Estimated surface for cigarette morbidity and marijuana morbidity

Fig. 8
figure 8

Estimated surface for cigarette morbidity and marijuana morbidity

7 Discussion

In this paper we adapted Bayesian fused Lasso prior and Bayesian elastic net prior for using in semiparametric structural equation models. Basis expansions are used to approximate the nonparametric relationships between the endogenous latent variables and the exogenous latent variables and covariates. When cubic splines are used as the basis expansion, it is beneficial to use the fused Lasso or the elastic net based priors (BFLSEM and BENSEM) to estimate the parameters since cubic splines are correlated in general. In the simulation study, both our BFLSEM and BENSEM reduce the standard deviations of the spline parameters and shrink the estimates of the spline parameters closer to zero when the true values of those parameters are equal to zero. More importantly, RMSE\((\hat{f})\) of BFLSEM and BENSEM is about half of RMSE\((\hat{f})\) of the BSLSEM (which is based on the standard Lasso prior, Guo et al. (2012)).

There are clear benefits to use the fused Lasso prior to estimate the coefficients of the covariates, however, it is difficult to generate realistic correlation structures. The usefulness of our methods will depend greatly on the type of the underlying correlation structures. In our simulation study, the fused Lasso prior based SEM (BFLSEM) has a remarkable improvement over the standard Lasso prior based SEM (BSLSEM, Guo et al. (2012)) for the tridiagonal structure with correlation equal to 0.70. However, it is difficult to simulate tridiagonal structures since we often get negative eigenvalues. We believe that if a natural order are present in a real data set our fused Lasso prior based SEM (BFLSEM) would lead to much better results.

In the application all of three methods BFLSEM, BENSEM and BSLSEM have similar estimates for the measurement equations. However in terms of structural equation parameters estimates our BFLSEM based on Bayesian fused Lasso comes out to be the winner and indicates strong interaction between behavior risk index and marijuana morbidity.

However, in real-world data if the Gaussian assumptions on the random components \(\varvec{\epsilon }_{i}\) and \(\varvec{\zeta }_{i}\) are not met, the model’s performance can be compromised, leading to biased parameter estimates, incorrect inference, and poor predictive accuracy. However violations of the Gaussian assumptions can be fixed by adopting a contaminated Gaussian error structure on \(\varvec{\epsilon }_{i}\) and \(\varvec{\zeta }_{i}\). Another approach can be using standard transformations like logarithmic, square root, or Box-Cox transformations on the manifest variables and the endogeneous latent variables.

In all our two proposed models we include two way interaction of the exogenous latent variables. It is straightforward to extend our model to three way interaction, when the problem has at least three exogenous latent variables. However, that will increase the number of coefficients need to be estimated significantly, depending on the number of knots. In our study, the options of the psychology survey are mostly ordinal data. In some cases, the options might be dichotomous and that would violate the continuous assumption of the manifest variable. Further research is needed to extend the manifest variable to binary and nominal response. Also it is worthwhile to extend it to other basis expansion methods.