1 Introduction

Case–control studies are retrospective studies where the response variable Y is dichotomous, e.g. presence or absence of some disease or injury. Subjects where \(Y=1\) are called cases and subjects where \(Y=0\) are called controls. Often there are potential confounding variables that are not of interest. Subjects with similar responses on these variables are considered part of the same stratum S. Matching subjects based on their stratum can reduce the effect of the confounding. A case–control study where 1 case is matched with M controls within the same stratum is called a 1–M matched case–control study (Agresti 2002; Hosmer and Lemeshow 2000). A special case of the matched case–control study is a matched case-crossover study where the stratum is the subject (Woodward 2013). Matched case–control studies are popular in public health, biomedical, and epidemiological applications, e.g., vaccine studies (Whitney et al. 2006), organ transplant studies (Peleg et al. 2007), and studies on traffic safety (Tester et al. 2004).

A semiparametric model for matched case–control studies with covariates measured with error is,

$$\begin{aligned} P(Y=1|\tilde{X},Z,S) = H^{-1}[m^*(\tilde{X},Z)+q(S)], \end{aligned}$$
(1.1)

where \(H(\cdot )\) is the link function, q(S) is the effect of stratum S, \(m^*(\cdot ,\cdot )\) is some function of Z, the covariates measure without error, and \(\tilde{X}\), the covariates measured with error. As matched case–control studies are retrospective studies, H is chosen to be the logit link function since it is the only link function that can be used to recover the prospective model (Scott and Wild 1997). Often the model is analyzed using conditional logistic regression in order to avoid estimating q(S). An alternative approach, and the approach we take in this paper, is to estimate the prospective model directly as a longitudinal binary outcomes and model q(S) as a random effect.

There is some existing work on the error-in-covariates for these models. For matched case–control studies analyzed using conditional logistic regression there is the work of McShane et al. (2001), Guolo and Brazzale (2008) and the related work on partial likelihood models of Huang and Wang (2000, 2001) which also account for functional covariate relationships. There are structural approaches where the unknown true covariate is parametrically modeled (Guolo 2008). These methods require knowledge of the true exposure rate and the measurement error distribution, including parameters. Others use functional approaches where the unobserved true covariate is unknown, but considered to be fixed and consequently no assumption is made regarding the distribution of the unobserved true covariate (Buzas and Stefanski 1996; Stefanski and Carroll 1987). There are texts available for a thorough review of non-Bayesian and Bayesian error-in-covariate methods (Carroll et al. 2006; Gustafson 2003).

There are some related Bayesian methodologies for measurement error in covariates, but none of them handle the clustered binary outcomes of the problem we confront. Berry et al. (2002) used smoothing splines and regression splines in the classical measurement error problem to a linear model set up, but not to the important case of binary data. Carroll et al. (2004) used Bayesian spline-based regression when an instrument is available for all study participants. In addition, both papers assumed that the unknown X is normally distributed. Sinha et al. (2010) proposed a semiparametric Bayesian method for handling measurement error under logistic regression setting. They developed a flexible Bayesian method where not only is the relationship between the disease and exposure variable treated semiparametrically, but also the relationship between the surrogate and the true exposure is modeled semiparametrically. The two nonparametric functions are modeled simultaneously via B-splines. Ryu et al. (2011) also proposed nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. They proposed to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model setting. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. Bartlett and Keogh (2018) first gave an overview of the Bayesian approach to handling covariate measurement error under parametric model setting, and contrast it with regression calibration, arguably the most commonly adopted approach. Bartlett and Keogh (2018) then demonstrated why the Bayesian approach has a number of statistical advantages compared to regression calibration and demonstrate that implementing the Bayesian approach is usually quite feasible for the analyst.

We adapt a fully Bayesian approach for covariate measurement error in semiparametric regression models for normal responses Y (Berry et al. 2002) for use with matched case–control studies, which treats \(\tilde{X}\) as a latent variable to be integrated over. This is similar to the work of Sinha et al. (2005) who take a Bayesian approach to error-in-covariates in conditional logistic linear regression models. We also develop two approximate-Bayesian approaches which use a first order Laplace approximation (Tierney and Kadane 1986) to marginalize \(\tilde{X}\) out of the likelihood.

For estimating \(m^*(\cdot ,\cdot )\), we assume \(m^*(\tilde{X},Z) = m(x)+Z\beta _z\), where \(m(\cdot )\) is a smooth function that can be approximated by the user’s favorite spline method, and where only one variable x is measured with error. Our focus will then be on a semiparametric mixed model approach for estimating \(m(x)+Z\beta _z\) that addresses covariate measurement error in x in 1–M matched case–control studies. Existing methods for characterizing error-in-covariates in models with clustered binary outcomes cannot estimate nonparametric relationships between the clustered binary outcome and covariates measured with error. Hence, the method we propose are unified approaches in their ability to handle error-in-covariates and detect both parametric and nonparametric relationships between clustered binary outcome and error-in-covariates. The Bayesian approaches are for computational convenience developed using a latent variable probit model (Albert and Chib 1993) with a scaled linear predictor to approximate a logit model (Camilli 1994). All are developed using low-rank thin-plate splines (Ruppert et al. 2003). We show through both simulations and a perturbed example of a 1–4 matched case–control study that the (fully) Bayesian approach performs similarly to the approximate-Bayesian approaches, except under model misspecification, where it tends to perform better.

This article is organized as follows: In Sect. 2, we describe a semiparametric mixed model with error-in-covariates and estimate it using low-rank thin-plate splines. In Sect. 3, we develop the approximate- and fully Bayesian approaches based on a latent variable probit approximation to a logistic model. In Sect. 4, we conduct a simulation study to compare our methods. In Sect. 5, we apply each approach to a 1–4 matched case–case control study for juvenile aseptic meningitis. Section 6 contains concluding remarks and possible future work.

2 Semiparametric mixed model with error-in-covariates

We can approximate m(x) using some basis function method such that \(m(x) \approx B(x)\beta _B\), where B(x) is a matrix of the basis function and \(\beta _B\) are the basis coefficients. For the purposes of this paper using low-rank thin-plate splines (Ruppert et al. 2003) with order p (chosen to be some natural number) and knots \((\xi _1 , \xi _2 , \ldots , \xi _\kappa )\), \(\kappa < N\times (M+1)\), chosen a priori. This produces the following linear model,

$$\begin{aligned} m(x)&\approx B(x)\beta _B \\ B(x)&= \left[ \begin{matrix}X^*&L_p^*(x)\end{matrix}\right] , \\ \beta _B&= \left[ \begin{matrix}\beta ^*_x&\beta ^*_L\end{matrix}\right] ^T \\ X^*&= \left[ \begin{matrix}1&x&\ldots&x^{p-1}\end{matrix}\right] , \\ L_p^*(x)&= \left[ \begin{matrix}|x-\xi _1|^{2p-1}&|x-\xi _2|^{2p-1}&\ldots&|x-\xi _\kappa |^{2p-1}\end{matrix}\right] . \end{aligned}$$

The penalty on \(\beta ^*_L\) is treated as the prior in a Bayesian framework. For low-rank thin-plate splines this is \(N( 0,\sigma ^2_\beta \Omega ^{-1})\), where the the (rc)th element of the penalty matrix \(\Omega \) is \(|\xi _r - \xi _c|^{2p-1}\). However, \(\Omega \) is not positive definite, and thus an invalid covariance matrix. To address this, singular value decomposition is used to find \((\Omega ^{-1/2})^T(\Omega ^{-1/2}) = \Omega ^{-1}\). We scale \(\Omega ^{1/2}\beta ^*_L = \beta _L\) and \(L^*_p(x)\Omega ^{-1/2} = L_p(x)\) so that \(L_p(x)\) is an orthogonal basis with prior distribution (i.e. penalty) \(N(0,\sigma ^2_\beta I)\) on \(\beta _L\) (Crainiceanu et al. 2005).

For error-in-covariates in matched case–control studies, we assume that we observe \(w_{ijk} = x_{ij} + u_{ijk}\), however \(x_{ij}\) is unobserved, the measurement error \(u_{ijk} \sim N(0,\sigma ^2_u)\), \(i = 1,2,\ldots ,N\) (the number of strata), \(j=1,2,\ldots ,M+1\) (the number of subjects in each strata), and \(k=1,2,\ldots ,K_{ij}\) is the number of replicated measurements for subject j in strata i. In order to properly estimate \(\sigma ^2_u\), \(K_{ij}\) must be greater than or equal to 2 for at least one ij.

To ease computations we approximate the logistic link function by using a latent variable probit model (Albert and Chib 1993) where the the linear predictor is scaled by \(\sqrt{\pi /8}\) (Camilli 1994). The model is as follows,

$$\begin{aligned} y_{ij}|x_{ij},Z_{ij},S_i&\sim \text {Bernoulli}(\pi _{ij}), \\ \pi _{ij}&= \text {logit}^{-1}(\eta _{ij}),\\&\approx \Phi (l_{ij}), \\ l_{ij}|y_{ij}&\sim \left\{ \begin{array}{ll} \text {Normal}^+\left( \eta _{ij}\sqrt{\pi /8},1\right) , &{} \quad y_{ij} = 1 \\ \text {Normal}^-\left( \eta _{ij}\sqrt{\pi /8},1\right) , &{}\quad y_{ij} = 0 \\ \end{array} \right. , \\ \eta _{ij}&= m(x_{ij})+Z_{ij}\beta _z+q(S_i), \\&\approx X^*_{ij}\beta ^*_x+L_p(x_{ij})\beta _L + Z_{ij}\beta _z + q(S_i), \\ q(S_i)&\sim \text {Normal}(0,\sigma ^2_q), \\ \beta _L&\sim \text {Normal}(0,\sigma ^2_\beta ), \\ w_{ijk}|x_{ij}&\sim \text {Normal}(x_{ij},\sigma ^2_u), \end{aligned}$$

where l is a latent variable, \(\text {Normal}^+(\cdot ,1)\) and \(\text {Normal}^-(\cdot ,1)\) are truncated normal distributions, to the left and to the right of zero, respectively.

3 Methods

We develop a fully Bayesian (FB) approach and two approximate-Bayesian (AB) approaches using first order Laplace approximation in Sects. 3.1 and 3.2, respectively.

3.1 Fully Bayesian approach

To improve computations, we let the intercept \(\beta _0\) be absorbed into q(S). We work with \(X_{ij}\), which is \(X^*_{ij}\) without the column of ones, and \(\beta _x\), which is \(\beta ^*_x\) without \(\beta _0\). The response Y depends on the regression parameters through l. Thus, a natural parametrization of the likelihood for modeling additive Gaussian measurement error is as follows,

$$\begin{aligned} L(W,l|Y,x,Z,\beta ,\sigma ^2_u)&\propto \prod _{i=1}^N\prod _{j=1}^{M+1} \left\{ \text {Normal}\left[ l_{ij};\eta _{ij}\sqrt{\pi /8},1\right] \times \bigg [\delta _{(l_{ij} \ge 0)}\delta _{(y_{ij}=1)}\right. \\&\quad +\delta _{(l_{ij} < 0)}\delta _{(y_{ij}=0)}\bigg ] \\&\quad \left. \times N(W_{ij};x_{ij},\sigma ^2_u I_{K_{ij}\times K_{ij}}) \right\} \\ \eta _{ij}&= Z_{ij}\beta _z+X_{ij}\beta _x+L_p(x_{ij})\beta _L +q(S_i). \end{aligned}$$

As mentioned previously, the prior on \(\beta _L\) should be chosen to be \(\pi (\beta _L|\sigma ^2_\beta ) \sim N(0,\sigma ^2_\beta )\), where \(\sigma ^2_\beta \) is a hyperparameter. In practice, the prior distributions placed on x and on q(S) should be chosen to reflect the data collected. For instance, a flexible approach might model x using a mixture of normals (Carroll et al. 1999). For this article, we choose \(\pi (x|\mu _x, \sigma ^2_x) \sim N(\mu _x, \sigma ^2_x)\) and \(\pi [q(S)|\beta _0, \sigma ^2_q] \sim N(\beta _0, \sigma ^2_q)\), where \(\mu _x\), \(\beta _0\), \(\sigma ^2_x\), and \(\sigma ^2_q\) are hyperparameters. The prior distributions for the other parameters are: \(\pi (\sigma ^2_u) \sim IG(\sigma ^2_u; A_u, B_u)\), and \(\pi [\beta _z,\beta _x] \sim N(\beta _z,\beta _x; g_\beta , t^2_\beta )\). Finally, we take the prior distributions for the hyper parameters as follows: \(\pi (\mu _x) \sim N(\mu _x; g_\mu , t^2_\mu )\), \(\pi (\sigma ^2_x) \sim IG(\sigma ^2_x; A_x, B_x)\), \(\pi (\sigma ^2_\beta ) \sim IG(\sigma ^2_\beta ; A_{\sigma ^2_\beta }, B_{\sigma ^2_\beta })\), \(\pi (\beta _0) \sim N(\beta _0; g_0, t^2_0)\), and \(\pi (\sigma ^2_q) \sim IG(\sigma ^2_q; A_q, B_q)\). Both the likelihood and prior structure are adapted from existing work where Y is continuous (Berry et al. 2002), which defaults to normal priors on mean-like parameters and inverse-gamma priors on variance parameters. In practice, careful choice of an informative prior structure can further improve inference. For example, it may be more appropriate to restrict the support of the prior on \(\sigma _u\) to be less than the value of \(\sigma _w\) since \(\sigma _w\) should be greater than \(\sigma _u\) when errors are additive and Gaussian.

We use Metropolis–Hastings (Metropolis et al. 1953; Hastings 1970) and Gibbs (Geman and Geman 1984) algorithms to sample the joint posterior of these parameters using Markov chain Monte Carlo (MCMC). The joint posterior distribution of x uses a Metropolis–Hastings step, while all other parameters can be sampled using Gibbs steps. The conditional posterior distributions for each parameter along with the proposal distribution for \(x_{ij}\) can be found in Appendix A.

3.2 Approximate-Bayesian approaches

Updating each \(x_{ij}\) via Metropolis–Hastings can be time consuming computationally, especially for large \(N \times M\). We propose two approximate-Bayesian (denoted AB1 and AB2) approaches to reduce computation time, by integrating each \(x_{ij}\) out of the conditional likelihood a priori. This integration is intractable due to the spline portion of the linear predictor. We use a first order Laplace approximation (Tierney and Kadane 1986) to solve the integration.

For AB1, we place an improper flat prior on \(x_{ij}\) and find:

$$\begin{aligned} \int L(l_{ij},W_{ij}|Y_{ij},x_{ij},Z_{ij},S,\beta ,\sigma ^2_u)\, \mathrm {d}x_{ij} \approx L(l_{ij}|Y_{ij},x_{ij}=\bar{w}_{ij\cdot },Z_{ij},S,\beta ), \end{aligned}$$

where \(\bar{w}_{ij\cdot } = K^{-1}_{ij}\sum _{k=1}^{K_{ij}} w_{ijk}\). See Appendix 2.1 for derivation.

For AB2, we place prior a normal prior on \(x_{ij}\) (as in Sect. 3.1) and find:

$$\begin{aligned}&\int L(l_{ij},W_{ij}|Y_{ij},x_{ij},Z_{ij},S,\beta ,\sigma ^2_u)N(x_{ij}|\mu _x,\sigma ^2_x) \, \mathrm {d}x_{ij}\\&\quad \approx L(l_{ij}|Y_{ij},x=\tilde{w}_{ij\cdot },Z_{ij},S,\beta ), \end{aligned}$$

where \(\tilde{w}_{ij\cdot } = \frac{K_{ij}\bar{w}_{ij\cdot }\sigma ^2_x+\mu _x\sigma ^2_u}{K_{ij}\sigma ^2_x+\sigma ^2_u}\). See Appendix 2.1 for derivation. The parameters \(\mu _x\), \(\sigma ^2_x\), and \(\sigma ^2_u\) need to be estimated a priori. We recommend setting each to the MLE:

$$\begin{aligned} \mu _x&= \bar{w}_{\cdot \cdot \cdot }, \\ \sigma ^2_u&= \sum _{i=1}^N\sum _{j=1}^{M+1}\sum _{k=1}^{K_{ij}}[N(M+1)(K_{ij}-1)]^{-1}(w_{ijk}-\bar{w}_{ij\cdot })^2, \quad \text {and} \\ \sigma ^2_x&= \sum _{i=1}^N\sum _{j=1}^{M+1}[N(M+1)-1]^{-1}(\bar{w}_{ij\cdot }-\mu _x)^2 -\sigma ^2_u. \end{aligned}$$

Both approximate-Bayesian methods produce simple ways of handling error-in-covariates, equivalent to using a plug-in estimator \(w^*\) and proceeding as if no covariate measurement error. To be clear, we use the following likelihood in the approximate-Bayesian analysis:

$$\begin{aligned} L(Y|l,w^*,Z,\beta )&\propto \prod _{i=1}^N\prod _{j=1}^{M+1} N[l_{ij};\eta _{w,ij}\sqrt{\pi /8},1]\\&\quad \times [\delta _{(l_{ij} \ge 0)}\delta _{(y_{ij}=1)}+\delta _{(l_{ij} < 0)}\delta _{(y_{ij}=0)}], \\ \eta _{w,ij}&= Z_{ij}\beta _z+W^*_{ij\cdot }\beta _x+L_p(w^*_{ij\cdot })\beta _L +q(S_i), \end{aligned}$$

where \(w^*_{ij\cdot } = \bar{w}_{ij\cdot }\) for AB1 and \(w^*_{ij\cdot } = \frac{m\bar{w}_{ij\cdot }\sigma ^2_x+\mu _x\sigma ^2_u}{K_{ij}\sigma ^2_x+\sigma ^2_u}\) for AB2, and \(W^*_{ij\cdot } = (w^*_{ij\cdot } , w^{*^2}_{ij\cdot } , \ldots , w^{*^{p-1}}_{ij\cdot } )\).

The prior structure we adopt for the rest of this model, i.e. \(\beta \), q(S), and their hyperparameters, is the same as in Sect. 3.1. We then obtain the same conditional posteriors for them as well.

4 Simulation study

To assess the adequacy of each approach for correcting for covariate measurement error, we conducted a simulation study to address performance in terms of minimizing both the mean squared error (MSE) and the mean bias. We considered the fully Bayesian approach of Sect. 3.1 and the two approximate-Bayesian approach of Sect. 3.2. In Sect. 4.1 we address model performance when the assumptions concerning the covariate measurement error are met. In Sect. 4.2 we address the robustness of each method when there is model misspecification error in the distribution of x and u. In Sect. 4.3 we describe the results.

For all simulations we set \(K_{ij} = 2\) for all ij, \(M = 4\). We look at only a single covariate z measured without error, with \(\beta _z = -0.5\). We simulate \(z \sim N(0,1)\) and \(q(S) \sim N(0,0.1^2)\). We look at two functions \(m(x) = x^2/6\) and \(m(x) = \sin (\pi x /2)\). The quadratic function was chosen for its simplicity and because quadratic models are usually fit parametrically with a linear term, while the sinusoidal pattern was chosen for its similarity to the relationship found in our juvenile aseptic meningitis data described in Sect. 5. To generate the clustered binary outcomes, sets of \(1+M\) binary outcomes were generated from \(P(Y=1|x,z,S,\beta _z) = \Phi [m(x)+z\beta _z+q(S)]\) until a set is found such that \(\sum _{j=1}^{1+M}Y_j = 1\). This was repeated for each of the N strata. And again repeated to produce 100 datasets for each simulation setup.

It should be noted that the choice of probit link function for data generation is arbitrary. We use the logit link function for analysis because the estimated parameters are the same whether the data were collected prospectively or retrospectively, and not because said estimated parameters will be from the “true model,” if such a thing is knowable.

For each Bayesian approach, we use the same prior structure as noted in Sect. 3.1, where \(\{A_u, B_u, A_x, B_x, A_{\sigma ^2_\beta }, B_{\sigma ^2_\beta }, A_q, B_q \} = 0.1\), \(\{g_\mu , g_\beta , g_0\} = 0\), and \(\{t^2_\mu , t^2_\beta , t^2_0\} = 5^2\). For estimation, we use low-rank thin-plate splines with \(\kappa = 10\) knots, chosen at evenly spaced percentiles of \(\bar{w}\), and with order \(p = 2\), for all methods. The mean squared error \(\sum _{i=1}^N\sum _{j=1}^{M+1}\left( \hat{\eta }^{(\cdot )}_{ij}-\hat{\eta }^{(T)}_{ij}\right) ^2\) and mean bias \(\sum _{i=1}^N\sum _{j=1}^{M+1}\left( \hat{\eta }^{(\cdot )}_{ij}-\hat{\eta }^{(T)}_{ij}\right) \) is computed for each simulation dataset, where \(\hat{\eta }^{(\cdot )}\) is the estimated linear predictor using one of the proposed methods, and \(\hat{\eta }^{(T)}\) is the estimated linear predictor for the fully Bayesian approach with perfect measurements for x.

For all simulations, all Bayesian methods were run for 10000 iterations with the first 2000 treated as burn-in. These values were determined graphically from the results of repeated test cases for each simulation combination. Every 10th iteration was kept after burn-in. Acceptance rates for the \(x_{ij}\)s averaged around 0.4 across all simulations. Simulations were run using Matlab 2012a (MATLAB 2012) and GNU Octave (Eaton et al. 2008). For code, please contact the authors.

4.1 Correctly specified model

In simulations where the measurement error distribution and distribution of x are correctly specified, we generate each \(x_{ij}\) from a standard normal and each \(u_{ijk}\) such that \(\sigma _u = \{0.1,0.3,0.5\}\), corresponding to small, large, and very large amounts of measurement error when \(\sigma _x = 1\) (Parker et al. 2010). We also consider small and large sample situations with the number of strata \(N = \{25,100\}\).

4.2 Model mis-specification

We consider three cases of model misspecification, one where only the distribution of x is misspecified, one where only the distribution of u is misspecified, and one where the distribution of both x and u are misspecified:

  • \(2^{3/2}\times (x+4) \sim \chi ^2_4\) and \(u \sim N(0,\sigma _u = 0.5)\)

  • \(x \sim N(0,1)\) and \(u \sim \text {Laplace}\left[ 0,\text {scale}=2^{-3/2}\right] \)

  • \(2^{3/2}\times (x+4) \sim \chi ^2_4\) and \(u \sim \text {Laplace}\left[ 0,\text {scale}=2^{-3/2}\right] \)

The misspecified distributions are chosen such that \(\sigma _u/\sigma _x = 0.5\) for all cases. For this set, we consider a moderate sample size with the number of strata \(N = 50\).

4.3 Simulation results

Tables 1 and 2 present the results for when the distribution of x and u are correctly specified, for the quadratic \(m(x) = x^2/6\) and sinusoidal \(m(x) = \sin (\pi x/2)\) cases, respectively. We observe that for the quadratic cases, no method consistently reduces the bias or MSE over the other. However, it should be noted that the fully Bayesian approach is never the worst at reducing MSE. For the sinusoidal cases the fully Bayesian approach is worst at reducing MSE in one case, where \(N=100\) and \(\sigma _u=0.1\), but it is the best choice for reducing MSE and bias for all cases where \(\sigma _u = \{0.3,0.5\}\). These results suggest that the fully Bayesian is at least as good as both approximate-Bayesian approaches, particularly at reducing MSE, and it is not clear which approximate-Bayesian approach is the better that the other.

Table 1 The MMB\(\times 10^2\) (mean mean bias) and the MMSE\(\times 10^2\) (mean mean squared error) of the linear predictors \(\hat{\eta }^{(\cdot )}\) for the case where \(m(x) = x^2/6\) for comparing the approximate-Bayesian methods with flat (AB1) and normal (AB2) prior on x and the fully Bayesian method (FB)
Table 2 The MMB\(\times 10^2\) (mean mean bias) and the MMSE\(\times 10^2\) (mean mean squared error) of the linear predictors \(\hat{\eta }^{(\cdot )}\) for the case where \(m(x) = \sin (\pi x/2)\) for comparing the approximate-Bayesian methods with flat (AB1) and normal (AB2) prior on x and the fully Bayesian method (FB)
Table 3 The MMB\(\times 10^2\) (mean mean bias) and the MMSE\(\times 10^2\) (mean squared error) of the linear predictors \(\hat{\eta }^{(\cdot )}\) for assessing robustness of the approximate-Bayesian methods with flat (AB1) and normal (AB2) prior on x, and the fully Bayesian method (FB) to model misspecification of the distribution of x and u when \(m(x) = x^2/6\)

Tables 3 and 4 present the results for when the distribution of x and u are incorrectly specified, for the quadratic \(m(x) = x^2/6\) and sinusoidal \(m(x) = \sin (\pi x/2)\) cases, respectively. We observe that for the quadratic cases, the fully Bayesian approach provides better reduction in MSE than both the approximate-Bayesian approaches for all misspecification types. Similarly, approximate-Bayesian approach AB2 dominates AB1 in terms of MSE. The opposite is observed for bias, where approximate-Bayesian approaches provide better reduction than the fully Bayesian for all misspecification types, though one approximate-Bayesian approach does not dominate the other. However, this is not true for the sinusoidal cases where the fully Bayesian approach reduces both the bias and MSE more than the approximate-Bayesian approaches for all misspecification types. Approximate-Bayesian approach AB1 dominates AB2 in terms of bias, however AB2 dominates AB1 in terms of MSE. These results suggest the approximate-Bayesian approaches might be at least as good at reducing bias, however the fully Bayesian approach is at least as good at reducing MSE, both when the distribution of x or u is misspecified.

Table 4 The MMB\(\times 10^2\) (mean mean bias) and the MMSE\(\times 10^2\) (mean mean squared error) of the linear predictors \(\hat{\eta }^{(\cdot )}\) for assessing robustness of the approximate-Bayesian methods with flat (AB1) and normal (AB2) prior on x, and the fully Bayesian method (FB) to model misspecification of the distribution of x and u when \(m(x) = \sin (\pi x/2)\)

5 Application: juvenile aseptic meningitis data

We consider the aseptic meninitis data of ADD CITATION. Aseptic meningitis is a viral infection that causes inflammation of the membrane that covers the brain and spinal chord. It is rarely fatal, but can take about two weeks to recover from fully. The study design was for a 1–4 case-crossover study with 211 subjects (i.e., strata). Case-crossover studies are a special case of matched case–control studies where the stratum is the subject. The turbidity of drinking water (i.e., the amount of suspended matter in the water) is believed to affect the risk of aseptic meningitis. For this study water turbidity was measured by a nephelometer. A nephelometer shoots a beam of light at water then measures the scattered light. It then uses a formula to determine the turbidity, measured in nephelometric turbidity units (NTU). The device is susceptible to miscalibration, and can be thrown off by air bubbles that may make water that does not actually contain any suspended particles appear cloudy. This study design was not setup for multiple measurements of NTU, so for illustrative purposes we treat NTU measurements collected on two separate samples dates as error prone measurements of NTU from the sample data of interest. These measurements are centered and scaled across dates. We use the same model specification as used in our simulations. For \(\mathbf {Z}\) we use the centered and scaled body temperature of the subjects in degrees Celsius.

Figure 1 shows a plot of the posterior mean fitted value (centered across method) for \(m(\cdot )\) using the approximate-Bayesian and fully Bayesian methods. The choice of method leads to dramatically different inference concerning the effect of NTU on the probability of acquiring aseptic meningitis, primarily for large measurements of NTU. By using the fully Bayesian method, we do a much better job capturing the decreasing effect of NTU for large values of NTU.

Fig. 1
figure 1

The posterior-mean fits of m(x) for the aseptic meningitis data, where x is centered and scaled Nephelometric Turbidity Units (NTU). The black dots are the centered (across all methods) fitted values using approximate-Bayesian method 1 (AB1) evaluated over a grid of NTU values. Similarly, the black circles are for approximate-Bayesian method 2 (AB2) and the black squares are for the fully Bayesian method (FB)

The posterior mean of \(\sigma _u\) from the fully Bayesian model is 0.8854 with a 90% equal tail credible interval of [0.8548, 0.9176]. Given \(\sigma _x \approx \sigma _{\bar{w}} = 0.7788\), we are in a large measurement error scenario with \(\sigma _u/\sigma _x \approx 1.1369\). From Fig. 2 we can see that the distribution of NTU is not normally distributed. As a result, we made a model misspecification error by placing a normal prior on the distribution of NTU. Given these conditions and our simulations results, we believe the fully Bayesian approach is the best approach for this data.

Fig. 2
figure 2

Histogram of \(\bar{\mathbf {w}}\), the subject-specific mean measurement of Nephelometric Turbidity Units (NTU), from the aseptic meningitis data. The black line is a fit from a normal distribution. The measurement for NTU are somewhat symmetric and are unimodal, but normality does not hold

6 Discussion

We have proposed a fully Bayesian and two approximate-Bayesian approaches for handling a semiparametric mixed model with error-in-covariates for matched case–control studies. These approaches are developed using low-rank thin-plate splines and a latent variable probit model. The strength of these methods is that they can handle both error-in-covariates and explain nonlinear relationships between matched binary outcomes and covariates measured with error. Additionally, we have shown these methods exhibit some robustness to model misspecification of x and u. Based on our knowledge, there is no existing methodology that has been shown to do these in matched case–control studies.

The fully-Bayesian approach treats x as a latent variable and then integrates it out. The approximate-Bayesian approaches use a first order Laplace approximation to the likelihood, marginalizing out x. Deciding which approach to take, AB1, AB2, or FB, can be challenging as it appears somewhat to depend on what the unknown function m(x) is and how well our assumptions are met about the normality of the distribution of x and of u. When all assumptions are met, the fully Bayesian approach tended to perform best more often. However, improvements were often small and not consistent across sample sizes or size of measurement error. As a result, it may not be worth the additional computation for large datasets when there is a reasonable chance it will not actually lead to an improvement. The stronger argument for using the fully Bayesian method is made by its performance under model misspecification, particularly in terms of MSE. If you believe the adage that ‘All models are wrong...,’ then this is the more compelling argument for using the fully Bayesian method, as it outperformed the approximate methods in almost every scenario in terms of mean bias, and every scenario in terms of MSE. Though, again these improvements were often only of modest size. A user may still feel that a faster solution has greater utility than a more accurate one. It is up to the user to consider what is best for their own project.

We note that our approach was developed for the univariate x. Our approach can be generalized for several covariates x measured with error into an additive model,

$$\begin{aligned} m^*(\tilde{X},Z) = \sum _{r_1=1}^{R_1} m_{r_1}(x_{r_1}) + \sum _{r_2=1}^{R_2} m_{r_2}(z_{r_2}), \end{aligned}$$

where there are \(R_1\) covariates measured with error and \(R_2\) covariates measured without error. Generalization to a nonadditive model will be an interesting and challenging problem because of the unknown interaction structure among unknown covariates. We illustrated our technique using low-rank thin-plate splines, however, it is straight forward to change the spline basis to any other where the smoothness penalty can be thought of as a \(N(0,\sigma ^2_\beta )\) prior on \(\beta _L\) the spline coefficients (Ruppert et al. 2003).

We assumed that the measurement error u was additive and normally distributed. This assumption creates a computational convenience, as we can choose an inverse-gamma conjugate prior for \(\sigma ^2_u\) so that it can be sampled in a Gibbs step. If we change the distributional assumption on u, this convenience will be lost. More complicated measurement error distributions that may depend on x or Y are worthwhile future research problems. Another choice of computational convenience was to use a rescaled latent variable probit model to approximate the logistic model. This validity of this choice depends the quality of this approximation and the user’s personal loss function for tolerating this approximation. However, we gained the ability to use conjugate normal priors on \(\beta _x\), \(\beta _z\), \(\beta _L\) and q(S) and sample them using Gibbs steps. Changing the link function would also remove this convenience.

Finally, we assumed the distribution of x was normal. In practice this might not be the case, and was not the case for our data analysis example. We showed the fully and approximate-Bayesian methods were somewhat robust to violations of this assumption, as well as assumptions about the distribution of u. However, flexible methods for properly modeling the distribution of x and u should improve performance of Bayesian error-in-covariates models.