1 Introduction

Regression models are an important statistical tool for capturing the average effects of covariance on a response variable. However, when the distribution of the response variable dependent on the covariates is not symmetric or there exist outliers, the mean regression is not appropriate for representing the phenomenon. In fact, real data analysis has suggested that the effects from covariates on the response variable may not be homogeneous when there are individuals with outlying values of the response variable. Quantile regression (Koenker and Bassett 1978) is a regression method that enables estimation of multiple quantiles. Because of the robustness of quantiles, quantile regression is known to be robust against outliers in the response variable and asymmetry in the response variable distribution. Furthermore, since any quantile can be used as a target of estimation, it is possible to choose a quintile for analysis based on the objective of the particular research and to compare the estimation results of different quantiles. In other words, quantile regression can be used to perform a more diversified analysis than mean regression. Because of these advantages, quantile regression has been attracting attention in recent years. In this paper, we focus on quantile regression modeling of hierarchical data with a nonlinear structure. Note that while the proposed model in this paper is based on longitudinal data, it can be applied to hierarchical data, since longitudinal data are a type of hierarchical data.

Linear mixed-effects models (Laird and Ware 1982) have been a common choice for the analysis of hierarchical data, including longitudinal data. Mixed-effects models are regression models that can capture correlations in observed values within individuals and variations between individuals. The predictor of mixed-effects models is represented by the sum of two terms, a fixed-effects term and a random-effects term. The fixed-effects term formulates the overall trend due to an unknown parameter common to all individuals and the random-effects term formulates the variation among individuals due to an unknown parameter that differs between individuals. Using these two terms, mixed-effects models can simultaneously estimate both the overall transition trend of a response variable and the transitions specific to each individual, taking into account inter-individual variability.

Although linear mixed-effects models estimate conditional expectations of response variables, studies analyzing hierarchical data have also been conducted in the framework of quantile regression. For example, linear quantile mixed-effects models assuming a univariate random effect and multivariate random effects have been proposed in Geraci and Bottai (2007) and Geraci and Bottai (2014), respectively. These models perform quantile regression in the framework of linear mixed-effects models and inherit the advantages of these models while using arbitrary quantiles instead of the mean value as the estimation targets.

Furthermore, Galarza et al. (2020) and Geraci (2019b) have proposed nonlinear quantile mixed-effects models for hierarchical data with nonlinear structures. These models are based on nonlinear mixed-effects models such as Lindstrom and Bates (1990) and include the assumption that a nonlinear function that adequately models the phenomenon of interest is already known. For this reason, Galarza et al. (2020) used a three-parameter logistic model (Pinheiro and Bates 1995), a well-known growth curve model, as a predictor of their nonlinear quantile mixed-effects models, and Geraci (2019b) used a bi-exponential model (Pinheiro and Bates 1995), which is a pharmacokinetic model, as a predictor of his nonlinear quantile mixed-effects models.

However, there are not many situations in which the models are known for the phenomena of interest. For example, our research studies the infant circadian rhythm of cortisol secretion using longitudinal data. Since the rhythm of cortisol secretion during infancy is unknown, it is impossible to determine the nonlinear function as a predictor of the model in advance.

The purpose of this study was to develop a model that can estimate the transition of quantile points of a response variable from longitudinal data with a nonlinear structure when the nonlinear function cannot be identified in advance. To achieve this goal, we consider a basis expansion method for both fixed- and random-effects terms in quantile mixed-effects models.

Since models with basis expansions can be expressed in the same form as a linear model, they can be treated as a special case of the linear quantile mixed-effects model, such as those proposed by Geraci and Bottai (2007) and Geraci and Bottai (2014). In practice, however, the computational cost of this approach is high. We first need to determine both the number of basis functions for the fixed-effects term and the number of basis functions for the random-effects term, and it is much more expensive to determine the combination of two elements than just one element. Furthermore, smoothing using basis expansions requires determining the value of the regularization parameters, which prevent overfitting and adjust the smoothness of the function.

To overcome these issues, we proposed the Bayesian nonparametric quantile mixed-effects models (BNQMs), which solve these problems by using Bayesian inference. In BNQMs, Bayesian regularization can be performed by assuming a specific distribution of the prior for the coefficient parameters of the basis function. The Markov chain Monte Carlo (MCMC) method is used in BNQMs to estimate the posterior of unknown parameters. In addition, the priors or hyperpriors are assumed hierarchically for the parameters of the priors for the coefficient parameters, and regularization parameters can be estimated simultaneously by the MCMC method. Such a method is called a fully Bayesian method. With this fully Bayesian method, it is no longer necessary to repeat point estimations to determine the optimum value from the candidate values of the regularization parameter, by comparison, using an information criterion.

In this study, we also proposed the use of a Gaussian process (GP) prior for the coefficient parameter vectors of the basis functions. We showed that the smoothness can be appropriately adjusted when the number of basis functions is excessive using a GP prior for Bayesian regularization. Based on this, we can fix the number of basis functions, and the smoothness of the regression curve can be adjusted by optimizing the hyperparameters of the GP, thus reducing the computational cost. In addition, since BNQMs use the full Bayesian method, these hyperparameters can also be estimated by the MCMC method.

A clear advantage of this approach over existing methods, including the aforementioned Galarza et al. (2020) and Geraci (2019b), as well as the additive and semiparametric models proposed by Yue and Rue (2011), Waldmann et al. (2013), Fenske et al. (2013), and Geraci (2019a), is that the proposed regularization method was developed by focusing more on the problem of using basis function expansions for random-effects terms. Here, the usefulness of the proposed regularization is confirmed by numerical experiments.

The remainder of the article is organized as follows. Section 2 explains the details of BNQMs. We then report our numerical experiments on BNQMs in Section 3. After showing the usefulness of such models in numerical experiments, an application example of a BNQM is shown by analyzing infant cortisol data in Section 4. Finally, this article is concluded in Section 5.

2 Bayesian nonparametric mixed-effects models

2.1 Model

We take the observation of the i-th individual at the j-th measurement time \(t_{ij}\), \(\{(t_{ij}, y_{ij}); i = 1,..., N, j = 1,..., n_i\}\). Then, the \(\tau \)-th quantile of \(y_{ij}\) at \(t_{ij}\) can be modeled as

$$\begin{aligned} Q_{y_{ij}}(\tau |t_{ij})= & {} \sum _{k=1}^{p}\beta _{\tau k}\phi _k(t_{ij})+\sum _{l=1}^{q}b_{\tau il}\psi _l(t_{ij})\nonumber \\= & {} \varvec{\beta }_\tau ^{\top }\varvec{\phi }(t_{ij})+\varvec{b}_{\tau i}^{\top }\varvec{\psi }(t_{ij}), \end{aligned}$$
(1)

where \(\tau \in (0,1)\), \(Q_{y_{ij}}(\cdot ) \equiv F^{-1}(\cdot )\), and \(\varvec{\phi }(t)=(\phi _1(t), \dots , \phi _{p}(t))^{\top } \) and \(\varvec{\psi }(t)=(\psi _1(t), \dots , \psi _{q}(t))^{\top } \) are vectors of the basis functions in the fixed- and random-effects terms, respectively, \(\varvec{\beta }_\tau =(\beta _{\tau 1}, \dots , \beta _{\tau p})^{\top }\) is a \(p \times 1\) coefficient parameter vector of \(\varvec{\phi }(t)\), and \(\varvec{b}_{\tau i}=(b_{\tau i1}, \dots , b_{\tau iq})^{\top }\) is a \(q \times 1\) coefficient parameter vector of \(\varvec{\psi }(t)\), where \(\varvec{b}_{\tau i} \sim N(\varvec{0}, \varvec{\varGamma }_\tau )\) is assumed. Here, \(\varvec{\varGamma }_\tau \) is a \(q \times q\) positive-definite covariance matrix. In the present study, Spline-based Gaussian basis functions (Kawano and Konishi 2007) were used in the numerical experiments and cortisol data analysis. For these basis functions, equidistant points corresponding to the knots of the cubic B-spline (de Boor 2001) are the centers.

Equation (1) can be rewritten in the form of a linear quantile mixed-effects model:

$$\begin{aligned} Q_{\varvec{y}_i}(\tau |\varvec{t}_i)=\varvec{\varPhi }_i\varvec{\beta }_\tau +\varvec{\varPsi }_i\varvec{b}_{\tau i}, \end{aligned}$$
(2)

where \(\varvec{y}_i =(y_{i1}, \dots , y_{in_i})^{\top }\), and \(\varvec{t}_i =(t_{i1}, \dots , t_{in_i})^{\top }\). Here, \(\varvec{\varPhi }_i=(\varvec{\phi }(t_{i1}), \dots , \varvec{\phi }(t_{in_i}))^{\top }\) is an \(n_i \times p\) design matrix in the fixed-effects term and \(\varvec{\varPsi }_i=(\varvec{\psi }(t_{i1}), \dots , \varvec{\psi }(t_{in_i}))^{\top } \) is an \(n_i \times q\) design matrix in the random-effects term. For simplicity of notation, we will omit the subscript \(\tau \) in the remainder of the paper.

2.2 Estimation

2.2.1 Likelihood function

In BNQMs, we consider Bayesian estimation of the unknown parameters of Eq. (1) using the MCMC method. Yu and Moyeed (2001) proposed a method to express likelihood using an asymmetric Laplace distribution as a Bayesian approach in quantile regression. We assume that the conditional distribution of \(y_{ij}\) is an asymmetric Laplace distribution, \(y_{ij}|\varvec{\beta },\varvec{b}_{i}, \sigma \sim AL(\mu _{ij}, \sigma , \tau )\); therefore, its probability density function can be written as

$$\begin{aligned} p(y_{ij}|\varvec{\beta },\varvec{b}_{i}, \sigma )=\frac{\tau (1-\tau )}{\sigma }\exp \left\{ -\rho _\tau \left( \frac{y_{ij}-\mu _{ij}}{\sigma }\right) \right\} , \end{aligned}$$
(3)

where \(\mu _{ij}=\varvec{\beta }^{\top }\varvec{\phi }(t_{ij})+\varvec{b}_{i}^{\top }\varvec{\psi }(t_{ij})\) is the location parameter, \(-\infty<\mu _{ij}<\infty \), \(\sigma >0\) is the scale parameter, \(0<\tau <1\) is the skewness parameter, and \(\rho _\tau (u)=u(\tau -I(u<0))\) is the loss function with indicator function \(I(\cdot )\).

Furthermore, Kozumi and Kobayashi (2011) proposed the following location-scale mixture representation of the asymmetric Laplace distribution:

$$\begin{aligned} y_{ij}|v_{ij}\sim & {} N(\mu _{ij}+ a \sigma v_{ij}, \delta ^2\sigma ^2 v_{ij}), \nonumber \\ v_{ij}\sim & {} \mathrm{Exp}(1),\nonumber \\ a= & {} \frac{1-2 \tau }{\tau (1- \tau )}, \nonumber \\ \delta ^2= & {} \frac{2}{\tau (1-\tau )}, \end{aligned}$$
(4)

where Exp(1) denotes an exponential distribution with mean 1 and \(\mu _{ij}=\varvec{\phi }(t_{ij})^{\top }\varvec{w}+\varvec{\psi }(t_{ij})^{\top }\varvec{b}_i\). This representation, which uses a normal distribution and an exponential distribution hierarchically, improves the sampling efficiency of the MCMC method.

Based on these settings, the conditional distribution of \(y_{ij}\) in the BNQM can be written as follows:

$$\begin{aligned} p(y_{ij}|\varvec{\beta }, \varvec{b}_i, v_{ij}, \sigma )= N(y_{ij}|\mu _{ij}+a \sigma v_{ij}, \delta ^2\sigma ^2 v_{ij}), \end{aligned}$$
(5)

where \(N(y_{ij}|\mu _{ij}+a \sigma v_{ij}, \delta ^2\sigma ^2 v_{ij})\) is a probability density function of the normal distribution with mean \(\mu _{ij}+a \sigma v_{ij}\) and variance \(\delta ^2\sigma ^2 v_{ij}\). From equation (5), the likelihood function can be written as

$$\begin{aligned} p(\varvec{y}|\varvec{\beta }, \varvec{b}, \varvec{v}, \sigma )= \prod _{i=1}^N \prod _{j=1}^{n_i} N(y_{ij}|\mu _{ij}+\alpha \sigma v_{ij}, \delta ^2\sigma ^2 v_{ij} ), \end{aligned}$$
(6)

where \(\varvec{b}=(\varvec{b}^{\top }_1, \cdots , \varvec{b}^{\top }_N)^{\top }\) and \(\varvec{v}=(v_{11}, \cdots , v_{1 n_1}, \cdots , v_{N 1}, \cdots , v_{N n_N})^{\top }\).

2.2.2 GP prior regularization for basis expansion method

In the basis expansion method, regularization is important for adjusting the smoothness of the estimated function. In the Bayesian framework, regularization can be achieved using a specific distribution as a prior. In this study, we proposed using a GP prior for the regularization of the coefficients of the basis functions. We will call this Bayesian regularization method “GP prior regularization”. The proposed GP prior regularization method is explained in this section and the advantages of using a GP prior for regularization in the basis expansion method were demonstrated in a simple numerical experiment, detailed later.

In the Bayesian framework, the regularization can be realized by setting a specific prior for the coefficient parameters. For a Bayesian regularization for a p-dimensional coefficient parameter vector \(\varvec{\beta }=(\beta _1,\dots , \beta _p)^{\top }\), a ridge regularization (\(l_2\) regularization) assuming the following multivariate normal distribution for \(\varvec{\beta }\) is often performed:

$$\begin{aligned} \varvec{\beta }\sim N_p(\varvec{0}, \sigma _\beta ^2\varvec{I}). \end{aligned}$$
(7)

In this case, the hyperparameter \(\sigma _\beta ^2\) is used to regulate the norm of \(\varvec{\beta }\) to prevent overfitting.

Consider the basis expansion method expressed by the following equation:

$$\begin{aligned} f(t)= & {} \beta _1\phi _1(t) + \cdots + \beta _p\phi _p(t) \nonumber \\= & {} \varvec{\beta }^{\top }\varvec{\phi }(t). \end{aligned}$$
(8)

In our treatment, we focused on the fact that the parameters \(\beta _k, \beta _l(k,l =1, \dots , p)\) are assumed to be independent of each other and we developed a method to set the prior. Specifically, we considered that the smoothness of the estimation curve could be more effectively adjusted using a prior that satisfies the assumption that the value of the coefficient parameter is closer when the location of the corresponding basis function is closer. For this, we assume a GP prior for the coefficient parameters of the basis function, which can be written as

$$\begin{aligned} \beta _k \sim GP(0, k(s_k, s_l)), k,l =1, \dots , p, \end{aligned}$$
(9)

where \(k(s_k, s_{l})\) is a kernel function. In the GP regression, kernel functions determine the similarity of the outputs, and Eq. (9) defines the similarity (covariance) between \(\beta _k\) and \(\beta _l\). (If \(k=l\), the kernel function defines the variance of \(\beta _k\).)

We propose using the center of the basis function \(s_k\) as the input of the kernel function, which satisfies the assumption that the coefficient parameter is closer when the corresponding basis function is closer. In this study, we use the following RBF kernel as \(k(s_k, s_{l})\):

$$\begin{aligned} k(s_{k}, s_{l} |\alpha , \rho )= \alpha ^2 \exp \left\{ - \frac{1}{2 \rho ^2} (s_k- s_{l})^2 \right\} , \end{aligned}$$
(10)

where \(\alpha \) and \(\rho \) are hyperparameters \((\alpha> 0, \rho > 0)\), and the scale (magnitude of amplitude) and smoothness of the function f(t) estimated by the basis expansion method expressed as equation (8) are adjusted.

Equation (9) is equivalent to assuming the following multivariate normal distribution as the prior for \(\varvec{\beta }\):

$$\begin{aligned} \varvec{\beta } \sim N(\varvec{0}, \varvec{K}), \end{aligned}$$
(11)

where \(\varvec{K} \) is the \(p \times p \)-dimensional covariance matrix and its (kl) element is determined by \(k(s_k, s_{l})\).

This method addresses the problem of overfitting when the number of basis functions is too large for the complexity of the nonlinear structures. To demonstrate this advantage, we conducted a simple numerical experiment. Figures 1 and 2 compare the fitting results for quantile regression using the basis expansion method when ridge regularization expressed by Eq. (7) and GP prior regularization expressed by Eq. (9) are used, with a change in the number of basis functions. The results show that the smoothness can be optimized even if the number of basis functions is excessive when a GP prior is assumed. In this method, the numbers of basis functions in the fixed-effects term and the random-effects term are unified to a fixed value p. The fact that it is not necessary to determine the combination of the number of basis functions based on an information criterion drastically reduces the computational cost.

2.2.3 Prior setting of BNQM

Here, we summarize the prior setting of BNQMs. Since BNQMs are mixed-effects models, we assume a GP prior for the coefficient parameters of the basis function in the fixed-effects and random-effects terms, expressed as

$$\begin{aligned} \varvec{\beta } | \alpha _f, \rho _f\sim & {} N(\varvec{0}, \varvec{G}), \nonumber \\ \varvec{b}_i | \alpha _r, \rho _r\sim & {} N(\varvec{0}, \varvec{\varGamma }), \end{aligned}$$
(12)

where \(\varvec{G}\) and \(\varvec{\varGamma }\) are the \(p \times p\)-dimensional covariance matrices and the (kl) elements of \(\varvec{G}\) and \(\varvec{\varGamma }\) are given by the following RBF kernels:

$$\begin{aligned} k_f(s_k, s_{l} |\alpha _f, \rho _f)= & {} {\alpha _f}^{2} \exp \left\{ - \frac{1}{2 {\rho _f}^{2}} (s_k- s_{l})^2 \right\} ,\nonumber \\ k_r(s_k, s_{l} |\alpha _r, \rho _r)= & {} {\alpha _r}^2 \exp \left\{ - \frac{1}{2 {\rho _r}^2} (s_k- s_{l})^2 \right\} , \end{aligned}$$
(13)

in which \(\alpha _f\), \(\alpha _r\), \(\rho _f\), and \(\rho _r\) are hyperparameters \((\alpha _f, \alpha _r, \rho _f, \rho _r > 0)\). Note that \(\alpha _f\) and \(\alpha _r\) adjust the scale of the regression curve and \(\rho _f\) and \(\rho _r\) adjust the smoothness of the regression curve. Furthermore, in this study, we estimate the posterior of these hyperparameters by the MCMC method, assuming the following priors for these hyperparameters:

$$\begin{aligned} \alpha _f&\sim N_{+}(0, \sigma _\alpha ^2), \alpha _r \sim N_{+}(0, \sigma _\alpha ^2),\nonumber \\ \rho _f&\sim IG(g_a, g_b), \rho _r \sim IG(g_a, g_b), \end{aligned}$$
(14)

where \(N_{+}(0, \sigma _\alpha ^2) \) is the half-normal distribution and \(IG (g_a, g_b) \) is the inverse gamma distribution. In addition, we assume a half-normal distribution \(N_{+} (0, c^2)\) for the prior for \(\sigma \), which is a scale parameter of the asymmetric Laplace distribution in this study.

2.3 Posterior

Using the likelihood of Sect. 2.2.1 and the prior of Sect. 2.2.3, the posterior of all unknown parameters in BNQMs can be expressed by Bayes’ theorem as follows:

$$\begin{aligned} p(\varvec{\beta },\varvec{b}, \alpha _f, \alpha _r, \rho _f, \rho _r, \sigma ,\varvec{v}|\varvec{y})\propto & {} p(\varvec{y}|\varvec{\beta }, \varvec{b}, \sigma , \varvec{v} ) \times \nonumber \\&p(\varvec{\beta }|\alpha _f, \rho _f)p(\varvec{b}|\alpha _r, \rho _r)\times \nonumber \\&p(\alpha _f) p(\alpha _r) p(\rho _f) p(\rho _r)p(\sigma )p(\varvec{v}), \end{aligned}$$
(15)

where \(p(\varvec{b}|\alpha _f, \rho _r) = \prod _{i=1}^{N} p(\varvec{b}_i|\alpha _r, \rho _r)\).

In BNQMs, sampling is performed with the posterior distribution of equation (15) by the MCMC method. Specifically, the MCMC method uses an algorithm called the No-U-Turn sampler (NUTS; Hoffman and Gelman 2014), which is an extension of the Hamiltonian Monte Carlo (HMC) method (Duane et al. 1987). This algorithm has been implemented as a standard in the probabilistic programming language Stan (Stan Development Team 2020), and BNQMs have been implemented using Stan. The details of HMC and NUTS are described in appendix B. In this study, posterior mean values were used as point estimates of the unknown parameters, and the lower \(2.5\%\) and upper \(2.5\%\) points of the posterior were used to estimate each \(95\%\) credible interval.

Fig. 1
figure 1

Comparison of fitting by nonparametric quantile regression with Bayesian ridge regularization and GP prior regularization performed. Here, p is the number of basis functions

Fig. 2
figure 2

Comparison of fitting by nonparametric quantile regression with Bayesian ridge regularization and GP prior regularization performed (continuation of Fig. 1). Here, p is the number of basis functions

3 Numerical experiments

In this section, we evaluate the usefulness of BNQMs through numerical experiments. The target quantiles of this numerical experiment are set to be \(\tau =(0.10,0.25,0.50,0.75,0.90)\). The purpose of the proposed model is to appropriately capture the nonlinear transitions of quantiles. To evaluate whether BNQMs achieve this purpose, two types of data structures are considered:

  1. (S1)

    Symmetric distribution of y given t.

  2. (S2)

    Asymmetric distribution of y given t.

For S1, the error is generated from the standard normal distribution, while for S2, the error is generated from the chi-square distribution with three degrees of freedom to reproduce each situation. In S1 and S2, the observation \(y_{ij}\) at the j-th measurement point of the i-th subject is generated according to the following corresponding equations \((i =1, \cdots , 10, j=1,\dots , 50)\):

(S1)

$$\begin{aligned} y_{ij}= & {} 0.1\cos (3\pi t_{ij})+b_{i1}\sin (2\pi t_{ij})+\sigma \epsilon _{ij},\nonumber \\ \sigma= & {} 0.2\exp \{-\sin (2\pi t_{ij})+0.01\},\nonumber \\ \epsilon _{ij}\sim & {} N(0, 1), \end{aligned}$$
(16)

(S2)

$$\begin{aligned} y_{ij}= & {} 0.1\cos (3\pi t_{ij})-0.5+b_{i1}\sin (2\pi t_{ij})+\sigma \epsilon _{ij},\nonumber \\ \sigma= & {} 0.01\cos (3\pi t_{ij})+0.1,\nonumber \\ \epsilon _{ij}\sim & {} \chi ^2(3), \end{aligned}$$
(17)

where the measurement time \(t_{ij}\) is generated from the uniform distribution U(0, 1) and the random-effect \(b_{i1}\) is generated from \(N(0, 0.2^2)\).

From Eqs. (16) and (17), 100 sets of simulation data are generated. The data for each structure type are shown in Fig. 3, in which the red curves represent the true quantile structures \((\tau =0.10,0.25,0.50,0.75,0.90)\). As shown in Takeuchi et al. (2006), the true quantile structure \(Q^*_{y_{ij}}(\tau |t_{ij}) \) can be expressed as a product of the standard deviation \(\sigma \) and the inverse of the cumulative distribution function of the error \(F^{-1}(\tau )\). From this, the true fixed effects structures of the \(\tau \)-th quantile in S1 and S2 are given by the following equations:

(S1)

$$\begin{aligned} Q^*_{y_{ij}}(\tau |t_{ij})=0.1\cos (3\pi t_{ij})+0.2\exp \{-\sin (2\pi t_{ij})+0.01\}F^{-1}(\tau ), \end{aligned}$$
(18)

(S2)

$$\begin{aligned} Q^*_{y_{ij}}(\tau |t_{ij}) = 0.1\cos (3\pi t_{ij})-0.5+\{0.01\cos (3\pi t_{ij})+0.1\} F^{-1}(\tau ). \end{aligned}$$
(19)
Fig. 3
figure 3

True quantile structures (fixed-effects) for S1 (left) and S2 (right) expressed by Eqs. (18) and (19), respectively. The red curves represent the true structures when \(\tau = 0.90, 0.75, 0.50, 0.25, 0.10\), from top to bottom

We compared the estimation accuracy of the proposed BNQMs-GP with BNQMs-Ridge and the Bayesian nonparametric quantile regression model: BNQ-GP.

BNQMs-GP and BNQMs-Ridge are the same BNQMs with different regularization methods. By comparing these two models, we verify the usefulness of our proposed regularization using GPprior. In order to show that the method is robust in the case of an excessive number of basis functions, we prepared cases where the numbers of basis functions are \(p=15\) and \(p=30\) for the fixed-effects term and the random-effects term, respectively. Although the number of basis functions 30 is considered excessive for the nonlinear structure of the simulation data (S1, S2) that we have prepared, we will investigate the accuracy of curve fitting in this situation.

Unlike the proposed BNQMs-GP, the BNQ-GP is not a mixed-effects model. From the comparison of the two models, it was confirmed that the proposed model constructed within the framework of the mixed-effects model worked better for hierarchical data with large individual differences.

(BNQMs-GP)

$$\begin{aligned}&Q_{y_{ij}}(\tau |t_{ij})= \varvec{\beta }^{\top }\varvec{\phi }(t_{ij})+\varvec{b}_i^{\top }\varvec{\psi }(t_{ij}),\nonumber \\&\beta _k \sim GP(0, k_f(s_k, s_l)), k=1, \dots , p, \nonumber \\&b_{ik} \sim GP(0, k_r(s_k, s_l)), \nonumber \\&k_f(s_k, s_{l} |\alpha _f, \rho _f)= {\alpha _f}^{2} \exp \left\{ - \frac{1}{2 {\rho _f}^{2}} (s_k- s_{l})^2 \right\} ,\nonumber \\&k_r(s_k, s_{l} |\alpha _r, \rho _r)= {\alpha _r}^2 \exp \left\{ - \frac{1}{2 {\rho _r}^2} (s_k- s_{l})^2 \right\} . \end{aligned}$$
(20)

(BNQMs-Ridge)

$$\begin{aligned}&Q_{y_{ij}}(\tau |t_{ij})= \varvec{\beta }^{\top }\varvec{\phi }(t_{ij})+\varvec{b}_i^{\top }\varvec{\psi }(t_{ij}),\nonumber \\&\beta _k \sim N(0, \lambda _\beta ), k=1, \dots , p, \nonumber \\&b_{ik} \sim N(0, \lambda _b), \nonumber \\&\lambda _\beta \sim N(0, 1) \nonumber \\&\lambda _b \sim N(0, 1) \end{aligned}$$
(21)

(BNQ-GP)

$$\begin{aligned}&Q_{y_{ij}}(\tau |t_{ij})= \varvec{\beta }^{\top }\varvec{\phi }(t_{ij}),\nonumber \\&\beta _k \sim GP(0, k(s_k, s_l)), k=1, \dots , p, \nonumber \\&k(s_k, s_{l} |\alpha , \rho )= \alpha ^2 \exp \left\{ - \frac{1}{2 \rho ^2} (s_k- s_{l})^2 \right\} . \end{aligned}$$
(22)

Here, \(p = 15, 30 \), \( \alpha _f \sim N_{+}(0, 1)\), \(\alpha _r \sim N_{+} (0, 1)\), \(\rho _f \sim IG (6.28, 1.35)\), and \(\rho _r \sim IG (6.28, 1.35) \). The method for setting these hyperpriors is described in detail in A. As the weakly informative prior for the scale parameter \(\sigma \) of the asymmetric Laplace distribution, the half-normal distribution \( \sigma \sim N _ {+} (0, 1) \) was used. The estimation accuracy of each model is evaluated using the mean and standard deviation of the mean square error (MSE) and the coverage probability of \(95\%\) credible intervals. Here, MSE is expressed by the following equation:

$$\begin{aligned} \frac{1}{Nm}\sum _{i=1}^{N}\sum _{j=1}^{m}\{Q^{*}_{y_{ij}}(\tau |t_{ij})-\hat{Q}_{y_{ij}}(\tau |t_{ij})\}^2, N=10, m=50, \end{aligned}$$
(23)

where \(\hat{Q}_{y_{ij}}(\tau |t_{ij})\) is the estimated value of the fixed-effects term of each comparison model and \(Q^{*}_{y_{ij}}(\tau |t_{ij})\) is the true value. Note that the coverage probability of \(95\%\) credible intervals is the percentage of the \(95\%\) credible interval that contained \(Q^{*}_{y_{ij}}(\tau |t_{ij})\).

Tables 1 and 2 show the mean and standard deviation (SD) of 100 MSEs and the coverage probabilities obtained by fitting comparison models. A small MSE indicates an accurate estimation and a small SD indicates a stable estimation. For both S1 and S2, BNQMs-GP shows smaller mean values of MSEs at all quantiles than BNQMs-Ridge and BNQ-GP. These results show that the estimation accuracy of BNQMs, which are mixed-effects models, is higher than BNQ-GP, which is not a mixed-effects model. For BNQMs, the variation due to individual differences could be regarded as a random-effects term by the mixed-effects modeling, and thus a more accurate and stable estimate could be obtained. Since BNQ-GP is not a mixed-effects model, it cannot account for variations due to individual differences, resulting in large variations in estimated values. These results show that BNQMs can estimate the nonlinear structure of quantiles from hierarchical data more accurately than a non-mixed-effects model. Furthermore, it can be seen that the accuracy of BNQMs-Ridge drops significantly when the number of basis functions is p=30, while BNQMs-GP shows almost no change. This indicates that GPprior can be used to adjust the smoothness of the function appropriately when the number of basis functions is excessive, even though this is a complex task of performing quantile regression in a mixed-effects model.

The coverage probability was also the closest to 95% for BNQM-GP. However, there were cases where the coverage probability was lower than \(95\%\), and this occurred at the edges of the distribution. This may be due to the characteristics of quantile regression. Note that Yang et al. (2016) argue that interval estimation may not work well if confidence intervals are obtained directly from the posterior distribution with ALD as the working likelihood, because the posterior inference is not asymptotically valid. At this time, we do not deal with this existing underlying problem and consider it as future work, but more reliable interval estimation may be possible, for example, using bootstrap confidence intervals. As a result, we were also able to further assert the good points of the proposed method through this additional experiment. That is, the proposed method can maintain the accuracy of estimation at the edges even when the number of basis functions is excessive. When the number of basis functions is excessive, the regularization by Ridge makes the estimation quite unstable and the coverage immediately becomes worse at once. On the other hand, the proposed method was able to suppress the decline of the coverage at the edges of the distribution. This is a value merit of the proposed method that is not found shared with other methods.

Table 1 Mean (\(\times 10^{-3}\)) and SD (\(\times 10^{-3}\)) of MSE, and coverage probability in S1
Table 2 Mean (\(\times 10^{-3}\)) and SD (\(\times 10^{-3}\)) of MSE, and coverage probability in S2
Fig. 4
figure 4

Boxplots of 100 MSEs for S1

Fig. 5
figure 5

Boxplots of 100 MSEs for S2

4 Application

In this section, we present a real data analysis using the proposed model. To clarify the cortisol secretion circadian rhythm in infants, longitudinal data of urinary cortisol concentrations were collected at Kurume University School of Medicine. The sample size of these data was 455 and the subjects comprised 12 infants aged 31 to 124 days after birth (approximately 1–4 months of age) whose urinary cortisol levels were repeatedly measured over several days. The data are color-coded in Fig. 6 by subject.

Since the number of measurement points per day and the measurement interval vary, and are sparse for each individual, a mixed-effects model is considered to be suitable. Although the secretion rhythm of cortisol is known to have a nonlinear structure, the secretion rhythm of infants has not been clarified (de Weerth et al. 2003), so a nonlinear equation cannot be assigned to the model in advance. Therefore, the proposed model, which can estimate the nonlinear structure from the data by basis expansions, is useful in this situation. The role of a BNQM in an application to cortisol data is to estimate the overall nonlinear structure (namely, the circadian rhythm) as a fixed-effects term while capturing fluctuations due to individual differences as a random-effects term. In addition, the proposed model estimates the quantile transitions of cortisol values. This has two main advantages. The first advantage is that the model can appropriately handle the problem that the distribution of cortisol concentration is asymmetric. There was a small number of individuals with extremely high cortisol concentrations in these data. In other words, the distribution of cortisol concentrations has a long right tail. The mean is greatly influenced by these extreme values, but quantile regression can instead focus on the transition of the median (50th quantile), which is less sensitive. The second advantage is that it is possible to capture the difference in rhythm based on the cortisol concentration itself. It is possible to compare the estimation results of multiple quantiles, and we expect to obtain useful information that could not be obtained by looking at the average. In this study, the 10th, 25th, 50th, 75th, and 90th quantiles were analyzed, and the rhythm differences between high, middle, and low cortisol levels were considered.

Fig. 6
figure 6

Line graph of cortisol secretion over time with the lines color-coded by subject. The upper graph shows all of the infants, and the three lower graphs show individual infants as examples. Some patterns, such as ID3, stay at a low value, while others, such as ID6 and ID8, are highly variable

The settings of the BNQM used in the analysis are as follows:

$$\begin{aligned} Q_{y_{ij}}(\tau |t_{ij})= & {} \varvec{\beta }^{\top }\varvec{\phi }(t_{ij})+\varvec{b}_i^{\top }\varvec{\psi }(t_{ij}),\nonumber \\ \beta _k\sim & {} GP(0, k_f(s_k, s_l)), k=1, \dots , p, \nonumber \\ b_{ik}\sim & {} GP(0, k_r(s_k, s_l)), \nonumber \\ k_f(s_k, s_{l} |\alpha _f, \rho _f)= & {} {\alpha _f}^{2} \exp \left\{ - \frac{1}{2 {\rho _f}^{2}} (s_k- s_{l})^2 \right\} ,\nonumber \\ k_r(s_k, s_{l} |\alpha _r, \rho _r)= & {} {\alpha _r}^2 \exp \left\{ - \frac{1}{2 {\rho _r}^2} (s_k- s_{l})^2 \right\} , \end{aligned}$$
(24)

where the response variable \(y_{ij} \) is the urinary cortisol concentration at the j-th measurement point of the i-th subject, and \(t_{ij}\) is the corresponding measurement time. Note that \(y_{ij} \) and \(t_{ij}\) were normalized to have a maximum value of 1 and a minimum value of 0 before they were analyzed. Specifically, the normalized value \(x_i^{*}\) is expressed by the following formula, where \(\varvec{x}\) is the data to be normalized and \(x_i\) is the i-th value of \(\varvec{x}\):

$$\begin{aligned} x_i^{*} = \frac{x_{i}-\min (\varvec{x})}{\max (\varvec{x})-\min (\varvec{x})}. \end{aligned}$$
(25)

Then, the result was determined by converting inversely. Again, \(p = 15 \), \( \alpha _f \sim N_{+}(0, 1)\), \(\alpha _r \sim N_{+} (0, 1)\), \(\rho _f \sim IG (6.28, 1.35)\), and \(\rho _r \sim IG(6.28, 1.35) \), as described in appendix A. As the weakly informative prior for the scale parameter \(\sigma \) of the asymmetric Laplace distribution, the half-normal distribution \( \sigma \sim N_{+} (0, 1) \) was used.

Fig. 7
figure 7

Estimation curves and \(95\%\) credible intervals

Cortisol has a role as a biomarker of biological rhythms (Weitzman et al. 1971). After a biological rhythm has been established, the secretion rhythm of cortisol is monophasic, with the highest level in the morning and the lowest level around midnight (Krieger et al. 1971). However, the subjects in this study were infants between 31 and 124 days old, which is an ambiguous period when biological rhythms are beginning to be established (Kinoshita et al. 2016; Kidd et al. 2005). Therefore, we analyzed their circadian rhythms using the proposed model and considered the results.

Figure 7 shows the 10th, 25th, 50th, 75th, and 90th quantile curves estimated by the BNQM. Furthermore, the \(95\%\) credible interval for the estimated curve of each quantile was determined and is included in Fig. 7. As shown, cortisol secretion tends to increase twice, in the morning and evening. This suggests that the cortisol secretion rhythm of the infants in this study was bimodal. This result could be obtained due to the use of the basis expansion method. In addition, the range of variation increases with the quantile. In other words, at higher cortisol levels, the variation in cortisol level is greater. Identifying this trend is a contribution of quantile regression.

5 Conclusion

In this study, we proposed Bayesian nonparametric quantile mixed-effects models (BNQM), which enable quantile regression considering the hierarchical structure of data by mixed-effects modeling. Furthermore, using a basis expansion method for both the fixed-effects term and random-effects term, it is possible to deal with cases in which the nonlinear structure differs greatly between subjects.

In the mixed-effects model based on the basis expansion method, it is necessary to optimize the combination of the number of basis functions and the regularization parameters, which is usually difficult from the viewpoint of computational complexity. Therefore, for BNQMs, we proposed using a new Bayesian regularization method, in which the coefficient parameters of the basis functions are assumed to follow a Gaussian process (GP) prior. Using this regularization method, BNQMs can appropriately adjust the smoothness of the regression curves even when the number of basis functions is too large for the complexity of the nonlinear structure. Using BNQMs can also reduce the burden of selecting the number of basis functions. Moreover, because BNQMs are based on a fully Bayesian method, they do not require repeated point estimation for determining the optimum value from candidate values of the hyperparameter of the GP prior by comparison using an information criterion. The proposed method also has the advantage that a credible interval can be estimated directly, instead of the confidence interval being approximated by the bootstrap method, which is an iterative method.

The performance of BNQMs was evaluated by a Monte Carlo simulation. The proposed BNQMs showed the highest estimation accuracy for each data structure and were shown to be useful as a quantile regression technique in hierarchical data with a nonlinear structure. Then, a BNQM was applied to longitudinal data of cortisol in infants. The results suggested that the cortisol secretion rhythm in infancy is bimodal, and the magnitude of the amplitude increases as the cortisol level itself increases.

Finally, we discuss future work. In this study, we adopted the asymmetric Laplace distribution as the working likelihood because it is the most commonly used. However, studies that include distributions of the SKD family, mentioned in Galarza et al. (2017) and Wichitaksorn et al. (2014), as a working likelihood option would be useful for further improving the estimation accuracy of the model. Also, in this study, the credible interval was obtained directly from the posterior distribution, whereas it is inherently ideal to adjust the posterior variance for cases where the asymmetric Laplace distribution is incorrectly specified, as argued by Yang et al. (2016), because the posterior inference would not be asymptotically valid. Discussions of the choice of working likelihoods and the associated asymptotic validity of the posterior distribution such as these are beyond the scope of the present paper, but certainly an interesting topic for future research. The next step could also be to go beyond simple curve fitting and extend the method to additive or semi-parametric models with more terms added to the current model in order to adjust for confounding.