1 Introduction

Functional magnetic resonance imaging (fMRI) is a non-invasive neuroimaging technique which measures the blood-oxygenation-level dependent (BOLD) contrast, i.e., the difference in magnetization between oxygenated and deoxygenated blood arising from changes in regional cerebral blood flow. In a typical task-related fMRI experiment, a subject is presented a set of stimuli while the whole brain is scanned at multiple time points. Each scan is arranged as a 3D array of volume elements (or “voxels”), and the experiment produces time series of BOLD responses acquired at each voxel.

Common modeling approaches for the analysis of task-related fMRI data rely on the general linear model formulation that was first proposed by Friston et al. [10] and subsequently investigated by many other authors, particularly for single-subject data, see for example [11, 12, 17, 19, 25, 33, 36, 38], among many others. Many of these models incorporate the complex spatial and temporal correlation structure of the fMRI data. Bayesian approaches, in particular, allow flexible modeling of spatial and temporal correlations via suitable prior models and can achieve increased signal detection and fewer false-positive counts with respect to simpler approaches that do not appropriately account for the spatio-temporal variability of the data. See for example [39] for a review of recent Bayesian models.

For multi-subject studies, two-stage “group analysis” approaches are often adopted as computationally attractive methods where summary estimates of model parameters are obtained at the individual level and then used in a second-stage model at the group/population level, see for example [2, 15, 18, 27, 29]. Also, newer data-driven methods for analyzing fMRI, for example, those that use model-free methods such as independent component analysis (ICA) and tensor-product ICA (T-PICA), have been developed to detect the presence of subgroups of participants within a population as in Cerliani et al. [4], but these approaches still involve multiple estimation steps, and therefore do not properly take into account variability and heterogeneity in the data.

In this paper, we review and extend a unified, single-stage Bayesian approach for the analysis of task-related brain activity proposed by Zhang et al. [40]. This model formulation considers a spatio-temporal linear regression model that specifically accounts for between-subject heterogeneity in neuronal activity via a spatially informed multi-subject non-parametric variable selection prior. This effectively captures correlation among time-series voxels within and across subjects, by inducing clustering among voxels within a subject at one level of the hierarchy and between subjects at the second level. In the fMRI literature, capturing statistical dependence among possibly remote neurophysiological events is often viewed as an aspect of “functional” connectivity [9, 13]. The approach of Zhang et al. [40] further takes into account the spatial proximity of potential activations within a subject by employing a Markov random field (MRF) prior on the selection indicators of the spike-and-slab distribution. Posterior inference is performed via fast Variational Bayes (VB) algorithms. We extend the modeling approach of Zhang et al. [40] to allow for multiple stimuli and different choices of the hemodynamic response function. Furthermore, we consider formulations of the model that can be used to analyze either voxel-based 2D slices or 3D data in the form of brain parcellations. We also discuss the non-parametric prior formulation in the case of a single-subject analysis. Finally, we show how to derive contrast maps based on the VB output.

A well-recognized challenge in the use of Bayesian models with complex methods is the lack of user-friendly software that can be used by practitioners to apply the methods to their experimental data. In an attempt to narrow the gap, we introduce NPBayes-fMRI, a MATLAB GUI that implements the non-parametric spatio-temporal models described in the paper. The GUI comprises two components, one for model fitting and another one for visualization of the results. Within the model fitting interface, the user can define the type of analysis (voxel-based or whole-brain parcellation into regions of interest, i.e., ROIs) and the model parameters. Users have the option of a pre-defined default setting for all parameters that also allows customized choices. The GUI also accommodates single-subject analyses. The VB algorithm can be run within the GUI. After running the algorithm, the output file can be uploaded via the visualization interface and used to plot subject-level activation maps, contrasts maps, and cluster-defined averaged \(\beta \)-maps. No additional MATLAB toolboxes are required to run NPBayes-fMRI.

The rest of the paper is organized as follows: Section 2 introduces the spatio-temporal model, the non-parametric variable selection prior, and the methods for posterior inference, for both multiple and single-subject data. Section 3 describes the NPBayes-fMRI MATLAB GUI. Section 4 provides illustrations of the methods using the MATLAB GUI. Section 5 concludes the paper.

2 Methods

In this section, we review the general framework of the model as proposed by Zhang et al. [40]. We first consider multiple subjects and then describe the simplified model for single-subject analysis. In both cases, we provide the formulation of the model for the general case of multiple tasks.

2.1 Multi-Subject Spatio-Temporal Model

For subject \(i=1, \ldots , N\), let \(Y_{i\nu }=(Y_{i\nu 1}, \ldots , Y_{i\nu T})^T\) be the vector of the BOLD response data at voxel \(\nu \), with \(\nu =1, \ldots , V\). We model the data as

$$\begin{aligned} Y_{i\nu }=X_{i\nu }\beta _{i\nu }+\varepsilon _{i\nu }, \; \varepsilon _{i\nu }\sim N_T(0, \Sigma _{i\nu }), \end{aligned}$$
(1)

where \(X_{i\nu }\) is a known \(T\times p\) covariate matrix and \(\beta _{i\nu }=(\beta _{i\nu 1}, \ldots , \beta _{i\nu p})^T\) is a \(p\times 1\) vector of regression coefficients. Without loss of generality, we center the data and thus do not include the intercept term in the model. Additionally, as typical with multiple subjects data, to make the BOLD signal levels consistent across subjects, we consider transformed the data by percent signal change normalization, i.e., \(y_{ivt}^{*}=y_{ivt}/\bar{y}_{iv}\times 100\), where \(y_{ivt}\) denotes the BOLD signal of subject i in voxel v at time point t, and \(\bar{y}_{iv}\) the mean signal level across the T time points. Let \(X_{i\nu j}\) be the jth column of \(X_{i\nu }\). Then \(X_{i\nu j}\) is modeled as the convolution of the j-th stimulus pattern with a hemodynamic response function (HRF) [3], that is,

$$\begin{aligned} X_{i\nu j}(t)=\int _0^tx_j(s)h_{\lambda _{i\nu j}}(t-s)ds, \end{aligned}$$
(2)

where \(x_j(s)\) represents the stimulus pattern. One common choice is a Poisson HRF, that is \(h_{\lambda _{i\nu j}}=\exp (-\lambda _{i\nu j})\lambda _{i\nu j}^t/t\). The parameter \(\lambda _{i\nu j}\) can be interpreted as the delay of the response with respect to the stimulus onset and it is often modeled as an unknown voxel-dependent parameter. Other popular choices are a canonical HRF, that is, \(h_{A_{i\nu j}}= A_{i\nu j}\big (\frac{t^{\alpha _{1}-1}\beta _{1}^{\alpha _{1}}}{\Gamma (\alpha _{1})}-c \frac{t^{\alpha _{2}-1}\beta _{2}^{\alpha _{2}}}{\Gamma (\alpha _{2})}\big ) \), where \(\alpha _{1}=6,\alpha _{2}=16,\beta _{1}=\beta _{2}=1,c=1/6\), and a gamma HRF, that is, \(h_{a_{i\nu j},b_{i\nu j}}=\frac{b_{i\nu j}^{-a_{i\nu j}}}{\Gamma {(a_{i\nu j})}}t^{a_{i\nu j}-1}exp(-x/b_{i\nu j})\) [19].

The error term in equation (1) is modeled as a long memory process. Specifically, the covariance matrix is written as \(\sum _{i\nu }(t,s)=[\gamma (|t-s|)]\), with the auto-covariance function \(\gamma (h)\) defined as

$$\begin{aligned} \gamma (h)\sim Ch^{-\alpha }, \end{aligned}$$
(3)

with \(C>0,0<\alpha <1\), and h large. This choice accounts for low-frequency noise which induces slow changes in voxel intensity over time, such as scanner drift, and for physiological noise, due to patient motion, respiration, and heartbeat causing fluctuations in signal across both space and time. In an analysis of single-subject fMRI data, [38] show that such modeling strategy improves the deconvolution of the signal and the noise, leading to the detection of more localized, fewer false-positive, and sparser activations with respect to using auto-regressive error structures.

Discrete wavelet transforms (DWT) are often employed in the fMRI literature as a way to decorrelate the data [6, 16, 20, 27, 38]. After applying the DWT to equation (1) the model in the wavelet domain can be written as

$$\begin{aligned} Y_{i\nu }^*=\sum _{j=1}^pX_{i\nu j}^*\circ \beta _{i\nu j}+\varepsilon _{i\nu }^*, \; \varepsilon _{i\nu }^*\sim N_T(0, \Sigma _{i\nu }^*), \end{aligned}$$
(4)

with \(\circ \) the element-by-element (Hadamard) product, and where W is a \(T\times T\) matrix corresponding to the wavelet transform, \(Y^*_{i\nu }=WY_{i\nu }, X^*_{i\nu }=WX_{i\nu }\), and \(\varepsilon _{i\nu }^*=W\varepsilon _{i\nu }\), and with the covariance matrix \(\Sigma _{i\nu }^*\) approximately diagonal with elements \(\psi _{i\nu }\sigma ^2_{imn}\) indicating the variance of the nth wavelet coefficient at the mth scale. We follow the variance progression method of Wornell and Oppenheim [35] for the wavelet coefficients,

$$\begin{aligned} \psi _{i\nu }\sigma ^2_{imn}=\psi _{i\nu }(2^{\alpha _{i\nu }})^{-m}, \end{aligned}$$
(5)

with \(\psi _{i\nu }\) the innovation variance and \(\alpha _{i\nu }\in (0, 1)\) the long memory parameter. This structure encompasses the general fractal process given above, which includes long memory.

2.2 Non-parametric Variable Selection Prior

Detecting voxels that activate in response to a stimulus is equivalent to identifying the non-zero regression coefficient \(\beta _{i\nu j}\) in model (4). Zhang et al. [40] embed the selection into a clustering framework, effectively defining a multi-subject non-parametric variable selection prior with spatially informed selection within each subject. More specifically, they employ a hierarchical Dirichlet Process (HDP) prior [31], which implies that the non-zero \(\beta \)’s within subject i are drawn from a mixture model and possibly shared between subjects. The HDP prior construction effectively captures correlation among time-series voxels within and across subjects, by inducing clustering among voxels within a subject at one level of the hierarchy and between subjects at the second level. This allows, in particular, to capture spatial correlation among potential activations of distant voxels, within a subject, while simultaneously borrowing strength in the estimation of the parameters from subjects with similar activation patterns. For the multi-stimuli formulation of the model, let \(\gamma _{i\nu j}\) be a binary indicator of whether a given voxel is activated or not under stimulus j, that is, \(\gamma _{i\nu j}=0\) if \(\beta _{i\nu j}=0\) and \(\gamma _{i\nu j}=1\) otherwise. A spiked non-parametric prior is imposed on the coefficients \(\beta _{i\nu j}\), i.e., a spike-and-slab prior where the slab distribution is modeled as a non-parametric prior, as

$$\begin{aligned} \beta _{i\nu j} | \gamma _{i\nu j}, G_i\sim & {} \gamma _{i\nu j} G_{ij}+(1-\gamma _{i\nu j})\delta _0, \end{aligned}$$
(6)

where \(\delta _{0}\) is a point mass at zero and G denotes a known distribution. With multiple subjects, a hierarchical Dirichlet process (HDP) prior can be specified as the non-parametric slab,

$$\begin{aligned} G_{ij}|\eta _1, G_0\sim & {} DP(\eta _1, G_0)\nonumber \\ G_0|\eta _2, P_0\sim & {} DP(\eta _2, P_0)\nonumber \\ P_0= & {} N(0, \tau ), \end{aligned}$$
(7)

where \(\tau ,\eta _{1},\eta _{2}\) are fixed parameters and \(P_{0}\) is the base distribution. Parameters \(\eta _{1},\eta _{2}\) control the variability of the coefficients at the subject and population level, respectively. The HDP prior consists of two levels of hierarchy, which induce clustering among voxels within a subject on one level and between subjects on the second level. This construction enables the model to borrow information from subjects exhibiting similar activation patterns in estimating parameters of interest and also capture spatial correlation among distant voxels. Using both simulated and real data, [40] show increased detection power and lower numbers of false-positive calls with respect to common two-stage estimation approaches which separate the inference on the individual fMRI time courses from the inference at the population level.

In addition to the prior construction above, spatial correlation among neighboring voxels within a subject is modeled via a Markov random field (MRF) prior imposed on \(\gamma _{i\nu j}\),

$$\begin{aligned} P(\gamma _{i\nu j}|d, e, \gamma _{ikj}, k\in N_{i\nu }) \sim \exp \left( \gamma _{i\nu j}\left( d+e\sum _{k\in N_{i\nu }}\gamma _{ikj}\right) \right) , \end{aligned}$$
(8)

with \(N_{i\nu }\) the set of neighboring voxels of voxel \(\nu \) for subject i, and \(p(\gamma _{i\nu })=\prod _{j=1}^p p(\gamma _{i\nu j})\). This prior reduces to an independent Bernoulli with parameter \(\exp (d)/[1+\exp (d)]\) if a voxel does not have any neighbors. The sparsity parameter \(d\in (-\,\infty ,\infty )\) in (8) represents the expected prior number of activated voxels, while the smoothness parameter \(e>0\) controls the probability of identifying a voxel as active based on the activation of the neighboring voxels. The use of MRF priors has become quite popular in recent years in the Bayesian modeling of fMRI data [17, 28, 37, 38].

The prior model is completed by considering a uniform prior distribution on the delay parameter, \(\lambda _{i\nu j} \sim \text {U}(u_1, u_2)\), for a Poisson HRF, or the amplitude parameter, \(A_{i\nu j} \sim \text {U}(u_1, u_2)\), for a Canonical HRF, or the shape and scale parameters, \(a_{i\nu j} \sim \text {U}(u_1, u_2), b_{i\nu j} \sim \text {U}(u_3, u_4)\), for a gamma HRF. Also, an Inverse Gamma (IG) prior is imposed on the innovation variance parameter, \(\psi _{i\nu }\sim \text {IG}(a_0, b_0)\), and a Beta distribution on the long memory parameter, \(\alpha _{i\nu }\sim \text {Beta}(a_1, b_1)\).

2.3 Single-Subject Modeling

For a single subject, the non-parametric prior reduces to a Dirichlet process (DP) prior of the type

$$\begin{aligned} G_0|\eta , P_0\sim & {} DP(\eta , P_0)\nonumber \\ P_0= & {} N(0, \tau ), \end{aligned}$$
(9)

where \(\tau \),\(\eta \) are fixed and \(P_{0}\) is the base distribution. The mass parameter \(\eta \) is used to regulate the variability of the coefficients at the subject level. This prior allows, in particular, to capture spatial correlation among activations of distant voxels. In the fMRI literature, capturing statistical dependence among possibly remote neurophysiological events is often viewed as an aspect of “functional” connectivity [9, 13].

2.4 Prior Specification

We provide here some general guidelines on the choice of the prior hyperparameters. Zhang et al. [40] comment on the sensitivity of their results to different choices of the priors when using simulated data. The authors notice that, in general, modest changes of the values of the variance parameter \(\tau \) in the base measure of the HDP prior and of the hyperparameters \(a_0, b_0, a_1, b_1\), of the priors on the variance parameters \(\psi \)’s and the long memory parameters \(\alpha \)’s, did not affect the accuracy of the estimation results. Consequently, these parameters can be set to default values that correspond to vague or uninformative priors. Employing non-informative or vague priors is a common choice in Bayesian statistics in the absence of prior knowledge about the unknown parameters. Here, in particular, the hyperparameter \(\tau \) can be set to a large value (e.g., \(>50\)), non-informative priors can be specified on the long memory parameters by setting \(a_1=b_1=1\) and vague priors on the innovation variance parameters by setting \(a_0=3, b_0=2\). As for the concentration parameters \(\eta _1\) and \(\eta _2\) of the HDP prior, larger values of these parameters tend to generate larger numbers of components across and within-subjects when fitting the model. We recommend using a non-informative setting by setting \(\eta _1=\eta _2=1\). Also, vague specifications can be adopted on the parameters of the HRF, for example, by specifying a uniform prior with, say, \(u_1=0, u_2=8\), in the case of a Poisson HRF, and similarly for the other HRFs.

Some sensitivity should be expected in regards to the MRF parameters. In particular, as noted by [40], larger values of d or e lead to lower FNRs, at the expense of higher FPRs and lower precisions. We suggest fixing d to reflect a prior belief in a sparse model. For example, \(d=-\,2.5\) implies that the prior probability of activation is less than \(10\%\) when a voxel has no neighbors. Also, a value of e in the range (0.3–0.5) generally results in values below the phase transition point, which can be estimated using the algorithm proposed by Propp and Wilson [24].

2.5 Model Fitting by Variational Bayes

Variational Bayes (VB) algorithms are an alternative method for posterior inference that, unlike MCMC methods, does not rely on numerical integration. Variational Bayes methods have been employed successfully in Bayesian models for single-subject fMRI data [8, 14, 22, 23, 34]. These methods find an optimal approximation to the posterior that minimizes the Kullback–Leibler (KL) divergence. Typically, VB approaches provide good estimates of means, although they tend to underestimate posterior variances and also to poorly estimate the correlation structure of the data [1, 26]. This can still be an acceptable trade-off for our inferential purposes, as we are only interested in the identification of broad areas of activations. Indeed, [40] perform a thorough comparison between MCMC and VB on simulated data, showing very good performance of the VB algorithm in the estimation of the model parameters. They also notice a remarkable improvement in computing time, with 1000 MCMC iterations taking approximately 7 h, on a double core ®Intel ®Xeon processor with 16 GB of memory, 2.2 GHz., while a VB with 50 inner loop iterations and 100 outer loop iteration took approximately 34 min.

When using VB methods within HDP frameworks, such as the spiked HDP prior distribution (7) on the \(\beta \)’s parameters, it is beneficial to employ the truncated stick-breaking construction, to exploit conjugacy and allow for analytically tractable updates of the parameters [32]. In our model formulation, the parameters of the HRF appear through convolution (2) and the \(\alpha \)’s via the variance progression formula (3). This makes it impossible to derive analytically tractable updates for these parameters. Zhang et al. [40] address the problem by combining the VB algorithm with an importance sampling procedure. The resulting algorithm has two major components. The first component (inner loop) approximates the posterior distribution of the regression coefficients, the selection parameters, and the innovation variance parameters via mean field variational inference with a coordinate ascent algorithm. The second component (outer loop) estimates the parameters of the HRF and the \(\alpha \)’s via importance sampling, with the importance sampling weights calculated based on the optimal solution from the first component. A schematic representation of the algorithm is given in Table 1.

Table 1 VB Algorithm (with Poisson HRF)

2.6 Posterior Inference

For posterior inference, primary interest is in the estimation of the selection parameters, \(\gamma \), and the regression coefficients, \(\beta \). These can be used to obtain activation maps, by subject and by stimulus. Using the output from the VB algorithm, posterior probabilities of inclusion (PPIs) for stimulus \(j, p(\gamma _{i\nu j}=1)\), for \(j = 1,\ldots P\), are approximated as weighted averages of the variational distribution values \(q(\gamma _{ivj}=1)\) estimated across the iterations of the outer loop of the algorithm (see Table 1). Activation maps can then be obtained by thresholding the PPIs using a threshold value to ensure a pre-defined Bayesian false discovery rate (FDR) [5, 21, 30]. For subject i and stimulus j, the Bayesian FDR is defined as

$$\begin{aligned} FDR_{ij}(\kappa _ij)=\frac{\sum _{v=1}^{V}(1-PPI_{ivj})I_{(PPI_{ivj}>\kappa _{ij})} }{\sum _{v=1}^{V}I_{(PPI_{iv}>\kappa _{i}})}, \end{aligned}$$
(10)

where \(PPI_{ivj}\) is the PPI for subject i at voxel v and stimulus j, and \(I_{(PPI_{ivj}>\kappa _{ij})}\) is the indicator function such that \(I_{(PPI_{ivj}>\kappa _{ij})}=1\) if \(PPI_{ivj}>\kappa _{ij}\) and 0 otherwise, with \(\kappa _i\) a threshold value. In the data analyses, one can set the FDR to a pre-specified value, typically 0.05 or 0.1, and then choose \(\kappa _i\) accordingly. This produces a spatial mapping of the activated brain regions, for each subject. Corresponding posterior \(\beta \)-maps can be calculated by estimating the \(\beta \) coefficients via weighted averages of the variational distribution values, on active voxels.

An additional feature of our modeling approach is that the use of the non-parametric HDP prior construction (6) can be exploited to obtain a clustering of the subjects for possible discovery of differential activations. For an individual stimulus, and given a pre-specified threshold (or FDR) value on the PPIs, a dissimilarity matrix can be calculated based on the squared Euclidean distances between each pair of subjects as

$$\begin{aligned} d_{ii^{'}}=\sqrt{(\hat{B}_i-\hat{B}_{i^{'}})^T(\hat{B}_i-\hat{B}_{i^{'}})}, \end{aligned}$$

with \(\hat{B}_i\) denoting the posterior estimate of \(B_i=(\beta _{i1j}, \ldots , \beta _{ivj})^T\). The dissimilarity matrix can then be transformed into a tree via hierarchical clustering and a dendrogram can be obtained using the linkage method with Ward’s minimum variance. An optimal number of clusters can finally be selected by visual inspection of the dendrogram and group-level \(\beta \)-maps can be calculated by averaging the posterior maps of the non-zero \(\beta \) coefficients in each cluster.

Finally, when analyzing experimental data with multiple stimuli, contrast maps can be produced to compare the effects of different treatments, by subject, by estimating probability maps of the type \(p(\beta _{j}-\beta _{j'}>\kappa )\), with j and \(j'\) a pair of stimuli and \(\kappa \) a pre-defined hypothesized value. Within the VB framework, a contrast map can be obtained by thresholding the probabilities

$$\begin{aligned} p_{v}=\sum _{l=1}^{L}\hat{w}_{vl}I_{((\sum _{j=1}^{J}\pi _{j}\cdot \beta _{vjl})>\kappa )}, \end{aligned}$$
(11)

with \(\pi =\pi _{1}, \ldots , \pi _{J} \) a contrast weight vector summing to 0, L the number of outer loop VB iterations, \(B_{vjl}\) the \(V\times J\times L\) matrix storing the updated \(\beta \) value for all voxels and stimuli, across the L iterations, and \(\hat{w}_{vl}\) the normalized importance weights. For each subject i and outer loop iterations \(l=1,\ldots L\), the importance weight is computed as

$$\begin{aligned} w_{vl}=\frac{q(\alpha _{iv}^{(l)},\lambda _{iv}^{(l)})}{\tilde{p}(\alpha _{iv}^{(l)},\lambda _{iv}^{(l)})}, \end{aligned}$$
(12)

with \(q(\alpha _{iv}^{(l)},\lambda _{iv}^{(l)})\) the variational distribution of \((\alpha _{iv}^{(l)},\lambda _{iv}^{(l)})\) and \(\tilde{p}(\alpha _{iv}^{(l)},\lambda _{iv}^{(l)})\) the importance sampling density of \((\alpha _{iv}^{(l)},\lambda _{iv}^{(l)})\). Once all outer loop iterations terminate, the importance weight is normalized to obtain \(\hat{w}_{vl}\).

3 The NPBayes-fMRI GUI

We now provide a detailed description of the NPBayes-fMRI MATLAB GUI that implements the spatio-temporal general linear regression models described in the previous Section. The GUI comprises two main interfaces, one for model fitting and one for the visualization of the results.

3.1 Model Fitting

For model fitting, a set of parameters must be defined using the NPBayes-fMRI: Model Fitting GUI shown in Fig. 1, by first selecting the object in the listbox and then clicking the Specify button:

Fig. 1
figure 1

NPBayes-fMRI: Main interface of the Model Fitting GUI

Output: The user is asked to specify the directory where the output of the VB algorithm will be saved. Once the model is run successfully, a result.mat file will be generated in the output directory.

Number of Subjects: The user is asked to specify the number of subjects that are being used for the analysis. When this variable is set to 1, a DP will be used for the slab distribution in equation (6), while a HDP is used otherwise.

2D or 3D Analysis: This option allows to specify the type of analysis, that is, whether it is performed on a single 2D slice or on a 3D whole-brain parcellation. If 2D is selected, then the user is prompted to define the threshold for 2D image and the dimension of the 2D slice. The threshold is a value used to define the gray matter mask on the fMRI data (see Data Files below). The parameter dimension of the 2D slice is defined by the number of rows and columns of the fMRI slice. The number of rows and columns must be such that their product is equal to V. These arguments will be used later for visualization of the results. If 3D is selected, the following arguments must be defined:

Matrix of ROI names: This is a .mat file containing a variable named ROI_names, which is a \(1\times V\) cell, with each cell entry containing the brain regions names as defined by the parcellation, with the .nii extension (e.g., Amygdala_L.nii).

Neighbor matrix: This is a .mat file containing a variable named nei_vec, whose first row contains the indices of those regions that are neighbors to the region whose index appears on the second row. For example, if region 5 has regions 6, 10, 15, 20 as its neighbors, then nei_vec will have entries \(\begin{bmatrix} 6&10&15&20 \\ 5&5&5&5 \end{bmatrix}.\)

This information is also used to specify the MRF prior (8) by calculating Euclidean distances between the centroids of the ROIs.

ROI NIFITI Directory: This is the directory where all NIFTI files for the ROIs are stored. The names of these files should be equal to the names provided by the Matrix of ROI names. These files will be used to map back the inference results into 3-Dimension for visualization.

Brain Template Image: For better visualization of the inference results, the user has the option to upload a brain template NIFTI file. When available, the brain template will be used as a background image, and the resulted image from running the model will be overlayed on top. This will only work when the dimension of the brain template file is equivalent to the dimensions of the ROI NIFTI images. If a template image is not provided, the visualizations will take place without a background image.

Data Files: The user needs to load a .mat file that consists of two matrices: xtdat, a \(T\times P\) binary design matrix, with T the number of time points and P the number of stimuli, and y_dat, a \(T\times (N\times V)\) matrix of BOLD signals, with N the number of subjects and V the number of voxels (for 2D analyses) or ROIs (for 3D analyses) . So if we let \(y_i\) be a \(N\times V\) BOLD signals for subject i, y_dat = \((y_1,y_2,\ldots ,y_N)\). For 2D analyses, a gray matter mask is applied to the fMRI data and inference is performed based on those voxels where y_dat is greater than the threshold specified by the user. For 3D analyses, y_dat should contain the voxel time-series data averaged by ROI, listed in the same order provided by the Matrix of ROI names .mat file. For both 2D and 3D analyses, the percent signal change normalization and the DWT are applied as part of the model fitting stage. For DWT, Daubechies minimum phase wavelets with 4 vanishing moments are used.

Parameter Setting: The software includes a pre-defined default setting for all hyperparameters and VB parameters. The user also has the option of setting some of the model parameters manually, including selecting the type of HRF distribution, the prior setting for the HRF parameters and the MRF parameters, and the number of VB iterations. In the default parameter setting, the Poisson distribution is automatically selected as the HRF distribution, with its default prior setting.

Once all the variables have been defined, the Run Model button in the GUI will turn green from red, and pressing the button will start the algorithm. If one chooses to run the model later, it is also possible to press the Save Batch button to store the model specification. In this case all the parameter settings are saved in a .mat file, which can later be loaded by clicking on the Load Batch button.

Fig. 2
figure 2

NPBayes-fMRI: Main interface of the Visualization GUI

Fig. 3
figure 3

NPBayes-fMRI: Visualization. Interface for viewing activation maps by subject. The Viewing Options tab allows the user to view all stimuli at once or one at a time. By clicking on the Map Type, Range and Colormap pop-up menus, the user can define the type of maps to visualize, set the axes ranges and the desired colormap setting. The bigger slider adjusts the PPI threshold and FDR value, the smaller one controls the transparency of the activation map when a Brain Template Image has been uploaded for 3D Analysis. The Multi-Slice option can be used to view multiple slices of the brain in one particular orientation for a given stimulus

Fig. 4
figure 4

NPBayes-fMRI: Visualization. Interface for viewing activation maps by cluster, for a given stimulus. The user selects the stimulus and the PPI (or FDR) threshold. The corresponding dendrogram will be displayed, and the user can then specify the number of clusters and click on the Load Cluster Defined from Dendrogram tab. When confirmed, the cluster indices will be displayed in the Cluster tab

Fig. 5
figure 5

NPBayes-fMRI: Visualization. Interface for viewing contrast maps by subject (for multiple stimuli). The user must use the Define Contrast tab to insert the contrast vector and hypothesized value. The slider bar can be used to adjust the PPI (or FDR) value

3.2 Visualization

The NPBayes-fMRI: Visualization GUI, shown in Fig. 2, is used to visualize the results using the result.mat file obtained from running the algorithm using NPBayes-fMRI: Model Fitting interface. The GUI comprises the three components described below. For simplicity, we consider the case of 3D data only.

Activation Maps by Subject (Fig.3): This function allows the user to view the activation maps, the posterior \(\beta \)-maps, and the HRF maps for a single subject. Clicking on Map Type allows the user to select either Probability  Map, which allows to view PPI activation maps, or Activation  Map, to view the posterior \(\beta \)-maps, or HRF  Map, to view posterior maps for the HRF parameters. Depending on the HRF distribution, the HRF map will display values for the \(\lambda _{i \nu j}\), \(A_{i \nu j}\), or \(a_{i \nu j}\cdot b_{i \nu j}\) (as the mean of a gamma distribution) parameters, for the Poisson, Canonical, and Gamma HRFs, respectively. MATLAB’s built-in colormaps can be selected via the Color Map pop-up menu. Range lets the user define the axes limits. When set to Equal_Range, all figures will be defined on the same range. This may be useful when one is comparing posterior \(\beta \)-maps across all stimuli. If, however, the \(\beta \) values are significantly smaller in one stimulus than another, then the Different_Range option is preferable when inspecting the activation maps. Two sliders appear on the right-hand side of the interface. The bigger slider can be used to adjust the PPI threshold and the FDR value. These values can also be set manually by the user. The smaller slider, circled in red in Fig. 3, appears only when a Brain Template Image has been uploaded for 3D Analysis. This slider allows the user to control the transparency of the activation map that will be overlayed on top of the Reference Image. The XY, and Z sliders are used to define the coordinates of the 3D NIFTI brain image in sagittal, coronal, and axial orientation. If the user desires to view multiple slices of the brain in one particular orientation for a given stimulus, the Multi-Slice option can be used instead. The Viewing Options tab can be used to view either all stimuli at once or a single stimulus at a time.

Activation Maps by Cluster (Fig.4): This function is used to view cluster-level activation maps, for a given stimulus and PPI (or FDR) threshold. Clusters are defined based on a dendrogram obtained by applying hierarchical clustering with Ward’s linkage method to a dissimilarity matrix defined based on the posterior mean estimates of the non-zero \(\beta \) coefficients. By clicking on Load Cluster Defined From Dendrogram, the user can insert the number of clusters by which the subjects will be grouped.

Table 2 Instructions for running the example dataset

Contrast Maps by Subject (Fig.5): For multiple stimuli, this function lets the user define a contrast by subject by defining a Contrast Vector and Hypothesis Value using the Define Contrast option. The length of the Contrast Vector must not be greater than the number of stimuli and the entries must sum to 0. Once a contrast has been defined, the user can use the slider to adjust the Threshold Probability and view different subjects by entering the subject numbers.

Fig. 6
figure 6

3D Analysis: Example of activation \(\beta \)-maps, with Range set to Equal Range, for stimulus 2 and PPI threshold of 0.9. The middle subplot displays a multi-slice sagittal view, the bottom subplot displays the activation map at coordinates \(X = 92, Y = 115, Z =111\)

4 Illustration with 3D Data

In order to better illustrate the features of the NPBayes-fMRI: Visualization GUI, we show some selected output from an analysis of a dataset consisting of fMRI data on 30 subjects performing an experiment with three stimuli. The dataset is part of a pilot study on variability in the cognitive and neural processes involved in reading, conducted at Rice University [7]. A 3D parcellation of the data was performed using the MarsBaR toolbox in SPM 12. The Automatic Anatomical Labeling (AAL) brain atlas was used to obtain the parcellation, resulting in 90 ROIs, excluding the regions associated with the cerebellum. Euclidean distances between pairs of ROIs were calculated using the coordinates defined in the Montreal Neurological Institute (MNI) space, and a neighboring matrix was calculated by thresholding the distances. The threshold was chosen so that ROIs would have five neighbors on average. This matrix was then used to define the neighboring structures among ROIs for the specification of the MRF prior given in equation (8). Instructions on how to upload the data into the toolbox are given in Table 2.

Results shown here were obtained by running the NPBayes-fMRI: Model Fitting interface with the default hyperparameter setting. Fig. 6 shows the posterior \(\beta \)-maps for one of the subjects, for stimulus 2, obtained at a PPI threshold of 0.9. The middle subplot displays a multi-slice sagittal view at \(X = [70, 80, 90, 100, 110]\). The smaller slider can be scrolled down if one wishes to see more of the brain structure through the overlayed activation map. The bottom subplot displays the activation map at coordinates \(X = 92, Y = 115, Z =111\). If View all Stimulus is selected under Viewing Options, then a \(3\times 3\) plot of activation maps will be displayed. Different locations of the brain can be examined by using the three sliders to control the XYZ coordinates.

For stimulus 2 and a PPI threshold of 0.9, Fig. 7 shows the dendrogram (middle) obtained by clustering the posterior \(\beta \) estimates and the cluster-level \(\beta \)-maps (bottom) when three clusters are selected. The subject numbers corresponding to each cluster are displayed on the interface that controls the dendrogram and activation maps (top). Finally, Fig. 8 displays the estimated contrast probability map \(p(\beta _{2}-\beta _{3}>0) = p(\beta _{2}>\beta _{3})\), for one of the subjects, for a threshold probability of 0.9, corresponding to a FDR value of 0.0096107.

Fig. 7
figure 7

3D Analysis: Example of dendrogram (middle), for stimulus 2 and a PPI threshold of 0.9, and cluster-level \(\beta \)-maps (bottom), obtained with three clusters. The subject cluster memberships are displayed in the Cluster tab of the interface (top)

Fig. 8
figure 8

3D Analysis: Example of contrast map, for subject 10 and a threshold probability of 0.9, corresponding to an FDR of 0.0096107

5 Conclusions

In this paper, we have described a unified, probabilistically coherent framework for the analysis of task-related brain activity in multi-subject fMRI experiments. The model, proposed by Zhang et al. [40], builds upon the large literature on spatio-temporal linear regression models by specifically accounting for within- and between-subjects heterogeneity via a non-parametric Bayesian variable selection prior. Furthermore, posterior inference is carried out via a variational Bayes algorithm that allows scalability. Zhang et al. [40] demonstrate that this probabilistically coherent modeling approach can improve estimation performance with respect to two-stage approaches. They also show, on real data, that a multi-subject modeling strategy leads to a more accurate detection of the activated areas than single-subject models. This is an important stepping stone in the development of reliable detection methods that can be applied to full brain datasets and complex experimental designs.

Here we have also introduced NPBayes-fMRI, a MATLAB GUI that implements the non-parametric models proposed by Zhang et al. [40]. The GUI comprises two components: one for model fitting and another for visualization of the results. Within the model fitting interface, the user can define the type of analysis (voxel-based or whole-brain parcellation into regions of interest, i.e., ROIs) and the model parameters. Users have the option of a pre-defined default setting for all parameters that also allows customized choices. The GUI also accommodates single-subject analyses. The VB algorithm can be run within the GUI or in batch mode. After running the algorithm, the output file can be uploaded via the visualization interface and used to plot subject-level activation maps, contrasts maps, and cluster-defined averaged \(\beta \)-maps. The toolbox is available for download at https://github.com/marinavannucci and at https://github.com/rimehi. Detailed instructions on how to use the toolbox can be found in the Instructions text file. No additional MATLAB toolboxes are required to run NPBayes-fMRI.