Keywords

1 Introduction

In the last two decades, our understanding of the mechanisms underlying the functioning and disruption of the human brain has advanced considerably. The development of a number of innovative technologies has spurred unparalleled enthusiasm for research in the Neurosciences. Major breakthroughs have escaped the bounds of academic labs, and have been often widely publicized by the media. For example, in February 2014, a special issue of the National Geographic magazine enthusiastically hailed the new technologies that are “shedding light on biology’s greatest unsolved mystery: how the brain really works”. In the future, these technologies are expected to have a profound impact on the type of clinical treatments administered by physicians. On September 21st 2013, the British journal The Guardian dedicated an article to the new landscape of psychiatry, where the use of widely employed anti-depressant drugs has been called into question in favor of alternative treatments directly targeting the functioning of specific neural circuits. By studying how brain areas interact differently in healthy and depressed patients, the hope is to decode the determinants of complex human emotion and behavior.

Statistical methods play a crucial role in the quest for a better understanding of brain mechanisms, and their disruption in the face of disease. As an illustration, in the analysis of many types of brain imaging data, it is customary to employ statistical parametric maps, e.g., localized maps of p-values or posterior probabilities, to inform on the significance and spatiotemporal organization of the observed signal across distinct brain regions [1]. Those images provide a synthetic representation of significant areas of the brain, which may be targeted for further research and, also, to improve clinical diagnosis or intervention. However, early approaches based on naive t-tests or ANOVA statistics have shown limitations, especially due to their inability to take into account the complexity and specific characteristics of the data. Thus, the need for fairly sophisticated statistical techniques has emerged, e.g. to address the typically weak signal, high dimensionality and complex spatio-temporal correlation structure of the data.

The previous chapters of the book have provided a compelling argument for demonstrating the advantages of thoughtful, non-naive, statistical approaches for analyzing brain imaging data. Here, we provide a review of the main themes highlighted in those chapters, and we further discuss some of the challenges that statistical imaging is currently confronted with. More specifically, in Sect. 2 we provide a summary review of the main inferential objectives associated with structural and functional brain imaging modalities, and discuss general modeling strategies that have been developed to achieve such inferences. In Sect. 3, we discuss the importance of developing analytical frameworks that allow to characterize the heterogeneity typically observed in brain imaging both within- and between- subjects. In Sect. 4, we examine clustering approaches, that allow to identify groups of subjects characterized by similar patterns of brain responses to a task. In Sect. 5, we discuss dynamic temporal models to capture the heterogeneity of functional connectivity network states experienced by subjects in the course of an experiment. In Sect.  6, we present recent modeling trends, which aim at combining information from multiple data sources in order to achieve a better understanding of brain processes: multimodal imaging analysis and imaging genetics are examples of those developments. In Sect. 7, we provide some concluding remarks.

2 Statistical Analysis of Brain Imaging Data

We start our discussion by noting that the statistical methods employed in the analysis of brain imaging data necessarily depend on the specific type of technology employed and need to be necessarily informed by the expert knowledge of neuroscientists. Brain imaging technologies can be roughly separated into three categories: structural, functional and molecular imaging technologies. Each technology aims at capturing different characteristics of brain mechanisms, and therefore requires specifically tailored methods.

2.1 Structural Imaging

Structural brain imaging aims at providing a description of the anatomical structure of the brain. As an illustration, computed axial tomography (CT) uses X-rays to quickly identify different levels of density and tissues inside a solid organ, and can be used to obtain clinical evidence of trauma, e.g., a stroke. MRI scans use powerful magnetic fields and radio frequency pulses to create high-resolution images, and thus they are able to depict the brain anatomy in greater detail. Signal change and cerebral atrophy visible on structural MRI can be used to identify diagnostically relevant imaging features to help the clinical diagnosis of neurodegenerative dementias.

Diffusion tensor imaging (DTI) is a popular MRI-based technique which allows to identify fiber tracts connecting brain regions by estimating the diffusion of the water molecules along their main direction. More specifically, the three-dimensional diffusion of water is mapped and characterized as a function of spatial location. The diffusion tensor describes the magnitude, the degree of anisotropy, and the orientation of diffusion anisotropy, that is how the water molecules differently move in the directions parallel to the fiber tracts rather than in the two orthogonal dimensions. Many different measures of diffusion anisotropy have been proposed to visualize and quantify the properties of the diffusion tensor [2]. The most commonly used parameters are fractional anisotropy (FA), a measure of the orientation of diffusion, and (rotationally indifferent) mean diffusivity (MD). DTI has been suggested as an indirect marker for white-matter integrity. For example, in epilepsy, the epileptogenic hippocampus demonstrates increased MD and decreased FA [3].

The two chapters by Crispino et al. and Cabassi et al. in this volume (pp. 1 and 37) provide interesting modeling approaches for the analysis of DTI data. Cabassi et al. argue that the quality of diffusion-weighted images could be affected by several types of artifacts, due to the low signal-to-noise ratio and the relatively long scan time required by the DTI tractography [4]. In particular, those artifacts may cause underestimation of diffusion coefficients and bias anisotropy measures. To address such issues, Cabassi et al. propose a hierarchical Bayesian model to estimate the effective unknown number of white matter fibers connecting each pair of brain regions. More precisely, they assume a discrete measurement error model, where each observed white matter fiber count is assumed to be Binomially sampled from the true unknown population of white matter fibers, which is assigned a latent Poisson prior. The model leverages available information both at the subject and the brain region scale. These results provide some evidence that the fiber-counts may be indeed severely underestimated.

The chapter by Crispino et al. provides an exploratory analysis of how structural connectivity may inform patterns of activation captured by functional imaging techniques among regions of interest (ROIs). This is an issue which we will discuss again in Sect. 6.2 later on in this chapter. In their latent space model for the DTI data, Crispino et al. consider the structural imaging data as an observed network of connections between ROIs and model the probability of observing an edge in the network (i.e., the probability that at least one white matter fiber connects two ROIs) as a function of how close/far the regions are. They conclude that the inferred latent space of the DTI is highly correlated with the physical one represented by ROIs locations, although the two may not completely overlap.

Also Durante and Dunson [5] have recently developed a statistical model to infer expected network structures from DTI data, which takes into account that fiber tracking pipelines are subject to measurements error. More specifically, they consider a latent variable framework, where the probability mass function of the network is characterized using a mixture of low-rank factorizations. Within each mixture component, connections among pairs of nodes are characterized as conditionally independent Bernoulli random variables given component-specific edge probabilities, which are further obtained as a function of node-specific latent variables. The model allows for group dependence in the mixing probabilities, which can be used to conduct global and local testing for differences in brain connectivity networks between two groups of subjects. The study of undirected connections estimated from structural imaging data will certainly be the objective of further investigations in the future.

2.2 Functional Imaging

Functional brain imaging involves the study of brain functioning, both in terms of its specialization (i.e., which parts of the brain respond to a given task) as well as its integration (i.e., how different brain regions interact with each other). Perhaps the two most popular functional brain imaging techniques are electroencephalography (EEG) and functional MRI (fMRI). EEG data record the electrical activity of the brain from the scalp. They are characterized by high temporal resolution. However, they present low spatial resolution, due to the configuration of the electrodes on the scalp. Due to the early influence of signal processing, statistical methods for EEG data often involve spectral time series representations of the temporal signal. With respect to EEG data, fMRI data are characterized by higher spatial resolution but lower temporal resolution. fMRI data provide an indirect measure of brain activity, since they record the metabolic activity in the brain, as represented by differences in local blood flow (blood-oxygen level-dependent, BOLD, signal). It is beyond the scope of this chapter to provide further details about the physiology of fMRI signals, for which we refer to Poldrack et al. [6]. We only mention that, due to their high spatial resolution, fMRI data have been typically employed to identify changes in brain activity across different brain regions, and also over time, although their ability to identify brain events over very short time periods may be somewhat limited.

The analysis of fMRI data Statistical methods for fMRI data vary widely according to the experiment design (e.g., task-based or resting-state experiment) and the objectives of the study. In a task-based experiment, for example, the whole brain is scanned at multiple times while a subject performs a series of tasks. Therefore, a typical objective is to detect which brain regions get activated by the external stimuli (activation detection). Statistical methods for this analysis typically include linear and nonlinear models, as well as mixture models, for both single- and multiple-subject studies.

The chapter by Gasperoni and Luati in this volume (p. 91) highlights the importance of taking into proper account the physiology of the different neuroimaging experiments in the statistical analysis of fMRI data. The hemodynamic response function (HRF) models the vascular response to neuronal activity, which contributes to the observed fMRI signal. Since the estimation of neural activity is a major interest of fMRI studies, the interpretation of fMRI findings may be severely impaired if the hemodynamic response were not accurately taken into account in the analysis [7, 8]. The HRF varies considerably over different brain regions and across subjects. Most fMRI studies have primarily focused on estimating the amplitude of evoked HRFs in task-based experiments. However, the influence of the hemodynamic response has been shown also in resting state experiments, to characterize the BOLD signal in response to spontaneous neuronal activity. For example, Rangaprakash et al. in [9] have shown that the variability of the HRF across the brain may alter functional connectivity estimates obtained from resting-state fMRI. In their chapter, Gasperoni and Luati extend a multi-step blind-deconvolution approach, first presented in [10], to estimate the HRF from spontaneous brain activity. In particular, they robustify the procedure by assuming a Student-t distribution for the noise affecting the BOLD signal and then they identify spontaneous activations as extreme values of the residuals obtained from a robust procedure for signal extraction. They discuss how the method based on the assumption of a Student-t distribution for the noise should select a smaller number of spontaneous activations then the method based on the Gaussian assumption. This is certainly an area of continuous interest in the fMRI literature, as it affects the validity of any subsequent inferences.

Brain connectivity Another important task in fMRI studies, which has received increased attention in recent years, is to infer brain connectivity. In general terms, connectivity looks at how brain regions interact with each other and how information is transmitted between them, with the aim of uncovering the actual mechanisms of how our brain functions. In particular, it is customary to distinguish between functional (undirected) and effective (directed) connectivity, as first defined by [11]. In the study of functional connectivity, the goal is to identify multiple brain areas that exhibit similar temporal profiles, either task-related or at rest. On the other hand, effective connectivity seeks to estimate the directed influence of one brain region on another. In the classical literature, simple approaches to capture functional connectivity are based on temporal correlations between regions of interest, or between a “seed” region and other voxels throughout the brain. Alternative approaches include clustering methods, to partition the brain into regions that exhibit similar temporal characteristics, and multivariate methods for dimension reduction, such as Principal Components Analysis (PCA) [12] and Independent Components Analysis (ICA) [13], which determine spatial patterns that account for most of the variability in the time series data. Approaches that allow to estimate partial correlations between predefined regions of interest (ROIs) have also been proposed, for example by using the graphical Lasso (GLasso), which estimates a sparse precision matrix [14].

In the Bayesian literature, Bowman et al. (2009) in [15] employed a two-stage modeling approach to capture short-range task-related (or between-group) connectivity between voxels within a given anatomical region. The model assumes that voxels within anatomically defined regions exhibit task-related activity that deviates around an overall mean for that region. By appropriate modeling of a flexible unstructured covariance matrix for regional mean parameters, the model allows to estimate spatial correlations which are interpretable as task-related functional connections. This modeling framework also allows to develop a measure of inter-regional (or long-range) connectivity between two regions. Long-range connectivity is observed whenever relatively distant pair of voxels exhibit high positive correlations, even when compared to a more proximal pair of voxels. For example, Broca’s area and Wernicke’s area are two noncontiguous anatomical regions that may exhibit long range correlations, given their joint involvement in speech generation, processing and understanding. More recently, Zhang et al. in [16] allow clustering of spatially remote voxels that exhibit fMRI time series with similar characteristics, by imposing a Dirichlet Process (DP, [17]) prior on the parameters of a long memory error term. The induced clustering can be viewed as an aspect of functional connectivity, as it naturally captures statistical dependencies among remote neurophysiological events.

Many of the chapters in this volume have focused on estimating functional connectivity. For example, the chapter by Caponera et al. in this volume (p. 111) proposes an elegant Bayesian time-dependent latent factor model, where the factor loading matrix can be interpreted as a simple measure of connectivity. Their method can be seen as a further contribution to the collection of multivariate methods for dimension reduction discussed above. A key assumption of their approach, which we will discuss further on in this chapter, is stationarity, i.e., the spatial dependence structure is assumed constant over time. Another interesting aspect of their work is the discussion of the graph theoretical approach to explore functional connectivity networks, according to the paradigm of analysis discussed in Bullmore and Sporns [18]. In neurological applications it is common practice to report the brain network structure by thresholding the estimated association measures (e.g., correlation matrices). The thresholding generates binary adjacency matrices which can be used to compute network indices to summarize the topological properties of the network. A vast number of graph theory measures of network topology have been recently studied in various neurological diseases. The majority of those features relate to various aspects of global network integration or local segregation. A relevant subset of features identifies the nodes that have a strong influence on the communication of the network, which are known as centrality or hub measures. The simplest of those centrality measures is degree centrality, which counts the number of edges connected to each node. Other centrality measures capture more nuanced quantities, such as eigenvector centrality, which identifies nodes that are connected to other highly central nodes, or betweenness centrality, which captures the number of shortest paths that pass through a node [19]. In addition, deviations from a small-world configuration have been consistently found to characterize various types of brain diseases, including Alzheimer’s disease, epilepsy, brain tumors, and traumatic brain injury [20]. Therefore, by investigating the inference on the graph theory measures of network topology induced by a particular modeling approach, it is possible to achieve additional understanding about the clinical implications of the estimated functional connectivity networks.

An alternative approach, which has been explored in a few chapters of this volume, regards the fMRI time-series as instances of functional data, to be considered in an object-oriented data analysis in non-Euclidean spaces. Instead of comparing networks based on a set of connectivity measures summarizing the topological properties of functional brain networks, the chapter by Cabassi et al. in this volume develops a procedure for testing group differences in the network structure based on several types of non-Euclidean metrics. Also Ginestet et al. (2017) in [21] have recently proposed to employ statistical inference on manifolds to develop one- and two-sample tests for network data objects. Similarly, the chapter by Cappozzo et al. in this volume (p. 57) considers a functional data analysis approach to define a rescaled covariance operator for functional random processes, in the Riemaniann manifold defined by positive semi-definite symmetric matrices. All contributions show that global tests may result in more statistical power than when using a mass-univariate approach, which is the standard approach in the field. On the other hand, global tests may be limited as in practice the interest of many investigators is often focused on local discrepancies in the network structure. Methodologies, like the one in [5], which allow for both global and local testing of differences in brain connectivity networks, may perhaps be adapted to this object-oriented data analysis framework.

Differently than functional connectivity, which relates to undirected associations between time series, effective connectivity refers to the influence that “one neural system exerts over another” [22]. Effective connectivity refers to causal dependence, as opposed to simple association. Therefore, commonly used approaches for capturing effective connectivity include many of the methods typically employed to represent causal relationships: structural equation modeling (SEM, [23, 24]), dynamic causal modeling (DCM, [25]) vector autoregressive (VAR) models [26], Granger causality [27] and Bayesian networks [28]. It should be pointed out, however, that even though such methods allow inference on directed connections between brain regions, they do not necessarily imply physiological causality. Due to the nature of fMRI experiments, the models can only be used to assess causality at the hemodynamic level rather than the neuronal level. Brain scientists are typically more interested to make inference on neural activity. However, the connectivity estimated at the hemodynamic level can still yield interesting results. More appropriately, physiological causality should be assessed through a carefully crafted experimental design [29]. In particular, the often-used notion of Granger causality is based on the idea that causes always precede effects. Therefore, past signal values from one brain region can be used to predict current values in another region. Gorrostieta et al. in [30] have developed a Bayesian hierarchical VAR model for investigating Granger causality and effective connectivity in multiple subjects, accounting for the variability in the connectivity structure within and between subjects. Yu et al. in [31] have further extended this framework, for simultaneously estimating brain activation and effective connectivity in a study of how brain motor function is altered in patients who have suffered a stroke, with respect to healthy subjects. With the hierarchical structure, subject-specific estimates for activation and connectivity are obtained by pooling information from other subjects. The approach allows to study local activation and connectivity between brain regions, and to compare the inferred patterns for stroke patients and healthy controls in order to explore the effects of stroke on brain motor function.

In this section, we have provided a limited overview of the main goals typically associated with functional imaging studies. We refer to [32] for a review of modeling approaches to study functional and effective connectivity, causal modeling, connectomics, and multivariate analyses of distributed patterns of brain responses. Bowman in [33] provides a more extensive background on various types of neuroimaging data and analysis objectives that are commonly targeted in brain imaging studies. Stephan and Friston in [34] provide an extensive review of the conceptual and methodological basis of linear and nonlinear DCMs for characterizing effective connectivity using fMRI data.

In the following sections, we discuss a few of the most recent interests and arising challenges in the analysis of neuroimaging data.

3 Describing the Heterogeneity of Brain Mechanisms

One of the main objectives in the analysis of brain imaging data is to characterize the heterogeneity typically observed both within- and between- subjects, especially in subjects affected by behavioral and psychiatric disorders. An improved understanding of the heterogeneity of brain mechanisms is considered key for enabling clinicians to deliver targeted, precision, medicine to individuals affected by such disorders. Current medical practice often relies on symptom-based diagnostic criteria. Despite the progress enabled by neuroimaging technologies in the understanding of the pathophysiology of the major psychiatric disorders, the diagnosis or treatment of individual patients have not been yet significantly impacted by such revolution [35]. On the other hand, traditional diagnostic criteria are increasingly recognized as inappropriate to describe the variety of the disorders actually observed in individuals, which are progressively seen as the result of the interplay of different characteristics [36, 37]. In 2010, the United States National Institutes of Mental Health (NIMH) started the Research Domain Criteria (RDoC) project to develop new ways for classifying mental disorders, on the basis of experimental research criteria rather than traditional diagnostic categories. The RDoC assumes that further insights and progress in the understanding and diagnosis of psychiatric disorders will be achieved by integrating many levels of information (from genomics to neuroimaging and self-reports). This holistic approach will allow to investigate both the normal and the disrupted dimensions of brain functioning and human behavior at a deeper level than it has been currently achieved. The ultimate long-term goal of the NIMH RDoC initiative is precision medicine. Data from genetics and clinical neuroscience will eventually allow the identification of prognostic and predictive biomarkers. That is, the goal is to develop an analytical framework that allows to incorporate the specific genomic and neuroimaging characteristics of a subject into a predictive decision-making paradigm, so that clinicians may optimize the choice of individual treatments based on their expected predicted outcome [38, 39].

Fig. 1
figure 1

Understanding the heterogeneity of the brain disorders based on neuro-imaging data and other information on the subjects in a unified framework is key for attaining the goal of precision medicine

Statistics can provide innovative tools for a data-driven classification of subjects, by combining the neuro-imaging data with the available genomic, behavioral and clinical information on the subjects. Figure 1 illustrates the general scheme underlying the unified approach sought for better understanding the heterogeneity of the brain disorders. Here, we will focus on a few approaches that can be used to capture the main sources of variability in fMRI data, with respect to

  1. (a)

    identifying clusters of subjects, characterized by similar patterns of brain responses to a task;

  2. (b)

    characterizing the heterogeneity in the individual dynamics of functional connectivity networks;

  3. (c)

    relating the observed imaging patterns to additional available information on the same subjects, including genetic covariates and other observable clinical or behavioral outcomes.

4 Clustering Subject-Specific Imaging Patterns

In single-subject analysis, the clustering of fMRI time-series has emerged as a way to classify the regions of the brain according to the temporal pattern of the BOLD response. For example, the chapter by Bertarelli et al. in this volume (p. 75) proposes k-means and functional clustering approaches to cluster fMRI time-series beyond the traditional statistical methods which are typically used to evaluate the level of activation of individual voxels. In the analysis of fMRI data, unsupervised clustering methods have been used also in the context of Gaussian mixture models applied to processed data (either “contrast” maps or simple z-statistic images), to capture distinct clusters of activations, e.g., for pre-surgical assessment of peritumoral brain activation [40, 41]. Alternatively, Zhang et al. (2014) in [16] provide a joint analytical framework to detect regions of the brain which exhibit neuronal activity in response to a stimulus and, simultaneously, infer the association, or clustering, of spatially remote voxels that exhibit fMRI time series with similar characteristics.

In multi-subject analyses, clustering methods have been used to identify groups of subjects that are characterized by similar patterns of brain activity. The chapter by Cappozzo et al. in this volume proposes functional clustering of networks based on the definition of a suitable distance between covariance operators, or alternatively on a low dimensional representation of the correlation matrices. Woolrich et al. in [42] and Xu et al. in [43] model the inter-subject variability in brain activity via (possibly infinite) Gaussian mixture models that estimate the probability that an individual has an activation at a particular location. Zhang et al. in [44] leverage on more advanced multi-level Bayesian nonparametric approaches to allow for the separate inferential objectives within and between subjects. More precisely, they employ a hierarchical Dirichlet Process prior construction to induce clustering among voxels within a subject at one level of the hierarchy and across subjects at the second level. This formulation allows, in particular, to capture spatial correlation among potential activations of distant voxels, within a subject (an aspect of functional connectivity), while simultaneously borrowing strength in the estimation of the parameters from subjects with similar activation patterns. Let \(Y_{i\nu }=(Y_{i\nu 1}, \ldots , Y_{i\nu T})^\top \) be the \(T\times 1\) vector of the BOLD response data at the \(\nu \)th voxel in the ith subject, with \(i=1, \ldots , N, \nu =1, \ldots V\), and with the symbol \((\cdot )^\top \) indicating the transpose operation. The BOLD time-series response is then modeled with a general linear model

$$\begin{aligned} Y_{i\nu }=X_{i\nu }\beta _{i\nu }+\varepsilon _{i\nu }, \; \varepsilon _{i\nu }\sim N_T(0, \varSigma _{i\nu }), \end{aligned}$$
(1)

where \(X_{i\nu }\) is a \(T\times p\) covariate matrix, \(\beta _{i\nu }=(\beta _{i\nu 1}, \ldots , \beta _{i\nu p})^\top \) is a \(p\times 1\) vector of regression coefficients and \(\varepsilon _{i\nu }=(\varepsilon _{i\nu 1}, \ldots , \varepsilon _{i\nu T})^\top \) is a \(T\times 1\) vector of errors. Typically, the matrix \(X_{i\nu }\) contains the design matrix, i.e., the convolved hemodynamic response function, which captures the change in the metabolism of the BOLD contrast due to an outside stimulus. Thus, each column of \(\mathbf X_{i\nu }\) is modeled through the convolution

$$\begin{aligned} \int _0^t x(s)\, h_v(t-s)\,ds, \quad t=1, \ldots , T \end{aligned}$$

of the external time-dependent stimulus function for a given task, x(s), which is known and corresponds to the experimental paradigm (for example, a vector defined with elements set to 1 when the stimulus is “on” and 0 when it is “off”), and a parametrically specified hemodynamic response function \(h_v(\cdot )\). In addition, the matrix \(\mathbf X_{i\nu }\) can also include precision covariates that incorporate motion correction estimates obtained from the preprocessing steps. Of course, additional individual specific covariates may also be included (e.g., demographic and clinical information), depending on the specific study objectives.

In model (1) the detection of brain voxels that activate in response to the stimulus reduces to a problem of variable selection, i.e., the identification of the nonzero \(\beta _{i\nu }\)’s and is achieved, in the Bayesian framework, by imposing a mixture prior, often called spike-and-slab prior, on the regression coefficients. Zhang et al. [44] embed the selection into a clustering framework and effectively define a multi-subject nonparametric variable selection prior with spatially informed selection within each subject. More specifically, they employ a hierarchical Dirichlet Process (HDP) prior [45], which implies that the non-zero \(\beta _{i\nu }\)’s within subject i are drawn from a mixture model and possibly shared between subjects. Let \(\gamma _{i\nu }\) be the binary indicator of whether voxel \(\nu \) in subject i is active or not, i.e., \(\gamma _{i\nu }=0\) if \(\beta _{i\nu }=0\) and \(\gamma _{i\nu }=1\) otherwise. Zhang et al. [44] impose a spiked HDP prior on \(\beta _{i\nu }\), i.e., a spike-and-slab prior where the slab distribution is modeled by a HDP prior,

$$\begin{aligned} \beta _{i\nu } | \gamma _{i\nu }, G_i\sim & {} \gamma _{i\nu } G_i+(1-\gamma _{i\nu })\delta _0\nonumber \\ G_i|\eta _1, G_0\sim & {} DP(\eta _1, G_0)\nonumber \\ G_0|\eta _2, P_0\sim & {} DP(\eta _2, P_0)\\ P_0= & {} N(0, \tau ),\nonumber \end{aligned}$$
(2)

with \(\delta _0\) a point mass at zero, with \(\tau \) fixed, \(\eta _1, \eta _2\) the mass parameters and \(P_0\) the base measure. The spike-and-slab formulation enforces sparsity in the pattern of activations within each subject. The HDP prior allows for non-zero coefficients to be shared within and across subjects, potentially highlighting regions characterized by similar intensity of brain activity across subjects. Since the number of mixture components is unknown and inferred from the data, this prior formulation provides an unsupervised clustering framework to account for between-subjects heterogeneity in neuronal activity. In order to take into account information on the anatomical structure of the brain, in particular the correlation between neighboring voxels, they further place a Markov Random Field (MRF) prior on the selection parameter \(\gamma _{i\nu }\).

A single fMRI experiment can yield hundreds of thousands of high frequency time series for each subject, arising from spatially distinct locations. Therefore, computational efficiency is essential for the practical relevance of any statistical method. This is particularly true for multi-subject studies. In particular, Bayesian methods face a significant challenge, since typically Markov chain Monte Carlo sampling algorithms are too slow and inefficient for this type of problems. Thus, there is a need for computational methods which approximate the posterior distribution for faster inference. Variational Bayes methods have been employed successfully in Bayesian models for single-subject fMRI data [46,47,48,49,50]. Typically, these approaches provide good estimates of means, although they tend to underestimate posterior variances and also to poorly estimate the correlation structure of the data. In a comparative study on simulated data, Zhang et al. [44] show that a variational Bayes algorithm approximating the posterior distribution of model (1)–(2) achieves robust estimation results at a much reduced computational costs, therefore allowing scalability of their method. Additionally, they demonstrate on synthetic data how their unified, single-stage, multiple-subject modeling approach, with variational Bayes inference, achieves improved estimation performance with respect to two-stage approaches which may be employed to ease the computational burden of multi-subject analyses.

The availability of user-friendly software implementations is also a required condition for the general adoption of novel statistical methods by the neuroscientist. For example, the model by Zhang et al. [44] has been implemented in a a MATLAB GUI (NPBayesfMRI, [51]), comprising two components, one for model fitting and another one for visualization of the results. Within the model fitting interface, the user can define the type of analysis (voxel-based or whole-brain parcellation into regions of interest, i.e., ROIs) and the model parameters. Users have the option of a pre-defined default setting for all parameters. Alternatively, they can set the parameters according to customized choices, depending on the available prior information. We should also mention Neuroconductor (https://neuroconductor.org/), an open-source R-platform for medical imaging analysis [52]. The platform provides data, methods, and software packages designed to support the analysis of populations of images using the publicly available statistical software R.

5 Dynamic Functional Connectivity

Behavioral and psychiatric disorders have been associated to differences in the brain functional connectivity networks, i.e., the set of interactions that take place between spatially segregated but temporally related regions of the brain [53]. Traditionally, brain network studies have assumed functional connectivity as spatially and temporally stationary, i.e., connectivity patterns are assumed not to change throughout the scan period [54]. However, in practice, the interactions among brain regions may vary during an experiment. For example, different tasks, or fatigue, may trigger varying patterns of interactions among different brain regions. Therefore, more recent work has pointed out that it is more appropriate to regard functional connectivity as dynamic over time [55]. Figure 2 provides a pictorial representation of the new paradigm. Current approaches for studying dynamic connectivity typically rely on multi-step approaches for inference, where the analysis may comprise the following steps. First, the fMRI time courses are segmented by selecting a sequence of sliding windows. Then, a covariance (or precision) matrix is estimated separately within each window, e.g., by using graphical Lasso. Finally, k-means clustering methods are used to identify re-occurring patterns of functional connectivity state [56]. Differences between states are assessed by computing and comparing descriptive graph metrics that capture structural properties of the networks, such as their clustering coefficient and efficiency. Arguably, those approaches are straightforward but present some major limitations. For example, the length of the window is arbitrarily selected before the analysis, through a trial-and-error process. This trial-and-error process can potentially lead to an increased number of false positive and false negative detections in the estimation of the networks, and ultimately affects the reproducibility of the findings. Indeed, Lindquist et al. in [57] show that the choice of the window length can affect inference in unpredictable ways. To partially obviate the issue, Cribben et al. in [58] and Xu and Lindquist in [59] have recently investigated greedy algorithms, which automatically detect change points in the dynamics of the functional networks. Their approach recursively estimates precision matrices using GLasso on finer partitions of the time course of the experiment, and selects the best resulting model based on the Bayesian Information Criterion (BIC). The algorithm estimates independent brain networks over noncontiguous time blocks. Of course, this is not so desirable, as it may be preferable to borrow strength across similar connectivity states in order to increase the accuracy of the estimation. Another issue is related to greedy searches, which often fail to achieve global optima.

Fig. 2
figure 2

Dynamic functional connectivity assumes that the functional connectivity networks may change over time

Chiang et al. in [60] investigate the stationarity of the brain network topology, as measured by the graph theory measures of functional connectivity networks. The aim of their study is to identify which aspects of network topology exhibit less within-scan temporal variability in resting-state networks, with the objective of evaluating which graph theory metrics may be robustly estimated using static functional connectivity analyses. In particular, they argue that some aspects of brain topology, such as the level of small-worldness, may exhibit greater temporal stationarity, whereas others, such as local measures, may be more susceptible to local dynamics and more likely to traverse multiple configurations. They use a Bayesian hidden Markov model to estimate the transition probabilities of various graph theoretical network measures using resting-state fMRI (rs-fMRI) data and to investigate the stationarity of different graph theory mesaures. They further propose two estimators of temporal stationarity, which can be used to assess different aspects of the temporal stationarity of functional networks: a deterministically-based estimator of the number of change-points, and a probabilistically-based estimator that takes into account stochastic variation in the estimated states. They show that small-world index, global integration measures, and betweenness centrality exhibit greater temporal stationarity than network measures of local segregation. This may reflect the organization of the resting-state brain, in which the small-world architecture of the brain is thought to have evolved in order to create systems that support efficiency in both local and global processing. Since long-range connections are generally thought to ensure the interaction between distant neuronal clusters, a large component of fluctuations between neuronal clusters (e.g., long-range connections) may therefore occur downstream to fluctuations within neuronal clusters (e.g., local connections), resulting in slightly greater temporal stationarity among global relative to local connections. On the other hand, connectivity within local subgraphs may be more susceptible to local cell dynamics and likely to fluctuate over time.

The chapter by Crispino et al. in this volume discusses a penalized likelihood approach to estimate time-varying Bayesian networks, based on a first-order Markovian assumption to model the connectivity dynamics. The strength of the interaction between two brain regions is a function of how often two regions are connected by an edge at different time points.

Warnick et al. in [61] propose a principled, fully Bayesian approach for studying dynamic functional network connectivity, that avoids arbitrary partitions of the data in sliding windows. More specifically, they cast the problem of inferring time-varying functional networks as a problem of dynamic model selection in the Bayesian setting. As we have previously discussed, brain networks can be mathematically described as graphs. A graph \(G = (\mathscr {V}, \mathscr {E})\) specifies a set of nodes (or vertices) \(\mathscr {V} = \{1,2,\ldots ,V\}\) and a set of edges \(\mathscr {E}\subset \mathscr {V} \times \mathscr {V}\). Here, the nodes represent the neuronal units, whereas the edges represent their interconnections. For example, nodes could be intended as either single voxels or macro-areas of the brain which comprise multiple voxels at once. Let \(\varvec{Y}_{t}=(Y_{t1}, \ldots , Y_{tV})^\top \) be the vector of fMRI BOLD responses of a subject measured on the V nodes at time t, for \(t=1, \ldots , T\). Then, the general linear model (1) can be re-expressed as follows,

$$\begin{aligned} \varvec{Y}_t=\varvec{\mu }+ \sum _{k=1}^K \varvec{X}_t^{k}\varvec{\circ }\beta _{k} +\varvec{\varepsilon }_t, \end{aligned}$$
(3)

where \(\circ \) denotes the element-by-element (Hadamard) product, \(\varvec{X}_{t}^{k}\) is the \(V\times 1\) design vector for the k-th stimulus, \(\varvec{\mu }\) the V-dimensional global mean and \(\varvec{\beta }_{k}=(\beta _{1 k}, \ldots , \beta _{V k})^\top \) the stimulus-specific V-dimensional vector of regression coefficients. A spike-and-slab prior is imposed on the coefficients \(\beta _{v k}\) to identify brain activations and allow decoupling of the task-related activations from the functional connectivity states. To characterize possibly distinct connectivity states, i.e., network structures, within different time blocks, Warnick et al. (2018) assume that functional connectivity may fluctuate among one of \(S>1\) different states during the course of the experiment. Let \(\varvec{s} = (s_1,\ldots ,s_T)^\top \), with \(s_t=s,\) for \(s\in \{1, \ldots S\}\), denoting the connectivity state at time t. Then, conditionally upon \(s_t\), they assume

$$\begin{aligned} (\varvec{\varepsilon }_t|s_t=s)\sim N_V(0,\varvec{\varOmega }_s), \end{aligned}$$
(4)

where \(\varvec{\varOmega }_s\in \mathbb {R}^V \times \mathbb {R}^V\) is a symmetric positive definite precision matrix, i.e., \(\varvec{\varOmega }_s=\varvec{\varSigma }^{-1}_s\), with \(\varvec{\varSigma }_s\) the covariance matrix. The zero elements in \(\varvec{\varOmega }_s\) encode the conditional independence relationships that characterise state s, that is graph \(G_s = (\mathscr {V}, \mathscr {E}_s)\). Specifically, \(\omega _{ij}^{(s)}=0\) if and only if edge \((i, j)\notin \mathscr {E}_s\). Many of the estimation techniques for Gaussian graphical models rely on the assumption of sparsity in the precision matrix, which is generally considered realistic for the small-world properties of brain connectivity in fMRI data. Thus, a G-Wishart distribution is considered as a conjugate prior on the space of the precision matrices \(\varvec{\varOmega }\) with zeros specified by the underlying graph G [62, 63]. The estimation of the unknown connectivity states at each of the time points is treated as a problem of change points detection, by modeling the temporal persistence of the states through a Hidden Markov Model (HMM). The approach is in line with recent evidence in the neuroimaging literature which suggests a state-related dynamic behavior of brain connectivity with recurring temporal blocks driven by distinct brain states [64, 65]. In the model proposed by Warnick et al. (2018), however, the change points of the individual connectivity states are automatically identified on the basis of the observed data, thus avoiding the use of a sliding window. Furthermore, they adapt a recent proposal put forward by Peterson et al. in [66] to conduct inference on the multiple related connectivity networks. The model formulation assumes that the connectivity states active at the individual time points may be related within a super-graph and imposes a sparsity inducing Markov Random field (MRF) prior on the presence of the edges in the super-graph. Thus, the estimation of the active networks between two change points is obtained by borrowing strength across related networks over the entire time course of the experiment, also avoiding the use of post-hoc clustering algorithms for estimating shared covariance structures.

6 Combining Information from Multiple Data Sources

The term “big data” is often employed to indicate the high-dimensionality and the complexity of data captured by modern technologies. With this meaning, brain imaging data can be regarded as inherently “big”. However, in Sect. 2, we have described how each neuroimaging technology is able to capture only specific characteristics of brain processes. Therefore, each single technology is also inherently limited in its ability to shed light on relevant brain mechanisms. Multi-modal analysis combines different neuroimaging modalities, and possibly information from different data platforms, to achieve a more comprehensive understanding of brain functioning. In this section, we review some recent interesting trends and contributions in this area.

6.1 Covariate-Dependent Analysis and Predictive Modeling

It is often of interest to study how imaging-based inferences vary depending on known covariates or risk factors, and to make predictions on a clinical or behavioral response based on the estimated individual’s brain activity.

For example, the chapter by Aliverti et al. in this volume (p. 23) proposes a sequential hierarchical approach, which starts by using a penalized GLasso approach to estimate functional connectivity. Then the connection probabilities are modeled through a latent logit regression involving both phenotypical and brain-region information. The covariates include the age of the subject, an indicator of mental health diagnosis, and another indicator of shared lobe membership for each pair of edges.

As an example of a modeling approach aimed at improving clinical prediction, we refer to Chiang et al. in [67]. They consider positron emission tomography (PET) imaging data from a study on Temporal lobe epilepsy (TLE), the most common form of adult epilepsy and the most common epilepsy refractory to anti-epileptic drugs. PET imaging is a well-developed technique in which the subject is injected with a positron-emitting isotope, such as \(^{18}\)F-FDG, and a PET image reconstructed of the isotope concentration based on the incidence of gamma rays from the positron-electron annihilation. In PET studies, the quantity that is clinically assessed is a scalar rate of regional glucose uptake. This quantity is then normalized relative to an internal reference standard, such as the whole-brain activity and compared to the expected level for a normal subject. The assessed quantity therefore provides a measure of the level of metabolic activity in each region, relative to that expected in healthy controls. Uptake levels may be quantified on the single-pixel level or based on the mean uptake within fixed regions of interest. Chiang et al. (2017) develop a Bayesian predictive modeling framework to identify whole-brain biomarkers from PET imaging which are associated to the prediction of post-surgical seizure recurrence following anterior temporal lobe resection. Post-surgical seizure recurrence is often due to the incomplete resection of the epileptogenic zone, which is defined as the area of cortex necessary and sufficient for initiating seizures, and whose removal is necessary for seizure abolition. Indeed, patients with different epileptogenic zone configurations may be expected to exhibit different risks of post-surgical seizure recurrence. The epileptogenic zone, however, cannot be identified pre-operatively. In their model formulation, Chiang et al. (2017) take this into account by looking at the observed PET brain measurements as the phenotypic manifestation of latent individual pathological states that are assumed to vary across the population. More precisely, the joint distribution of the data is factored into the product of two conditionally independent submodels, an outcome model that relates the post-surgical outcome to the latent states, and a measurement model that relates those latent states to the observed brain measurements. For the latter, they employ mixture models for clustering and variable selection priors that capture spatial correlation among neighboring brain regions. Thus, subjects are clustered into subgroups with different latent states, i.e., different epileptogenic zone configurations, while simultaneously identifying discriminatory brain regions that characterize the subgroups. A logistic regression model relates the latent states to the binary clinical outcome. Alternative predictive modeling approaches for neuroimaging include the use of pattern recognition techniques, such as Linear Discriminant Analysis [68], Support Vector Machines [69, 70] and Bayesian classifiers [71, 72]. We refer to the review in [73] for a discussion of Bayesian methods for classification and prediction.

6.2 Multi-modal Imaging Analysis

Multi-modal imaging refers to imaging performed using different instrumentation platforms, although a given modality may also provide multiple types of imaging outcomes. The objective is to obtain a more accurate understanding of brain processes by combining two or more datasets obtained with different instruments. For example, in the study of epilepsy, simultaneous acquisition of EEG and fMRI has been employed to improve the spatio-temporal resolution of either data with the aim of localizing epileptic foci [74]. Statistical models for multi-modal analysis are necessarily integrative. In particular, Bayesian methods are well suited for the analysis of multi-modal data, due to their ability to integrate the data into a hierarchical model. We refer to the reviews in [75, 76] for a discussion of general strategies for multi-modal analysis and to [73] for a review of Bayesian methods. Jorge et al. in [77] present a review of the most relevant EEG-fMRI integration approaches for the study of human brain function.

For example, Kalus et al. in [78] use EEG-informed spatial priors in their Bayesian variable selection approach to detect brain activation from fMRI data. Specifically, they relate the prior activation probabilities to a latent predictor stage \(\varvec{\zeta }=(\zeta _1, \ldots , \zeta _V)^\top \) via a probit link \(p(\gamma _v=1)=\varPhi (\zeta _v)\), with \(\varPhi \) the standard normal cdf and \(\zeta _v\) consisting of an intercept term and an EEG effect, that is

$$\begin{aligned} \zeta _v=\zeta _{0, v}+\zeta _{EEG, v}= {\left\{ \begin{array}{ll} \varsigma _{0,v}, &{} \text {if }\text {predictor 0} \\ \varsigma _{0,v}+\varsigma _GJ_v, &{} \text {if }\text {predictor } {glob},\\ \varsigma _{0,v}+\varsigma _vJ_v, &{}\text {if }\text {predictor } {flex} \end{array}\right. } \end{aligned}$$
(5)

where \(J_v, v=1, \ldots , V\) is the continuous spatial EEG information and where 0, glob and flex indicate three types of predictors: predictor 0 contains a spatially-varying intercept \(\varvec{\varsigma }_0=(\varsigma _{0,1}, \ldots , \varsigma _{0,V})^\top \), and corresponds to an fMRI activation detection scheme without incorporating EEG information; predictor glob contains a global EEG effect \(\varsigma _G\) in addition to the intercept; predictor flex contains a spatially-varying EEG effect \(\varvec{\varsigma }=(\varsigma _1, \ldots , \varsigma _V)^\top \).

An interesting avenue of research is the development of methods for the integration of fMRI and structural imaging data. Here, we mention a recent proposal by Chiang et al. in [79], where the authors develop a multi-subject multi-modal vector autoregressive (VAR) modeling approach for inference on effective connectivity based on resting-state functional MRI data. More in detail, their method uses Bayesian variable selection techniques to allow for simultaneous inference on effective connectivity at both the subject- and group-level. Furthermore, it accounts for multi-modal data by integrating structural imaging information into the prior model, encouraging effective connectivity between structurally connected regions.

6.3 Imaging Genetics

Recent developments in molecular genetics have lowered the cost of individual genetic profiling, creating the opportunity to collect massive amounts of genetic information and neuroimaging data on the same subjects. Thus, the field of imaging genetics has emerged as a promising approach for investigating the genetic determinants of brain processes and related behaviors or psychiatric conditions. Ultimately, the objective is to identify specific brain activity features and genetic variants that can be used as biomarkers to assist medical decision making. However, the high-dimensionality and complexity of the data add challenges to statistical analysis. On one hand, there is a problem of variable selection and multiple decision testing, due to the large number of variables’ calls and the necessity to identify a sparse set of relevant fMRI features or genetic covariates. On the other hand, naive multi-step multivariate approaches may lead to results that are difficult to interpret, especially if existing biological information is not incorporated at some stage of the analysis.

Nathoo et al. in [80] provide a comprehensive review of recent statistical approaches for the joint analysis of high-dimensional imaging and genetic data, with particular consideration for approaches proposed within the frequentist paradigm. In particular, they distinguish massive univariate and voxel-wise approaches, where the spatial association among separate brain regions is not explicitly modeled, from more sophisticated multivariate approaches, either through regression techniques or low rank regression, mixture models, and group sparse multi-modal regression.

In the Bayesian literature, Stingo et al. in [81] have proposed a hierarchical mixture model based on ROI summary measures of BOLD signal intensities measured on schizophrenic patients and healthy subjects. The model incorporates prior knowledge via network models that capture known dependencies among the ROIs. More specifically, let \(\{x_{ij},~i=1,\ldots ,n,~j=1,\ldots ,p\}\) indicate the ROI-based summaries of BOLD signal intensity on a set of p features (the anatomical ROIs) in n subjects. The authors envision that some of the features could discriminate the n subjects into K separate known groups (e.g., schizophrenia cases and healthy controls). Let \(\varvec{\gamma }=(\gamma _1,\ldots ,\gamma _p)^\top \) be a latent binary vector such that \(\gamma _j=1\) if the j-th feature is discriminatory and \(\gamma _j=0\) otherwise. By employing a discriminant analysis framework, they model the data as a mixture model of the general type

$$\begin{aligned} f_{k}(x_{ij}|\gamma _j)=(1-\gamma _j)\,f_{0}(x_{ij}; \theta _{0j}) +\gamma _j\, f(x_{ij}; \theta _{kj}), \quad k=1,\ldots ,K, \end{aligned}$$
(6)

where \(f_{0}(x_{ij};\theta _{0j})\) describes the distribution of the “null” model for the non-discriminatory features, while \(f(x_{ij}; \theta _{kj})\) is the distribution of the measurements on the discriminatory features for subjects in group k. Gaussian distributions are assumed for the mixture components, that is \(f_{0}(x_{ij}; \theta _{0j})=N(0,\sigma _{0j}^{2})\), and \(f(x_{ij}; \theta _{kj})=N(\mu _{kj},\sigma _{kj}^{2})\). A spatial MRF prior that captures available knowledge on connectivity among regions of the brain is employed to select ROIs that discriminate schizophrenic from healthy controls:

$$\begin{aligned} P(\gamma _j| \gamma _i, i \in N_j) = \frac{\exp (\gamma _{j} F(\gamma _{j}))}{1+\exp (F(\gamma _{j}))}, \end{aligned}$$
(7)

where \(F(\gamma _{j})= e + f \sum _{i \in N_j} (2 \gamma _i - 1)\) and \(N_j\) is the set of direct neighbors of feature j in the network. The parameter e controls the sparsity of the model, while higher values of f encourage neighboring features to take on the same \(\gamma _{j}\) value. Note that if a feature does not have any neighbor, then its prior distribution reduces to an independent Bernoulli, with parameter \(\exp (e)/[1+\exp (e)]\), a prior often adopted in the Bayesian variable selection literature. The model also allows the group-specific components to depend on selected covariates (e.g., single nucleotide polymorphisms—SNPs) measured on the individual subjects. Let \(\mathbf{Z}_{i}=(Z_{i1}, \ldots , Z_{iR})^\top \) denote the set of available covariates for the i-th individual. The vectors of the means of the discriminating components are modeled as subject-specific parameters

$$\begin{aligned} \varvec{\mu }_{ik(\gamma )}=\varvec{\mu }_{0k(\gamma )}+\varvec{\beta }_{k(\gamma )}^\top \,\mathbf{Z}_i, \quad k=1,\ldots ,K, \end{aligned}$$
(8)

where \(\varvec{\mu }_{0k(\gamma )}\) is a baseline process which captures long-range brain connectivity and \(\varvec{\beta }_{k(\gamma )}\) is a \(R \times p_{\gamma }\) matrix of coefficients describing the effect of the covariates on the observed measurements. This model formulation uses component-specific parameters that determine how covariates, and other relevant spatial characteristics, affect the observed measurements \(\mathbf{x}_{i(\gamma )}\), on the n subjects, given the selected features. In this respect, the classification of the n subjects in K groups is driven by the subjects’ covariates. Different covariates are allowed to affect the individual mixture components, by modeling the \(\varvec{\beta }_{k(\gamma )}\) through spike-and-slab priors. Posterior inference will result in the simultaneous selection of a set of discriminatory ROIs and the relevant SNPs, together with the reconstruction of the correlation structure of the selected regions.

More recently, Greenlaw et al. in [82] have developed a hierarchical Bayesian model with regularizing shrinkage priors, such that the posterior mode corresponds to the estimator proposed by Wang et al. in [83], in order to obtain uncertainty estimates on the regression parameters. Chekouo et al. in [84] have extended the proposal in [81] by developing an integrative Bayesian risk prediction model, which directly links genetic and imaging data with the clinical outcome (e.g., a clinical diagnosis of schizophrenia). The model allows for the identification of a regulatory network between SNPs and ROI intensities, thus exploiting the imaging features as an intermediate phenotype, and further assumes that: (i) genetic factors may affect non-discriminatory brain regions (as endophenotypes); and that (ii) genetic factors may be independently associated with disease status without the mediation of a discriminatory imaging endophenotype. With respect to other approaches, the risk predictive framework allows a direct assessment of the individual probability of being affected by schizophrenia as a function of the observed fMRI and SNP biomarkers, and can also be seen as an extension of recently proposed scalar-on-image regression models to the challenging setting of imaging genetics.

7 Conclusions

The chapters in this volume provide a stimulating outlook over many current trends in the analysis of brain imaging data. Well-thought statistical models contribute to a deeper understanding of brain functioning, and its disruption as a consequence of disease. The approaches need to take appropriately into account the physiology of the different neuroimaging experiments. However, the involvement of a large community of statisticians in the analysis of this type of data is relatively recent. The section on Statistics in Imaging of the American Statistical Association was only founded in 2012, with the goal to increase the influence of statistics and statisticians on imaging science.

All the contributions in this volume show how the use of novel advanced statistical methods could contribute greatly to future developments in neuroimaging. For example, the chapters by Cabassi et al. and by Cappozzo et al. call attention to the possibilities offered by recent developments in object-oriented data analysis in non-Euclidean spaces. The chapter by Bertarelli et al. also proposes functional data approaches for clustering fMRI time-series. The chapter by Gasperoni and Luati uses a modern robust filtering method for detecting spontaneous activations in resting state fMRI time series and thus improving the estimation of the hemodynamic response function. The chapter by Caponera et al. emphasizes the use of established spatio-temporal modeling techniques to take appropriately into account the dependence structure of the data, achieve dimension reduction, and provide an interpretable assessment of functional connectivity across brain regions. The chapter by Aliverti et al. uses a sequential hierarchical approach that leverages multiple available methods in literature, in order to remove noise from the fMRI signal, estimate the functional brain connectivity networks and investigate the association between phenotypes and functional connectivity patterns. Finally, the chapter by Crispino et al. employs latent space models from network analysis to estimate the structural connectivity information provided by DTI data and examine how structural connectivity may inform patterns of activation captured by functional imaging techniques among regions of interest.

The fast developments in the Neurosciences will keep proposing new challenges to the applied statistician. Multimodal analysis, imaging genetics, and predictive modeling techniques are still at their infancy, in the attempt to identify satisfactory biomarkers for targeted intervention. Novel efficient algorithms may fully exploit the information of existing technologies. For example, fMRI time courses are originally complex-valued signals giving rise to both magnitude and phase data. However, most studies—including all those discussed in this volume—typically use only the magnitude signals and thus irreversibly discard half of the data that could potentially contain important information. Multiple studies show that detectability in low signal-to-noise regions of magnetic resonance images is improved by using the full complex-valued fMRI data. Yu et al. in [85] have recently proposed a Bayesian variable selection approach for detecting brain activation at the voxel level from complex valued fMRI data, where inference is conducted via a complex-valued extension of the Expectation-Maximization (EM) algorithm for Bayesian variable selection of [86] that allows for fast detection of active voxels in large-dimensional complex-valued fMRI. By considering both the real and imaginary information, their approach is able to detect more true positives and less false positives than magnitude-only models, especially when the signal-to-noise ratio is small.

New high-resolution imaging technologies promise to deliver more accurate representations of brain processes. In the last few years, the US NIH Brain Initiative has sponsored multiple grants for developing several next generation human imaging techniques. For example, investigators at University of California, Berkeley are now working on MR Corticography (MRCoG), a new tool for studying neuronal circuitry that improves resolution by an order of magnitude, making it possible to visualize cortical layers and microcircuit columns throughout the whole brain. Researchers at Stanford University are developing a novel PET photon detector concept that promises to enhance substantially PET image reconstruction and should permit joint PET-MR (magnetic resonance) imaging. Joint PET-MR collection would allow multi-modal, simultaneous image acquisition of neuron receptor function, functional MR, and high-resolution neuroanatomy. Other technological developments promise to enhance the spectrum of experimental designs available to investigators. Boto et al. in [87] have recently introduced a magnetoencephalography system that can be worn like a helmet, allowing free and natural movement during scanning. The system would make it easier to conduct experiments with subjects who are traditionally difficult to study under a fixed scanner, such as young children with epilepsy or patients affected by Parkinson’s disease. One of the experiments conducted by the investigators to test the new technology also included a simple “ping-pong” ball-game in which subjects were asked to bounce a table tennis ball on a bat!

In addition to new technologies, new directions of research will surface. For example, the so-called gut-brain axis has been recently implicated in multiple conditions. The enteric nervous system in our abdomen has been shown to communicate directly with the brain through the vagus nerve, which connects the brain with many of our major organs. For this reason, the enteric nervous system is often referred to as our “second brain”. Feelings of appetite and satiety are mediated through complex pathways where gut hormones play crucial roles. Understanding the brain-gut mechanisms of appetite and weight control may help the identification of novel therapeutic interventions. The gut microbiome has been implicated also in the development of irritable bowel syndrome as a consequence of anxiety and stress, as well as of neurological/behavioral disorders like autism, ADHD, and various mood disorders. Due to the complexity of the data employed for those investigations, the contribution of advanced statistical models will be necessary to ensure interpretable and reproducible findings for clinical diagnosis and future therapeutic research.