Keywords

1 Introduction

Understanding how information flow in the brain conducts a specific cognitive task is a major scientific challenge. In the field of neuroscience, studying brain function entails learning about how the brain processes information. Researchers can figure out how information flows through different parts of the brain to gain such knowledge [8, 9, 12]. In most cases, information flow in the brain is random and directional. Granger causality [1, 14], or Transfer Entropy [4, 19], can be used to calculate directional information flow. However, the methods described above can only calculate one-way information flow from one area to another rather than bidirectional information and instantaneous information flow between functional brain regions. Directed information was designed to characterize channels with feedback but the way that we applied it to neuroscience to solve the aforementioned problems [10]. It’s mainly a tool for inference, some causality inference, and more along the lines of Granger causality in spirit [14]. Granger causality says that X causes Y because time series X causes a time series Y if you can predict Y if you also condition on the observations of X. Directed information encapsulates this by prediction reduction attribute so a decrease in randomness. Comparison of other information-theoretic methods applied in neuroscience, DI not only quantify measure feed-forward information but also feedback information. The brain is a complex system, and each region is densely connected to infer specific cognitive tasks. Therefore, directed information can help us understand how neural information flows among brain regions. This study uses a directed information method to measure information communication among visual scene category neural networks with fMRI dataset.

According to fMRI studies, some scene-selective regions in the human visual cortex have been discovered and linked to higher-order functions like scene perception or category, such as Primary Visual Cortex (V1), Fusiform Face Area (FFA) [7], Occipital Face Area (OFA), Occipital Place Area (OPA), Parahippocampal Place Area (PPA) [3], and retrosplenial cortex (RSC) [13]. Although all these regions respond well to scenes and objects, less research has been done on how these regions communicate with one another during the experience of natural scenes, specific bidirected information flow among these regions. That’s also our primary motivation in this study.

2 Methods

2.1 Definition

Shannon Entropy. Assuming a random variable X, which can get a value like x as probability \(\mathrm {P}(\mathrm {X}=\mathrm {x})\), entropy of this variable can be expressed as:

$$\begin{aligned} \mathrm {H}(\mathrm {X})=-\sum _{\mathrm {x}} \mathrm {p}(\mathrm {x}) \log _{2} \mathrm {p}(\mathrm {x})=\left\langle -\log _{2} \mathrm {p}(\mathrm {x})\right\rangle \end{aligned}$$
(1)

Conditional Entropy. The conditional entropy of X given Y is the average uncertainty that remains about x when y is known:

$$\begin{aligned} H_{X | Y}=-\sum _{x, y} p(x, y) \log _{2} p(x | y)=H_{X, Y}-H_{Y} p(x | y)= \frac{p(x,y)}{p(y)} \end{aligned}$$
(2)

Mutual Information. Given two random variables X and Y, the mutual information can be calculate as the difference of sum of individual entropy and the entropy of the variables considered jointly as a single system. It can be mathematically formula expressed as:

$$\begin{aligned} \mathrm {H}(\mathrm {X} ; \mathrm {Y})=\mathrm {H}(\mathrm {X})+\mathrm {H}(\mathrm {Y})-\mathrm {H}(\mathrm {X}, \mathrm {Y}) \end{aligned}$$
(3)

Conditional Mutual Information. The conditional mutual information of X and Y given Z is the uncertainty that remains about x and y when z is known:

$$\begin{aligned} \begin{aligned} \mathrm {I}(\mathrm {X} ; \mathrm {Y} \mid \mathrm {Z})&=\mathrm {H}(\mathrm {X} \mid \mathrm {Z})+\mathrm {H}(\mathrm {Y} \mid \mathrm {Z})-\mathrm {H}(\mathrm {X}, \mathrm {Y} \mid \mathrm {Z}) \\&=\left\langle \log _{2} \frac{\mathrm {p}(\mathrm {x} \mid \mathrm {y}, \mathrm {z})}{\mathrm {p}(\mathrm {x} \mid \mathrm {z})}\right\rangle \end{aligned} \end{aligned}$$
(4)

Granger Causality. The Granger causality (GC) idea firstly proposed by Granger in 1969 [5]. The basic idea is, if two signal \(\mathbf {X}\) and \(\mathbf {Y}\) have caucal relationship, instead of history value of \(\mathbf {Y}\), then \(\mathbf {Y}\) also can be predicted given \(\mathbf {X}\) information. Assuming \(\mathbf {X}^{n}=\left[ X_{1}, X_{2}, \ldots , X_{n}\right] \) and \(\mathbf {Y}^{n}=\left[ Y_{1}, Y_{2}, \ldots , Y_{n}\right] \) are two continue time series. The GC analysis can be expressed as a auto-regressive or line prediction model as follows:

$$\begin{aligned} Y_{i} =\sum _{j=1}^{P} a_{j} Y_{i-j}+e_{i} \end{aligned}$$
(5)
$$\begin{aligned} Y_{i} =\sum _{j=1}^{P}\left[ b_{j} Y_{i-j}+c_{j} X_{i-j}\right] +\tilde{e}_{i} \end{aligned}$$
(6)

where \(e_{i}\) indicate error of prediction \(Y_{i}\) given only past valye of Y, (\(Y_{i-1}, ..., Y_{i-P}\)), and \(\tilde{e}_{i}\) is the error of prediction \(Y_{i}\) given both history value of Y (\(Y_{i-1}, ..., Y_{i-P}\)) and previous value of X (\(X_{i-1}, ..., X_{i-P}\)). Based on GC properties above described, GC analysis gradually applied in the neuroscience disincline.

Transfer Entropy. Another widely applied causal measurement in neuroscience is Transfer Entropy (TE) [17]. How the prior knowledge affects the next state or predicts future state can use TE to address this question. TE can be defined as:

$$\begin{aligned} \begin{aligned} TE_{X \rightarrow Y}&=I\left( Y_{t+1}: X_{t-k: t} | Y_{t-l: t}\right) =H\left( Y_{t+1} | Y_{t-l: t}\right) -H\left( Y_{t+1} | Y_{t-l: t}, X_{t-k: t}\right) \\&=\sum _{y_{t+1}} \sum _{x_{t-k: t}} \sum _{y_{t-l: t}} p\left( x_{t+1}, x_{t-k: t}, y_{t-l: t}\right) \log \frac{p\left( x_{t+1} | x_{t-k: t}, y_{t-l: t}\right) }{p\left( x_{t+1} | p\left( x_{t-k: t}\right) \right. } \end{aligned} \end{aligned}$$
(7)

The TE can be used measured the directed information flow from Y \(\rightarrow \) X or X \(\rightarrow \) Y. The basic theory of TE can be shown graphically in Fig. 1. In neuroscience studies, it is very used for defined the directed causality effects between neural signals. However, the real neural activity not just single directed information flow. Neurons can use resonance at the same time, that means X \(\leftrightarrow \) Y. The pitfall of TE is that it cannot measure the bi-directed information flow at the same time. A comprehensive review on TE estimate directed information flow could be found in [16].

Fig. 1.
figure 1

The diagram of transfer entropy flow from \(X \rightarrow Y\) (X causes Y) and vice versa. \(X_{t}\) and \(Y_{t}\), with \(t=1, \cdots , n\) indicate two time series, respectively. The time series \(Y_{t}\) is caused not only by the previous history of X, \(X_{t},X_{t-1},X_{t-2}\), and \(X_{t-3}\), but also by the caused by self-previous history, \(Y_{t-1},Y_{t-2}\), and \(Y_{t-3}\).

2.2 Directed Information

In this section, we are going to describe directed information from mathematics view. Assuming uppercase letters X and Y denoted random variables, and denote n-tuple \(\left( X_{1}, X_{2}, \ldots , X_{n}\right) \) as \(X^{n}\). The information flow from \(X^{n}\) to \(Y^{n}\) can be formula as,

$$\begin{aligned} I\left( X^{n} \rightarrow Y^{n}\right) \triangleq \sum _{i=1}^{n} I\left( X^{i} ; Y_{i} \mid Y^{i-1}\right) =H\left( Y^{n}\right) -H\left( Y^{n} \Vert X^{n}\right) \end{aligned}$$
(8)

Where \(H\left( Y^{n}\right) -H\left( Y^{n} \Vert X^{n}\right) \) is causally conditional entropy [15], and it can be defined as,

$$\begin{aligned} H\left( Y^{n} \Vert X^{n}\right) \triangleq \sum _{i=1}^{n} H\left( Y_{i} \mid Y^{i-1}, X^{i}\right) \end{aligned}$$

Comparison of mutual information,

$$\begin{aligned} I\left( X^{n} ; Y^{n}\right) =H\left( Y^{n}\right) -H\left( Y^{n} \mid X^{n}\right) \end{aligned}$$

The condition entropy instead of causally conditional entropy in the directed information. Meanwhile, directed information is not symmetric, e.g., \(I\left( Y^{n} \rightarrow X^{n}\right) \ne I\left( X^{n} \rightarrow Y^{n}\right) \) in general. On the contrary, reverse information flow can be defined as,

$$\begin{aligned} I\left( Y^{n-1} \rightarrow X^{n}\right) =\sum _{i=1}^{n} I\left( Y^{i-1} ; X_{i} \mid X^{i-1}\right) \end{aligned}$$
(9)

It has a number of significant properties, some can be found in [1, 15]. For the sake of brevity, we’ll just reveal two enlightening conservation rules. Based on Massey and Massey [11] the conservation law,

$$\begin{aligned} I\left( X^{n} ; Y^{n}\right) =I\left( X^{n} \rightarrow Y^{n}\right) +I\left( Y^{n-1} \rightarrow X^{n}\right) \end{aligned}$$
(10)

Equation 10 is particularly enlightening in settings where \(X_{i}\) and \(Y_{i}\) appear alternately, as shown in Fig. 2.

Fig. 2.
figure 2

The interaction between \(X^{n}\) and \(Y^{n}\) sequence.

In some cases, \(X^{n}\), \(Y^{n}\) may happen simultaneously, such as neural network in the brain. The following is another conservation law stated in  [1] which, in such situations, may be more insightful than that in Eq. 10. The instantaneous influence can be calculated through directed information and reverse directed information as shown in Fig. 3(c),

$$\begin{aligned} \sum _{i=1}^{n} I\left( X_{i} ; Y_{i} \mid X^{i-1}, Y^{i-1}\right) = I\left( X^{n} \rightarrow Y^{n}\right) - I\left( X^{n-1} \rightarrow Y^{n}\right) \end{aligned}$$
(11)
Fig. 3.
figure 3

The three possible way of neural information flow. (a) shown information flow from A to B, (b) shown information flow from B to A, (c) shown bidirectional information flow between A and B, respectively.

If \(\sum _{i=1}^{n} I\left( X_{i} ; Y_{i} \mid X^{i-1}, Y^{i-1}\right) =0\). That is, they do not have an instantaneous effect on one another. In this study, the Context-Tree Weighting (CTW) algorithm proposed by Willims [18] was used to estimate DI [6] flow among visual scene neural networks and its powerful algorithms to compress data.

2.3 FMRI Dataset

The public BOLD5000 datasetFootnote 1 [2] used in this study when we estimated bidirected information flow among visual scene neural networks (see Fig. 4). The fMRI experiments used a dataset obtained from 4 subjects (aged 24 to 27 years) with normal or corrected-to-normal vision, who each viewed 5254 images over 15 scanning sessions. The stimuli images were selected from three classical computer vision datasets. They are SUN dataset (1000 Scene Images)Footnote 2, COCO datasetFootnote 3, and ImageNet dataset (1916 images)Footnote 4, respectively.

Fig. 4.
figure 4

Nature scenes and objects selective ROIs.

3 Result

In this section, we experimentally compared correlation, mutual information, transfer entropy, and directed information for quantifying information flow in the visual scenes neural networks. In Fig. 5, we have shown functional connectivity among visual scenes neural networks with correlation, mutual information, and transfer entropy approaches. We found that mutual information can capture more information than correlation, but both methods do not quantify information flow direction. In Fig. 6, the graphs depict the functional connectivity between visual scenes neural networks, and the strength of edges color represents connectivity weights in which consistence of functional connectivity matrix in Fig. 5. We found that information flow between the scene-selective areas, e.g., OPA, PPA, RSC, and object-selective areas, e.g., LOC, plays an important role in visual scene/object categories or recognitions. Nevertheless, we are interested in whether information feedback and resonance information flow in the high-level visual cortex. Therefore, in Fig. 7 and Fig. 8, we found consistent results with correlation and mutual information methods. However, we also found some unknown results that are resonance information exists in the high-level visual cortex.

Fig. 5.
figure 5

The functional connectivity matrix between scene/object-selective ROIs was estimated with correlation, mutual information, and transfer entropy methods.

Fig. 6.
figure 6

The graphs depict functional connectivity in the high-level visual cortex with correlation with threshold 0.05, mutual information with threshold 0.1, and transfer entropy (arrow indicates the direction of information flow). Edge thickness is proportional to correlation coefficients, mutual information, and directed transfer entropy.

Fig. 7.
figure 7

The left matrix shows pair-wise directed functional connectivity. The middle image shows pair-wise instantaneous influence, and the right image indicates pair-wised reverse-directed functional connectivity in the high-level visual cortex. Therefore, there has a weak reverse-directed information flow in the visual scenes and objects neural networks.

Fig. 8.
figure 8

The graph depicted functional connectivity of the representation from several scenes ROIs corresponding to Fig. 7, respectively. The graph on the left depicts pair-wise directed functional connectivity. The middle graph depicts pair-wise reverse directed functional connectivity, whereas the right figure depicts pair-wise instantaneous influence in the high-level visual cortex.

4 Discussion and Conclusions

This paper begins with an information-theoretic perspective and quantifies information flow in the high-level visual neural networks. It collects directed and reverse directed information, mutual information, and resonance information between the brain’s left and right regions of interest. It opens up a new avenue for us to investigate what happens when pairwise neural signal entanglement occurs. It means a lot in understanding neural signal flow in the brain. However, there are some limitations in which we should point out in the following contents.

First, we got directed information through CTW estimator, and it needs to satisfy that input data should be binary value. In other words, we need to convert the BOLD signal into a binary value that means we will lose some information when we estimate functional connectivity between ROIs. Therefore accuracy estimate binary BOLD signal is a crucial problem when we are going to quantify directed information flow. Second, we directly used in visual scenes/objects selective-ROIs in which defined via t-statistics. Considered effect size and functional overlap problems, the estimated information flow through information-theoretical methods are not accurate. In the following study, on the one hand, we need to consider how to avoid or solve the problems mentioned above. On the other hand, we can reconstruct nature images from the BOLD signal to confirm the information flow in the high-level visual cortex.

Nevertheless, we still found some interesting results through estimated directed information. First, we found that there has information flow within the scene-selective areas, e.g., OPA, PPA, and RSC. Second, we also found that information flow between the scene-selective areas, e.g., RSC, and the object-selective areas, e.g., LOC. Third, we found that there has weak reverse-directed information flow in the high-level visual cortex.

5 Code Availability

The code used to reproduce result can be available under the request author.