1 Introduction

Neuroimaging studies associate autism spectrum disorder (ASD) with local structural and functional brain deficits [1, 2]. Since ASD diagnosis is highly challenging, advanced machine learning-based diagnosis frameworks have been developed [3], most of which leveraging functional magnetic resonance imaging (fMRI) and diffusion tensor imaging (DTI) neuroimaging modalities [4,5,6]. Typically, there are two conventional representations of the brain derived from MRI data: (i) intensity images, and (ii) connectivity networks (also called connectome). To boost diagnosis accuracy of brain disorders, one would ideally use high-resolution brain images and connectomes. However, MRI data with high-resolution are very scarce due to the limited number of high-resolution 7T MRI scanners worldwide. To circumvent this issue, several works focused on designing methods for synthesizing high-resolution images (7T-like MR) from low-resolution images (3T MR) [7]. However, to the best of our knowledge, existing works on predicting high-resolution data (HR) from low-resolution (LR) data overlooked connectomic data, i.e., brain networks. Typically, a brain connectome is the result of time-consuming MRI data processing pipelines which integrate an image to brain atlas parcellation step such as Automated Anatomical Labelling (AAL) [8] with 90 anatomical regions of interest (ROIs), defining the resolution (or size) of the constructed brain connectome. To generate brain connectomes at different resolutions or scales, one generally needs to process and register the input MRI to each target MRI atlas space for automatic labelling of brain ROIs. However, the bench-to-bedside image processing pipeline to transform an MR image into a connectome is time-consuming –particularly when using high-resolution brain images and atlases. Alternatively, one can learn how to directly synthesize HR brain connectome from LR brain connectome to alleviate the computational cost of image processing including the major steps of registration and label propagation, which are highly prone to bias.

Interestingly, works on brain network to network prediction are very limited [9] with the exception of the recent work [10] proposing the first framework for missing multiple target brain networks prediction from single source brain network. Since multi-view brain networks have different distributions, [10] integrated a domain alignment step to find shared space where source and target networks are projected while maximizing the correlation between their respective distributions. Fundamentally, this work is based on predicting the missing target views from source views by learning how to select the best source training samples in the shared space in terms of (i) closeness to the testing subject in the LR domain, and (ii) their cross-domain overlap score based on the number of shared local neighbors these training samples have across source and target domains. We term the set of selected source training subjects, which are close to the testing subject, as the ‘testing neighborhood’ (TN). Next, the missing network in the target domain is predicted by linearly fusing the selected source training samples in the previous step. Although pioneering, this work is limited by the use of an inherently flat training sample selection, which overlooks the hierarchical structure that might be present in the testing neighborhood. In order to address these challenges, we propose the first framework that predicts a HR brain network from a LR brain network rooted in a hierarchical multi-layer embedding and alignment of LR and HR testing neighborhoods.

We base our method on a simple hypothesis: if one can identify the best hierarchically embedded representations of neighborhood including training samples centered around a given testing subject in the LR domain, one can use a weighted average of their corresponding samples in the HR domain to predict the missing testing HR network. To account for the domain shift where the distribution of the source LR and target HR domains might be misaligned, we first leverage canonical correlation analysis (CCA) to find a coupled LR-HR manifold [11] that nests projected LR and HR networks while maximizing the correlation between their respective distributions. Next, we learn a subject-to-subject similarity matrix using multi-kernel connectomic manifold learning [12] which models the relationships between all training and testing samples in the coupled space. This defines the baseline layer of our HR prediction framework. Next, we propose to hierarchically learn and embed LR neighoborhoods centered at the LR testing sample and its corresponding HR neighborhood at each CCA-based domain alignment layer. In the last layer, we identify the most similar training samples with the highest cross-domain scores to the testing subject in the LR domain for prediction in the HR domain. Both domain alignment and manifold embedding steps are hierarchically implemented across L layers. Ultimately, we use the final output of aligned and embedded shared subspace to predict HR network. Specifically, we score the selected closest training samples to the testing LR network with the highest hierarchical cross-domain neighborhood overlap. We show that our proposed method achieves a better prediction accuracy in comparison with two baseline methods [10, 12].

Fig. 1.
figure 1

Pipeline of the proposed hierarchical multi-layer embedding and alignment of low-resolution (LR) and high-resolution (HR) neighborhoods for HR network prediction. (A) Each training subject has a LR network and a HR network. Each network is encoded in a symmetric connectivity matrix, whose upper off-diagonal part is vectorized. We store training LR feature vectors in a training LR matrix \(\mathbf {D}_{LR}\) and HR feature vectors in training HR matrices \(\mathbf {D}_{HR}\). (B) By using the training LR matrix \(\mathbf {D}_{LR}\) and pairing it with its corresponding training HR matrix \(\mathbf {D}_{HR}\), we learn a coupled LR-HR manifold using Canonical Correlation Analysis (CCA) for domain alignment. We then use multi-kernel learning [12] to learn a similarity matrix that models the relationship between training and testing subject embeddings (\(\mathbf {Z}_{LR}^{(0)}\)) in the coupled manifold in layer \(l=0\). We also learn a HR manifold that nests only embedded (\(\mathbf {Z}_{HR}^{(0)}\)) training subjects in the first layer. In the first layer, we primarily identify the top \(\kappa _0\) training LR samples in the aligned domain with the highest learned similarities to the LR testing sample. This selected training set defines the testing neighborhood, which will be hierarchically mapped and aligned in the next layers. (C) We hierarchically learn a LR-HR domain alignment and neighborhood embedding and select a new set of top \(\kappa _l\) training subjects with the highest learned similarity scores to the embedded testing subject. In the last hierarchical layer (\(l=L\)), we select training hierarchically embedded LR samples which are (i) most similar to the hierarchically embedded LR testing samples, and (ii) have the highest cross-domain overlap in proximal neighbors. Last, we average the corresponding HR networks of the selected LR training networks in the last layer L to predict the target missing HR network.

2 Proposed Method

In this section, we detail our proposed hierarchical LR and HR domain alignment and testing neighborhood embedding for brain network super-resolution. We denote matrices by boldface capital letters, e.g., \(\mathbf {X}\), and scalars by lowercase letters, e.g., x. We denote the transpose operator and the trace operator as \(\mathbf {X}^T\) and \(tr(\mathbf {X})\), respectively. We illustrate the important steps of the proposed pipeline in Fig. 1.

Feature Extraction. Each brain is represented by two of connectivity matrices in LR and HR domains, respectively (Fig. 1–A). Each element in a single matrix captures the relationship between two anatomical regions of interest (ROIs) using a specific metric (e.g., correlation between neural activity or similarity in brain morphology) and the number of ROIs of LR and HR are denoted \(n_1\) and n, respectively. We then vectorize each connectivity matrix of the \(i^{th}\) subject to define a feature vector \(\mathbf {f}_{LR}^i\) (resp. \(\mathbf {f}_{HR}^i\)) for its HR (resp. LR) brain network. We concatenate the off-diagonal elements in the upper triangular part of the input matrices (LR and HR, respectively). Hence, each LR brain network is represented by \(n_1 \times n_1 \) matrix and the vectorization of the matrix produces a (\(d_1 = n_1 \times (n_1-1)/2\)) size feature vector. Each HR network is encoded in \(n \times n \) matrix which is vectorized into a (\(d = n \times (n-1)/2\)) size feature vector. Given N subjects, we leave-one-out cross-validation and store the remaining \((N-1)\) training LR feature vectors in a training LR matrix \(\mathbf {D}_{LR}\in \mathbb {R}^{(N-1) \times {d_1}}\) and HR feature vectors in training HR matrices \(\mathbf {D}_{HR}\in \mathbb {R}^{(N-1) \times {d}}\).

Step 1: CCA-based LR and HR domain alignment. Our main goal is to learn how to predict HR brain network from a given LR brain network (Fig. 1–B). However, this learning process might be sensitive to the domain fracture issue, where data distributions driven from different domains are not inherently and naturally aligned. To solve this issue and motivated by the fact that canonical correlation analysis (CCA) is efficient in analyzing and mapping two sets of variables onto a shared aligned space [13, 14], we learn CCA mappings that align LR brain networks with HR brain networks, respectively, to a shared space. Given a training LR matrix \(\mathbf {D}_{LR} \in \mathbb {R}^{(N-1) \times {d_1}}\) comprising \(N-1\) training feature vectors, each of size \({d_1}\), and a training HR matrix \(\mathbf {D}_{HR} \in \mathbb {R}^{(N-1) \times {d}}\), we estimate a LR mapping \(\mathbf {W}_{LR}\) and a target mapping \(\mathbf {W}_{HR}\) that transforms both onto the couple LR-HR space, respectively. This produces LR and HR embeddings (\(\mathbf {Z}_{LR}^{(l)}\) and \(\mathbf {Z}_{HR}^{(l)}\)) in each layer l:

$$\begin{aligned} \mathbf {Z}_{LR}^{(l)}=\mathbf {D}^{(l)}_{LR}\mathbf {W}^{(l)}_{LR} \in \mathbb {R}^{(\kappa _l+1) \times d_{2,l}}\nonumber \\ \mathbf {Z}_{HR}^{(l)}=\mathbf {D}^{(l)}_{HR}\mathbf {W}^{(l)}_{HR} \in \mathbb {R}^{\kappa _l \times d_{2,l}} \end{aligned}$$
(1)

In the testing stage, we use the learned canonical transformation matrices to map the LR feature vector of a testing subject onto the shared space, where we learn how to identify the most similar and trustworthy training LR feature vectors to the testing LR network using LR and HR manifold learning.

Step 2: HR and LR manifold learning. Following the domain alignment, we learn pairwise similarities between LR samples (resp., HR) samples using the embeddings \(\mathbf {Z}_{LR}^{(l)}\) (resp. \(\mathbf {Z}_{HR}^{(l)}\)) in the aligned coupled LR-HR space. Specifically, we leverage the recent work of [12] proposing to learn a convenient cell-to-cell similarity function from a single-cell data as an input. SIMLR (single-cell interpretation via multi-kernel learning) firstly clusters samples into groups for identification of subgroups and projects into low-dimensional. This method finds a best distance metric for fitting structure of different groups by combining multiple kernels. The main advantage of SIMLR is the flexibility of the adoption of multiple kernel representations for calculating similarities although single cell data have varied statistical characteristic. By using this method, we learn two manifolds: (1) one LR manifold encoded in \(\mathbf {S}_{LR}\) similarity matrix \(\mathbf {S}_{LR}\), which integrates all training and testing samples, and (2) one HR manifold encoded in \(\mathbf {S}_{HR}\), modeling the relationship between training HR samples. (Fig. 1–B). Each kernel \(\mathbf {K}\) is Gaussian and expressed as follows: \(\mathbf {K}(\mathbf {f}_{LR}^i,\mathbf {f}_{LR}^j) = \frac{1}{\epsilon _{ij} \sqrt{2 \pi }} e^{ (- \frac{|\mathbf {f}_{LR}^i - \mathbf {f}_{LR}^j|^2}{2 \epsilon _{ij}^2}) }\), where \(\mathbf {f}_{LR}^i\) and \(\mathbf {f}_{LR}^j\) denote the feature vectors of the i-th and j-th subjects respectively and \(\epsilon _{ij}\) is defined as: \(\epsilon _{ij} = \sigma (\mu _i + \mu _j)/2\), where \(\sigma \) is a tuning parameter and \(\mu _i = \frac{ \sum _{l \in KNN(\mathbf {f}_{LR}^i)} |\mathbf {f}_{LR}^i - \mathbf {f}_{LR}^j| }{k}\), where \(KNN(\mathbf {f}_{LR}^i)\) represents the top neighboring subjects i of subject j. The learned similarity matrices in both LR and HR aligned domains should be small if the distance between a pair of subjects is large.

$$\begin{aligned} \mathbf {S}_{LR}^{(l)}=SIMLR(\mathbf Z^{(l)}_{LR}) \in \mathbb {R}^{(\kappa _l+1) \times d_{3,l}}\nonumber \\ \mathbf {S}_{HR}^{(l)}=SIMLR(\mathbf Z^{(l)}_{HR}) \in \mathbb {R}^{\kappa _l \times d_{3,l}} \end{aligned}$$
(2)

For simplicity, in the following sections we will abstract away the internal structure of the SIMLR in Eq. 2 and use \(\mathbf S_{LR}^{(l)}\) and \(\mathbf S_{HR}^{(l)}\) to denote an arbitrary SIMLR module learning the new embedding of similarities in aligned LR and HR domains, respectively.

Step 3: HR network prediction via hierarchical alignment and embedding of LR and HR testing neighborhoods. Our proposed HR prediction framework hierarchically learns a finer LR testing neighborhood embedding along with its corresponding HR neighborhood. The key idea is to learn the most similar top \(\kappa \) subjects to the testing subject in LR domain at layer l, by using the embeddings of neighbors generated from the previous layer \({l-1}\). Next, both learned LR and HR neighborhood embeddings are aligned using CCA (Fig. 1–C). Given the baseline learned similarity matrix \(\mathbf {S}_{LR}^{(0)}\) at layer \(l=0\), we detail below how the hierarchical alignment and embedding modules of \(\mathbf {S}_{LR}^{(0)}\) operate and how this process is iterated from layer l to layer \(l+1\).

\(\bullet \) Hierarchical alignment and embedding module. At baseline layer \(l=0\), we leverage multiple kernel manifold learning [12] (Step 2) to learn (i) the similarity matrix \(\mathbf {S}_{LR}^{(0)} \in \mathbb {R}^{(\kappa _1 \times d_{3,1})}\) where \(d_{3,1}<d_{2,1}\) between the testing LR network and training LR networks, and (ii) the similarity matrix \(\mathbf {S}_{HR}^{(0)} \in \mathbb {R}^{(\kappa _1 \times d_{3,1})}\) between all training HR networks.

\(\mathbf {S}^{(l)}_{LR} \in \mathbb {R}^{(\kappa _l+1) \times (\kappa _l+1) }\) denotes the most similar embedded LR samples to the embedded testing subject by SIMLR (i.e., embedding testing neighborhood) and \(\mathbf {S}^{(l)}_{HR}\in \mathbb {R}^{\kappa _l \times \kappa _l}\) the corresponding embedded HR samples in layer l. \(\kappa _l\) denotes the dimension of the embedded neighborhood in layer l.

Suppose that \(\mathbf {S}^{(l)}\) has already been computed, i.e., that we have computed the matrix in the l-th layer of our model. Given this input, the hierarchical \(l+1\) layer generates a new similarity matrix of training LR embeddings \(\mathbf Z_{LR}^{(l+1)}\) and \(\mathbf Z_{HR}^{(l+1)}\). In particular, we alternatingly apply the two following equations:

$$\begin{aligned} \mathbf Z_{LR}^{(l+1)}=CCA^{(l)}(\mathbf S^{(l)}_{LR}) \in \mathbb {R}^{(\kappa _l+1) \times d_{2,l}}\nonumber \\ \mathbf Z_{HR}^{(l+1)}=CCA^{(l)}(\mathbf S^{(l)}_{HR}) \in \mathbb {R}^{\kappa _l \times d_{2,l}} \end{aligned}$$
(3)
$$\begin{aligned} \mathbf S_{LR}^{(l+1)}=SIMLR^{(l)}(\mathbf Z_{LR}^{(l+1)}) \in \mathbb {R}^{{(\kappa _l+1)} \times {(\kappa _l+1)}}\nonumber \\ \mathbf S_{HR}^{(l+1)}=SIMLR^{(l)}(\mathbf Z_{HR}^{(l+1)}) \in \mathbb {R}^{{\kappa _l} \times {\kappa _l}} \end{aligned}$$
(4)

Step 4: Predicting HR networks using cross-domain shared neighborhood. Finally, in the last layer L, once the the most similar LR training neighbors to the testing LR sample with the highest cross-domain scores are identified, we retrieve their corresponding networks in the HR domain, then use weighted average to predict the target missing HR network. Basically, we define a ‘trust score’ for each training sample i similar to the testing subject j based on the overlap of their hierarchically embedded neighborhoods in the aligned LR and HR domains, respectively. Following the learning of \(\mathbf {S}_{LR}\) using all samples in the mapped source domain using SIMLR, we identify the top \(\kappa \)-closest training subjects to a given testing subject. Next, for each training sample, we find its nearest neighbors using \(\mathbf {S}_{LR}\) and \(\mathbf {S}_{HR}\), learned in the aligned target domain using only training subjects (Fig. 1).

This is rooted in the assumption that for a particular training subject which is close to the testing subject in the LR domain, the more shared neighbors it has across the embedded LR and HR neighborhoods in the last layer L, the more reliable it is in predicting the HR from the LR network, and thus it can be considered as trustworthy for the target prediction task. We compute a normalized trust score (TS) for each closest training neighbor to the testing subject by (i) first identifying the list of its top \(\kappa \) closest neighbors \(\mathcal {N}_{LR}\) in \(\mathbf {S}_{LR}\) and \(\mathcal {N}_{HR}\) in \(\mathbf {S}_{HR}\), then (ii) computing the normalized overlap between both lists as \(TS(\kappa ) = \frac{ \mathcal {N}_{LR} \bigcap \mathcal {N}_{HR}}{\kappa }\). The ultimate \(TN(\kappa )\) score is thus calculated as a soft overlap between \(\mathbf {S}^{(L)}_{LR}\) and \(\mathbf {S}^{(L)}_{HR}\) weighted by \(\mathbf {S}_{LR}\).

3 Results and Discussion

Connectomic Dataset and Method Parameters. We used leave-one-out cross-validation to evaluate the proposed prediction framework on 186 normal controls (NC) from Autism Brain Imaging Data Exchange (ABIDE I)Footnote 1 public dataset, each with structural T1w MR image. We used FreeSurfer [15] to reconstruct both right and left cortical hemispheres for each subject from T1-w MRI, and then parcellated each cortical hemisphere into 35 cortical regions using Desikan-Killiany Atlas. For each subject, we created cortical morphological brain networks derived from the cortical maximum principal curvature using the technique proposed in [16,17,18]. For SIMLR, we used a nested grid search, fixing the number of clusters c (\(1 \le c \le 5\)). We used 10 kernels. We set the number of \(L=3\) and the number of selected neighbors in each layer is defined as \(\kappa = \{50, 25, 5 \}\), respectively.

LR Data Synthesis via Downsampling HR Brain Connectomes. HR data downsampling or degradation models are frequently used in image super-resolution literature (i.e, MR images) for evaluation. For instance, [19] applied downsampling method to obtain LR images of \(256\,\times \,256\) and \(128\,\times \,128\) resolutions from \(512\,\times \,512\) HR images. By doing so, downsampling decreases the number of voxels and also causes the loss of image details, thereby creating a lower-resolution image. Similarly, we create LR networks for each subject by computing mean connectivity value of HR within a \( w\,\times \,w \) sized window, where w denotes the window size (\(w = {10,16,20}\)). Hence, we created three different LR network datasets through this mean-pooling process.

Evaluation and Comparison Methods. To evaluate the performance of our hierarchical HR prediction from LR framework, we benchmark our framework against: (1) [12] where we used SIMLR to identify the most similar neighbors to the left-out testing subject the LR domain without any domain alignment, (2) the baseline network prediction method integrating both manifold learning and aligned proposed by [10]. (Figure 2–A–C) demonstrates that our method achieves the lowest prediction mean absolute error (MAE) in comparison with baseline methods. Figure 2–D displays the predicted HR networks by different methods from an input LR network along with the residual networks in both left and the right hemispheres for a representative testing subject. Clearly, our method decreases the residual error. However, we would like to point out that our framework training is constrained by the availability of paired LR and HR training networks. In our future work, we will relax this constraint by allowing our high-resolution prediction framework to learn from unpaired training LR and HR brain connectomes.

Fig. 2.
figure 2

(A–C)Evaluating the prediction performance of our proposed hierarchical alignment and embedding of LR and HR neighborhoods on left and right hemispheric brain networks (LH and RH). We report the mean absolute error (MAE) between ground-truth and predicted HR networks. We benchmark agains two methods: (1) [12] where we used SIMLR to identify the most similar neighbors to the left-out testing subject the LR domain without any domain alignment, (2) the baseline network prediction method integrating both manifold learning and aligned proposed by [10]. (D) Comparison between the ground-truth and predicted HR networks from LR networks (obtained by mean-pooling using a \(w=10\) sized window) of the left hemisphere for a representative testing subject by our method and comparison methods. We display the residual matrices computed using element-wise absolute difference between ground truth and predicted networks. Ground truth: the ground truth HR network of a testing subject. Prediction: the predicted HR network using our purposed framework.

4 Conclusion

This paper proposes the first work on predicting high-resolution brain networks from low-resolution brain networks, by bridging the connection between low-resolution and high-resolution domains, then hierarchically learning how to create high-order nested neighborhood embeddings for ultimately identifying the most reliable training LR samples for the target prediction task. In our future work, we will learn how to predict a multi-resolution brain networks from a single low-resolution brain network. We will also evaluate our hierarchical HR prediction framework on larger datasets to predict other types of high-resolution brain networks including functional brain connectivity and structural connectivity.