Keywords

1 Introduction

Accurate diagnosis of brain diseases is critical, as it allows for early intervention and treatment and helps advance neuroscience studies. Recently, machine learning-based approaches [9, 26] have made significant strides in identifying brain diseases such as Alzheimer’s disease. Those existing methods assume that various medical images are based on homogeneous data distribution and utilize an identical model among different domain datasets. In other words, a model trained on a source domain is directly applied to the target domain without adjusting the domain difference [5]. However, in real-world applications, the presence of inter-domain heterogeneity can challenge the validity of this assumption. Differences in data distribution between domains can arise from variations in scanner protocols, demographic information of cohorts within sites, etc. This distribution discrepancy between training and test data, also known as domain shift [2], can reduce the performance of models across different domains.

To alleviate the domain shift, Unsupervised Domain Adaptation (UDA), which transfers well-trained knowledge from sufficiently labeled source data to unlabeled target data [19], has been widely exploited. Recently, various UDA methodologies [3, 6, 16] for effective domain transfer have been proposed. Deep-CORAL [16] minimizes domain shift by aligning the second-order statistics of source and target distributions without requiring any target labels. DANN [3] is a widely used adversarial learning-based domain adaptation method in modern medical imaging tasks. AD\(^{2}\)A [6] proposed an attention-guided deep domain adaptation strategy to identify brain disease in multi-site MRI. However, since these methods utilize the pixel-level distributional characteristics of samples in the spatial domain, they are sensitive to noise or variations in input data and limited in their ability to adapt to significant changes in input data distribution, which are commonly encountered in real-world scenarios. More recently, the outstanding performance of UDA has been achieved through research [15, 21, 22, 24] by aligning the frequency domain between source and target distributions, which effectively relieves the domain shift problem. Those frequency-based methods propose a simple image translation strategy by replacing the low-frequency spectrum between the source and target domains. They achieve remarkable performance by transforming images through (Fast) Fourier Transform (FFT) and inverse FFT (iFFT) [14] for frequency manipulation and simply training on the transformed images. However, they have a limitation in that the optimum portion of the low-frequency has to be selected manually for optimal performance. In addition, since the semantic information of the original image may be lost in the process of replacing the low-frequency spectrum, the overfitting problem can occur with a fixed high-frequency spectrum [18].

To address the limitations mentioned above, we propose a novel adversarial training network based on self-adversarial disentanglement and frequency mixup strategy by exploiting the full scale of the frequency spectrum. In medical imaging, the amplitude refers to the intensity or brightness of a pixel in an image. The phase, on the other hand, represents the local orientation or direction of the intensity changes in the image. In intra-domain adaptation process, we get the intensity-shifted source domain by integrating the amplitude and phase from the intensity-transformed and the original source domain, respectively. Our proposed model learns to extract intensity-invariant representation based on a self-adversarial training approach [27] by leveraging the intensity-shifted and original source domain. The self-adversarial disentangling method can effectively pretrain models that are robust to intensity variations (i.e., domain shift problem). Based on the pretrained model using the source domain only, in the inter-domain adaptation process, we reconstruct a novel amplitude-mixed target domain by mixing the amplitudes from the source and target domains, respectively, utilizing the mixup technique [25]. Unlike low-frequency domain replacement methods, mixing all-frequency domains can include low-level statistics from the target domain while effectively preserving low-level statistics of the source domain. Through domain transfer using the amplitude-mixed target domain, we solved the domain shift problem and demonstrated it in the brain disease classification task.

The main contributions of this work are as follows: (1) We propose a novel image translation-based adversarial training network by frequency mixup manipulation to exploit the semantic information of the source and target domains without loss of information in the frequency domain. (2) We show the generalizability of our proposed method in the pretraining step through self-adversarial disentangling by utilizing the frequency manipulation of the intensity-shifted source domain. (3) Our proposed method outperforms the existing methods for UDA robustly.

2 Related Work

2.1 Frequency-Based UDA

Unsupervised domain adaptation (UDA) has been explored to transfer knowledge from a sufficiently labeled (source) domain to an unlabeled unseen (target) domain. Recent studies [22, 24] reveal that a simple alignment of the frequency domain between the source and target distributions can remarkably improve the performance of UDA. On the one hand, Yang et al. [22] proposes Fourier domain adaptation (FDA) by replacing the source frequency with the target frequency at the low-level to resolve the discrepancy between the source and target distributions. To be specific, the frequency replacement results in a reconstructed source image in the target style, which presents a reduced disparity between different domains. They suggest that a simple Fourier transform operation can achieve state-of-the-art performance on domain adaptation benchmarks without requiring individual training for domain alignment. On the other hand, Zakazov et al. [24] proposes a very light and transparent approach to perform test-time domain adaptation. The idea is to substitute the target low-level frequency space components that are deemed to reflect the style of an image.

As such, most frequency-based methodologies reconstruct images by replacing low-level frequencies through Fourier transform operations in each domain. This demonstrates improved UDA performance through a simple alignment of low-level statistics between source and target distribution. However, these methods encounter limitations in accurately discerning between low and high-frequency regions, which subsequently imposes a challenge in manually pinpointing the optimal region for enhancing performance. To alleviate these problem, we introduce a frequency mixup strategy by exploiting the full scale of the frequency spectrum.

2.2 Adversarial Training for Domain-Invariant Features

Adversarial training [1, 3, 7, 17] is a practical approach for learning domain-invariant features by leveraging adversarial learning to minimize the domain discrepancy between different datasets. The basic idea of adversarial training is to train models that generate realistic data samples and distinguish between actual and generated samples at the same time. In the context of domain-invariant feature representation, by using a discriminator that attempts to distinguish between source and target domain features, the feature encoder is encouraged to learn domain-invariant representations that are not discriminative with respect to domain labels. This minimizes the domain discrepancy, leading to more robust and transferable features. In recent studies, Levi et al. [11] learns a feature representation that is both robust and domain invariant. By using a variant of DANN on the source domain and its corresponding target domain, the proposed method learns a feature representation constrained not to discriminate between the source and target examples and can achieve a more robust representation. Yang et al. [23] proposes a novel dual-module network architecture to promote learning domain invariant features. Furthermore, they improved performance by using a discrepancy loss to find the discrepancy of the prediction results and the feature distribution between the two modules.

Fig. 1.
figure 1

Overall framework of our proposed method that consists of intra-domain adaptation and inter-domain adaptation processes.

3 Proposed Method

Let \(\textbf{X}_{\text {s}}\in \mathbb {R}^{h\times w\times d\times 1}\) denotes three-dimensional structural magnetic resonance imaging (sMRI), and \(\textbf{Y}_{\text {s}}\) refers to the category label in the source domain, i.e., \(\mathcal {D}_{\text {s}}=\{(\textbf{X}_{\text {s}}, \textbf{Y}_{\text {s}})\}\). In contrast, there is no category label in the target domain, i.e., \(\mathcal {D}_{\text {t}}=\{(\textbf{X}_{\text {t}})\}\). The goal of our proposed method is to train a classification model on \(\mathcal {D}_{\text {s}}\) and \(\mathcal {D}_{\text {t}}\) that can perform well on unseen target domains. As shown in Fig. 1, we describe the crucial components of our proposed framework, which comprises intra-domain adaptation for pretaining using the frequency manipulation, attention-based feature encoder, and inter-domain adaptation for domain transfer.

3.1 Intra-domain Adaptation

In the first step of an intra-domain adaptation process, we use random noise transformation in the source domain to create an intensity-transformed source domain \(\textbf{X}_{\text {is}}\). Then we can get an identity-shifted source domain, which maintains the semantic characteristics of the source domain while containing the information of different intensity distributions. For this purpose, we utilize FFT algorithm [14] in mixing the information of the intensity-transformed and the original source domain. The amplitude of the intensity-transformed source domain and the phase of the original source domain is combined through iFFT process to synthesize the intensity-shifted source domain. In detail, let \(\textbf{A}\), \(\textbf{P}\) be the amplitude and phase components of the FFT F of an image. And then they fed to iFFT to generate the reconstructed intensity-shifted source domain \(\mathcal {D}_{\text {IS}}\) as follows:

$$\begin{aligned} \mathcal {D}_{\text {IS}} = F^{-1}(\textbf{A}(F(\textbf{X}_{\text {is}})) \times \textbf{P}(F(\textbf{X}_{\text {s}}))). \end{aligned}$$
(1)

We adopted a label classifier \(\mathcal {C}_{\text {L}}\) for identifying the label of the given images and an intensity discriminator \(\mathcal {C}_{\text {I}}\) which plays a role in making the encoder \(\mathcal {E}\) to be robust in intensity differences. The cross-entropy loss \(\mathcal {L}_{\text {ce}}\) for minimizing \(\mathcal {C}_{\text {L}}\) and maximizing \(\mathcal {C}_{\text {I}}\) with gradient reversal layer [3] is as follows:

$$\begin{aligned} \mathcal {L}_{\text {cls}} = \mathcal {L}_{\text {ce}}(\mathcal {C}_{\text {L}}(\textbf{X}_{\text {s}}, \textbf{Y}_{\text {s}})), \mathcal {L}_{\text {int}} = \mathcal {L}_{\text {ce}}(\mathcal {C}_{\text {I}}(\textbf{X}, \textbf{Y}_{\text {i}})). \end{aligned}$$
(2)

3.2 Attention-Based Feature Encoder

We design a 3D convolutional neural network to extract features of brain MRIs from source and target domains. The feature encoder \(\mathcal {E}\) includes 10 convolutional layers comprised of \(3 \times 3 \times 3\) kernels, followed by batch normalization and ReLU in each convolution layer. Subsequently, the downsampling operation is conducted to the even-numbered convolution layers for hierarchical feature extraction. Previous studies [12, 13, 20] have demonstrated that brain disorders are highly associated with specific regions in the brain. Based on this proposition, we designed an attention module to automatically identify brain regions closely related to brain diseases in brain MRIs. As shown in Fig. 1, the mixed feature generated by the last layer of the feature encoder is used as an input of the proposed attention module. In the spatial attention module, the outputs of average pooling and max pooling from the mixed feature are concatenated. Then, they pass through a convolution layer to generate spatial attention maps \(\textbf{AM}\). Finally, the sigmoid function \(\delta \) is used to calculate the attentive score of \(\textbf{AM}\). Mathematically, the spatial attention map \(\textbf{SA}\) is defined as:

$$\begin{aligned} \textbf{SA} = \delta (Conv^{3\times 3\times 3}([\textbf{AM}_{\text {max}},\textbf{AM}_{\text {avg}}])). \end{aligned}$$
(3)

To make our model robust to domain differences, we need to maintain attentional consistency between \(\textbf{SA}_{\text {s}}\) and \(\textbf{SA}_{\text {t}}\), which represents attention maps for the source domain and target domain, respectively. We design an attention consistency loss to transfer semantic information from the source domain to the target domain. Attention consistency loss, which calculates the mean square difference between \(\textbf{SA}_{\text {s}}\) and \(\textbf{SA}_{\text {t}}\) is defined as follows:

$$\begin{aligned} \mathcal {L}_{\text {att}} = \frac{1}{N\times H\times W\times D}\sum ^{N}_{i=1}\Vert \textbf{SA}_{\text {s}} - \textbf{SA}_{\text {t}}\Vert . \end{aligned}$$
(4)

3.3 Inter-domain Adaptation

In inter-domain adaptation process, we extract the frequencies of the source and target domain, respectively, using FFT operation. Inspired by the mixup technique [25], we devise a novel image translation strategy by linearly interpolating between the amplitude spectrum of two domains. The equation of the frequency amplitude mixup \(\textbf{FAM}\) is defined as:

$$\begin{aligned} \textbf{FAM} = (1-\lambda )\textbf{A}(F(\textbf{X}_{\text {t}})) + \lambda \textbf{A}(F(\textbf{X}_{\text {s}})), \end{aligned}$$
(5)

where \(\lambda \sim U(0, 1)\) refers to a random value within a fixed range.

The mixed amplitude spectrum is combined with the phase of the target image and fed to iFFT, generating the reconstructed amplitude-mixed target domain \(\mathcal {D}_{\text {MT}}\) as follows:

$$\begin{aligned} \mathcal {D}_{\text {MT}} = F^{-1}(\textbf{FAM} \times \textbf{P}(F(\textbf{X}_{\text {t}}))). \end{aligned}$$
(6)

To reduce the domain gap between the source and target domain, domain classifier \(\mathcal {C}_{\text {D}}\) is designed to distinguish MRI features from different domains, same as the intra-domain adaptation step.

$$\begin{aligned} \mathcal {L}_{\text {dom}} = \mathcal {L}_{\text {ce}}(\mathcal {C}_{\text {D}}(\textbf{X}, \textbf{Y}_{\text {d}})). \end{aligned}$$
(7)

3.4 Objective Function

Our objective function was performed with the goal of minimization even though the negative loss for maximization was included for the domain classification loss. Since our domain classifier already includes a gradient reversal layer in the module, backpropagation is performed by multiplying a negative constant to maximize the loss function. As a result, we jointly minimize the label classification loss \(\mathcal {L}_{\text {cls}}\), the attention consistency loss \(\mathcal {L}_{\text {att}}\), and maximize the domain classification loss \(\mathcal {L}_{\text {dom}}\). The overall objective function of our proposed method is defined as follows:

$$\begin{aligned} \mathcal {L}_{\text {total}} = \mathcal {L}_{\text {cls}} + \mathcal {L}_{\text {att}} - \mathcal {L}_{\text {dom}}. \end{aligned}$$
(8)

4 Experiments

4.1 Dataset

ADNI Dataset. We used the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, which is a public dataset utilized for brain disease-related research [8]. This dataset consists of ADNI-1, ADNI-2, and ADNI-3, which refer to the different site domains. We excluded data from ADNI-1 and ADNI-2, which also belong to ADNI-3, for the sake of independent evaluation. After pruning, ADNI-1 contains 431 subjects with T1-weighted sMRIs, and ADNI-2, ADNI-3 contain 360, 398 subjects, respectively.

AIBL Dataset. To demonstrate the effectiveness of domain adaptation in other domains, we additionally used the Australian Imaging Biomarkers and Lifestyle Study of Ageing (AIBL) dataset, which seeks to discover which biomarkers, cognitive characteristics, and health and lifestyle factors determine the development of Alzheimer’s disease. AIBL contains 577 subjects with T1-weighted sMRIs. We conducted the experiment by dividing the training and test data at a ratio of 8:2 for all subjects in each domain.

4.2 Implementation

In the intra-domain adaptation process, we pretrained the model for 50 epochs to adapt the feature encoder \(\mathcal {E}\) from the source to the target domain. Subsequently, the trained feature encoder was fine-tuned for 100 epochs in the inter-domain adaptation process. The best model selection was performed underlying the AUC score via simple hold-out validation. As implementation details in both steps (i.e., intra-/inter-domain adaptation process), Adam [10] is exploited as the optimizer with an initial learning rate of 1e−4, and the batch size is set to 4.

Table 1. Performance of our proposed method and baseline methods in AD identification (i.e., AD vs. CN classification) in different domain transfer settings.
Table 2. Performance of our proposed method and ablating Fourier frequency manipulation (FFM) in intra-domain adaptation.

4.3 Experiments and Analysis

In the experiments, we compared our proposed network with state-of-the-art UDA methods [4, 6, 16], which have been widely used in modern medical imaging tasks. We utilized the structure of our backbone feature encoder for experimenting with DANN and Deep-CORAL. To demonstrate the validity of our proposed method in various metrics, we utilized four metrics for performance evaluation in the experiment, i.e., accuracy (ACC), sensitivity (SEN), specificity (SPE), and AUC curve (AUC), which evaluate the classification performance. We conducted an experiment, as shown in Table 1, based on a scenario for domain adaptation from the source domain to the target domain. We can see that the overall performance of our proposed method is better than that of the other UDA approaches. This demonstrates that 1) Fourier frequency manipulation-based self-adversarial disentanglement in intra-domain adaptation and 2) frequency mixup-based domain transfer in inter-domain adaptation in our proposed method can effectively align domain distributions. We also visualized the domain distribution adapted by our proposed method and the original domain distribution in Fig. 2 to verify the effectiveness of our proposed model in distribution alignment.

Fig. 2.
figure 2

Visualization of (a) the original distribution and (b) the distribution after adaptation of the proposed our method for each domain (i.e., ADNI-1, ADNI-2, AIBL).

Table 3. Performance of our proposed method and ablating attention consistency loss (AC).
Table 4. Performance of our proposed method and adopting two image translation strategies in inter-domain adaptation.

4.4 Ablation Analysis

In order to assess the efficacy of self-adversarial disentanglement using Fourier frequency manipulation, we conducted an ablation experiment with and without using Fourier frequency manipulation in the intra-domain adaptation process. As seen in Table 2, utilizing Fourier frequency manipulation for the intensity-shifted source domain results in better performance within overall evaluation metrics. This reveals that manipulating frequencies of the source domain using the Fourier transform operation can make the model robust to the intensity differences.

Our proposed combination with the attention consistency loss empowers the domain invariant semantic representations, thus enhancing diagnosis performance on unseen target domains. Besides the attention consistency loss, our attention module helps highlight discriminative regions across different domains, while others can only focus on a single domain. To verify these attention mechanisms, we conducted an ablation experiment with and without computing attention consistency loss in the inter-domain adaptation process. From Table 3, we can derive that attention consistency loss is useful in boosting learning performance.

We also compared with the previous low-frequency replacing method (i.e., Fda [22]) and the vanilla mixup method [25] to demonstrate the effectiveness of the proposed frequency mixup strategy, as shown in Table 4. Using the Fda method, which replaces low-frequency, occurs overfitting in the high-frequency region due to loss of low-frequency information [18] and showed poor results in the overall evaluation metrics. The results of adopting the vanilla mixup technique are slightly better than those using Fda but worse than our proposed method. Through this ablation study, we demonstrated the effectiveness of our mixup technique at the frequency semantic-level rather than other mixing strategies.

5 Conclusion

In this paper, we proposed a frequency mixup manipulation-based unsupervised domain adaptation model to alleviate domain shifts in brain disease identification. The proposed model comprises two main steps: intra-domain adaptation and inter-domain adaptation. In the intra-domain adaptation step, a pretraining process is conducted to enhance the intensity-invariant feature extraction capability of the model. This is achieved by using self-adversarial disentangling with frequency manipulation-based intensity-shifted domains. In the inter-domain adaptation step, a domain transfer process is performed, where the reconstructed image through frequency mixup is used to train a model that is robust to domain adaptation. Our experimental results demonstrate that the proposed method outperforms state-of-the-art UDA methods in terms of accuracy and effectiveness.