Keywords

1 Introduction

1.1 PET Imaging and AD Diagnosis

Alzheimer’s Disease (AD) is the most common cause of dementia, affecting quality of life for many elderly people and their families. Early diagnosis and intervention of AD can improve the quality of life by significantly slowing the progression of the disease, thus it is an active area of research. Positron Emission Tomography (PET) appears to be a very promising imaging technique to assess the progression and stage of the disease by monitoring the spread of Tau-protein in the form of Neurofibrillary Tangles (NFT) and Amyloid beta (A\(\beta \))  [12, 13, 21]. As a functional imaging technique, PET uses a radioactive tracer injected into the patient, and images the distribution of the tracer over the course of minutes or hours. In AD research, PET imaging techniques measure amyloid plaque (AV45)  [4, 22] and tau protein aggregates (AV1451)  [17, 24] that are essential to understanding AD pathology and diagnosis. Compared to AV45-/AV1451- PET, FDG-PET usually helps differentiate AD from other causes of dementia, because it can characterize the patterns of glucose metabolism in the brain that are specific to AD  [16]. Example T1-weighted Magnetic Resonance images, AV45-/FDG- PET brain images of CN and AD are shown in Fig. 1.

Fig. 1.
figure 1

Examples of T1-weighted MRI, AV45-PET, FDG-PET brain images of Cognitive Normal (CN), early mild cognitive impairment (EMCI), and Alzheimer’s disease (AD). The differences are much more clearly visible on the PET images than in the MR images, especially for the EMCI case. On the EMCI case, increased accumulation of AV45 in the medial temporal, occipital, and frontal lobe (inferior frontal gyrus shown) is noticeable. AD shows reduced brain metabolism in the FDG scan, and significant uptake of AV45 compared to both CN and EMCI due to widespread accumulation of Amyloid-\(\beta \). In the T1-weighted MRI images, the size increment of the ventricles is visible from CN to EMCI to AD, however it is not as clearly visible as in PET.

While PET plays an important role for AD diagnosis, it can be prohibitive in terms of cost and planning: (1) the short half life of the radioisotopes requires on-site production in remote regions; (2) no-show patients result in radioisotopes being wasted; (3) the length of imaging sessions is determined by the tracer and the use case so motion artifacts may be unavoidable, and lastly; (4) small variations (\(\sim \)5 min) in the acquisition start time may cause over- or under-estimation of quantitative parameters. It is also not as widely available as Magnetic Resonance Imaging (MRI).

1.2 Synthesizing PET Images from MR for AD Diagnosis

To address the shortcomings of PET for AD diagnosis, a number of studies have attempted AD diagnosis from T1-weighted MR images. While T1-weighted MRI is most suitable for visualizing anatomical structures in the brain, it is not optimal for AD diagnosis because it does not highlight functional or metabolic properties of brain tissues. The question arises as to whether one can leverage existing combined PET-MR image pairs (a combined imaging modality available to only large research institutions) to generate PET images from MR-only image acquisitions.

Conditional generative adversarial networks (CGAN)  [11] have previously been used to generate images of a modality from a paired input image of a different modality. Frequent examples of such paired images are images and label maps, images and sketch, and pictures of the same scene from one lighting condition to another (e.g. day/night). For medical image analysis, such as AD diagnosis, PET image is generated from MRI using CGAN. The generated PET is then used to train AD classification network.

This work proposes an approach similar to CGAN, where CGAN is trained end-to-end with the final goal of AD classification. If trained with classification goal, then the performance of generating realistic images may be compromised. We overcome this limitation by adaptively fine-tuning the GAN losses and classification losses. Also, the overall GAN training is stabilized by the loss fine-tuning. State-of-the-art result on three- and four- class AD classification are achieved with the proposed architecture and training regime.

1.3 Dataset and Classification of Cognitive Decline

We use the publicly available ADNI (Alzheimer’s Disease Neuroimaging Initiative) dataset comprised of F18-AV-45 (florbetapir) and F18-FDG (fluorodeoxyglucose) PET image pairs along with the co-registered T1-weighted MRI. The dataset contains six dementia related conditions: cognitive normal (CN), early mild cognitive impairment (EMCI), late mild cognitive impairment (LMCI), mild cognitive impairment (MCI), subjective memory complaint (SMC), and Alzheimer disease (AD). Among these conditions, SMC is difficult to subjectively distinguish from CN. Also, there may be overlaps between EMCI/LMCI and MCI. Therefore, we test binary classification of AD/CN, three- and four- class classification of AD/MCI/CN and AD/LMCI/EMCI/CN for early AD diagnosis. Figure 1 show some examples of CN, EMCI, AD images in the dataset.

We randomly divide the dataset with 70% training, 10% validation, and 20% testing according to the patients, resulting 722/104/207 subjects for each train/validation/test set. Some subjects have multiple scans (i.e., more than one temporal scan), with the total 1,525 image triplets (AD45-/FDG-PET, T1-MRI). The images are pre-processed using FreeSurfer  [1]. The T1-weighted images are skull-stripped, where non-cerebral matters such as skull and scalp are removed. Registration, re-scaling  [9] and partial volume correction  [8] is applied to the PET images. The T1-weigthed images are re-scaled to \(1\,\text {mm}^{3}\) with \(256^{3}\) voxels, and PET images are \(2\times 93\times 76\times 76\) voxels with 2 temporal resolution.

2 Related Works

Image-based AD diagnosis is regarded as a challenging task. Most of the prior works use a combination of structural and functional imaging, such as T1-weighted MRI and PET, or T1-weighted MRI and functional MRI such as DTI (diffusion tensor imaging). They also typically focus on binary classification of each state category, such as AD vs. NC, or AD vs. MCI.

A combination of T1-weighted MRI, AV45-/FDG-PET was used with multi-feature kernel supervised within-class-similar discriminative dictionary learning algorithm to demonstrate binary classification of AD/NC, MCI/NC, AD/MCI in [15]. A combination of T1-weighted MRI and FDG-PET with three-dimensional convolutional neural network (CNN) was used to demonstrate binary classification of CN/AD, CN/pMCI, sMCI/pMCI in [10]. GAN was used to generate additional PET images from T1-weighted MRI that do not have AV45-PET image pairs in [25]. MRI and real-/synthetic- PET image pairs are subsequently used to train CNN to perform binary classification of stable-MCI/progressive-MCI.

Functional MRI (fMRI) is an MRI imaging technique most similar to PET that it can measure brain activity by detecting changes associated with blood flow. A minimum spanning tree (MST) classification framework was proposed in [5] to perform binary classification of MCI/NC, AD/CN, and AD/MCI using fMRI. A combination of T1-weighted MRI and Diffusion Tensor Imaging (DTI) was used with Multiple Kernel Learning to demonstrate binary classification of CN/AD, CN/MCI, AD/MCI in [2].

More recent work demonstrates diagnosing AD from T1-weighted MRI only. Longitudinal studies with landmark-based features and support vector machines to classify CN/AD and CN/MCI in [26]. T1-weighted MRI was used with convolutional autoencoder based unsupervised learning for the CN/AD and progressive-MCI/stable-MCI classification task in [18]. Other recent works show multi-class classification using T1-weighted MRI. A variant of DenseNet CNN was used for multi-class classification of AD/MCI/NC using MRI in [23]. T1-weighted MRI was used with CNN to demonstrate binary classification of NC/AD and three-class classification of NC/AD/MCI in [6].

3 Methods

The pix2pix  [11] CGAN architecture is widely adopted in the medical image analysis domain for synthesizing from one image modality to another. For instance, Yan et al.  [25] use the CGAN to generate AV45-PET from T1-weighted MRI to supplement the training dataset with additional synthetic PET-MRI image pairs. While for generating an image of different modality may be an end-goal for computer vision domain, in medical domain we often want to diagnose a disease, such as AD, using the generated image. We hypothesize that a GAN designed and trained with this diagnosis end-goal in mind can outperform in AD diagnosis, compared to other types of CGAN application where synthesis and diagnosis are trained separately.

3.1 Conditional Generative Adversarial Networks

The pix2pix  [11] CGAN is trained with the following objective:

$$\begin{aligned} G^* = \arg \min _G\max _D \mathcal {L}_{cGAN}(G,D) + \lambda \mathcal {L}_{L1}(G). \end{aligned}$$
(1)

where \(\mathcal {L}_{cGAN}(G,D)\) and and \(\mathcal {L}_{L1}(G)\) are defined as

$$\begin{aligned} \mathcal {L}_{cGAN}(G,D) =&\mathbb {E}_{x,y}[\log D(x,y)] + \mathbb {E}_{x,z}[\log (1-D(x,G(x,z))],\end{aligned}$$
(2)
$$\begin{aligned} \mathcal {L}_{L1}(G) =&\mathbb {E}_{x,y,z}[\Vert y-G(x,z)\Vert _1]. \end{aligned}$$
(3)

where x, y and G(xz) can be regarded as MRI, PET input, and generated PET. The CGAN consists of a generator (G) that has encoder-decoder architecture, and a discriminator (D) that is a CNN classifier. The U-Net  [19] architecture is usually used as the G that takes an input image and generates an output image of a same size but of different modality or characteristics. PET conventionally has lower image resolution than MRI, so we modify the U-Net architecture to take the different resolutions into account - MRI: \(256\times 256\times 256\); PET: \(2\times 93\times 76\times 76\). The encoder part has eight layers while the decoder part has five. Only the middle five layers in the encoder-decoder part has the skip-connection, with the last two up-sampling (transpose convolution) layers to make the target PET resolution. The discriminator CNN has three convolutional (conv-) layers that take MRI input, and two conv-layers that take PET input. The two branches of conv-layers are merged and followed by two additional conv-layers for classification.

Fig. 2.
figure 2

Overall architecture and training pipeline. While generator and discriminator are trained independently to compete against each other, they are both trained with the additional AD classification loss that are adjusted (1) to generate realistic PET images, and (2) to perform well on AD classification. In addition, losses are monitored and weights are adjusted to stablilize the GAN training, preventing loss oscillation.

3.2 GAN with Discriminator-Adaptive Loss Fine-Tuning

GAN is trained with minimax objective  [7] where G and D compete with each other. CGAN is trained with an additional L1 loss for the G, and a patch-GAN  [11] classifier for the D. The D in our generative network is trained with additional AD classification losses: (1) based on real MRI and generated PET input, multiplied by a hyper-parameter \(\lambda _{\text {GAN}_D}\), and (2) based on real MRI and PET, multiplied by \(\lambda _{\text {CLS}_D}\):

$$\begin{aligned} \mathcal {L}_\text {D}(D,G) =&\lambda _{\text {GAN}_D}\arg \min _G \mathcal {L}_{cGAN}(G,D) + \lambda _{\text {CLS}_D}\mathbb {E}_{x,y,\hat{y}}[\log D(\hat{y}|x,y)], \end{aligned}$$
(4)

where \(\hat{y}\) is the AD label.

The G is also trained with AD classification loss based on real MRI and generated PET input, in addition to the GAN loss and L1 loss. Each loss is multiplied with hyper-parameters to control their relative importance during the training - \(\lambda _{\text {CLS}_G}\), \(\lambda _{\text {GAN}_G}\), and \(\lambda _{L1}\):

$$\begin{aligned} \mathcal {L}_\text {G}(G,D)\,=\,&\lambda _{\text {GAN}_G}\arg \max _D \mathcal {L}_{cGAN}(G,D) \nonumber \\&+\,\lambda _{\text {CLS}_G}\mathbb {E}_{x,\hat{y}}[\log D(\hat{y}|x,G(x,z))] + \lambda _{L1}\mathcal {L}_{L1}(G). \end{aligned}$$
(5)

In the earlier phase of the GAN training, generated PET likely are far from the real ones. They progressively become more realistic as the training proceeds. Therefore, D is trained initially with small \(\lambda _{\text {GAN}_D}\) and gradually increased during the training, while \(\lambda _{\text {CLS}_D}\) starts from a larger value and gradually decreased. This encourages the D to focus on AD classification when G is improving to generate more realistic PET images. The G is trained with a large \(\lambda _{\text {GAN}_G}\) at first so it can focus on generating realistic PET in the beginning. It is gradually decreased as \(\lambda _{\text {CLS}_G}\) increases from a smaller value, to emphasize AD classification using the generated PET images. We set \(\lambda _{\text {GAN}_D}\) and \(\lambda _{\text {CLS}_G}\) as 0.01 and linearly increase 10 times per epoch, \(\lambda _{\text {CLS}_D}\) and \(\lambda _{\text {GAN}_G}\) initially as 100 and decrease 1/10 times per epoch. We train for 1000 epochs using ADAM optimizer  [14].

Stabilizing Training. Training D and G independently, the D and G loss can oscillate rather than being in a stable convergence state  [3]. To remedy this problem we continuously monitor the D and G loss, and adjust the hyper-parameters \(\lambda \) for the losses if any one is lower compared to the previous epoch. Loss oscillation generally occurs when the training has well proceeded, and this is when AD classification losses get higher weights. This is similar to the approach of [20] penalizing D weights with annealing to stabilize the GAN training. For example, when the D loss starts to oscillate and becomes higher compared to the previous epoch, then (1) its previous checkpoint is restored, and (2) \(\lambda _{\text {CLS}_D}\) gets decreased. The overall training pipeline is shown in Fig. 2.

4 Results

We perform two- to four- class AD classification using T1-weighted MRI input. The two-class AD classification results is shown in Table 1. The CNN approach in [6] report better performance on the two-class AD/CN classification, and GANDALF show similar performance to pix2pix + CNN method. We suspect this may be because AD vs. CN is more clearly visible than AD/MCI/CN or AD/LMCI/EMCI/CN on MRI, so a deep CNN with good hyper-parameter set can provide better result and PET plays a rather limited role for the diagnosis. We did not conduct a thorough hyper-parameter search for GANDALF in this study.

Table 1. Comparison of MRI-based AD diagnosis for AD vs. CN binary classification. CNN based method  [6] reports best performance which may indicate using PET and synthesized PET is more useful for early AD diagnosis.

Results of the three-class AD/MCI/CN classification task are shown in Table 2. We achieve state-of-the-art performance on the three-class classification compared to the prior works using T1-weighted MRI input. MCI may show more subtle difference on the MRI compared to the AD as can be seen in Fig. 1. This, and the consistent better performance of the generative methods compared to the prior works could indicate that an additional training of synthesizing PET can help achieving better performance for early AD diagnosis.

Table 2. MRI-based AD diagnosis for AD/MCI/CN three-class classification. Better performance shown by generative methods may suggest additional training to generate synthesized PET can be promising for early diagnosis of AD using MRI.

Lastly, four-class classification of AD/LMCI/EMCI/CN results are shown in Table 3. We show a meaningful first result on classifying early-MCI and late-MCI from CN and AD, a promising first step for early AD diagnosis using T1-weighted MRI. Our proposed GANDALF method also shows improved performance compared to the pix2pix + CNN method. Towards the end of the GANDALF training, the entire network acts as a classification network with T1-weighted MRI input. Finding a better/deeper classifier/discriminator architecture could improve the final classification performance. However this should be balanced with the generator architecture/depth for the GAN training with the minimax objective. A thorough hyper-parameter search could also improve the final performance.

Table 3. MRI-based AD diagnosis for AD/LMCI/EMCI/CN four-class classification. We show meaningful result that can be promising for early diagnosis on AD using T1-weighted MRI input.

5 Conclusion

Early diagnosis and intervention of Alzheimer’s disease (AD) can significantly slow the progression of the disease and improve patients’ condition and the life quality of the patient and their caregivers. PET imaging can provide great insight for early diagnosis of AD, however, it is rarely available outside of research environments. Earlier works on MRI-based AD diagnosis use conditional generative adversarial networks (GAN) to synthesize PET from MRI, and subsequently use the generated PET for AD diagnosis.

We propose a network where AD diagnosis end-goal is incorporated into the MRI-PET synthesis and trained end-to-end, instead of first synthesizing PET and then use it for AD diagnosis. Furthermore, we suggest a training scheme to stabilize the GAN training. We achieve state-of-the-art MRI-based AD diagnosis for three-class AD classification of AD/MCI/CN. We also achieve the first meaningful result on four-class (AD/LMCI/EMCI/CN) classification that can be promising for early diagnosis of AD based on MRI, to the best of our knowledge.