Abstract
Nowadays, several medical procedures depend on the comparison and combination of images obtained in different modalities (magnetic resonance, computed tomography, PET, among others). Image registration is a geometric transformation process to align two or more images. It is necessary to have robust algorithms to find the best parameters of transformation in order to achieve accurate registrations. Reinforcement learning allows to train an agent through direct environment interaction, to achieve a goal. In this work, a comparison of the performance of Q-learning and Deep-Q with its variants is presented. Brain magnetic resonance images are used in 2D domain considering rigid deformations. The comparison is based on the reward values, computing the Pearson correlation factor in monomodal registration and Mutual information in multimodal registration, obtained during the learning process. It is also considered an error measure between the target parameters and the achieved ones. Finally, a backup memory criterion is proposed to train the Q-Network methods. Experimental results show a successfully behavior in all cases, but performance is improved when the proposed criterion is applied.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Reinforcement Learning (RL) is one of the three main paradigms of Machine Learning. RL is composed by: an agent, an environment and its model, a policy, a reward signal and a value function [1]. The main idea is that an agent can be trained through direct interaction with its environment, therefore it is a learning process to take actions to maximize a numerical reward signal [1, 2].
Registration is a geometrical aligning process between two or more images, taken in different times, views or modalities, for comparison and combination of information purposes [3]. In the medical field, this process has an important role, especially in patient monitoring and treatment [4]. This requires robust algorithms that allow to obtain better results using reasonable time and computing resources.
The aim of this work is to apply RL algorithms, Q-learning, Deep-Q-Network (DQN), Double DQN and Dueling DQN, to image registration tasks in order to compare their performances. Moreover, a backup memory criterion, as a novel way to train a neural network, is introduced in order to improve the results of DQN algorithms. It is applied in Double DQN to analyze their performances.
This work is organized as follows: Sect. 2 summarizes the algorithms used; Sect. 3 describes their application in an image registration process; Sect. 4 shows the obtained results and Sect. 5 presents the conclusions and future work.
2 Reinforcement Learning Algorithms
2.1 General Description
Reinforcement learning is based on dynamic system theory for optimal control. Markov decision processes (MDPs) are used due its mathematical background. MPDs are the formalization of sequential decision making where actions can influence not just immediate rewards but also in future states and rewards [1].
A finite Markov Decision process is defined by a tuple \( \left\langle {S, A, T, R, E} \right\rangle \), where \( S \) is the agent’s state-space, \( A \) is the action-space, \( T \) is the dynamic transition in reference to \( T\left( {s, a, s^{\prime}} \right) \) i.e. the probability to select an action \( a \) in current state \( s \) to obtain the next state \( s^{\prime} \), \( R \) is the numeric reward value obtained when an agent change the state \( s \) to \( s^{\prime} \) selecting an action, and \( E \) is the terminal states set to avoid future transitions where \( E \subset S \) [5]. Thus, the probability of those values occurring at time \( t \), given particular values of the state and actions, is defined as:
2.2 Q-Learning Algorithm
Q-learning is a RL technique derived from Temporal Difference (TD) methods. TD methods combine properties of Dynamic programming algorithms and Monte-Carlo methods. The goal is to learn a policy, which tells an agent what action to take in different states [6].
In order to adjust the polices, Watkins [7] introduced an action-value function \( Q^{\pi } \left( {s, a} \right) \) which is the expected value in the initial state \( s \), selecting an action \( a \) through the policy \( \pi \). According to the Bellman equation, the \( Q \) function [8] for an optimal policy, is defined as:
where, \( 0 \le \gamma \le 1 \), is the discounting factor to evaluate how valuable is the immediate reward in terms of the expected final reward.
Function \( Q \) updating is simply performed with instantaneously available information, which is a function of the optimal actions (normally a greedy policy is used) [6]. The update of function \( Q \) at time \( t \) can be written as:
where \( \alpha \) is the learning rate, \( \alpha\, \epsilon\, \left( {0,1} \right] \).
2.3 Deep-Q Network Algorithm
The DQN algorithm, proposed by Mnih [9], uses a multilayer neural network that has an array of states \( S \) as input and an array of actions \( A \) as output. The action-value function will be \( Q\left( {s, .; \theta } \right) \), where \( \theta \) are the weights of the network called Deep. The function \( Q \) is calculated by updating \( \theta \) based on experience [10, 11].
In this case, an agent learns the parameterized function \( Q\left( {s, a; \theta_{t} } \right) \):
where \( \alpha \) is the learning rate and \( Y_{t}^{Q} \) are the target values computed by: \( Y_{t}^{Q} = r_{t + 1} + \gamma \mathop {\hbox{max} }\nolimits_{a} Q\left( {s_{t + 1} , a;\theta_{t} } \right) \).
2.4 Double DQN Algorithm
If the same action-value function is used to evaluate and select an action \( a \), it may lead to over-optimization. In the Double DQN algorithm, an agent learns two action-value functions through assignment of random experiences to obtain two sets of parameters or weights \( \theta \) and \( \theta ' \) [10]. To update the function, one set of parameters is used to determine the policy and the other one is used to compute the value. The target values in Double DQN algorithms are computed as:
2.5 Dueling DQN Algorithm
It is usually assumed that the choice of an action is always a priority, but there are some cases, especially in a free-model problem, where selecting an action or another does not affect the overall performance of the agent [12]. Therefore, the action-value function \( Q\left( {s, a; \theta_{t} } \right) \) can be separated into its two main subfunctions:
where \( V\left( s \right) \) is the value function, \( A\left( {s,a} \right) \) is the advantage function, \( \theta \) is the set of parameters of the neural network and \( \alpha , \beta \) are the set of parameters corresponding to \( V\left( s \right) \) and \( A\left( {s,a} \right) \). The subfunctions are combined in the output layer in order to obtain the function \( Q \) [13].
3 Description of Problem and Applications of Algorithms
3.1 Medical Image Registration
Two images are considered during the registration process: the target image and the moving image. The moving image is deformed to align to the coordinate system of the target image [14].
A geometric transformation is estimated considering 3 parameters: scale, orientation and translation. Monomodal and multimodal registration are performed in the 2D domain with the translation parameter over the X and Y axes.
3.2 Image Description
The image dataset is composed of brain magnetic resonance images (MRI) in T1 and T2 modalities and SPECT images (Fig. 1). The images were provided by the Radiologic Institute of Mar del Plata, Argentina. In these images, only rigid deformations can occur during acquisition.
T1 and T2 images are sized 256 × 256, 8 bits, grayscale. SPECT images have the same size, RGB color space, 8 bits per channel.
3.3 Software and Hardware Description
RL algorithms were implemented in Python 3.6, using TensorFlow 1.10. The tests were performed using a Core-i5 computer with 8 GB RAM and Debian operating system.
3.4 RL Algorithms in Image Registration
RL algorithms have been adapted for image registration taking the next considerations:
-
1.
Environment: composed by the target image and the moving image. The moving images are created using the parameters of Table 1 as targets, in order to measure the error achieved by each algorithm.
-
2.
States: described as the set \( \left[ {s, \theta , t_{x} , t_{y} } \right] \), where \( s \) is the scale value, \( \theta \) is the orientation angle, and \( t_{x} \) and \( t_{y} \) are the values to move pixels over the X and Y axes. The initial state is set as \( \left[ {1.0, 0^\circ , 0, 0} \right] \).
-
3.
Actions: the total number of actions is \( 2^{4} = 16 \) because there are \( 4 \) transformation parameters and \( 2 \) possibilities for each one (increment or decrement). The parameter step is defined by \( \left[ { \pm 0.05, \pm 0.25^\circ , \pm 1, \pm 1} \right] \).
-
4.
Reward: Pearson correlation coefficient in monomodal registration and Mutual information in multimodal registration are considered. These similarity measures are maximized during the registration process. When an action produces a decrease in similarity, the agent receives a penalty of −0.1 and the algorithm restarts the learning process.
3.5 Backup Memory Criterion
In DQN algorithms, the agent experiences are stored in order to train the neural networks. A batch technique is used for this purpose [15]. In the backup memory proposed in this work, best experiences are saved, and they are only replaced by others with better reward values, which prevents these good solutions from being lost.
In a random position of the batch, current information is replaced by experiences of the backup memory. Therefore, the batch contains information from the database of experiences (good and bad ones) and from the backup memory (best experiences). In this work, backup memory is used with the Double DQN algorithm and the results show a good performance compared to the other algorithms.
4 Results
The initial parameters are the same for all algorithms. The number of epochs is 200 and the learning process runs 100 iterations, though it can end before if the agent receives a penalty. The tests are evaluated along 5 independent runs.
In Figs. 2 and 3, the mean values of the Pearson coefficient for monomodal registration are presented. In Fig. 3, the double-back_ algorithm, which uses the backup memory criterion, achieved a better correlation factor than the other algorithms.
In Figs. 4 and 5, the mean values of mutual information for multimodal registration are presented. The double-back_ algorithm shows a similar performance to other Deep Networks algorithms.
In Tables 2 and 3, the mean values of similarity measures and errors computed for monomodal and multimodal registration tests are presented. The tables show the performance of each algorithm through mean values of similarity measures and error between target and obtained parameters.
Finally, average of times spent during the tests are presented in Table 4.
5 Conclusions
In this work, the Reinforcement Learning paradigm is applied to medical image registration. A performance comparison study between four RL algorithms is presented. The registration process is an optimization problem where a similarity measure is maximized. RL allows that an agent could be trained in order to find the best parameters in medical images considering rigid deformations.
In all cases, the algorithms gave successfully results which allow a significant improvement in similarity between the target images and registered images. According to the results, Q-learning, Double-DQN and Double-DQN with backup memory show better performances than the others.
According to the results, the mean error obtained by Double-back is reduced around of 37% in Test1 and 35% in Test2 compared with the highest error achieved by each algorithm. Q-learning and Double-DQN reduce the mean error around of 24% in Test1 and 19% in Test2. Moreover, these algorithms also use a reasonable time for the training process which is important in reference to computational resources available.
Future work focuses on analyzing the optimum size of batch and backup memories to optimize the training of the deep neural networks which could improve their performance and to maximize the reward values. The RL paradigm could also be applied in nonlinear registration.
References
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. (2017)
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Zitová, B., Flusser, J.: Image registration methods: a survey. Image Vis. Comput. 21, 977–1000 (2003)
Maintz, J.B.A., Viergever, M.A.: A survey of medical image registration. Med. Image Anal. 2(1), 1–37 (1998)
Roderick, M., MacGlashan, J., Tellex, S.: Implementing the deep Q-network, November 2017
Sigaud, O., Buffet, O.: Markov decision processes in artificial intelligence (2010)
Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
Bellman, R.: A Markovian decision process (1957)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning, September 2015
Lin, L.-J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8(3–4), 293–321 (1992)
Wang, Z., Schaul, T., Hessel, M., Com, M., Van Hasselt, H., Lanctot, M.: Dueling network architectures for deep reinforcement learning (2015)
Harmon, M.E., Baird, L.C., Klopr, A.H.: Advantage updating applied to a differential game. In: NIPS (1995)
Fitzpatrick, J.M., Hill, D.L.G., Maurer, C.R.: Image registration. In: Handbook of Medical Imaging. Medical Image Processing and Analysis, vol. 2, pp. 447–514 (2000)
Hoffer, E., Hubara, I., Soudry, D.: Train longer, generalize better: closing the generalization gap in large batch training of neural networks (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
The authors declare that they have no conflict of interest.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Isa-Jara, R.F., Meschino, G.J., Ballarin, V.L. (2020). A Comparative Study of Reinforcement Learning Algorithms Applied to Medical Image Registration. In: González Díaz, C., et al. VIII Latin American Conference on Biomedical Engineering and XLII National Conference on Biomedical Engineering. CLAIB 2019. IFMBE Proceedings, vol 75. Springer, Cham. https://doi.org/10.1007/978-3-030-30648-9_36
Download citation
DOI: https://doi.org/10.1007/978-3-030-30648-9_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30647-2
Online ISBN: 978-3-030-30648-9
eBook Packages: EngineeringEngineering (R0)