A Comparative Study of Reinforcement Learning Algorithms Applied to Medical Image Registration

Isa-Jara, Ramiro F.; Meschino, Gustavo J.; Ballarin, Virginia L.

doi:10.1007/978-3-030-30648-9_36

Ramiro F. Isa-Jara¹⁸,
Gustavo J. Meschino¹⁹ &
Virginia L. Ballarin¹⁸

Part of the book series: IFMBE Proceedings ((IFMBE,volume 75))

Included in the following conference series:

Latin American Conference on Biomedical Engineering

2162 Accesses
1 Citations

Abstract

Nowadays, several medical procedures depend on the comparison and combination of images obtained in different modalities (magnetic resonance, computed tomography, PET, among others). Image registration is a geometric transformation process to align two or more images. It is necessary to have robust algorithms to find the best parameters of transformation in order to achieve accurate registrations. Reinforcement learning allows to train an agent through direct environment interaction, to achieve a goal. In this work, a comparison of the performance of Q-learning and Deep-Q with its variants is presented. Brain magnetic resonance images are used in 2D domain considering rigid deformations. The comparison is based on the reward values, computing the Pearson correlation factor in monomodal registration and Mutual information in multimodal registration, obtained during the learning process. It is also considered an error measure between the target parameters and the achieved ones. Finally, a backup memory criterion is proposed to train the Q-Network methods. Experimental results show a successfully behavior in all cases, but performance is improved when the proposed criterion is applied.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Robust Multimodal Image Registration Using Deep Recurrent Reinforcement Learning

Reinforcement Learning Driven Intra-modal and Inter-modal Representation Learning for 3D Medical Image Classification

Multimodal Image Registration with Deep Context Reinforcement Learning

Keywords

1 Introduction

Reinforcement Learning (RL) is one of the three main paradigms of Machine Learning. RL is composed by: an agent, an environment and its model, a policy, a reward signal and a value function [1]. The main idea is that an agent can be trained through direct interaction with its environment, therefore it is a learning process to take actions to maximize a numerical reward signal [1, 2].

Registration is a geometrical aligning process between two or more images, taken in different times, views or modalities, for comparison and combination of information purposes [3]. In the medical field, this process has an important role, especially in patient monitoring and treatment [4]. This requires robust algorithms that allow to obtain better results using reasonable time and computing resources.

The aim of this work is to apply RL algorithms, Q-learning, Deep-Q-Network (DQN), Double DQN and Dueling DQN, to image registration tasks in order to compare their performances. Moreover, a backup memory criterion, as a novel way to train a neural network, is introduced in order to improve the results of DQN algorithms. It is applied in Double DQN to analyze their performances.

This work is organized as follows: Sect. 2 summarizes the algorithms used; Sect. 3 describes their application in an image registration process; Sect. 4 shows the obtained results and Sect. 5 presents the conclusions and future work.

2 Reinforcement Learning Algorithms

2.1 General Description

Reinforcement learning is based on dynamic system theory for optimal control. Markov decision processes (MDPs) are used due its mathematical background. MPDs are the formalization of sequential decision making where actions can influence not just immediate rewards but also in future states and rewards [1].

A finite Markov Decision process is defined by a tuple $ \left\langle {S, A, T, R, E} \right\rangle $, where $ S $ is the agent’s state-space, $ A $ is the action-space, $ T $ is the dynamic transition in reference to $ T\left( {s, a, s^{\prime}} \right) $ i.e. the probability to select an action $ a $ in current state $ s $ to obtain the next state $ s^{\prime} $, $ R $ is the numeric reward value obtained when an agent change the state $ s $ to $ s^{\prime} $ selecting an action, and $ E $ is the terminal states set to avoid future transitions where $ E \subset S $ [5]. Thus, the probability of those values occurring at time $ t $, given particular values of the state and actions, is defined as:

$$ p\left( {s^{\prime}, r | s, a} \right) = \Pr \left\{ {S_{t} = s^{\prime}, R_{t} = r | S_{t - 1} = s, A_{t - 1} = a} \right\} . $$

(1)

2.2 Q-Learning Algorithm

Q-learning is a RL technique derived from Temporal Difference (TD) methods. TD methods combine properties of Dynamic programming algorithms and Monte-Carlo methods. The goal is to learn a policy, which tells an agent what action to take in different states [6].

In order to adjust the polices, Watkins [7] introduced an action-value function $ Q^{\pi } \left( {s, a} \right) $ which is the expected value in the initial state $ s $, selecting an action $ a $ through the policy $ \pi $. According to the Bellman equation, the $ Q $ function [8] for an optimal policy, is defined as:

$$ Q^{ *} \left( {s,a} \right) = \sum\nolimits_{{s^{\prime}\epsilon S}} {T\left( {s,a,s^{\prime}} \right)\left[ {R\left( {s,a,s^{\prime}} \right) + \gamma \mathop {\hbox{max} }\nolimits_{a^\prime} Q^{ *} \left( {s^{\prime},a^{\prime}} \right)} \right]} , $$

(2)

where, $ 0 \le \gamma \le 1 $, is the discounting factor to evaluate how valuable is the immediate reward in terms of the expected final reward.

Function $ Q $ updating is simply performed with instantaneously available information, which is a function of the optimal actions (normally a greedy policy is used) [6]. The update of function $ Q $ at time $ t $ can be written as:

$$ \begin{array}{*{20}l} {\delta_{t} = \left[ {r_{t + 1} + \gamma \mathop {\hbox{max} }\limits_{a'} Q_{t} \left( {s_{t + 1} ,a^{\prime}} \right) - Q_{t} \left( {s_{t} ,a_{t} } \right)} \right]} \hfill \\ {\;\;\;\;\;\;\;Q_{t + 1} \left( {s_{t} ,a_{t} } \right) \leftarrow Q_{t} \left( {s_{t} ,a_{t} } \right) + \alpha_{t} \left( {s_{t} ,a_{t} } \right)\delta_{t} ,} \hfill \\ \end{array} $$

(3)

where $ \alpha $ is the learning rate, $ \alpha\, \epsilon\, \left( {0,1} \right] $.

2.3 Deep-Q Network Algorithm

The DQN algorithm, proposed by Mnih [9], uses a multilayer neural network that has an array of states $ S $ as input and an array of actions $ A $ as output. The action-value function will be $ Q\left( {s, .; \theta } \right) $, where $ \theta $ are the weights of the network called Deep. The function $ Q $ is calculated by updating $ \theta $ based on experience [10, 11].

In this case, an agent learns the parameterized function $ Q\left( {s, a; \theta_{t} } \right) $:

$$ \theta_{t + 1} = \theta_{t} + \alpha \left( {Y_{t}^{Q} - Q\left( {s_{t} ,a_{t} ;\theta_{t} } \right)} \right)\nabla_{{\theta_{t} }} Q\left( {s_{t} ,a_{t} ;\theta_{t} } \right) , $$

(4)

where $ \alpha $ is the learning rate and $ Y_{t}^{Q} $ are the target values computed by: $ Y_{t}^{Q} = r_{t + 1} + \gamma \mathop {\hbox{max} }\nolimits_{a} Q\left( {s_{t + 1} , a;\theta_{t} } \right) $.

2.4 Double DQN Algorithm

If the same action-value function is used to evaluate and select an action $ a $, it may lead to over-optimization. In the Double DQN algorithm, an agent learns two action-value functions through assignment of random experiences to obtain two sets of parameters or weights $ \theta $ and $ \theta ' $ [10]. To update the function, one set of parameters is used to determine the policy and the other one is used to compute the value. The target values in Double DQN algorithms are computed as:

$$ Y_{t}^{DoubleQ} = r_{t + 1} + \gamma Q\left( {\mathop {\hbox{max} }\nolimits_{a} Q\left( {s_{t + 1} , a;\theta_{t} } \right); \theta_{t}^{ '} } \right) . $$

(5)

2.5 Dueling DQN Algorithm

It is usually assumed that the choice of an action is always a priority, but there are some cases, especially in a free-model problem, where selecting an action or another does not affect the overall performance of the agent [12]. Therefore, the action-value function $ Q\left( {s, a; \theta_{t} } \right) $ can be separated into its two main subfunctions:

$$ Q \left( {s,a; \theta , \alpha ,\beta } \right) = V\left( {s;\theta , \beta } \right) + A\left( {s,a; \theta , \alpha } \right) , $$

(6)

where $ V\left( s \right) $ is the value function, $ A\left( {s,a} \right) $ is the advantage function, $ \theta $ is the set of parameters of the neural network and $ \alpha , \beta $ are the set of parameters corresponding to $ V\left( s \right) $ and $ A\left( {s,a} \right) $. The subfunctions are combined in the output layer in order to obtain the function $ Q $ [13].

3 Description of Problem and Applications of Algorithms

3.1 Medical Image Registration

Two images are considered during the registration process: the target image and the moving image. The moving image is deformed to align to the coordinate system of the target image [14].

A geometric transformation is estimated considering 3 parameters: scale, orientation and translation. Monomodal and multimodal registration are performed in the 2D domain with the translation parameter over the X and Y axes.

3.2 Image Description

The image dataset is composed of brain magnetic resonance images (MRI) in T1 and T2 modalities and SPECT images (Fig. 1). The images were provided by the Radiologic Institute of Mar del Plata, Argentina. In these images, only rigid deformations can occur during acquisition.

T1 and T2 images are sized 256 × 256, 8 bits, grayscale. SPECT images have the same size, RGB color space, 8 bits per channel.

3.3 Software and Hardware Description

RL algorithms were implemented in Python 3.6, using TensorFlow 1.10. The tests were performed using a Core-i5 computer with 8 GB RAM and Debian operating system.

3.4 RL Algorithms in Image Registration

RL algorithms have been adapted for image registration taking the next considerations:

1.
Environment: composed by the target image and the moving image. The moving images are created using the parameters of Table 1 as targets, in order to measure the error achieved by each algorithm.
Table 1. Parameters used to generate the testbench to test RL algorithms.
Full size table
2.
States: described as the set $ \left[ {s, \theta , t_{x} , t_{y} } \right] $, where $ s $ is the scale value, $ \theta $ is the orientation angle, and $ t_{x} $ and $ t_{y} $ are the values to move pixels over the X and Y axes. The initial state is set as $ \left[ {1.0, 0^\circ , 0, 0} \right] $.
3.
Actions: the total number of actions is $ 2^{4} = 16 $ because there are $ 4 $ transformation parameters and $ 2 $ possibilities for each one (increment or decrement). The parameter step is defined by $ \left[ { \pm 0.05, \pm 0.25^\circ , \pm 1, \pm 1} \right] $.
4.
Reward: Pearson correlation coefficient in monomodal registration and Mutual information in multimodal registration are considered. These similarity measures are maximized during the registration process. When an action produces a decrease in similarity, the agent receives a penalty of −0.1 and the algorithm restarts the learning process.

3.5 Backup Memory Criterion

In DQN algorithms, the agent experiences are stored in order to train the neural networks. A batch technique is used for this purpose [15]. In the backup memory proposed in this work, best experiences are saved, and they are only replaced by others with better reward values, which prevents these good solutions from being lost.

In a random position of the batch, current information is replaced by experiences of the backup memory. Therefore, the batch contains information from the database of experiences (good and bad ones) and from the backup memory (best experiences). In this work, backup memory is used with the Double DQN algorithm and the results show a good performance compared to the other algorithms.

4 Results

The initial parameters are the same for all algorithms. The number of epochs is 200 and the learning process runs 100 iterations, though it can end before if the agent receives a penalty. The tests are evaluated along 5 independent runs.

In Figs. 2 and 3, the mean values of the Pearson coefficient for monomodal registration are presented. In Fig. 3, the double-back_ algorithm, which uses the backup memory criterion, achieved a better correlation factor than the other algorithms.

In Figs. 4 and 5, the mean values of mutual information for multimodal registration are presented. The double-back_ algorithm shows a similar performance to other Deep Networks algorithms.

In Tables 2 and 3, the mean values of similarity measures and errors computed for monomodal and multimodal registration tests are presented. The tables show the performance of each algorithm through mean values of similarity measures and error between target and obtained parameters.

Table 2. Performance of RL algorithms achieved during Monomodal registration.

Full size table

Table 3. Performance of RL algorithms achieved during Multimodal registration.

Full size table

Finally, average of times spent during the tests are presented in Table 4.

Table 4. Average time spent by each algorithm during tests.

Full size table

5 Conclusions

In this work, the Reinforcement Learning paradigm is applied to medical image registration. A performance comparison study between four RL algorithms is presented. The registration process is an optimization problem where a similarity measure is maximized. RL allows that an agent could be trained in order to find the best parameters in medical images considering rigid deformations.

In all cases, the algorithms gave successfully results which allow a significant improvement in similarity between the target images and registered images. According to the results, Q-learning, Double-DQN and Double-DQN with backup memory show better performances than the others.

According to the results, the mean error obtained by Double-back is reduced around of 37% in Test1 and 35% in Test2 compared with the highest error achieved by each algorithm. Q-learning and Double-DQN reduce the mean error around of 24% in Test1 and 19% in Test2. Moreover, these algorithms also use a reasonable time for the training process which is important in reference to computational resources available.

Future work focuses on analyzing the optimum size of batch and backup memories to optimize the training of the deep neural networks which could improve their performance and to maximize the reward values. The RL paradigm could also be applied in nonlinear registration.

References

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. (2017)
Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Article Google Scholar
Zitová, B., Flusser, J.: Image registration methods: a survey. Image Vis. Comput. 21, 977–1000 (2003)
Article Google Scholar
Maintz, J.B.A., Viergever, M.A.: A survey of medical image registration. Med. Image Anal. 2(1), 1–37 (1998)
Article Google Scholar
Roderick, M., MacGlashan, J., Tellex, S.: Implementing the deep Q-network, November 2017
Google Scholar
Sigaud, O., Buffet, O.: Markov decision processes in artificial intelligence (2010)
Google Scholar
Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
MATH Google Scholar
Bellman, R.: A Markovian decision process (1957)
Article MathSciNet Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning, September 2015
Google Scholar
Lin, L.-J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8(3–4), 293–321 (1992)
Google Scholar
Wang, Z., Schaul, T., Hessel, M., Com, M., Van Hasselt, H., Lanctot, M.: Dueling network architectures for deep reinforcement learning (2015)
Google Scholar
Harmon, M.E., Baird, L.C., Klopr, A.H.: Advantage updating applied to a differential game. In: NIPS (1995)
Google Scholar
Fitzpatrick, J.M., Hill, D.L.G., Maurer, C.R.: Image registration. In: Handbook of Medical Imaging. Medical Image Processing and Analysis, vol. 2, pp. 447–514 (2000)
Google Scholar
Hoffer, E., Hubara, I., Soudry, D.: Train longer, generalize better: closing the generalization gap in large batch training of neural networks (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Image Processing Laboratory, ICYTE-CONICET, Universidad Nacional de Mar del Plata, Mar del Plata, Argentina
Ramiro F. Isa-Jara & Virginia L. Ballarin
Bioengineering Laboratory, ICYTE-CONICET, Universidad Nacional de Mar del Plata, Mar del Plata, Argentina
Gustavo J. Meschino

Authors

Ramiro F. Isa-Jara
View author publications
You can also search for this author in PubMed Google Scholar
Gustavo J. Meschino
View author publications
You can also search for this author in PubMed Google Scholar
Virginia L. Ballarin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ramiro F. Isa-Jara .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Mexico City, Mexico
César A. González Díaz
Departamento de Ingeniería Electrica y Computación, Universidad Autónoma de Ciudad Juárez, Chihuahua, Mexico
Christian Chapa González
Universidad Nacional de San Juan, San Juan, Argentina
Eric Laciar Leber
Universidad de Guadalajara, Guadalajara, Mexico
Hugo A. Vélez
Universidad Autónoma de Nuevo León, Nuevo León, Mexico
Norma P. Puente
Universidad Autónoma de Baja California, Ensenada, Baja California, Mexico
Dora-Luz Flores
Universidade Federal de Uberlândia, Uberlândia, Brazil
Adriano O. Andrade
Instituto Nacional de Cancerología, Mexico City, Mexico
Héctor A. Galván
Universidad Autónoma Metropolitana, Mexico City, Mexico
Fabiola Martínez
Universidad Federal de Santa Catarina, Florianópolis, Brazil
Renato García
Instituto Nacional de Rehabilitación, Mexico City, Mexico
Citlalli J. Trujillo
Universidad Autónoma de San Luis Potos, San Luis, Mexico
Aldo R. Mejía

Ethics declarations

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Isa-Jara, R.F., Meschino, G.J., Ballarin, V.L. (2020). A Comparative Study of Reinforcement Learning Algorithms Applied to Medical Image Registration. In: González Díaz, C., et al. VIII Latin American Conference on Biomedical Engineering and XLII National Conference on Biomedical Engineering. CLAIB 2019. IFMBE Proceedings, vol 75. Springer, Cham. https://doi.org/10.1007/978-3-030-30648-9_36

Download citation

DOI: https://doi.org/10.1007/978-3-030-30648-9_36
Published: 01 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30647-2
Online ISBN: 978-3-030-30648-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

A Comparative Study of Reinforcement Learning Algorithms Applied to Medical Image Registration

Abstract

Similar content being viewed by others

Robust Multimodal Image Registration Using Deep Recurrent Reinforcement Learning

Reinforcement Learning Driven Intra-modal and Inter-modal Representation Learning for 3D Medical Image Classification

Multimodal Image Registration with Deep Context Reinforcement Learning

Keywords

1 Introduction