X-TRA: Improving Chest X-ray Tasks with Cross-Modal Retrieval Augmentation

van Sonsbeek, Tom; Worring, Marcel

doi:10.1007/978-3-031-34048-2_36

Tom van Sonsbeek¹¹ &
Marcel Worring¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13939))

Included in the following conference series:

International Conference on Information Processing in Medical Imaging

2598 Accesses
1 Citations

Abstract

An important component of human analysis of medical images and their context is the ability to relate newly seen things to related instances in our memory. In this paper we mimic this ability by using multi-modal retrieval augmentation and apply it to several tasks in chest X-ray analysis. By retrieving similar images and/or radiology reports we expand and regularize the case at hand with additional knowledge, while maintaining factual knowledge consistency. The method consists of two components. First, vision and language modalities are aligned using a pre-trained CLIP model. To enforce that the retrieval focus will be on detailed disease-related content instead of global visual appearance it is fine-tuned using disease class information. Subsequently, we construct a non-parametric retrieval index, which reaches state-of-the-art retrieval levels. We use this index in our downstream tasks to augment image representations through multi-head attention for disease classification and report retrieval. We show that retrieval augmentation gives considerable improvements on these tasks. Our downstream report retrieval even shows to be competitive with dedicated report generation methods, paving the path for this method in medical imaging.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Computer-aided diagnosis through medical image retrieval in radiology

Article Open access 01 December 2022

Improving Joint Learning of Chest X-Ray and Radiology Report by Word Region Alignment

FgKF: Fine-Grained Knowledge Fusion for Radiology Report Generation

Keywords

1 Introduction

The promise of automated deep learning systems to assist radiologists is enormous. At the moment, important milestones, such as better consistency or even better performance have been achieved on an increasing number of use-cases [18, 37]. A source of inspiration in further improvement of these efforts is the way humans register and analyze images, which for deep learning has shown to be effective in the past [17, 37].

In any analysis, a doctor provides the memory and knowledge to place what is currently seen in the context of what has been seen before. In principle this can be compared to what implicitly happens at scale in any deep learning method. A doctor’s analysis is not implicit though. Their analysis process can be described and verified. We wonder whether (medical) deep learning methods could benefit from an explicit memory/knowledge infusion.

Making deep learning methods more explicit in terms of using past observations has already been studied in Natural Language Processing (NLP), in the form of retrieval augmentation [14, 21]. Supplementing data by retrieving relevant retrieved information can lead to performance gains [4]. This process can be thought of to work as both an enrichment and regularization process. A benefit of retrieval augmentation is that context from a trusted knowledge source is used as a supplement [13, 29]. The versatility of retrieval augmentation, which essentially provides a non-parametric memory expansion, is gaining traction in the multi-modal field [4, 28].

Multi-modal data modalities typically have different strengths leading to a strong and a weak data modality [37]. For instance, radiology reports generally contain richer and more complete information than X-rays, since the report is essentially a clinician’s annotation [24]. With retrieval augmentation information can be transferred explicitly from the strong to the weak modality.

A reason retrieval augmentation methods are not yet adopted for medical applications lies in the weakness of retrieval methods for the medical domain. Retrieval in the general domain is focused on global image regions [8, 16] whereas in medical images global features, such as body/organ structure are similar across patients. Meanwhile more fine-grained aspects are more discriminating as disease indicators, but are easily overlooked. The need for fine-grained results makes medical image retrieval magnitudes more complex.

We propose X-Ray Task Retrieval Augmentation (X-TRA), a framework for retrieval augmentation in a multi-modal medical setting, specifically designed for X-ray and radiology report analysis. To do so we introduce a cross-modal retrieval model and retrieval augmentation method. We make the following contributions.

We propose a CLIP-based multi-modal retrieval framework with a dedicated fine-tuning component for efficient content alignment of medical information which improves state-of-the-art results in multi- and single-modal retrieval on radiology images and reports.
We introduce a multi-modal retrieval augmentation component for disease classification and report retrieval pipelines.
We show that our method (1) reaches state-of-the-art performance both in multi-label disease classification and report retrieval. (2) Our report retrieval is competitive with dedicated report generation methodologies. (3) We show the cross-dataset versatility and the limitations of our method.

2 Related Work

Multi-modal Alignment. The introduction of Transformers for natural language processing (NLP) accelerated the development of integrated vision-language (VL) alignment models suitable for various VL-tasks, such as ViLBERT [19], LXMERT [30] and SimVLM [33]. These methods provide alignment on region to sentence- or word-level scale. The next step in multi-modal alignment was made by methods using contrastive learning combined with substantially larger datasets. Examples are CLIP [27] and ALIGN [10] which significantly outperform existing methods by using datasets for training consisting of 400M and 1.8B VL-pairs respectively. Domain-specific versions of CLIP, which is open-source, have been fine-tuned with additional data, such as PubMedCLIP [3].

Retrieval Augmentation. The origin of retrieval augmentation lies in the NLP field. It was created to fully utilise the power of large datasets. With retrieval augmentation we are not only dependent on a parametric model, but can also supplement data as a non-parametric component. Previous methods have shown the simple yet effective and versatility working of retrieval augmentation in a number of applications [5, 13, 29]. Retrieval in Medical Imaging. Up until recently the only retrieval methods in medical imaging were tailored hand-crafted methods [16]. With access to large datasets and pre-trained methods the balance shifted towards making automated retrieval methods [6, 26]. Especially in the histopathology and radiology domain major strides were made with retrieval methods [2, 8]. The use of text to improve image retrieval has been adopted for improving chest X-ray retrieval. Yu et al. [35] use CNN and word2vec features for multi-modal alignment and retrieval. Zhang et al. [36] approach this problem with a hash-based retrieval method.

Retrieval for Chest X-ray Analysis. Common tasks in chest X-ray analysis are disease classification and report generation [1, 11, 15]. Using retrieval for report generation has been a common approach. The approaches often entail the use of retrieved information as an input or template for a decoder which crafts a custom report [23, 32, 34]. Augmentation of chest X-ray tasks with synthetically generated diffusion-based images was shown to be possible [1], however the clinical use of non-genuine images can lead to complications and is not undisputed [37].

3 Methods

Our method is composed of two separate parts (Fig. 1). The first part is the alignment of the two modalities and construction of the retrieval model. The second part uses the output of the retriever as a non-parametric component in (cross-modal) retrieval augmentation to enhance the downstream tasks.

We consider a dataset $\varTheta ^{N}_{\{\textbf{x},\textbf{r}\}}$ consisting of pairs containing an X-ray ($\textbf{x}_i$) and radiology report ($\textbf{r}_i$). To align these modalities we make use of the powerful CLIP vision-language aligner. Our objective is to minimize the distance between $\textbf{x}$ and $\textbf{r}$, to make cross-modal tasks possible. These aligned features will be used for retrieval augmentation to do multi-label classification and report retrieval as downstream tasks.

3.1 Stage I: Multi-modal Content Alignment

We leverage the pre-trained features from CLIP for initial feature representations. However, there is a domain shift between the natural image data CLIP is trained on and medical images we want to use in our method. Medical images can be visually very similar, while holding drastically different information. Small localized markers can be indicators for disease. In natural images global representations are more decisive and thus more suitable for unsupervised contrastive alignment. Alignment in CLIP goes as follows [27],

$$\begin{aligned} \mathcal {L}_{CLIP}=-\frac{1}{N} \sum _{z \in Z}\sum _{i=1}^N \log \frac{\textrm{e}^{\left( {\text {sim}}\left( z_i^0, z_i^1\right) / \tau \right) }}{\sum _{j=1}^N \textrm{e}^{ \left( {\text {sim}}\left( z_i^0, z_j^1\right) / \tau \right) }}\;\; with \;Z = \{(\textbf{x},\textbf{r}),(\textbf{r},\textbf{x})\}. \end{aligned}$$

(1)

We need to overcome the obvious domain shift between medical images and the natural images on which CLIP is trained. Therefore, we require a more specific type of fine-tuning that is especially geared towards content-based extraction. We introduce the following loss, requiring a global class label for each dataset. With this fine-tuning step we are creating a supervised content-based alignment method with content classifier C:

$$\begin{aligned} \qquad \mathcal {L}_{ours} = -\frac{1}{N}\sum _{z \in Z}\sum _{i=1}^N y_i log_{e}(\widehat{C(z_i)})\qquad \;\;\; with \; Z = \{\textbf{x},\textbf{r},(\textbf{x},\textbf{r})\}.\end{aligned}$$

(2)

This content based alignment loss should improve the alignment of detailed content-level details over the global visual appearance of the image.

Creating a Retrieval Index. At retrieval time we need to retrieve images that have a high similarity with query images. To efficiently do so we make use of Facebook AI Similarity Search (FAISS) [12]. This retrieval tool efficiently performs nearest-neighbour similarity search. After multi-modal alignment we encode our data to a FAISS index I conditioned on the training dataset. We can construct indices that only retrieve images ($I^{\textbf{x}}$), only reports ($I^{\textbf{r}}$), or both ($I^{\textbf{x}\textbf{r}}$).

Given a query $\mathcal {Q}_s$ in source modality s, we can obtain its k neighbours of target modality t through:

(3)

this can be either $\textbf{x}$, $\textbf{r}$ or both. Once retrieval index I is trained based on the newly aligned training dataset we can consider the retriever as a non-parametric component which retrieves information from a fixed dataset in the subsequent retrieval augmentation steps. Note that during testing time, a query from the test set will be used to retrieve neighbours from the training set.

3.2 Stage II: Retrieval Augmentation

The purpose of retrieval augmentation is to effectively leverage similar representations to adopt a more informative representation of a given input, with our already trained retrieval index we retrieve similar representations.

To obtain a richer representation of $\textbf{x}_i$, we retrieve intra- and inter-modal neighbours from $I^{\textbf{x}}$ and $I^{\textbf{r}}$ respectively. To integrate the retrieved neighbouring samples, we can use various fusion methods [25]. The simplest one is concatenation: . A more suitable method is multi-head attention (MHA) which is able to capture the long range dependencies between the original image and the retrieved information [31]:

(4)

3.3 Downstream Tasks

We are tackling two common tasks in chest X-ray analysis. These are multi-label disease classification and report retrieval. For this last task our objective is to show how well a retriever can perform on the report generation task. We measure performance by comparing task performance of $\textbf{x}^{TRA}$ in comparison to $\textbf{x}$.

A useful property of our retrieval index would be usability of an pre-trained model across datasets. Three clinically relevant scenarios for this are: From scratch training on the new dataset, frozen usage of the trained retrieval model and fine-tuning of the existing retrieval model with another image-report dataset.

3.4 Datasets

The primary dataset to which our method is applied is MIMIC-CXR (200k image-report pairs) [11]. Disease labels for each pair are extracted from the report through a rule-based extraction method [9]. To evaluate the versatility and cross-domain capabilities of our method, we use the small openI (4k image-report pairs) [20] and image-only CheXpert (200k images) [9] datasets. Official train-test splits are used.

3.5 Experimental Setup

As pre-processing step, the X-ray images are normalized and standardized by rescaling with center-cropping to scale $256\times 256$, from which images of size $224\times 224$ are sampled. The maximum number of tokens for representing radiology reports in the text encoder is set to 256. Three different VL models are used as encoders. At first a CNN-BERT model, composed of a DenseNet121 image encoder and a ClinicalBERT [7] text encoder. Given the strong performance of large vision-language models we also use CLIP (ViT-32 image encoder and text encoder) [27] and its medically fine-tuned equivalent PubMedCLIP [3]. This model is fine-tuned using the Radiology Objects in COntext (ROCO) dataset [22].

Multi-modal alignment is implemented as a single pass through a two-layer ReLu activated MLP, with dimension $z_{enc}$, a dropout rate of 0.5, and layer normalization. $z_{enc}$ is the output dimension of the encoder. We implement C as a three layer classifier head with dimensions $\{z_{enc},256,14\}$. During retrieval we make use of $k=10$ retrieved neighbours. To prevent overfitting, early stopping with a tolerance of 3 is applied to all training operations.

4 Results

4.1 Cross-Modal Retrieval

We are comparing the performance of our retrieval method against previous methods in Table 1 in terms of class-based mean average precision (mAP). Due to the powerful alignment of CLIP and tailor made fine-tuning we are outperforming all existing retrieval approaches for radiology images and/or reports by a large margin. The performance difference with similarly fine-tuned encoder-decoder combination DenseNet121 and ClinicalBERT further underwrites the power of CLIP in building a strong retrieval method, specifically on cross-domain retrieval. Interestingly, we observe that PubMedCLIP is not outperforming CLIP. This can be explained by a domain shift between MIMIC-CXR and ROCO, together with the ability of CLIP to generalize well out-of-domain [27]. In our downstream tasks image-based retrieval is most important, which is performing similar on inter- and intra-modal retrieval tasks.

Table 1. Class-based retrieval performance (source $\rightarrow $ target) for images ($\textbf{x}$) and reports ($\textbf{r}$) in terms of mAP on MIMIC-CXR on our content alignment method, compared against other methods.

Full size table

4.2 Multi-label Disease Classification

Disease classification results in terms of AUC in Table 2 show that retrieval augmentation gives a clear improvement across different disease classes. It is interesting to see that we find a positive, albeit weak, correlation (R$\approx $0.60) between the increase in class AUC performance and retrieval mAP. Moreover, the performance gain from retrieval augmentation ($0.80\rightarrow 0.85$) is similar to additional training with synthetic diffusion-generated X-rays ($0.80\rightarrow 0.84$) [1]. The benefit of our method is that the supplemented information originates from the trusted dataset itself and is not synthetically generated.

Table 2. Chest X-ray classification on MIMIC-CXR with and without retrieval augmentation. The results show the beneficial effect of retrieval augmentation on classification performance.

Full size table

4.3 Report Generation

In retrieval augmented report retrieval we show interesting performance on the report generation metrics compared to a selection of previous methods. While it should not be expected that simple retrieval outperforms dedicated report generation methods we are able to provide a result that can be considered competitive (Table 3). On the METEOR and ROUGE metric we are even outperforming most existing methods. The metrics reflect that the strength of report retrieval is in the global representation of the report. Our retriever is fine-tuned to retrieve samples with equivalent label spaces, hence good results on metrics that reward global similarity. An interesting outlook is the application of this method in a dedicated report generation framework which could boost performance further.

Table 3. Chest X-ray report retrieval on MIMIC-CXR with and without X-TRA retrieval augmentation. Compared to dedicated report generation methods.

Full size table

4.4 Cross-Dataset

By evaluating the cross-dataset scenarios (Table 4) with the CheXpert and openI datasets we can conclude that transferability to images from other domains is limited. However we do see that if retrieval augmentation is not useful, it can be ignored by the model and will not be detrimental for performance. The domain shift between different chest X-rays is a remaining problem [24]. Currently the most practical solution for this problem is the addition of a fine-tuning step.

Cross-domain results on open-I show that learning across modalities is possible with fine-tuning. When adding the openI dataset to the existing retrieval index, we can integrate the existing index with this new dataset. We can see that X-TRA benefits openI in this setting. In the updated retrieval index 23% of the retrieved information originates from openI and 77% from MIMIC-CXR.

Table 4. Cross-domain result on downstream tasks: Report retrieval (RR) and multi-label classification (MLC) with and without X-TRA.

Full size table

4.5 Ablation Studies

We study the effect of the components in our retrieval augmentation method in Fig. 2. Specifically we look at the influence of each component in content- and CLIP based alignment. Interestingly, the composition of data modalities in retrieval augmentation does not have a big effect, since the retriever has similar results in inter- and intra-modal retrieval. In case randomly selected data is used instead of retrieved information, we achieve comparable results compared to our method without X-TRA. This is in accordance with cross-modal results, showing that if X-TRA supplemented information is not useful, it can be ignored. Using a partial retrieval index we can conclude that X-TRA can be useful with a small retrieval index, however performance reaches optimal levels when $N>100k$.

4.6 Insight and Limitations

Qualitative results from our retrieval method for 2 different query images is shown in Fig. 3. We retrieve from the image index and report index. The retrieved images match well in terms of labels attributed to them, showing that our fine-tuning is preventing the retrieval of images that are only globally similar.

Fine-tuning of the entire CLIP model to domain-specific data is an interesting prospective. Potentially this can further improve the performance of our retrieval model. However, as we have shown in this paper regarding the performance of CLIP against PubMedCLIP, the loss of generalization can also be detrimental. In future studies this an promising avenue to explore.

5 Conclusion

In this work we present X-TRA, a simple yet effective method to improve multiple tasks on radiology images. Our method is composed of a content alignment and a retrieval augmentation step. With a new label-based alignment loss we are able to leverage pre-trained CLIP features to create a powerful cross-modal retrieval model. The general CLIP model appears to be more useful for our retrieval model than the slightly out-of-domain medically fine-tuned PubMedCLIP. We use this retrieval model to improve chest X-ray analysis through retrieval augmentation. With this we are adding an enrichment and regularization component that improves both multi-label disease classification and report retrieval by up to over 5%. On this last task we are even showing to be competitive with dedicated report retrieval methods. It opens up possibilities for retrieval augmentation as a generic tool in medical imaging.

References

Chambon, P., et al.: RoentGen: vision-language foundation model for chest x-ray generation. arXiv preprint arXiv:2211.12737 (2022)
Endo, M., Krishnan, R., Krishna, V., Ng, A.Y., Rajpurkar, P.: Retrieval-based chest x-ray report generation using a pre-trained contrastive language-image model. In: Machine Learning for Health, pp. 209–219. PMLR (2021)
Google Scholar
Eslami, S., de Melo, G., Meinel, C.: Does CLIP benefit visual question answering in the medical domain as much as it does in the general domain? arXiv e-prints arXiv:2112.13906 (Dec 2021)
Gur, S., Neverova, N., Stauffer, C., Lim, S.N., Kiela, D., Reiter, A.: Cross-modal retrieval augmentation for multi-modal classification. In: Findings of EMNLP 2021, pp. 111–123 (2021)
Google Scholar
Guu, K., Lee, K., Tung, Z., Pasupat, P., Chang, M.: Retrieval augmented language model pre-training. In: ICML, pp. 3929–3938 (2020)
Google Scholar
Hu, B., Vasu, B., Hoogs, A.: X-MIR: explainable medical image retrieval. In: WACV, pp. 440–450 (2022)
Google Scholar
Huang, K., Altosaar, J., Ranganath, R.: ClinicalBERT: modeling clinical notes and predicting hospital readmission. arXiv preprint arXiv:1904.05342 (2019)
Ionescu, B., et al.: Overview of the ImageCLEF 2022: multimedia retrieval in medical, social media and nature applications. In: CLEF, pp. 541–564 (2022)
Google Scholar
Irvin, J., et al.: CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In: AAAI, vol. 33, pp. 590–597 (2019)
Google Scholar
Jia, C., et al.: Scaling up visual and vision-language representation learning with noisy text supervision. In: ICML, pp. 4904–4916 (2021)
Google Scholar
Johnson, A.E., et al.: MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6(1), 317 (2019)
Article Google Scholar
Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data 7(3), 535–547 (2019)
Article Google Scholar
Komeili, M., Shuster, K., Weston, J.: Internet-augmented dialogue generation. In: ACL, pp. 8460–8478 (2022)
Google Scholar
Lewis, P., et al.: Retrieval-augmented generation for knowledge-intensive NLP tasks. NeurIPS 33, 9459–9474 (2020)
Google Scholar
Li, J., Li, S., Hu, Y., Tao, H.: A self-guided framework for radiology report generation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 588–598 (2022)
Google Scholar
Li, Z., Zhang, X., Müller, H., Zhang, S.: Large-scale retrieval for medical image analytics: a comprehensive review. Med. Image Anal. 43, 66–84 (2018)
Article Google Scholar
Litjens, G., et al.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017)
Article Google Scholar
Liu, X., et al.: A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit. Health 1(6), e271–e297 (2019)
Article Google Scholar
Lu, J., Batra, D., Parikh, D., Lee, S.: ViLBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. NeurIPS 32 (2019)
Google Scholar
OpenI: Indiana University - chest x-rays (PNG images). https://openi.nlm.nih.gov/faq.php
Pasupat, P., Zhang, Y., Guu, K.: Controllable semantic parsing via retrieval augmentation. In: EMNLP, pp. 7683–7698 (2021)
Google Scholar
Pelka, O., Koitka, S., Rückert, J., Nensa, F., Friedrich, C.M.: Radiology Objects in COntext (ROCO): a multimodal image dataset. In: Stoyanov, D., et al. (eds.) LABELS/CVII/STENT -2018. LNCS, vol. 11043, pp. 180–189. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01364-6_20
Chapter Google Scholar
Pino, P., Parra, D., Besa, C., Lagos, C.: Clinically correct report generation from chest x-rays using templates. In: International Workshop on Machine Learning in Medical Imaging, pp. 654–663 (2021)
Google Scholar
Pooch, E.H., Ballester, P.L., Barros, R.C.: Can we trust deep learning models diagnosis? The impact of domain shift in chest radiograph classification. arXiv preprint arXiv:1909.01940 (2019)
Priyasad, D., Fernando, T., Denman, S., Sridharan, S., Fookes, C.: Memory based fusion for multi-modal deep learning. Inf. Fusion 67, 136–146 (2021)
Article Google Scholar
Qayyum, A., Anwar, S.M., Awais, M., Majid, M.: Medical image retrieval using deep convolutional neural network. Neurocomputing 266, 8–20 (2017)
Article Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML, pp. 8748–8763 (2021)
Google Scholar
Ramos, R., Martins, B., Elliott, D., Kementchedjhieva, Y.: SmallCap: lightweight image captioning prompted with retrieval augmentation. arXiv preprint arXiv:2209.15323 (2022)
Siriwardhana, S., Weerasekera, R., Wen, E., Kaluarachchi, T., Rana, R., Nanayakkara, S.: Improving the domain adaptation of retrieval augmented generation (RAG) models for open domain question answering. arXiv preprint arXiv:2210.02627 (2022)
Tan, H., Bansal, M.: LXMERT: learning cross-modality encoder representations from transformers. In: EMNLP, pp. 5100–5111 (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. NeurIPS 30 (2017)
Google Scholar
Wang, J., Bhalerao, A., He, Y.: Cross-modal prototype driven network for radiology report generation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision (ECCV 2022). LNCS, vol. 13695, pp. 563–579. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19833-5_33
Wang, Z., Yu, J., Yu, A.W., Dai, Z., Tsvetkov, Y., Cao, Y.: SimVLM: simple visual language model pretraining with weak supervision. In: ICLR (2021)
Google Scholar
Yang, X., Ye, M., You, Q., Ma, F.: Writing by memorizing: hierarchical retrieval-based medical report generation. In: ACL, pp. 5000–5009 (2021)
Google Scholar
Yu, Y., Hu, P., Lin, J., Krishnaswamy, P.: Multimodal multitask deep learning for x-ray image retrieval. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12905, pp. 603–613. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87240-3_58
Chapter Google Scholar
Zhang, Y., Ou, W., Zhang, J., Deng, J.: Category supervised cross-modal hashing retrieval for chest x-ray and radiology reports. Comput. Electr. Eng. 98, 107673 (2022)
Article Google Scholar
Zhou, S.K., et al.: A review of deep learning in medical imaging: imaging traits, technology trends, case studies with progress highlights, and future promises. Proc. IEEE 109(5), 820–838 (2021)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Tom van Sonsbeek & Marcel Worring

Authors

Tom van Sonsbeek
View author publications
You can also search for this author in PubMed Google Scholar
Marcel Worring
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tom van Sonsbeek .

Editor information

Editors and Affiliations

University of Leeds, Leeds, UK
Alejandro Frangi
University of Copenhagen, Copenhagen, Denmark
Marleen de Bruijne
Inria Saclay - Île-de-France Research Centre, Palaiseau, France
Demian Wassermann
Technical University of Munich Garching, Munich, Bayern, Germany
Nassir Navab

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

van Sonsbeek, T., Worring, M. (2023). X-TRA: Improving Chest X-ray Tasks with Cross-Modal Retrieval Augmentation. In: Frangi, A., de Bruijne, M., Wassermann, D., Navab, N. (eds) Information Processing in Medical Imaging. IPMI 2023. Lecture Notes in Computer Science, vol 13939. Springer, Cham. https://doi.org/10.1007/978-3-031-34048-2_36

Download citation

DOI: https://doi.org/10.1007/978-3-031-34048-2_36
Published: 08 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34047-5
Online ISBN: 978-3-031-34048-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

X-TRA: Improving Chest X-ray Tasks with Cross-Modal Retrieval Augmentation

Abstract

Similar content being viewed by others

Computer-aided diagnosis through medical image retrieval in radiology

Improving Joint Learning of Chest X-Ray and Radiology Report by Word Region Alignment

FgKF: Fine-Grained Knowledge Fusion for Radiology Report Generation

Keywords

1 Introduction

2 Related Work

3 Methods