Evaluating multimodal relevance feedback techniques for medical image retrieval

Markonis, Dimitrios; Schaer, Roger; Müller, Henning

doi:10.1007/s10791-015-9260-4

Evaluating multimodal relevance feedback techniques for medical image retrieval

Medical Information Retrieval
Published: 01 August 2015

Volume 19, pages 100–112, (2016)
Cite this article

Download PDF

Information Retrieval Journal Aims and scope Submit manuscript

Evaluating multimodal relevance feedback techniques for medical image retrieval

Download PDF

Dimitrios Markonis¹,
Roger Schaer¹ &
Henning Müller¹

893 Accesses
10 Citations
Explore all metrics

Abstract

Medical image retrieval can assist physicians in finding information supporting their diagnosis and fulfilling information needs. Systems that allow searching for medical images need to provide tools for quick and easy navigation and query refinement as the time available for information search is often short. Relevance feedback is a powerful tool in information retrieval. This study evaluates relevance feedback techniques with regard to the content they use. A novel relevance feedback technique that uses both text and visual information of the results is proposed. The two information modalities from the image examples are fused either at the feature level using the Rocchio algorithm or at the query list fusion step using a common late fusion rule. Results using the ImageCLEF 2012 benchmark database for medical image retrieval show the potential of relevance feedback techniques in medical image retrieval. The mean average precision (mAP) is used as the evaluation metric and the proposed method outperforms commonly-used methods. The baseline without feedback reached 16 % whereas the relevance feedback with 20 images reached up to 26.35 % with three steps and when using 100 images up to 34.87 % in four steps. Most improvements occur in the first two steps of relevance feedback and then results start to become relatively flat. This might also be due to only using positive feedback as negative feeback often also improves results after more steps. The effect of relevance feedback in automatically spelling corrected and translated queries is investigated as well. Results without mistakes were better than spell-corrected results but the spelling correction more than double results over non-corrected retrieval. Multimodal relevance feedback has shown to be able to help visual medical information retrieval. Next steps include integrating semantics into relevance feedback techniques to benefit from the structured knowledge of ontologies and experimenting on the fusion of text and visual information.

Medical Image Retrieval Using Multimodal Data

Multimodal Medical Image Retrieval

Text- and Content-Based Medical Image Retrieval in the VISCERAL Retrieval Benchmark

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Searching for images is a daily task for many medical professionals, especially in image-oriented fields such as radiology (Markonis et al. 2012). However, the huge amount of visual data in hospitals and the medical literature is not always easily accessible and physicians have generally little time for information search as they need to diagnose an increasing number of cases with increasing image details in a limited amount of time.

Therefore, medical image retrieval systems need to return information adjusted to the knowledge level and expertise of the user in a quick and precise fashion. A well known technique trying to improve search results by user interaction is relevance feedback (Rocchio 1971). Relevance feedback allows the user to mark results returned in a previous search step as relevant or irrelevant to refine the initial query. The concept behind relevance feedback is that though users may have difficulties in formulating a precise query for a specific task, they generally see quickly whether a returned result is relevant to the information need or not. This technique found use in image retrieval particularly with the emergence of content-based image retrieval (CBIR) systems (Squire et al. 2000; Taycher et al. 1997; Wood et al. 1998). Following the CBIR mentality, the visual content of the marked results is used to refine the initial image query. With the result images represented as a grid of thumbnails and limited metadata, relevance feedback can be applied quickly to speed up the search iterations and refine results. Recent user-tests with radiologists on a medical image search system also showed that this method is intuitive and straightforward to learn (Markonis et al. 2013).

Depending on whether the user manually provides the feedback to the system (e.g. by marking results) or the system obtaining this information automatically (e.g. by log analysis) relevance feedback can be categorized as explicit or implicit. Moreover, the information obtained by relevance feedback can be used to affect the general behaviour of the system (long-term learning). In Müller et al. (2004) a market basket analysis algorithm is applied in image retrieval using long-term learning. A recent review of short-term and long-term learning relevance feedback techniques in CBIR can be found in Li and Allinson (2013). An extensive survey of relevance feedback in text-based retrieval systems is presented in Ruthven and Lalmas (2003) and for CBIR in Rui et al. (1997). Another survey (Crucianu et al. 2004) gives a good overview on key aspects of image retrieval relevance feedback such as the objectives of image retrieval the main relevance feedback mechanisms and the different evaluation strategies, using real users or pseudo-relevance feedback.

Strategies of simulated feedback are also presented in Müller et al. (2000) where relevance feedback is divided into two main strategies: at the level of results of multiple image queries and at the feature level, creating a pseudo-image out of the set of image queries. The challenges of adding negative feedback to the mechanism are also discussed, as negative examples are often much more than the positives ones, “destroying” thus the query. In Qian et al. (2003) several alternating feature spaces are presented for relevance feedback showing to improve results as exploring new areas of the feature space in each iteration. In Cox et al. (1996) a Bayesian approach for relevance feedback is proposed following a atrget search algorithm. The same authors also explore relevance feedback techniques on small displays (Vinay et al. 2005).

In the medical informatics field, Chen et al. (2011) applies CBIR with relevance feedback on mammography retrieval. In Rahman et al. (2007), an image retrieval framework using relevance feedback is evaluated on a dataset of 5000 medical images that uses support vector machines to compute the refined queries.

There are many existing medical retrieval systems that combine text and visual information: such as NovaSearch (Mourão and Martins 2013), Open-I^{Footnote 1} and others. Relevance feedback is not always available in these system and often only evaluated in a qualitative manner. In Rahman et al. (2011) an approach for relevance feedback using similarity fusion is shown to improve the retrieval performance in two iterations of medical image search. However it is only evaluated with respect to the size of the shortlist used for the pseudo-relevance feedback and not against other relevance feedback techniques.

In this paper we evaluate different explicit, short-term relevance feedback techniques using visual content or text for medical image retrieval. We propose a technique that combines visual and text-based relevance feedback and show that it achieves a competitive performance to the state-of-the-art approaches.

2 Methods

In this study the same categorization as in Müller et al. (2000) is followed. Two main feedback strategies with respect to the retrieval stage where the relevance feedback information is added to the new query are examined. In Fig. 1 the image retrieval pipeline is shown as well as the steps where relevance feedback can be used. The first strategy is at the feature level, resulting into a single feature representation for all the query images and consequently a single result list. There is no need for result list fusion in this strategy. The second one is performed at the result list fusion step, where each image query has returned a different result list. In the multi-modal approaches the combination of visual and textual information is obtained at the Result list fusion stage for both strategies.

2.1 Rocchio algorithm

One of the most well known relevance feedback techniques is Rocchio’s algorithm (Rocchio 1971). Its mathematical definition is given below:

$${\mathbf {q}}_m = \alpha {\mathbf {q}}_o + \beta \frac{1}{|D_r|} \sum _{{\mathbf {d}}_j\in D_r}{\mathbf {d}}_j - \gamma \frac{1}{|D_{nr}|} \sum _{{\mathbf {d}}_j\in D_{nr}}{\mathbf {d}}_j $$

(1)

where ${\mathbf {q}}_m$ is the modified query, ${\mathbf {q}}_o$ is the original query, $D_r$ is the set of relevant images, $D_{nr}$ is the set of non-relevant images and $\alpha ,\beta $ and $\gamma $ are weights.

Typical values for the weights are $\alpha = 1, \beta = 0.8$ and $\gamma = 0.2$. Rocchio’s algorithm is typically used in vector space models and also for CBIR. Intuitively, the original query vector is moved towards the relevant vectors and away from the irrelevant ones. By giving a weight to the positive and negative parts a problem of CBIR can be avoided that when more negative than positive feedback exists that also many relevant images disappear from the results set (basically leaving images with few features or features not present in the initial set of images).

2.2 Late fusion

Another technique that showed potential in image retrieval (García Seco de Herrera et al. 2013) is late fusion. Late fusion (Depeursinge and Müller 2010) is used in information retrieval to combine result lists. It can be applied for fusing multiple queries and for multi-modal techniques where results of text and visual retrieval are for example combined. It can also be used for fusing multiple features, even though early fusion is more commonly chosen for this purpose. The concept behind this method is to merge the result lists into a single list while boosting common occurrences using a fusion rule.

For example, the fusion rule of the score-based late fusion method CombMNZ (Shaw and Fox 1994) is defined as:

$$S_{\texttt{combMNZ }}(i)=F(i)* S_{\texttt{combSUM }}(i) $$

(2)

where F(i) is the number of times an image i is present in retrieved lists with a non-zero score, and S(i) is the score assigned to image i. CombSUM is given by

$$S_{\texttt{combSUM }}(i)=\sum _{j=1}^{N_j}{S_j(i)}$$

(3)

where $S_j(i)$ is the score assigned to image i in retrieved list j.

2.3 Multi-modal relevance feedback

Most of the techniques use vectors either from the text or the visual models. However, it has been shown that approaches that use both text and visual information can outperform single-modal ones in image retrieval if performed carefully (Müller and Kalpathy-Cramer 2010). We propose the use of multi-modal information for relevance feedback to enhance the retrieval performance. This is, to the best of our knowledge, the first time that such a technique is proposed in image retrieval. As late fusion is applied on result lists, it is straightforward to use for combining results from visual and text queries.

2.4 Relevance feedback in multi-lingual queries

Another experiment run in this study is to investigate how the RF performs in more realistic scenarios when automatic spelling correction and language translation may have been applied to the query. For this, an even distribution of spelling errors across the text queries were introduced: diacritics omission, leaving out white space, character omission, character insertion, character replacement and character swapping. Automatic spelling correction was then applied to the queries, while queries in German, French and Czech were automatically translated into English.

2.5 Experimental setup

For evaluating the relevance feedback techniques the following experimental setup was followed: The n search iterations are initiated with a text query in iteration 0. The relevant results from the top k results of iteration i were used in the relevance feedback formula of the iteration $i+1$ for $i=0\ldots n-2$.

The image dataset, topics and ground truth of ImageCLEF 2012 medical image retrieval task (Müller et al. 2012) were used in this evaluation. The ad-hoc image based topics were used in our study. The dataset contains more than 300,000 images from the medical open access literature (subset of PubMed Central).

The image captions were accessed by the text-based runs and indexed with the Lucene^{Footnote 2} text search engine. A vector space model was used along with tokenization, stopword removal, stemming and inverse document frequency-term frequency weighting. The Bag-of-visual-words model described in García Seco de Herrera et al. (2012) and the bag-of-colors model appearing in García Seco de Herrera et al. (2013) were used for the visual modelling of the images. Since only positive feedback ws used in this study, no weights were used for the Rocchio algorithm. In multi-modal runs, the fusion of the visual and text information is performed only for the 1000 top results as in the evaluation of ImageCLEF only the top 1000 documents are taken into account in any case.

Five techniques were evaluated in this study:

1.
text: text-based RF using vector space model. Word stemming, tokenization and stopword removal is performed in both text and multi-modal runs.
2.
visual_rocchio: visual RF using Rocchio to fuse the relevant image vectors and CombMNZ fusion to fuse the original query results with the visual results.
3.
visual_lf: visual RF using late fusion (and the CombMNZ fusion rule) to fuse the relevant image results and the original query results with the visual ones.
4.
mixed_rocchio: multi-modal RF using Rocchio to fuse the relevant image vectors and CombMNZ fusion to fuse the original query results with the relevant caption results and relevant visual results.
5.
mixed_lf: multi-modal RF using late fusion (and the CombMNZ fusion rule) to fuse the relevant image results and the original query results with the caption text results and relevant visual results.

Regarding the experiment for relevance feedback in combination with the machine translation and automatic spelling-correction, the Health on the Net (HON) spell checker ^{Footnote 3} was used. The number of spellchecked recommendations to return was set to 25. The decision whether to take the highest frequency spelling suggestion (or keep the existing user query as being spelt correctly) as the correctly spelt term was made based on the ratio of spell suggestion frequency to query term frequency in the collection (threshold set to $\ge $8:1 experimentally).

The MOSES system (Koehn et al. 2007) was used to automatically translate German, French and Czech queries into English. The ImageCLEF dataset contains translations of the queries for German and French while manual translation was used to translate the queries in Czech.

Three main runs were evaluated for each language:

The first run used the queries after being translated into English by the query translation service.
The second run used the queries after being translated into English by the query translation service and spelling errors were artificially introduced.
The third run used the translated queries as the two runs above with the spelling errors corrected by the spelling correction service.

All the above queries were used as input to the experiment described in the beginning of this section using the best performing RF technique and $k=100$.

3 Results

The evaluation of the five techniques was performed for $k=5, 20, 50, 100$ and $n = 5$. Results of the mean average precision (mAP) of each technique per iteration are shown in Figs. 2, 3, 4 and 5.

Table 1 gives the best mAP scores of each run. The numbers in parentheses are the number of the iteration when this score was achieved. For scores that were the same in multiple iterations of the same run, the iteration closer to the first is used.

Table 1 Best mAP scores

Full size table

Table 2 shows the effect of the translation (row 1), the introduction of spelling errors (row 2), and the automatic spelling correction (row 3) to the retrieval performance before applying any RF.

Table 2 Iteration 0 mAP for each of the languages

Full size table

Figure 6 demonstrates the mean average precision at iteration 4 of the mixed_lf technique.

Table 3 shows the results of a sample query for different categories of relevance feedback methods, illustrating the differences in the results when using text, visual and mixed relevance feedback.

Table 3 Sample query: top five results of iteration 4 $(k=100)$ for the three categories of methods (text, visual and mixed)

Full size table

4 Discussion

Medical image retrieval from articles in the literature is a challenging task as the image datasets from the biomedical literature are quite noisy (containing many graphs, diagrams and non-medical pictures). Moreover the areas in the images containing pathology related information are small and the differences are quite subtle with usually only a very small portion of the image being relevant.

All of the evaluated techniques improve retrieval after the initial search iteration. This demonstrates the potential of relevance feedback for refining medical image search queries. Relevance feedback using only visual appearance models, even though improving the retrieval performance after the first iteration, performed worse than the text-based runs in most cases. Visual features still suffer from the semantic gap between the expressiveness of visual features and our human interpretation. Still, this shows their usefulness in image datasets where no or few text meta-data are available. Moreover, when combined with the text-information in the proposed method, they improve the text-only baseline. Recently introduced higher-order representations, such as Fisher vectors or vectors of locally aggregated descriptors (VLADs) may further improve the retrieval results in this scenario.

The proposed multi-modal runs provide the best results in all the cases except for case $k=5$. Surprisingly, the visual runs perform slightly better than the text and the multi-modal approaches for this case. However, assuming independent and normal distributed average precision values the significance tests show that the difference is not statistically significant.

We consider the case $k = 20$ as the most realistic scenario since users do not often inspect more than two pages of results and most actually stay only on the initial results page. Especially for grid-like result interface views, where each page can contain 20–50 results, we consider $k=20$ more realistic than $k=5$. In this case the proposed methods achieve the best performance with 0.2606 and 0.2635 respectively. Again, the significance tests do not find any significant difference between the three best approaches. However, applying different fusion rules for combining visual and text information (such as linear-weighting) could further improve the results of the mixed approaches. It can be noted that as the k increases, the performance improvement also increases, highlighting the added value of relevance feedback. Larger values of k than 100 were not explored as these scenarios were judged as unrealistic.

In the visual runs using Rocchio for combining the visual queries is performing worse than late fusion. This comes in accordance with the findings in García Seco de Herrera et al. (2012). The reason behind this could be that the large visual diversity of relevant images in medicine and the curse of dimensionality cause the modified vector to behave as an outlier in the high dimensional visual feature space. In the mixed runs the difference between the two methods is not statistically significant with Rocchio performing slightly better than the late fusion.

Relevance feedback is shown to be able to improve the retrieval performance in difficult real world scenarios where spelling errors are introduced and corrected as well as machine translation is applied to the queries.

Irrelevant results were ignored, as they often have little or no impact on the retrieval performance (Müller et al. 2000; Salton and Buckley 1997). More importantly, the ground truth of the dataset used contains a much larger portion of annotated irrelevant results than relevant ones. This was considered to potentially simulate an unrealistic scenario, as users do not usually mark many results as negative examples (or only very few). Having too many negative examples could also cause the modified vector to follow an outlier behaviour. Preliminary results confirmed this hypothesis, where the use of negative results for relevance feedback can decrease performance after the first iteration if not handled in a careful way.

A larger number of steps could be investigated but this might be unrealistic, given the fact that physicians have little time and stop after a few minutes of search (Markonis et al. 2012). Often users will only test a few steps of relevance feedback at the most.

In the evaluation of our relevance feedback mechanism we assume a perfect user who marks all relevant items as relevant. As past literature shows (Müller et al. 2000), by selecting not all relevant items but those adding most information, a human user familiar with the system can potentially achieve better results than the automatic system when adding positive and negative feedback, so we feel that this is a good approximation. For novice users this is maybe unlikely and a novice user might thus simply add all relevant items which is the scenario we take for our feedback evaluation.

The advantage of perfect positive feedback is also that results become reproducible, so the exact same conditions can be reproduced for other techniques whereas manual user tests can depend strongly on the person supplying the feedback and thus do not allow comparing performance over several systems in a homogeneous way. User tests with a very similar system for several users have been published (Markonis et al. 2013), and we feel for this article a reproducible way for supplying relevance feedback is better. This does not replace user tests and more on interaction and interface design can be learned via such user tests.

5 Conclusions

This paper proposes the use of multi-modal information when applying relevance feedback to medical image retrieval. An experiment was set up to simulate the relevance feedback of a user on medical topics from ImageCLEF 2012.

In general, all the techniques evaluated in this study improve the performance, which shows the added value of relevance feedback. Text-based relevance feedback showed consistently good results. Visual techniques showed competitive performance for a small number of steps, underperforming in the remaining cases. The proposed multi-modal approaches show promising results slightly outperforming the text-based one but without statistical significance.

More fusion techniques are going to be evaluated in the future. Comparison to manual query refinement by users is considered in future plans to assess relevance feedback as a concept in medical image retrieval. The addition of semantic search is also of interest, to take advantage of the structured knowledge of the medical ontologies such as RadLex^{Footnote 4} (Radiology Lexicon) (Lanlotz 2006) and MeSH^{Footnote 5} (Medical Subject Headings) (WHSL Medical Subject Headings for PubMed Searching 2014) or other medical ontologies to model semantic knowledge.

Notes

References

Chen, C. C., Huang, P. J., Gwo, C. Y., Li, Y., & Wei, C. H. (2011). Mammogram retrieval: Image selection strategy of relevance feedback for locating similar lesions. International Journal of Digital Library Systems (IJDLS), 2(4), 45–53.
Article Google Scholar
Cox, I. J., Miller, M. L., Omohundro, S. M., & Yianilos, P. N. (1996). Target testing and the PicHunter Bayesian multimedia retrieval system. In Advances in Digital Libraries (ADL’96) (pp. 66–75), Washington, D. C.: Library of Congress.
Crucianu, M., Ferecatu, M., & Boujemaa, N. (2004). Relevance feedback for image retrieval: A short survey. In State of the Art in Audiovisual Content–Based Retrieval, Information Universal Access and Interaction including Datamodels and Languages (DELOS2 Report.
Depeursinge, A., & Müller, H. (2010). Fusion techniques for combining textual and visual information retrieval. In H. Müller, P. Clough, T. Deselaers, & B. Caputo (Eds.), ImageCLEF, the Springer international series on information retrieval (Vol. 32, pp. 95–114). Berlin: Springer.
Google Scholar
García Seco de Herrera, A., Markonis, D., Eggel, I., & Müller, H. (2012). The medGIFT group in ImageCLEFmed 2012. In Working notes of CLEF 2012.
García Seco de Herrera, A., Markonis, D., & Müller, H. (2013). Bag of colors for biomedical document image classification. In H. Greenspan, H. Müller (Eds.) Medical content-based retrieval for clinical decision support, MCBR–CDS 2012 (pp. 110–121). Lecture Notes in Computer Sciences (LNCS).
García Seco de Herrera, A., Markonis, D., Schaer, R., Eggel, I., & Müller, H. (2013). The medGIFT group in ImageCLEFmed 2013. In Working Notes of CLEF 2013 (Cross Language Evaluation Forum).
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., et al. (2007). Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions (pp. 177–180). ACL’07 Stroudsburg, PA, USA: Association for Computational Linguistics.
Lanlotz, C. P. (2006). Radlex: A new method for indexing online educational materials. Radiographics, 26, 1595–1597.
Article Google Scholar
Li, J., & Allinson, N. M. (2013). Relevance feedback in content-based image retrieval: A survey. In Handbook on neural information processing (pp. 433–469), Berlin: Springer.
Markonis, D., Baroz, F., Ruiz de Castaneda, R. L., Boyer, C., & Müller, H. (2013). User tests for assessing a medical image retrieval system: A pilot study. In MEDINFO 2013.
Markonis, D., Holzer, M., Dungs, S., Vargas, A., Langs, G., Kriewel, S., et al. (2012). A survey on visual information search behavior and requirements of radiologists. Methods of Information in Medicine, 51(6), 539–548.
Article Google Scholar
Mourão, A., & Martins, F. (2013). NovaMedSearch: A multimodal search engine for medical case-based retrieval. In Proceedings of the 10th conference on open research areas in information retrieval, OAIR’13 (pp. 223–224).
Müller, H., García Seco de Herrera, A., Kalpathy-Cramer, J., Demner Fushman, D., Antani, S., & Eggel, I. (2012). Overview of the ImageCLEF 2012 medical image retrieval and classification tasks. In Working Notes of CLEF 2012 (Cross Language Evaluation Forum).
Müller, H., & Kalpathy-Cramer, J. (2010). The ImageCLEF medical retrieval task at icpr 2010—information fusion to combine viusal and textual information. In Proceedings of the international conference on pattern recognition (ICPR 2010), lecture notes in computer science (LNCS). Istanbul, Turkey: Springer.
Müller, H., Müller, W., Squire, D. M., Marchand-Maillet, S., & Pun, T. (2000). Strategies for positive and negative relevance feedback in image retrieval. In A. Sanfeliu, J. J. Villanueva, M. Vanrell, R. Alcézar, J. O. Eklundh, & Y. Aloimonos (Eds.), Proceedings of the 15th international conference on pattern recognition (ICPR 2000) (pp. 1043–1046). Barcelona, Spain: IEEE.
Müller, H., Müller, W., Squire, D. M., Marchand-Maillet, S., & Pun, T. (2000). Strategies for positive and negative relevance feedback in image retrieval. Tech. Rep. 00.01, Computer Vision Group, Computing Centre, University of Geneva, rue G n ral Dufour, 24, CH–1211 Gen ve, Switzerland.
Müller, H., Squire, D. M., & Pun, T. (2004). Learning from user behavior in image retrieval: Application of the market basket analysis. International Journal of Computer Vision, 56(1–2), 65–77. (Special issue on content-based image retrieval).
Article Google Scholar
Qian, F., Li, M., Zhang, H. J., Ma, W. Y., & Zhang, B. (2003). Alternating feature spaces in relevance feedback. MTA, 21, 35–54. (Special issue on multimedia information retrieval).
Google Scholar
Rahman, M. M., Antani, S. K., & Thoma, G. R. (2011). A learning-based similarity fusion and filtering approach for biomedical image retrieval using SVM classification and relevance feedback. IEEE Transactions on Information Technology in Biomedicine, 15(4), 640–646.
Article Google Scholar
Rahman, M. M., Bhattacharya, P., & Desai, B. C. (2007). A framework for medical image retrieval using machine learning and statistical similarity matching techniques with relevance feedback. IEEE Transactions on Information Technology in Biomedicine, 11(1), 58–69.
Article Google Scholar
Rocchio, J. J. (1971). Relevance feedback in information retrieval. In The SMART retrieval system, experiments in automatic document processing englewood cliffs (pp. 313–323)., Englewood Cliffs, NJ: Prentice Hall.
Rui, Y., Huang, T. S., & Mehrotra, S. (1997). Relevance feedback techniques in interactive content–based image retrieval. In I. K. Sethi, R. C. Jain (Eds.), Storage and retrieval for image and video databases VI, SPIEProc (vol. 3312, pp. 25–36).
Ruthven, I., & Lalmas, M. (2003). A survey on the use of relevance feedback for information access systems. The Knowledge Engineering Review, 18(02), 95–145.
Article Google Scholar
Salton, G., & Buckley, C. (1997). Improving retrieval performance by relevance feedback. Readings in information retrieval, 24, 5.
Google Scholar
Shaw, J. A., & Fox, E. A. (1994). Combination of multiple searches. In TREC-2: The second text retrieval conference (pp. 243–252).
Squire, D. M., Müller, W., Müller, H., & Pun, T. (2000). Content–based query of image databases: inspirations from text retrieval. In B. K. Ersboll, & P. Johansen (Eds.), Pattern recognition letters (selected papers from the 11th scandinavian conference on image analysis SCIA ’99), vol. 2113–14, pp. 1193–1198.
Taycher, L., Cascia, M. L., & Sclaroff, S. (1997). Image digestion and relevance feedback in the ImageRover WWW search engine, Proc. Visual. (Vol. 97, pp. 85–94).
Vinay, V., Cox, I. J., Milic-Frayling, N., & Wood, K. (2005). Evaluating relevance feedback algorithms for searching on small displays. In D. E. Losada & J. M. Fernández-Luna (Eds.), Advances in Information Retrieval, Lecture Notes in Computer Science (LNCS) (pp. 185–199). Berlin: Springer.
Google Scholar
WHSL Medical Subject Headings for PubMed Searching: Medical subject headings (MeSH). (2014). http://libguides.wits.ac.za/whsl-mesh. Accessed 14 Dec 2014.
Wood, M. E., Campbell, N. W., & Thomas, B. T. (1998). Iterative refinement by relevance feedback in content–based digital image retrieval (pp. 13–20).

Download references

Acknowledgments

This work was supported by the EU 7th Framework Program in the context of the Khresmoi project (Grant 257528).

Author information

Authors and Affiliations

Information Systems Institute, HES–SO, Rue du TechnoPole 3, 3960, Sierre, Switzerland
Dimitrios Markonis, Roger Schaer & Henning Müller

Authors

Dimitrios Markonis
View author publications
You can also search for this author in PubMed Google Scholar
Roger Schaer
View author publications
You can also search for this author in PubMed Google Scholar
Henning Müller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dimitrios Markonis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Markonis, D., Schaer, R. & Müller, H. Evaluating multimodal relevance feedback techniques for medical image retrieval. Inf Retrieval J 19, 100–112 (2016). https://doi.org/10.1007/s10791-015-9260-4

Download citation

Received: 15 December 2014
Accepted: 06 July 2015
Published: 01 August 2015
Issue Date: April 2016
DOI: https://doi.org/10.1007/s10791-015-9260-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Evaluating multimodal relevance feedback techniques for medical image retrieval

Abstract

Similar content being viewed by others