Image Retrieval System Based on EMD Similarity Measure and S-Tree

Le, Thanh Manh; Van, Thanh The

doi:10.1007/978-1-4614-6747-2_17

Thanh Manh Le³ &
Thanh The Van⁴

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 234))

1948 Accesses
2 Citations

Abstract

This chapter approaches the binary signature for each image on the base of the percentage of the pixels in each color image and builds a similar measure between the images based on EMD (earth mover’s distance). Next, it aims to create S-tree in a similar measure EMD to store the image’s binary signatures to quickly query image signature data. Then, from a similar measure EMD and S-tree, it provides an image retrieval algorithm and CBIR (content-based image retrieval). Last but not least, based on this theory, it also presents an application and experimental assessment of the process of querying image on the database system over 10,000 images.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Clustering Binary Signature Applied in Content-Based Image Retrieval

An Efficient Approach to Improve Retrieval Rate in Content Based Image Retrieval Using MPEG-7 Features

Image retrieval system based on multi feature extraction and its performance assessment

Article 03 January 2021

Keywords

1 Introduction

It is difficult to find images in a large database of digital images. There are two main approaches for querying the images: querying the images based on the keyword TBIR (text-based image retrieval) [1] and those based on the content CBIR (content-based image retrieval) [1, 2].

In recent years, there have been considerable researches regarding CBIR, such as the image retrieval system based on color histogram [1–4], the similarity of the images based on histogram and the texture [5], and using the EMD distance in image retrieval [6–8].

This chapter aims to create the binary signature of an image and describe the distribution of image’s colors by a bitstring with a given size. It also aims to query “similar images” in a large image database system efficiently. Additionally, two major targets are used to reduce the amount of storage space and speed up the query image on large database systems.

2 The Related Theory

2.1 S-Tree

S-tree [2, 9] is a tree with many branches that are balanced; each node of the S-tree contains a number of pairs \( \langle \mathrm{sig},\mathrm{next}\rangle \), where \( \mathrm{sig} \) is a binary signature and \( \mathrm{next} \) is a pointer to a child node. Each node root of the S-tree contains at least two pairs and at most \( M \) pairs \( \langle \mathrm{sig},\mathrm{next}\rangle \), all internal nodes in the S-tree at least \( m \) and at most \( M \) pairs \( \langle \mathrm{sig},\mathrm{next}\rangle \), \( 1\leq m\leq {M \left/ {2} \right.} \); the leaves of the S-tree contain the image’s binary signatures \( \mathrm{sig} \), along with a unique identifier \( \mathrm{oid} \) for those images. The S-tree height for \( n \) signatures is at most \( h=\left\lceil {{\log_m}n-1} \right\rceil \). The S-tree was built on the basis of inserting and splitting. When the node \( v \) is full, it will be split into two.

2.2 EMD Distance

Setting \( I \) as a set of suppliers, \( J \) as a set of consumers, and \( {c_{ij }} \) as the transportation cost from the supplier \( i\in I \) to the consumer \( j\in J \), we need to find out flows \( {f_{ij }} \) to minimize the total cost \( \sum\limits_{{i\in I}} {\sum\limits_{{j\in J}} {{c_{ij }}{f_{ij }}} } \) with the constraints [10] \( {f_{ij }}\geq 0,\sum\limits_{{i\in I}} {{f_{ij }}\leq {y_j}}, \sum\limits_{{j\in J}} {{f_{ij }}\leq {x_i}}, i\in I,j\in J \). With \( {x_i} \) as the provider’s general ability \( i\in I \), \( {y_j} \) is the total need of the consumer \( j\in J \). The feasible condition is \( \sum\limits_{{j\in J}} {{y_j}\leq \sum\limits_{{i\in I}} {{x_i}} } \). The EMD distance [6, 7] is as follows: \( \mathrm{EMD}(x,y)={{{\left( {\sum\nolimits_{{i\in I}} {\sum\nolimits_{{j\in J}} {{c_{ij }}{f_{ij }}} } } \right)}} \left/ {{\left( {\sum\nolimits_{{i\in I}} {\sum\nolimits_{{j\in J}} {{f_{ij }}} } } \right)}} \right.}={{{\left( {\sum\nolimits_{{i\in I}} {\sum\nolimits_{{j\in J}} {{c_{ij }}{f_{ij }}} } } \right)}} \left/ {{\left( {\sum\nolimits_{{j\in J}} {{y_j}} } \right)}} \right.} \)

3 Building Data Structures and Image Retrieval Algorithms

3.1 Creating a Binary Signature of the Image Based on the Color Histogram

Step 1. Choose a standard color set \( C=\{{c_1},{c_2},\ldots,{c_n}\} \) to calculate the color histogram of the images. To quantify the image \( I \) in order to retain only the dominant colors \( {C_I}=\left\{ {c_1^I,c_2^I,\ldots,c_{{{n_I}}}^I} \right\} \), the color histogram vector of image \( I \) is \( {H_I}=\left\{ {h_1^I,h_2^I,\ldots,h_{{{n_I}}}^I} \right\} \).
Step 2. Calculate the color histogram vector standardizes \( H=\{{h_1},{h_2},\ldots,{h_n}\} \), where \( {h_i}={{{h_j^I}} \left/ {{\sum\nolimits_j {h_j^I} }} \right.} \) if \( {c_i}\in C\cap {C_I} \), otherwise \( {h_i}=0 \).
Step 3. Each color \( c_j^I \) will be described into a bitstring \( b_1^jb_2^j,\ldots,b_m^j \). The binary signature of the image \( I \) will be \( \mathrm{sig}={B^1}{B^2}\ldots {B^n} \), \( {B^j}=b_1^jb_2^j\ldots b_m^j \), in which \( b_i^j=1 \) if \( i=\left\lceil {{h_i}\times m} \right\rceil \), otherwise \( b_i^j=0 \).

3.2 Measuring Similar Image Based on EMD Distance

The weight of the component\( B_I^j=b_1^jb_2^j\ldots b_m^j \)is\( w_I^{j}=w\left( {B_I^j} \right)=\sum\limits_{i=1}^m {\left( {b_i^j\times \left( {{i \left/ {m} \right.}} \right)\times 100} \right)}; \) the weight vector of the image \( I \) will be \( {W_I}=\left\{ {w_I^1,w_I^2,\ldots,w_I^n} \right\} \). \( J \) is the image that we need to calculate the similarity with the image \( I \), so we need to minimize the cost \( \sum\limits_{i=1}^n {\sum\limits_{j=1}^n {{d_{ij }}{f_{ij }}} } \), and \( F=\left( {{f_{ij }}} \right) \) is the matrix of color distribution flows from \( c_I^i \) to \( c_J^j \) and \( D=\left( {{d_{ij }}} \right) \) is the Euclidean distance matrix in the RGB color space from \( c_I^i \) to \( c_J^{j} \). The similarity between two images \( I \) and \( J \) based on the EMD distance will minimize the value \( \mathrm{EMD}(I,J)=\mathop{\min}\limits_{{F=\left( {{f_{ij }}} \right)}}{{{\left( {\sum\limits_{i=1}^n {\sum\limits_{j=1}^n {{d_{ij }}{f_{ij }}} } } \right)}} \left/ {{\sum\limits_{i=1}^n {\sum\limits_{j=1}^n {{f_{ij }}} } }} \right.} \), with \( \sum\limits_{i=1}^n {\sum\limits_{j=1}^n {{f_{ij }}} } =\min \left( {\sum\limits_{i=1}^n {w_I^i}, \sum\limits_{j=1}^n {w_J^j} } \right) \)

3.3 Creating S-Tree Based on EMD Distance

Algorithm1 . Gen-Stree(S, Root)

Step 1. v = Root;

If S = Ø then STOP;

Else Choosing <sig,oid> ∈ S and S = S <sig,oid>;

To go Step 2;

Step 2. If v is leaf then

begin

v = v ⊕ <sig,oid>; UnionSig(v);

If v.count > M then SplitNode(v);

To go back Step 1;

end

Else

begin

EMD(SIG ₀ →sig,sig)=min{EMD(SIG _i →sig,sig)|SIG _i ∈ v};

v = SIG ₀ →next; To go back Step 2;

End

Splitting the node \( v \) based on \( \alpha -seed \) and \( \beta -seed \) in [2, 9] is done as follows:

Algorithm2. SplitNode(v)

Create the nodes \( {v_{\alpha }} \) and \( {v_{\beta }} \) contains \( \alpha -seed \) and \( \beta -seed \) ;

For (SIG _i ∈ v) do

Begin

If (EMD(SIG _i →sig, \( \alpha -seed \) )<EMD(SIG _i →sig, \( \beta -seed \) )) then

\( {v_{\alpha }} \) = \( {v_{\alpha }} \) ⊕ SIG _i ;

Else \( {v_{\beta }} \) = \( {v_{\beta }} \) ⊕ SIG _i ;

\( {s_{\alpha }} \) = \( \bigcup {sig_i^{\alpha }} \) , with \( sig_i^{\alpha}\in {v_{\alpha }} \) ; \( {s_{\beta }} \) = \( \bigcup {sig_i^{\beta }} \) , with \( sig_i^{\beta}\in {v_{\beta }} \) ;

\( {v_{parent }}={v_{parent }}\oplus {s_{\alpha }} \) ; \( {v_{parent }}={v_{parent }}\oplus {s_{\beta }} \) ;

If ( \( {v_{parent }}.count \) > \( M \) ) then SplitNode ( \( {v_{parent }} \) );

End.

Procedure UnionSig( \( v \) )

Begin

\( s \) = \( \bigcup {sig_i} \) , with \( sig_i\in v \) ;

If ( \( {v_{parent }} \) != null) then

begin

\( SI{G_v}=\{SI{G_i}|SI{G_i}\to next=v,SI{G_i}\in {v_{parent }}\} \) ; \( {v_{parent }}\to (SI{G_v}\to sig)=s;\) UnionSig( \( {v_{parent }} \) );

end

End.

3.4 The Image Retrieval Algorithm Based on S-Tree and EMD Distance

Algorithm3. Search-Image-Sig(sig, S-tree)

v = root; SIGOUT = Ø Stack = Ø Push(Stack, v);

while ( not Empty(Stack)) do

begin

v = Pop(Stack);

If (v is not Leaf) then

begin

For (SIG _i ∈ v and SIG _i →sig ∧ sig = sig) do

EMD(SIG ₀ →sig,sig)=min{EMD(SIG _i →sig,sig)|SIG _i ∈v};

Push(Stack, SIG ₀ →next);

end

Else SIGOUT = SIGOUT ∪ {<SIG _i →sig,oid _i >|SIG _i ∈v};

end

return SIGOUT;

4 Experiments

4.1 Model Application

Phase 1: Perform Preprocessing (Fig. 1)
- Step 1. Quantize images in the database and convert to a color histogram.
- Step 2. Convert the color histogram of the image in the form of binary signatures.
- Step 3. Respectively calculate the similarity measure EMD distance of the image signatures and insert into the S-tree.
Phase 2: Implementation Query
- Step 1. For each query image, calculate the color histogram and convert into binary signatures.
- Step 2. Perform binary signature query on the S-tree consisting of the image signature, it is possible to find similar images at the leaves of the S-tree through the EMD measure.
- Step 3. After finding similar images, conduct arrangement of similar levels from high to low and make the title match with the images arranged on the basis of similarity EMD distance.

4.2 The Experimental Results

Each image will calculate the color histogram based on 16 colors: black, silver, white, gray, red, orange, yellow, lime, green, turquoise, cyan, ocean, blue, violet, magenta, and raspberry (Figs. 2 and 3).

5 Conclusion

This chapter creates algorithms in order to speed up the retrieval of similar images based on the image’s binary signatures and then designs and implements the image retrieval model CBIR. As can be seen from the experiment, it takes a long time to create the S-tree from the image’s binary signature, but the retrieval of the image that relies on the S-tree will be a lot faster than a linear search method based on EMD. However, using EMD to calculate the distribution of the image’s colors will result in inaccuracy in the case of the images with the same percentage of color pixels, but the color distribution location does not correspond to each other. The next development will assess the similarity of the image through EMD distance with location distribution of the percentage of color pixels and compare the objects in the contents of the image to increase accuracy when querying the similar images.

References

Neetu Sharma S, Paresh Rawat S, Jaikaran Singh S (2011) Efficient CBIR using color histogram processing. Signal Image Process Int J 2(1):94–112
Article Google Scholar
Nascimento MA, Tousidou E, Chitkara V, Manolopoulos Y (2002) Image indexing and retrieval using signature trees. Data Knowl Eng 43(1):57–77
Article MATH Google Scholar
Abuhaiba ISI, Salamah RAA (2012) Efficient global and region content based image retrieval. Int J ImageGraphics Signal Process 4(5):38–46
Article Google Scholar
Yu J, Amores J, Sebe N, Radeva P, Tian Q (2008) Distance learning for similarity estimation. IEEE Trans Pattern Anal Mach Intell 30(3):451–462
Article Google Scholar
Kavitha C, Babu Rao M, Prabhakara Rao B, Govardhan A (2011) Image retrieval based on local histogram and texture features. Int J Comput Sci Inf Technol 2(2):741–746
Google Scholar
Rubner Y, Tomasi C, Guibas LJ (1998) A metric for distributions with applications to image databases. In: Proceedings of the IEEE international conference on computer vision, Bombay, India, 4–7 January 1998, pp 59–66
Google Scholar
Abdelkhalak B, Zouaki H (2011) EMD similarity measure and metric access method using EMD lower bound. Int J Comput Sci Emerg Technol 2(6):323–332
Google Scholar
Hurtut T, Gousseau Y (2008) Francis Schmitt: adaptive image retrieval based on the spatial organization of colors. Comput Vis Image Underst 112(2):101–113
Article Google Scholar
Chen Y, Chen Y (2006) On the signature tree construction and analysis. IEEE Trans Knowl Data Eng 18:1207–1224
Article Google Scholar
Konstantinidis K, Gasteratos A, Andreadis I (2005) Image retrieval based on fuzzy color histogram processing. Sci Direct Optics Commun 248(4–6):375–386
Article Google Scholar

Download references

Author information

Authors and Affiliations

Hue University, Hue, Vietnam
Thanh Manh Le
Center for Information Technology, HoChiMinh City University of Food Industry, HoChiMinh City, Vietnam
Thanh The Van

Authors

Thanh Manh Le
View author publications
You can also search for this author in PubMed Google Scholar
Thanh The Van
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thanh The Van .

Editor information

Editors and Affiliations

School of Engineering, Mercer University, 151 Brookefield Drive, Macon, 31210, Georgia, USA
Jengnan Juang
National Changhua University of Educatio, No. 1, Jin De Road, Changhua City, 500, Taiwan R.O.C.
Yi-Cheng Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Le, T.M., Van, T.T. (2013). Image Retrieval System Based on EMD Similarity Measure and S-Tree. In: Juang, J., Huang, YC. (eds) Intelligent Technologies and Engineering Systems. Lecture Notes in Electrical Engineering, vol 234. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6747-2_17

Download citation

DOI: https://doi.org/10.1007/978-1-4614-6747-2_17
Published: 28 February 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6746-5
Online ISBN: 978-1-4614-6747-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Image Retrieval System Based on EMD Similarity Measure and S-Tree

Abstract

Similar content being viewed by others

Clustering Binary Signature Applied in Content-Based Image Retrieval

An Efficient Approach to Improve Retrieval Rate in Content Based Image Retrieval Using MPEG-7 Features

Image retrieval system based on multi feature extraction and its performance assessment

Keywords

1 Introduction