1 Introduction

The proliferation of digital images has resulted into large number of digital image libraries. In order to have an easy access to such large number of images, proper organization and indexing of images is required. Indexing and arrangement of such huge amount of different types of images is a challenging task. The field of image retrieval proposes a number of solutions to achieve this task. Image retrieval systems can be broadly classified into two types-text-based retrieval systems and content-based retrieval systems. Text-based retrieval systems search for images on the basis of specific keywords. Such systems are quite popular and most of the modern search engines use text-based retrieval approach for searching an image. However, such systems suffer from two major disadvantages. First, text-based image retrieval systems require manual annotation of huge amount of images which is quite cumbersome. Second, text-based retrieval systems fail to retrieve visually similar images. Second type of image retrieval system which is Content-Based Image Retrieval (CBIR) system tends to overcome limitations of text-based retrieval systems. Such systems do not require manual tagging of images and retrieve visually similar images.

Content-Based Image Retrieval (CBIR) refers to searching and retrieval of images on the basis of features present in the image [1]. The features may be primary features such as colour, texture, and shape or semantic features such as type of image. CBIR systems take image or its sketch as a query and extract features present in the image. These features are used to construct feature vector which is matched with those of images in large database to retrieve visually similar images.

Since the term CBIR came into existence, a number of techniques have been proposed for CBIR. Early image retrieval techniques were based on colour feature. Colour is a visible property of an object and is invariant to certain geometrical transformations. Colour feature has been used in the form of colour histogram [24] and colour correlogram [8]. However, colour feature highlights visible properties of an object and does not distinguish between different shapes of objects. Two images may have same histogram but they may not be representing same objects. Texture is another feature that has been in use for a long time [15, 34]. Texture has been exploited as Gabor filter [15], and Fourier transform [34]. Recent trends in texture analysis for image retrieval have seen the use of local patterns such as Local Binary Pattern (LBP) [22], and Local Ternary Pattern (LTP) [29]. Most of the recent state-of-the-art image retrieval techniques are exploiting local patterns and their different variants [16, 17]. Apart from texture and colour, shape is another feature which has been extensively used in literature. Shape feature has been exploited as single feature in [25] as well as in combination with texture in [26].

As the images became more complex, use of primary features as a single feature started proving to be insufficient. Single feature fails to capture complex and varying level of details. Use of combination of features overcomes this limitation. The combination of features gathers different details of an image and combines them to form more efficient feature vector which single feature fails to do. The combination of colour and texture [10], texture and shape [18] proves this fact.

Most of the above mentioned retrieval techniques exploit single scale of image for construction of feature vector. Single scale feature vector is not efficient to extract complex details present in an image. The solution to this problem is use of multiscale techniques that capture varying level of details. Multiscale or multiresolution techniques process multiple scales of an image to gather complex details. The advantage of multiresolution techniques is that features that are left undetected at one scale get detected at another scale. Several multiresolution techniques such as wavelets [5], curvelets [28] etc. have been used to achieve this task [27]. These techniques exploit multiple scales of an image to construct feature vector. The work presented in this paper proposes a Multiscale local binary pattern (LBP) based technique for CBIR. Core idea for the new multiscale LBP scheme has been derived from the concept of multiscale wavelet transform. Most of the other multiscale LBP techniques consider consecutive neighbourhood pixels for thresholding with the centre pixel. Instead of considering consecutive neighbourhood pixels, the technique presented in this paper considers combination of pixels for thresholding with the centre pixel. 3 × 3 scale of LBP is expanded to 5 × 5 and 7 × 7 scales and different combinations of eight neighbourhood pixels are considered for thresholding with the centre pixel. This method not only overcomes the limitations of 3 × 3 scale LBP which is considered to be too local but also overcomes the limitations of other multiscale LBP techniques [20] which consider only boundary pixels and do not consider inner scale while constructing feature vector. The proposed method constructs feature vector through Gray Level Co-occurrence Matrix (GLCM) which gives information about how frequently adjacent pixel pairs occur in an image and helps in getting information about spatial distribution of pixels in an image.

The remaining part of the paper is organized as follows. Section 2 discusses Related Work. Section 3 discusses the proposed multiscale LBP structure. Section 4 discusses The Proposed Method. Section 5 discusses Experiment and Results and finally Section 6 concludes the paper.

2 Related work

The past decade has seen numerous techniques of CBIR. Several new concepts have emerged for extracting distinguishing features in an image. The early features for CBIR included primary features such as colour, texture, and shape. However, as the complexity of image increased, use of primary features on single resolution of image started proving to be insufficient. Hence, the use of multiresolution techniques for CBIR came into existence. The use of multiresolution techniques such as wavelets as a feature for CBIR has been quite useful. Wavelet has been used as single feature in [2] and in combination with other features in [3, 27]. Wavelets decompose image into multiple scales of image to perform multiresolution processing. Decomposing image into multiple scales helps in capturing features at multiple resolutions of image. Also, features left undetected at one scale get detected at another scale. This helps in constructing more efficient feature vector as compared to feature vector constructed through features exploited at single scale of image. Wavelet has been used as a feature for processing not only natural images but also medical images. Zhang et al. [39] exploited the concept of stationary wavelet transform to distinguish between abnormal and normal brains. Yang et al. [33] exploited the concept of wavelet energy for automated classification of brain images. Wang et al. [31] proposed a new descriptor Stationary Wavelet Entropy to extract features from brain image.

In order to improve efficiency of CBIR systems, several intelligence based techniques have been proposed which consider user feedback for improving accuracy. Rashedi et al. [23] proposed inclusion of user’s feedback for improving accuracy of CBIR systems. This method combines the concept of semantic clustering and long term learning for CBIR. Huang et al. [9] proposed the concept of Discriminative Extreme Learning Machine (DELM) to overcome limitations of relevance feedback in order to improve accuracy of CBIR. Li et al. [11] proposed a new concept of object bank that encodes appearance of object and spatial location information in images. Wang et al. [30] proposed the concept of kernel-based classifiers for image retrieval. Liang et al. [12] proposed learning based method for optimizing top precision performance measure. Gao et al. [4] proposed deep learning hashing technique for CBIR.

Apart from all these techniques, numerous descriptors have been proposed for image classification and retrieval. Of these, local patterns have been one of the most popular concepts. Local patterns not only gather local information but also provide information about structural arrangement of pixels. Local patterns such as Local Binary Pattern (LBP), Local Ternary Pattern (LTP) etc. are some of the basic local patterns. These local patterns have been used as texture descriptors and have been widely used for face recognition applications [35, 36]. Other than these, a number of new local patterns such as Local Tetra Pattern (LTrP) [16], Local Binary Extrema Pattern (LBEP) [17], Local Derivative Pattern [37] etc. are the variants of LBP, which provide local information by considering pixels in various directions. Recently, new local patterns such as Microstructure Descriptor (MSD) [13], Visual Attention Model [14], Hybrid Information Descriptor (HID) [38] have been proposed which not only provide local information but are also used for exploiting semantic aspect of image.

A major drawback of the above local patterns is that all of them gather local information by dividing image into 3 × 3 blocks. The 3 × 3 scale is considered to be too local and may not efficiently capture large scale dominant texture features. To overcome these drawbacks, multiscale techniques have been introduced. The scale size 3 × 3 is increased to 5 × 5 and 7 × 7 to efficiently capture large size dominant texture feature. Ojala et al. [20] proposed this concept and tested on texture datasets. The results were found to improve with increase in the size of scale. The concept of multiscale LBP was also exploited by Zhu et al. [40]. Zhu et al. [40] proposed six novel multiscale colour LBP operators to increase photometric invariance property and discriminative power of original LBP. Guo et al. [6] proposed the concept of hierarchical multiscale LBP for face and palmprint recognition. This technique constructed a hierarchical multiscale LBP histogram for an image. Xia et al. [32] proposed multiscale local spatial binary pattern (MLSBP) for image retrieval. This work integrated the LBP with spatial distribution information of gray level variation between referenced pixel and its neighbours.

3 Multiscale local binary pattern

3.1 Local binary pattern

The basic concept of Local Binary Pattern (LBP) was proposed by Ojala et al. [19]. The LBP operator works in a 3 × 3 pixel block of an image. The pixels of this block are thresholded by the center pixel of the block. The values in the thresholded neighbourhood are multiplied by the weights provided to the corresponding pixels. Computation of LBP of a grayscale image is shown in Fig. 1 along with the resulting LBP image obtained after performing this operation. Since the neighbourhood comprises of 8 pixels, therefore 28 = 256 possible texture labels can be obtained with reference to the values of centre pixel and 8 neighbourhood pixels. The original LBP operator was introduced for simple texture feature analysis in images. Later on, other important properties of LBP operator and its variants were discovered such as rotation invariance, statistical robustness, contrast measure etc. [22] which proved quite useful for several image processing applications such as face recognition, image segmentation etc. Some of the important properties of LBP which are useful for image analysis are as follows-

  1. 1.

    It is simple and efficient local descriptor for describing textures.

  2. 2.

    It encodes the relationship between gray value of centre pixel and neighbourhood pixels into 0 and 1.

  3. 3.

    It is useful in extracting local information of an image.

  4. 4.

    As a local feature, when combined with global feature acts as an efficient feature vector.

Fig. 1
figure 1

(a) Computation of LBP (b) Input image (c) LBP image

3.2 Multiscale local binary pattern

The original LBP operator works in 3 × 3 neighbourhood. It attempts to extract local information in 3 × 3 scale which is considered to be too small. Some of the large size dominant texture features may not be captured efficiently using 3 × 3 scale. Also, changes in texture feature may also be not captured efficiently. Some of the limitations of original 3 × 3 operator are as follows [22]-

  1. 1.

    Small spatial support area.

  2. 2.

    Large scale structures that may be dominant features of some textures are not captured efficiently.

  3. 3.

    Too local to be robust.

  4. 4.

    Not very robust against local changes in the texture.

The limitations of 3 × 3 scale are overcome by expanding 3 × 3 block size to large size blocks such as 5 × 5 and 7 × 7. Ojala et al. [20] proposed the concept of multiscale LBP by showing that no limitation can be put to the size of neighbourhood or to the number of sampling points. Figure 2 shows the possible neighbourhoods of different radius and neighbourhood pixels for LBP as proposed by Ojala et al. [20]. The multiscale LBP overcomes the limitations of original 3 × 3 LBP operator. It provides large spatial support area, efficiently captures dominant features of some textures, and is robust against local changes.

Fig. 2
figure 2

Possible multiscale LBP of different radius and neighbourhoods

3.3 Construction of the proposed multiscale LBP scheme

In the multiscale LBP schemes, proposed by Ojala et al. [20], only boundary pixels are thresholded by the value of centre pixel and the pixels in the inner block are not considered while constructing feature vector. Hence a considerable amount of information is lost. The proposed multiscale scheme tends to overcome this limitation. The proposed scheme, instead of considering consecutive boundary pixels, considers combination of eight neighbourhood pixels. Hence this operator produces more than one LBP image as different combinations of neighbourhood pixels produce different LBP matrices. It not only considers boundary pixels but also the pixels in inner blocks for constructing feature vector. The concept of the proposed multiscale LBP scheme is derived from the basic concept of implementation of multiresolution analysis in wavelet transform. The basic idea of the proposed descriptor is to represent multiscale LBP as a superposition of subscales. Such superposition decomposes a multiscale LBP of scale n x n (where n is odd) into different subscales through different combinations of eight neighbourhood pixels. Each subscale is further decomposed in order to consider pixels of inner subscale. This has been achieved by considering different combinations of eight neighbourhood pixels which are thresholded by the centre pixel of the subscale. The advantages of this concept are many. First, it captures change in texture feature efficiently whereas the other techniques, considering consecutive neighbourhood pixels, fail to capture. Second, it also considers the inner subscale for constructing feature vector which the other multiscale LBP techniques fail to do. Third, the features left undetected in one subscale get detected in another subscale. The construction of the proposed multiscale LBP scheme is demonstrated with the help of a diagram in Fig. 3. In the proposed scheme, instead of considering consecutive neighbourhood pixels, different combinations of eight neighbourhood pixels are selected and are thresholded by the value of centre pixel. This generates different LBP images as shown in Fig. 3(b). In case of 5 × 5 scale, instead of considering sixteen neighbourhood pixels, the boundary pixels are broken into small size combinations of eight pixels which are thresholded by the centre pixel. This produces three LBP matrices or subscales and in case of 7 × 7 scale, six LBP matrices or subscales are produced.

Fig. 3
figure 3

(a) The proposed multiscale LBP scheme for 5 × 5 scale (b) LBP images of the proposed multiscale scheme of 5 × 5 scale (c) The proposed multiscale LBP scheme for 7 × 7 scale (d) LBP images of the proposed multiscale scheme of 7 × 7 scale

Figure 3(a) shows the construction of the proposed multiscale LBP scheme for 5 × 5 scale. In case of 5 × 5 scale it has been shown that different combinations of eight neighbourhood pixels are considered for thresholding by the centre pixel. The inner subscale of this scale which consists of eight pixels is also considered. This produces three LBP images which are shown in Fig. 3(b). In the first two subscales of Fig. 3(a), different combinations of neighbourhood pixels are chosen by considering boundary pixels and the third subscale consists of inner 3 × 3 scale. The first two subscales are represented by 8 neighbourhood pixels which are obtained by breaking 16 neighbourhood pixel representation into two 8 neighbourhood pixel representations by choosing different combinations of neighbourhood pixels. In this combination of LBP, instead of considering consecutive boundary pixels, combination of pixels is chosen for constructing feature vector. Fig. 3(c) shows construction of the proposed multiscale scheme for 7 × 7 scale. In this case, it is shown that different combinations of eight neighbourhood pixels are considered which are thresholded by the centre pixel. In case of 7 × 7 scale, there are 24 pixels in the outermost boundary, which are broken into three 8 neighbourhood pixels. The 7 × 7 scale consists of two inner scales, 5 × 5 scale and 3 × 3 scale. The inner 5 × 5 scale has been constructed in the same manner as in Fig. 3(a). This gives a pyramidal structure to the proposed scheme where the peak of the pyramid is of the lowest scale, that is, 3 × 3 scale and the base of the pyramid is of the highest scale, that is, 7 × 7 scale. Here, it is also interesting to note that all LBP matrix construction at a particular scale are disjoint to each other. This construction characterizes the multiresolution property of the proposed multiscale LBP scheme. The proposed multiscale scheme is characterized by the following important features.

  1. 1.

    The proposed multiscale scheme for 5 × 5 scale produces three LBP images corresponding to the three subscales, obtained by choosing different combinations of neighbourhood pixels, as shown in Fig. 3(b). The changes in texture feature can be observed in the foreground as well as in the background of these three LBP images. Choosing different combinations of neighbourhood pixels and then thresholding them by centre pixel captures change in texture feature efficiently. The change in background texture can be observed in first and second LBP image whereas change in foreground texture can be observed in the third LBP image of Fig. 3(b). Similarly, in case of 7 × 7 scale, the changes in texture can be more clearly observed in the six LBP images as compared to multiscale scheme for 5 × 5 scale.

  2. 2.

    The proposed multiscale LBP scheme for 5 × 5 scale consists of two subscales, 5 × 5 scale and 3 × 3 scale. Similarly, the proposed multiscale LBP scheme for 7 × 7 scale consists of three subscales- 7 × 7 scale, 5 × 5 scale, and 3 × 3 scale. Such decomposition of scales characterizes an important aspect of multiresolution analysis- features left undetected at one subscale get detected at another subscale. The 5 × 5 scale captures changes in texture feature more efficiently than single 3 × 3 scale as it consists of not only 5 × 5 scale but also inner 3 × 3 scale. Similarly, 7 × 7 scale captures changes in texture feature more efficiently than 5 × 5 scale as it consists of not only outer 7 × 7 scale but also 5 × 5 scale and inner 3 × 3 scale. Large scale size efficiently captures dominant texture feature which small scale size fails to do.

  3. 3.

    The proposed multiscale LBP scheme decomposes 5 × 5 scale into two subscales- 5 × 5 scale and 3 × 3 scale. Similarly, the proposed scheme decomposes 7 × 7 scale into three subscales- 7 × 7 scale, 5 × 5 scale, and 3 × 3 scale. This type of decomposition of n x n scale into smaller scale produces a unique arrangement of subscales which characterizes pyramidal structure of image [5] where the base of the pyramid is of the highest scale and the peak of the pyramid is of the lowest scale. Such high to low scale analysis of texture is useful for capturing changes in texture features which is particularly useful for object recognition.

Along with the above mentioned features, the proposed multiscale scheme consists of the following general properties-

  1. 1.

    It considers different combinations of eight neighbourhood pixels which are thresholded by the centre pixel. Instead of considering consecutive neighbourhood pixels, it breaks neighbourhood pixels into small pixel combinations.

  2. 2.

    It not only considers boundary pixels but also pixels in the inner blocks of each scale.

  3. 3.

    It generates 28 = 256 possible texture labels with reference to the values of centre pixel and neighbourhood pixels for each LBP matrix.

  4. 4.

    It generates different LBP matrices for each combination of neighbourhood pixels which consist of different LBP codes.

3.4 Advantages of the proposed multiscale LBP scheme-

Following are the major advantages of the proposed multiscale LBP scheme-

  1. 1.

    The proposed multiscale scheme considers all information in a neighbourhood as it considers not only boundary pixels but pixels in inner blocks which are not considered by other multiscale LBP techniques.

  2. 2.

    It divides n x n scale (where n is odd) into different LBP subscales which are formed through different combinations of neighbourhood pixels. Each LBP subscale results into different LBP matrices formed by different LBP codes. The information left undetected in one subscale gets considered in another subscale which is an important property of any multiresolution technique.

  3. 3.

    It has large spatial support area. The 3 × 3 scale LBP has small spatial support area due to small window size. It fails to capture dominant features that are significant features of image. The proposed multiscale LBP scheme has large spatial support area due to its large window size. This helps in capturing dominant features in an image.

  4. 4.

    It efficiently captures large scale structures that may be dominant texture features. An image consists of large as well as small size objects. The 3 × 3 scale LBP has limitation that it captures small size objects efficiently. However, it fails to capture large size objects completely as it has small spatial support area. Due to this the significant features of an image are not captured efficiently. The proposed multiscale LBP scheme overcomes this limitation as it has large spatial support area and it efficiently captures significant features of image completely due to its large window size.

  5. 5.

    It captures change in texture feature efficiently. The proposed multiscale LBP scheme is formed through different combinations of neighbourhood pixels. Through different combinations of neighbourhood pixels, it is able to capture the change in texture feature efficiently as shown in Fig. 3. The existing multiscale schemes consider consecutive neighbourhood pixels which fail to capture change in texture feature efficiently.

  6. 6.

    It is very robust against local changes in the texture. The proposed multiscale scheme is constructed through different combinations of neighbourhood pixels. It not only considers boundary pixels but also inner pixels of the window. Due to this combination of pixels, it is able to efficiently capture local changes in the texture feature of an object as shown in Fig. 3(b) and (d).

3.5 Gray-level co-occurrence matrix (GLCM)

Gray-Level Co-occurrence Matrix (GLCM) was proposed by Haralick et al. [7]. It is a statistical method of texture feature analysis in an image. GLCM analyzes texture feature by computing how frequently pixel pairs with specific values and in a specified spatial relationship occur in an image. The occurrence of pixels is in a particular distance and direction. GLCM expresses certain important properties about the spatial relationship of gray level intensity values and their distribution in an image. It also helps in getting information about structural arrangement of pixels. This helps in analyzing texture feature and helps in determining how the pixel pair values are distributed. It also helps in understanding certain important textural properties of any surface such as coarseness, smoothness, roughness etc.

The concept of GLCM has been explained with the help of an example in Fig. 4. Figure 4(a) shows the original matrix and Fig. 4(b) shows GLCM constructed for original matrix. In Fig. 4(b), the topmost row and leftmost column represent pixel values that appear in original matrix. The entries in GLCM represent the number of times pixel pairs appear in original matrix. For example, the pixel pair (2, 1) appears three times in original matrix which is shown in GLCM.

Fig. 4
figure 4

(a) Original matrix (b) GLCM for original matrix

4 The proposed method

The proposed method consists of the following steps-

  1. 1.

    Computation of multiscale LBP of gray scale images using the proposed scheme.

  2. 2.

    Construction of GLCM of multiscale LBP descriptors to construct feature vector.

  3. 3.

    Similarity measure.

The schematic diagram of the proposed method is shown in Fig. 5.

Fig. 5
figure 5

Schematic diagram of the proposed method

4.1 Computation of multiscale LBP using the proposed scheme

The first part of the proposed method computes multiscale LBP of grayscale image using different combinations of eight neighbourhood pixels for 5 × 5 and 7 × 7 scale size. The proposed multiscale scheme not only considers boundary neighbourhood pixels but also inner pixels for constructing feature vector. In case of 5 × 5 scale, three LBP matrices are generated and in case of 7 × 7 scale, six LBP matrices are generated. These matrices represent texture information at different scales and capture change in texture feature efficiently. While constructing feature vector for 5 × 5 scale LBP, the inner 3 × 3 scale is also considered. Similarly, for 7 × 7 scale LBP, the inner 5 × 5 scale and 3 × 3 scale are also considered. This underlines an important property that the features left undetected at one subscale get detected at another subscale.

4.2 Construction of GLCM for feature vector

GLCM provides information about how frequently pixel pairs holding specific values and in a specified spatial relationship occur in an image. This provides information about spatial distribution of pixels which other features such as histogram fail to provide. Histogram provides information about frequency of pixels. It fails to provide information about co-occurrence of intensity value pairs. This does not give local information about intensity pairs. GLCM overcomes this limitation by providing information about frequency of co-occurrence of pixel pairs. This helps in understanding structural arrangement of pixels. GLCM of the proposed scheme of multiscale LBP is constructed for feature vector. GLCM for each combination of eight neighbourhood pixels is constructed separately which is used as feature vector for retrieval. For 5 × 5 and 7 × 7 scales, GLCM for all combinations of neighbourhood pixels is constructed separately, which are used as feature vector for retrieving visually similar images. GLCM of angle 00 with distance 1 has been taken for constructing feature vector for the proposed method. The size of GLCM has been rescaled to 8 × 8.

4.3 Similarity measurement

Similarity measurement is done to retrieve visually similar images from large database. The proposed method uses GLCM to construct feature vector. The feature vector of query image is matched with those of database images using Weighted L1 distance. Let (f Q1, f Q2, …f Qn ) be the set of feature vectors of query images and let (f DB1, f DB2, …f DBn ) be the set of feature vectors of database images. Then the similarity between query image and database image is computed using the following Weighted L1 distance formula-

$$ Similarity(S)\sum_{i=1}^n\left|\frac{f_{DBi}-{f}_{Qi}}{1+{f}_{DBi}+{f}_{Qi}}\right|,\kern0.5em i= 1, 2,\dots, n $$
(1)

In this paper, Weighted L1 distance metric has been used for similarity measurement as it gives better retrieval results as compared to Euclidean distance and Manhattan distance.

5 Experiment and results

To perform the experiment using the proposed method, images from following five benchmark datasets have been used. These datasets consist of wide variety of images and are widely used for evaluation of image retrieval methods.

Dataset 1 (Corel-1K)-

The first dataset used in the experiment is Corel-1 K dataset (http://wang.ist.psu.edu/docs/related/). It consists of 1000 images. The images in this dataset are classified into ten different categories, namely, Africans, Beaches, Buildings, Buses, Dinosaurs, Elephants, Flowers, Horses, Mountains, and Food. Each category consists of 100 images. The size of each image is either 256 × 384 or 384 × 256.

Dataset 2 (Olivia-2688)-

The second dataset used to measure the performance of the proposed method is Olivia-2688 dataset [21]. It consists of 2688 images. The images in this dataset are divided into eight categories, namely, Coast, Forest, Highway, Inside City, Mountain, Open Country, Street, and Tall Building. Each Category consists of different number of images ranging from maximum 410 to minimum 260. The size of each image is 256 × 256.

Dataset 3 (Corel-5K)-

The third dataset used in the experiment is Corel-5 K dataset (http://www.ci.gxnu.edu.cn/cbir/). It consists of 5000 images. The images in this dataset are divided into fifty categories consisting of different types of images in various categories ranging from animals, human beings to sunsets, card etc. Each category consists of 100 images. The size of each image is either 187 × 128 or 128 × 187.

Dataset 4 (Corel-10K)-

The fourth dataset used for testing the proposed method is Corel-10 K dataset (http://www.ci.gxnu.edu.cn/cbir/) which is an extension of Corel-5 K dataset. It consists of 10,000 images. The images in this dataset are divided into hundred categories consisting of wide variety of images. Each category consists of 100 images. Each image is of size 187 × 128 or 128 × 187.

Dataset 5 (GHIM-10K)-

The fifth dataset used in the experiment is GHIM-10 K dataset (http://www.ci.gxnu.edu.cn/cbir/). It consists of 10,000 images. The images in this dataset are divided into twenty categories consisting of various types of images like horses, insects, flowers etc. Each category consists of 500 images. The size of each image is either 300 × 400 or 400 × 300.

Each image of datasets Corel-1K, Olivia-2688, GHIM-10K has been rescaled to size 256 × 256 (28 × 28) and images of datasets Corel-5K and Corel-10K are rescaled to size 128 × 128 (27 × 27) to ease the computation. Sample images from each dataset are shown in Fig. 6. Each image of all datasets is taken as query image. If the retrieved images belong to the same category as that of the query image, the retrieval is considered to be successfully, otherwise the retrieval fails.

Fig. 6
figure 6

Sample images from datasets

5.1 Performance evaluation

Performance of the proposed method has been evaluated in terms of precision and recall. Precision is defined as the ratio of total number of relevant images retrieved to the total number of images retrieved. Mathematically, precision can be formulated as

$$ P=\frac{I_R}{T_R} $$
(2)

where I R denotes total number of relevant images retrieved and T R denotes total number of images retrieved.

Recall is defined as the ratio of total number of relevant images retrieved to the total number of relevant images in the database. Mathematically, recall can be formulated as

$$ R=\frac{I_R}{C_R} $$
(3)

where I R denotes total number of relevant images retrieved and C R denotes total number of relevant images in the database. In this experiment, T R  = 10 and the value of C R varies according to the number of images in each category of dataset. For Corel-1K, Corel-5K, and Corel-10K datasets, C R  = 100. For GHIM-10K dataset, C R  = 500 and for Olivia-2688 dataset the value of C R depends on total number of images in each category of dataset.

5.2 Retrieval results

For experimentation purpose, images of datasets Corel-1K, Olivia-2688, and GHIM-10K have been rescaled to size 256 × 256 (28 × 28) and Corel-5K and Corel-10K are rescaled to size 128 × 128 (27 × 27) to ease the computation. Multiscale LBP codes of 2-D grayscale images using the proposed scheme for scale size 5 × 5 and 7 × 7 are computed. Different combinations of neighbourhood pixels produce different LBP codes which are stored in separate matrices. GLCM for each of these matrices is computed to construct feature vector for retrieval.

The proposed scheme produces different number of LBP matrices for different scale size. In case of 5 × 5 scale size, the proposed scheme produces three LBP matrices and in case of 7 × 7 scale size, it produces six LBP matrices. GLCM of each of these matrices is constructed separately to form feature vector. Similarity measurement for each of these feature vectors is done separately. The similarity measurement of feature vectors produces different sets of similar images. For example, in case of 5 × 5 scale, three sets of similar images are obtained as three matrices of LBP are generated. Union of these sets of similar images is taken to produce final set of similar images. Recall is computed by counting total number of relevant images in the final set. Similarly, for precision, top n matches for each set is counted and then union operation is applied on these sets to produce final image set. Mathematically, this can be stated as follows. For r x r scale size, let f 1 , f 2 ,  .  …  , f m be the sets of similar images obtained from different feature vectors. Then the final set of similar images denoted by f RS is given as

$$ {f}_{RS}={f}_1\cup {f}_2\cup .\dots \cup {f}_m $$
(4)

Similarly, let \( {f}_1^n,{f}_2^n,.\dots {f}_m^n \) be the sets of top n matches obtained from different feature vectors. Then the final set of top n images denoted by \( {f}_{PS}^n \) is given as

$$ {f}_{PS}^n={f}_1^n\cup {f}_2^n\cup .\dots \cup {f}_m^n $$
(5)

Retrieval is considered to be good if the values of precision and recall are high.

Table 1 shows average precision and recall values for different scale size on all five datasets (Corel-1K, Olivia-2688, Corel-5K, Corel-10K, and GHIM-10K) using three similarity metrics (Euclidean distance, Manhattan distance, and Weighted L1 distance). The bold values show the best result among all other distance measures. From Table 1 it can be observed that the overall values of average precision and recall on all five datasets obtained from Weighted L1 distance is better than those obtained from Euclidean distance and Manhattan distance. Since the values of average precision and recall for 7 × 7 scale has been used to compare the performance of the proposed method with other state-of-the-art CBIR techniques, we choose the values obtained from Weighted L1 distance for the proposed method which produces better results for 7 × 7 scale as compared to other distance measures. Figure 7 shows precision vs. dataset plot for all five datasets. Figure 8 shows recall vs. dataset plot for all five datasets. Figure 9 shows precision vs. recall plot for the proposed method on all five datasets.

Table 1 Average recall and precision values of the proposed method for different scales on different datasets using different distance measures
Fig. 7
figure 7

Average precision vs. dataset for the proposed method

Fig. 8
figure 8

Average recall vs. dataset for the proposed method

Fig. 9
figure 9

Precision vs. recall plot for the proposed method on (a) Corel-1 K and Olivia-2688 (b) Corel-5 K and GHIM-10 K datasets (c) Corel-10 K

From Table 1, Figs. 7, 8 and 9, it is observed that the average values of precision and recall increase with increase in size of scale. As the scale increases, large scale structures that may be dominant features of image are captured efficiently. Through various combinations of eight neighbourhood pixels, the proposed multiscale LBP scheme is able to capture changes in texture features more efficiently. It not only considers boundary pixels but inner pixels of a scale are also considered. Hence, it is able to gather those details which are left undetected in previous scale. All these properties lead to increase in retrieval accuracy.

5.3 Performance comparison of the proposed multiscale LBP scheme with other multiscale LBP techniques

In order to test the effectiveness of the proposed multiscale LBP scheme, its performance has been measured with other multiscale LBP schemes. Here, for comparing the performance of the proposed technique with other multiscale techniques, feature vector construction in all other multiscale techniques has also been done in the same manner, through GLCM, as in case of the proposed technique.

The first multiscale LBP technique which has been compared with the proposed scheme is the one proposed by Ojala et al. [20]. This technique considers only boundary pixels in each scale and generates sixteen bit binary code. This technique is easy to design and captures change in texture feature through boundary pixel values. However, in case of 5 × 5 scale, it fails to consider the inner 3 × 3 scale which results in the loss of useful information. Hence, it misses considerable detail due to which there is not much change in retrieval accuracy in both scales (5 × 5 and 7 × 7). The proposed multiscale LBP scheme considers all details inside a scale and constructs much efficient feature vector than the technique proposed by Ojala et al. [20]. Therefore, by using the proposed technique, value of precision and recall changes a lot with the change in scales. This fact can be observed in Table 2, Figs. 10 and 11.

Table 2 Performance comparison of the proposed method (PM) with other multiscale LBP techniques
Fig. 10
figure 10

Performance comparison of the proposed method with other multiscale techniques for 5 × 5 scale in terms of (a) Precision (b) Recall on all five datasets

Fig. 11
figure 11

Performance comparison of the proposed method with other multiscale techniques for 7 × 7 scale in terms of (a) Precision (b) Recall on all five datasets

The second multiscale LBP scheme used for comparing with the proposed technique is the one proposed by Zhu et al. [40]. For 5 × 5 scale, this technique combines two schemes. For scale 7 × 7, this technique uses the same scheme proposed by Ojala et al. [20]. Hence the retrieval results achieved by this technique are same as that of Ojala et al. [20] for 7 × 7 scale. This technique combines two multiscale schemes for 5 × 5 scale and improves retrieval result as compared to Ojala et al. [20]. However, this technique too suffers from same drawbacks as the technique proposed by Ojala et al. [20]. The proposed multiscale scheme produces much higher retrieval accuracy than Zhu et al. [40] which can be observed in Table 2, Figs. 10 and 11. The bold values in Table 2 highlight the best results and highlight precision and recall values of the proposed method.

The proposed method considers different combinations of eight neighbourhood pixels at multiple scales and covers not only boundary pixels but also inner scales which are not covered by other multiscale techniques. It also captures dominant texture features efficiently and is robust against change in texture feature in an image. Hence, the proposed multiscale LBP scheme outperforms other multiscale LBP techniques. Table 3 analyzes performance comparison of the proposed method over other multiscale LBP techniques in terms of precision and recall.

Table 3 Analysis of performance comparison of the proposed method over other multiscale LBP techniques in terms of precision and recall

5.4 Performance comparison of the proposed method with other state-of-the-art methods

The proposed method has been compared in terms of precision and recall with other state-of-the-art image retrieval methods.

The first method compared with the proposed method is Srivastava et al. [25]. This method divides image into blocks of different size and computes moments of each block. This method attempts to capture shape feature for retrieval. However, this method fails to capture varying level of details, as single feature is not sufficient for capturing complex details. Although this method attempts to divide image into blocks of different size for constructing feature vector, it does not exploit multiscale aspect of the method to construct efficient feature vector. Hence it produces low retrieval accuracy. The proposed method gathers varying level of details at different scales and efficiently constructs feature vector through GLCM. It exploits multiscale aspect of LBP and thereby constructs more efficient feature vector than Srivastava et al. [25]. Hence, the proposed method outperforms Srivastava et al. [25] as shown in Tables 4, 5 and Fig. 12.

Table 4 Performance comparison of the proposed method (PM) with other state-of-the-art CBIR methods in terms of Precision (%)
Table 5 Performance comparison of the proposed method (PM) with other state-of-the-art CBIR methods in terms of Recall (%)
Fig. 12
figure 12

Performance comparison of the proposed method (PM) with other methods in terms of (a) Precision (b) Recall on five datasets

The second method used for comparison with the proposed method is Srivastava et al. [26]. This method divides image into blocks and computes Local Ternary Pattern (LTP) codes to extract local features followed by computation of moments of resulting LTP codes. This approach combines multiple features for constructing feature vector and performs better than Srivastava et al. [25]. However, this method uses single resolution of image for retrieval and fails to capture varying level of details. This technique works better for small datasets but fails to produce high retrieval accuracy in case of large datasets. The proposed method exploits multiscale aspect of LBP and efficiently captures changes in texture feature. Also, being a multiscale processing technique, it is able to capture varying level of details. Hence the proposed method performs much better than Srivastava et al. [26] as demonstrated in Tables 4, 5 and Fig. 12.

The third method compared with the proposed method is Srivastava et al. [27]. This method combines wavelet based multiresolution technique along with shape feature moments. This method considers multiple resolutions of image for constructing feature vector. However, this method considers only global feature at multiple resolutions of image and works better for small datasets. For large datasets consisting of wide variety of images, this method fails to produce high retrieval accuracy. The proposed method considers local feature at multiple scales of LBP, and captures varying level of details and changes in texture feature efficiently. Also, the proposed method produces high retrieval accuracy for both small as well as large datasets which Srivastava et al. [27] fails to do. This fact can be observed from Tables 4, 5 and Fig. 12.

The fourth method compared with the proposed method is Microstructure Descriptor (MSD) [13, 14, 38]. This is an efficient image retrieval technique which computes edge orientation similarity and underlying colours for constructing feature vector. This method efficiently captures local features for image retrieval. However, it considers only 3 × 3 scale for computing microstructures and hence fails to capture varying level of details since it uses single scale of MSD for constructing feature vector. The proposed method considers multiple scales of LBP thereby capturing changes in texture feature efficiently. Hence, the proposed method outperforms MSD [13, 14, 38] as shown in Tables 4, 5 and Fig. 12.

The fifth method compared with the proposed method is Xia et al. [32]. This method proposes multiscale LBP in the form of a new pattern named Multiscale Local Spatial Binary Pattern (MLSBP). This technique computes local pattern in four directions. The computed pattern has been named as Local Spatial Binary Pattern (LSBP) and computed at multiple scales of image. We have considered MLSBP for comparison by computing LSBP in different directions. This method incorporates directional detail along with local pattern and produces better retrieval results as compared to other local patterns. However, this technique fails to efficiently exploit local feature at multiple scales since the size of each LSBP scale remains 3 × 3 and does not capture change in texture feature efficiently. The proposed multiscale LBP technique efficiently exploits local feature at multiple scales and captures change in texture feature. Therefore, the proposed method outperforms Xia et al. [32] in terms of retrieval accuracy as shown in Tables 4, 5 and Fig. 12. The bold values in Tables 4 and 5 highlight the best results and highlight precision and recall values of the proposed method.

The proposed method constructs feature vector at multiple scales thereby considering varying level of details and works best both on small as well as large datasets as compared to other state-of-the-art image retrieval methods. Table 6 analyzes performance comparison of the proposed method over other multiscale LBP techniques in terms of precision and recall.

Table 6 Analysis of performance comparison of the proposed method over other state-of-the-art techniques in terms of precision and recall

6 Conclusion

This paper proposed a technique for Content-Based Image Retrieval (CBIR) through multiscale Local Binary Pattern (LBP) scheme. LBP codes using 3 × 3, 5 × 5, and 7 × 7 scales were computed using different combinations of eight neighbourhood pixels. The construction of feature vector was done through Gray-Level Co-occurrence Matrix (GLCM). The feature vector of query image was matched with those of database images to retrieve visually similar images. In a nutshell, the proposed method has following advantages:

  1. 1.

    It efficiently captures large scale dominant features of some textures that are not captured by single scale LBP.

  2. 2.

    It efficiently captures changes in texture feature at multiple scales.

  3. 3.

    Features left undetected in one subscale get detected in another subscale.

  4. 4.

    It is robust against local changes in texture.

Performance of the proposed method was measured in terms of precision and recall. The proposed method outperformed other multiscale LBP techniques and some of the other state-of-the-art CBIR techniques as demonstrated through experimental results. The proposed method does not consider directional details for constructing feature vector. This can be achieved by computing LBP codes at different angles which is going to be our future work. Also, this paper mainly focuses on gathering local information for image retrieval. In future, we will combine global information such as image moments with multiscale LBP in order to exploit combination of local and global features for image retrieval.