1 Introduction

1.1 Motivation

Due to advancement in technology nowadays, in many areas like academic, crime-prevention, medicine, entertainment, etc., the collection of digital images is increasing to a great extent. The collection mainly consists of existing diagrams, photographs, paintings, drawings, prints, etc. To access appropriate information, we need to retrieve any particular type of images from these large databases. However, this is an unrealistic problem and cannot be solved manually. To solve this problem, different types of image retrieval methods have been arrived. Text-based image retrieval (TBIR) is one such traditional method which retrieves images based on correlated text and metadata. However, there are some limitations in case of TBIR such as lack of appropriate metadata associated with query image, incorrect metadata and limitation of characters in the keywords describing the image. It is also not possible to manually add metadata to a large collection of images, as it will be time consuming. In content-based image retrieval (CBIR), the perceivable contents of the image (feature), i.e., texture, colour, shape, fingerprint, face, etc., are used to characterize and index the image database. Hence, feature extraction becomes the most significant step of any CBIR system. Thus, CBIR removes all such problems of TBIR. For this reason nowadays, CBIR has become more acceptable and standard image retrieval technique.

Sometimes, combination of more than one feature improves the performance of the overall retrieval system outstandingly. In this work, we have merged colour and texture features to obtain more appropriate retrieval results. In most of the previous works, the researchers have applied the local pattern directly on the given input image to extract the texture feature. Since, discrete wavelet transform helps to enhance the directional information of an image. For this reason, we have collected the texture feature of the local pattern from the wavelet domain of the given input image. Then, the obtained texture feature is concatenated with the colour feature of the original image to extract the final feature vector. Also, feature vector length of the method is comparatively less that reduces the time complexity of the overall retrieval system. From the experimental section, it is easily visible that the feature vector length of the newly proposed pattern is lesser than most of the previously existing patterns along with the retrieval performance of the suggested method is superior than most of the other existing methods.

1.2 Related work

Previous work in CBIR is mainly based on texture, colour and shape features. These features are called generic features. The concept of wavelet, which means small wave, was first initiated by Jean Morlet, a French geophysicist in 1982. Discrete wavelet transform (DWT) applies a low pass filter and a high pass filter in both horizontal and vertical directions to decompose the input image into four components. Nowadays, the use of wavelet transform in different fields like signal processing, image processing, computer graphics, pattern recognition, data compression and its advantages over Fourier transform were described by Sifuzzaman et al. [1]. Wavelet transform has been used extensively for the description of image texture. A wavelet domain technique has been used for texture analysis and test of repeating patterns. The texture quality is measured along the most important perceptual dimensions [2]. To collect colour and texture features from an image, wavelet transform was used [3]. Daubechies wavelet transform has been used for image indexing and searching [4]. Here, wavelet coefficients in the lowest few frequency bands and their variances have been used to construct the feature vectors. The concept of wavelet correlogram in CBIR has been first introduced in [5]. DWT extracts information from an image only in three directions (horizontal, vertical and diagonal). To remove this directional limitation, Gabor wavelet feature has been used for texture analysis in [6]. Ahmadian et al. have used Gabor wavelet transform for texture classification in [7]. Gabor wavelet correlogram, an extension of [5], was introduced as a rotation invariant feature using Gabor wavelet in CBIR [8]. Kokare et al. have extracted the texture feature using a new set of 2D rotated wavelet filters (RWF) and DWT jointly in CBIR [9]. Dual-tree-complex wavelet transform (DT-CWT) has been proposed as a special case of Gabor filters [10]. Hence, it requires less computation and at the same time it has the directional advantages. It extracts features in six directions. A set of dual-tree-rotated complex wavelet filter (DT-RCWF) and DT-CWT have been used together in [11]. Here, texture features are extracted in 12 different directions.

To extract texture feature, local binary pattern (LBP) was first proposed by Ojala et al. [12]. In LBP, the spatial relationship between the referenced pixel and its local neighbours is represented by 8 bit binary string. For texture retrieval and classification, rotation invariant and uniform version of LBP has been introduced [13]. Various extensions of LBP, e.g., block-based local binary pattern (BLK LBP) [14], center symmetric LBP (CS-LBP) [15], dominant LBP (DLBP) [16], completed LBP (CLBP) [17], etc., were proposed for texture classification. The limitation of traditional LBP method is that its circular sampling region does not describe the anisotropic features. To solve the problem, a multi-structure local binary pattern (Ms-LBP) operator has been proposed, where the shape of sampling region is changed to obtain an extended LBP operator for texture classification [18]. Houam et al. have used one-dimensional local binary pattern for bone texture characterization [19]. In LBP, it thresholds exactly at the value of center pixel. For this reason, it is very much sensitive to noise. Local ternary pattern (LTP) removes such limitation of LBP using a threshold interval when it compares the value of center pixel with its local neighbours [20]. It was more resistant to noise. For face recognition, Zhang et al. introduced local derivative patterns (LDPs), where the LBP is described as non-directional first-order local pattern and computed from the first-order derivatives of the image [21]. They have described LDP as higher-order LBP. Local edge pattern (LEP), another extension of LBP, is obtained from an edge image [22]. It had two versions, local edge pattern for image segmentation (LEPSEG) and local edge pattern for image retrieval (LEPINV) for different applications. Local tetra patterns (LTrPs) represented the spatial structure of image texture in horizontal and vertical direction along with the magnitude patterns [23]. Directional local extrema pattern (DLEP) extracted texture feature of image using the local extrema of the referenced pixel in four directions [24]. Peak valley edge pattern (PVEP) has been obtained by calculating the first-order derivatives in four directions [25]. It was a combination of two binary patterns, peak edge pattern (PEP) and valley edge pattern (VEP). For biomedical image indexing and retrieval, local mesh pattern (LMeP) was proposed which describes the relationship among the neighbours for a given referenced pixel in an image [26]. Local mesh peak valley edge pattern (LMePVEP), an extension of LMeP, was used to describe the relations among the neighbours which are obtained by performing the first-order derivative of the image [27]. Murala et al. proposed local maximum edge binary pattern (LMEBP) in the domain of CBIR and object tracking [28] where the maximum edge information of the given input image was extracted to obtain the texture feature. Dynamic texture (DT), also known as texture with motion, is a texture which is collected in temporal domain. To recognize DT from facial expressions, Zhao et al. proposed volume local binary patterns (VLBP) by integrating the appearance and motion together [29]. To reduce the computational complexity of this pattern, they have only considered the co-occurrences of LBP on three orthogonal planes (LBP-TOP).

Wang et al. combined the properties of Haar wavelet transform and uniform local binary pattern (ULBP) for ear recognition [30]. At first, Haar wavelet transform decomposes the ear images. Then, texture features are collected from the transformed images using combination of ULBP, block-based and multi-resolution methods. A face recognition technique is proposed in [31] by merging the features of DWT and adaptive local binary pattern (ALBP). Here, DWT helps to improve the common features of the facial images. A combination of DWT and local extrema pattern is proposed in CBIR [32]. First, two-level decomposition of DWT is applied on images. After that, feature vectors are extracted by applying local extrema pattern on wavelet coefficients. DWT increases the efficiency of intensity information of the texture units.

Extraction of colour feature from a digital image is a widely used technique in CBIR. Swain and Ballard have introduced the concept of colour histogram [33]. They have also proposed a new distance measure, the histogram intersection, to compute the similarity between histograms of the images. Schaefer et al. first used the top three central moments (mean, standard deviation and skewness) of each colour in the image in CBIR [34]. Colour histogram (CH) is computationally simple and efficient. However, it does not provide any spatial information. To remove this limitation of CH, colour coherence vector (CCV) was proposed [35]. It classifies pixels of each distinct colour in the image in two parts. Pixels that belong to some large constant coloured region were called coherent pixels and others were called incoherent pixels. One more colour feature, called colour correlogram, was first introduced by Huang et al. [36]. It contained spatial correlation among different colours in the image.

To improve the efficiency of CBIR system, combination of more than one feature can be used to extract the feature vector. A perfect combination of colour and texture feature was proposed for image retrieval where these colour and texture features were collected from multi-resolution wavelet domain [37]. Colour feature was obtained by constructing the colour auto-correlograms of the hue and saturation component of the image in HSV colour space. For texture feature, moments [block difference of inverse probabilities (BDIP) and block variation of local correlation coefficients (BVLC)] of the images were collected. Murala et al. proposed another image retrieval method that combines colour and texture features by applying Gabor wavelet transform and colour histogram (GWT + CH) on the images [38]. A new object tracking algorithm was introduced by Murala et al. using the properties of RGB colour histogram and local extrema patterns jointly [39].

To strengthen the effectiveness of different innumerable tasks, the higher dimensional data should be represented more appropriately. These high-dimensional data are represented by a large number of features, most of them being noisy and redundant. Because of the over fitting of high-dimensional data, the effectiveness of the traditional system is reduced. To overcome this problem, Li et al. proposed a innovative unsupervised feature selection algorithm, clustering-guided sparse structural learning (CGSSL) by combining cluster analysis and sparse structural analysis into a unified framework [40]. A new Robust Structured Subspace Learning (RSSL) algorithm was proposed to represent the data properly by combining feature learning and image understanding into a single framework [41]. A new semi-supervised learning system, Multi-correlation Learning to Rank (MLRank) framework was introduced for image interpretation [42]. In this scheme, semantic relevance among the tags and visual similarity among the images are used to rank the pertinence of the tags to the corresponding images.

1.3 Main contributions

In this paper, we have introduced a new image retrieval method that uses the texture features which are obtained from the wavelet coefficients of the image in combination with colour histogram. For texture feature, the new method combines the multi-resolution properties of DWT with the properties of local pattern. To improve the accuracy of the proposed method, we have merged the colour feature with texture feature.The main benefactions of the proposed work are outlined in the following manner:

  1. 1.

    A novel method, called local extrema peak valley pattern (LEPVP), has been proposed in this work. Texture features have been extracted from DWT domain of the given input image using LEPVP. Colour features have been collected by applying RGB colour histogram to the original image. Then, these two features are concatenated for image indexing and retrieval.

  2. 2.

    The capability of the presented method has been examined on three coloured natural image databases (Corel-1k, Corel-5k and Corel-10k) and two coloured texture image databases (STex, MIT VisTex) and compared with other existing methods.

The paper is assembled as follows: a short introduction which covers the motivation of the proposed work, related works and main contributions of the paper, is described in Sect. 1. A condensed view of different local patterns (the LEP and the LEPVP) is given in Sect. 2. The proposed method and its advantages over other existing methods are explained in Sect. 3. Section 4 presents query matching and similarity measurement. The proposed system framework is discussed in Sect. 5. Section 6 illustrates the experimental results and discussions. At the end, conclusions are explained in Sect. 7.

2 Local patterns

2.1 Local extrema pattern (LEP)

Murala et al. proposed LEP [39] which is a modified version of LBP [12]. The LBP operator was suggested by Ojala et al. [12] for texture classification. In LBP, the spatial relationship between each center pixel of a given input image and its local neighbours is represented by a binary string. In LEP, texture feature of a given image is extracted by computing the local extremas of the center pixel Z c along 0°, 45°, 90°, 135° direction for each 3 × 3 pattern in the image. To obtain the local extremes, the gray value of center pixel, Z c is compared with its neighbour pixel values Z j as follows:

$$Z_{j}^{{\prime }} = Z_{j} - Z_{\text{c}} ;\quad {\text{for}} {\kern 1pt}\; j = 1, \ldots ,8$$
(1)

Now, the local extremas are computed using Eq. 2:

$$Z_{\text{c}}^{{\prime }} \left( \beta \right) = B_{1} \left( {Z_{l}^{{\prime }} , Z_{{\left( {l + 4} \right)}}^{{\prime }} } \right) ;\quad{\text{for }} l = \left( {1\; + \;\frac{\beta }{45}} \right) {\text{and}} \; \forall \beta = 0{^\circ },45{^\circ },90{^\circ },135{^\circ }$$
(2)
$$B_{1} \left( {x,y} \right) = \left\{ {\begin{array}{*{20}c} {1 \quad x \times y \ge 0} \\ {0 \quad {\text{otherwise}}} \\ \end{array} } \right.$$
(3)

Now, the LEP at the center pixel Z c is obtained using the following equation:

$${\text{LEP}}\left( {Z_{\text{c}} } \right) = \mathop \sum \limits_{\beta } 2^{{(\frac{\beta }{45})}} \times Z_{\text{c}}^{{\prime }} \left( \beta \right) ;\quad \forall \beta = 0{^\circ },45{^\circ },90{^\circ },135{^\circ }$$
(4)

Finally, the input image is transformed to LEP map having values from 0 to 15. The figure representation of LEP is shown in Fig. 1. After calculation of LEP for each pixel in input image, the image is illustrated by constructing a histogram as follows:

$$H\left( k \right)|_{\text{LEP}} = \mathop \sum \limits_{m = 1}^{M} \mathop \sum \limits_{n = 1}^{N} B_{2} ({\text{LEP(}}m,n),k) ;\quad k \in \left[ {0,15} \right]$$
(5)
$$B_{2} \left( {x,y} \right) = \left\{ {\begin{array}{*{20}c} {1\quad x = y} \\ {0 \quad {\text{else}}} \\ \end{array} } \right.$$
(6)

where M × N is the size of the image.

Fig. 1
figure 1

Calculation of LEP

2.2 Local extrema peak valley pattern (LEPVP)

In this paper, we have introduced a new local pattern, local extrema peak valley pattern to collect the texture feature from the input image. It is an extended version of LEP. It draws out the directional information of all 3 × 3 patterns of the given input image by calculating the local differences between the center pixel and its local neighbours (defined in Eq. 1). After that, the local extremas along 0°, 45°, 90°, 135° directions are obtained as follows:

$$Z_{c}^{{\prime }} \left( \beta \right) = B_{3} \left( {Z_{l}^{{\prime }} , Z_{{\left( {l + 4} \right)}}^{{\prime }} } \right) ;\;{\text{for}}\;l = \left( {1 + \frac{\beta }{45}} \right) \quad {\text{and}} \quad\forall \beta = 0{^\circ },45{^\circ },90{^\circ },135{^\circ }$$
(7)
$$B_{3} \left( {x,y} \right) = \left\{ {\begin{array}{*{20}l} {1 \quad x \ge 0 \;{\text{and}} \;y \ge 0} \\ {2 \quad x < 0 \;{\text{and}} \;y < 0} \\ {0\quad {\text{otherwise}}} \\ \end{array} } \right.$$
(8)

In LEPVP, the function B 1 in Eq. 2 is replaced by the function B 3. Now, the LEPVP at the center pixel Z c is obtained using Eq. 9.

$$LEPVP\left( {Z_{\text{c}} } \right) = \left\{ {Z_{\text{c}}^{{\prime }} \left( {0{^\circ }} \right);Z_{\text{c}}^{{\prime }} \left( {45{^\circ }} \right);Z_{\text{c}}^{{\prime }} \left( {90{^\circ }} \right);Z_{\text{c}}^{{\prime }} \left( {135{^\circ }} \right)} \right\}$$
(9)

LEPVP is a three-valued code containing 0, 1 and 2. It is further split into two binary patterns, local extrema peak pattern (LEPP) and local extrema valley pattern (LEVP). After the computation of LEPP and LEVP for each pixel, the whole input image is converted to LEPP and LEVP images having values ranging from 0 to 15. Separate histograms are constructed for both the images using Eq. 5. Finally, these two histograms are concatenated to obtain the feature vector.

The LEPVP computation for a center pixel on a given 3 × 3 pattern is shown in Fig. 2. Initially, the local differences between the center pixel (marked with red colour) and its eight neighbours are computed to get different directions as shown in Fig. 2. Here, the arrowhead points toward the center pixel when the gray value of the center pixel is greater than the gray value of the neighbour pixel (when the local difference is negative). Otherwise, the arrowhead points toward outside (when the local difference is positive). These directions are used to calculate the local extremas along 0°, 45°, 90°, 135° directions. Finally, the ternary LEPVP pattern is obtained which is further split into LEPP and LEVP patterns. LEPP is obtained by retaining 0 and replacing 1 by 0 and 2 by 1. LEVP is obtained by retaining 0 and 1 and replacing 2 by 0 (shown in Fig. 2).

Fig. 2
figure 2

Calculation of LEPVP

When both the directions proceed towards the center then it is termed as peak pattern and this pattern is coded with bit-value 2 (as shown in Fig. 3). When both the directions go away from the center then it is called valley pattern, and coded with bit value 1. In other two cases (shown in Fig. 3), it is coded with 0.

Fig. 3
figure 3

Calculation of LEPVP pattern bits using the pixel directions

Figure 4 shows an example of getting LEPVP pattern bits for a given 3 × 3 pattern. Here, the gray value of the center pixel is 15. In 0° and 45° direction, one arrowhead is pointing towards the center and the other arrowhead is pointing towards the outside. That is why these two local patterns are coded with 0 (described in Fig. 3). In 90° direction, both the arrowheads are pointing towards the center pixel. This pattern is called peak pattern and the corresponding pattern bit is 2. In 135° direction, both the arrowheads are exiting from the center pixel. Hence, it is a valley pattern which is coded with 1. Finally, the obtained LEPVP pattern is 0 0 2 1 which is further split into LEPP pattern 0 0 1 0 and LEVP pattern 0 0 0 1.

Fig. 4
figure 4

Example to obtain LEPVP pattern bits

3 Proposed method and its advantages

3.1 Proposed method

In the proposed methodology, feature vector is collected using concatenation of colour and texture features of the given query image. To obtain texture feature, combination of LEPVP operator and discrete wavelet transform is applied on the input image. To obtain colour feature, RGB colour histogram of the original image is constructed.

3.1.1 Texture feature

Discrete wavelet transform (DWT) is an extension of Fourier transform. In DWT, two filters, a low-pass filter (L) and a high-pass filter (H) are applied on the image both vertically and horizontally to extract the feature vector. Wavelet transform decreases the computational complexity of the retrieval system. Since, it decomposes the original image with high resolution into low-resolution images or sub-images, DWT decomposes the input image into four sub-images: approximation coefficient (LL), horizontal coefficient (LH), vertical coefficient (HL) and diagonal coefficient (HH). Approximation coefficient of one level is used for next level decomposition. The approximation coefficient (LL) of DWT contains low-frequency information of the image and other three coefficients (LH, HL, HH) carry most of the high-frequency information of the image.

In the proposed method, two-level discrete wavelet transform with Daubechies-4 wavelet filters are applied on the given input image to extract the wavelet coefficients. Thus, total seven sub-images are obtained as shown in Fig. 5. Then LEPVP operator is applied to each sub-image to collect the texture feature. Finally, seven LEPVP patterns are obtained which contain both textural and directional information of the given input image. Then each LEPVP pattern is coded into one LEPP pattern and one LEVP pattern. Hence, total 14 (7 × 2) binary patterns are obtained. Histogram is constructed for each pattern. Since maximum value of each pattern is 15, length of each histogram will be 16. Final texture feature is obtained by concatenating all 14 histograms. So the length of the resultant histogram is 224 (16 × 14) which is less as compared to other existing methods, e.g., LBP [12], LMEBP (Local maximum edge binary pattern) [28], DLEP [24], etc.

Fig. 5
figure 5

First-level decomposition and second-level decomposition of DWT

3.1.2 Colour feature

In CBIR, colour histogram is a very frequently used and popular colour feature. It is computationally simple and efficient. In the presented method, colour features are obtained by applying colour histogram to the RGB colour channel of the given image. In the presented method, eight quantized bins for each colour channel are used. Therefore, the length of the colour feature vector is 24 (8 + 8 + 8). For each image, 24 colour features are extracted.

3.1.3 Combined feature

In the proposed method, we have tried to collect more information from the input image using combination of colour and texture features. Since colour and texture are the primary features of the image. To get the combined feature, resultant texture histogram is concatenated with colour histogram of the input image. Therefore, length of the combined feature vector is 248 (224 + 24).

To acquire more appropriate retrieval result, we have normalized the combined feature vector with a factor n f . The value of n f varies for different databases depending on the size of database images. In this paper, total five benchmark databases are used for experiment. In a benchmark database, size of all the images is identical. Size of each image in Corel-1k database is either 384 × 256 or 256 × 384. Images in Corel-5k and Corel-10k have resolution of either 126 × 187 or 187 × 126 which is smaller than the images in Corel-1k database. Images in STex and MIT VisTex database have size 128 × 128. Hence, normalization factor (n f) is taken 500 for Corel-1k database, 50 for Corel-5k and Corel-10k database and 10 for STex and MIT VisTex database during experiment.

3.2 Advantages of the proposed method over other existing methods

To extract the texture feature, a new local pattern (LEPVP) is introduced in this paper. In most of the previous existing local patterns (like LBP [12], LTP [20], etc.), the spatial relation between the center pixel and its local neighbours is encoded to get the pattern value. However, LEPVP draws out the spatial relation between a pair of neighbours in 0°, 45°, 90° and 135° directions for a given center pixel in a local region.

LEP [39] is a binary pattern which encodes the image with two (either “0” or “1”) specific values but, LEPVP encodes the image with three distinct values (“0”, “1” or “2”). For this reason, LEPVP is capable of obtaining more detailed spatial information from a given input image.

In most of the previous existing methods, texture feature is extracted by applying a local pattern directly on the given input image. However, in this paper, we have applied the LEPVP operator on the wavelet domain of the input image to extract the texture feature. This strengthened the retrieval performance of the obtained feature vector because DWT captures more directional information as compared to the original image.

The lengthy feature vector takes more time to extract similar images from a given database for a given query image. Hence, computationally also this method is advantageous in terms of feature vector length. The final feature vector length of the proposed method is 248 which is less as compared to most of the previous existing methods.

4 Query matching and similarity measurement

After feature extraction, the next important task in CBIR is to estimate the likeness between query feature vector and feature vector of each image in the database to select the n top-matched images. Query feature vector is represented as ‘f Q’ which is extracted from the query image Q. Similarly, the feature vector of ith image in the database is denoted by f i DB for i = 1,…,|DB|. For query matching, we have used d 1 similarity distance measure [23] which is defined as follows:

$$D\left( {Q,{\text{DB}}_{i} } \right) = \mathop \sum \limits_{k = 1}^{{L_{n} }} \left| {\frac{{f_{\text{Q}} \left( k \right) - f_{\text{DB}}^{\text{i}} (k)}}{{1 + f_{\text{Q}} \left( k \right) + f_{\text{DB}}^{\text{i}} \left( k \right)}}} \right|$$
(10)

where \(f_{Q} \left( k \right)\) and \(f_{DB}^{i} (k)\) symbolize the kth feature of the given query image and ith image in the database, respectively. The total length of the feature vector is denoted by L n . In the proposed method, the value of L n is 248.

5 Proposed system framework

Flow diagram of the conferred image retrieval system is described in Fig. 6 and algorithm of the presented work is illustrated below.

Fig. 6
figure 6

Flowchart of the proposed image retrieval system

5.1 Algorithm

Input: Query image and Image database.

Output: Retrieved images.

  1. 1.

    Upload the query image.

  2. 2.

    Transform it into gray scale image from RGB image.

  3. 3.

    Apply two level DWT with Daubechies-4 wavelet filter to the gray image which decomposes the image into 7 sub images.

  4. 4.

    Perform the LEPVP on 7 sub images which gives 7 ternary patterns.

  5. 5.

    Split each ternary pattern into two binary patterns (LEPP and LEVP). Thus, total 14 binary patterns are obtained.

  6. 6.

    Construct histogram for each binary pattern.

  7. 7.

    Construct the colour histogram of the given input image of 8 quantized bins for each colour channel.

  8. 8.

    Build the query feature vector by concatenating the histograms which are constructed in step 6 and 7.

  9. 9.

    Similarly, build feature database by applying the same technique which is described above to each image in the image database.

  10. 10.

    Calculate the similarity distance between the query feature vector and each feature vector in the feature database using Eq. 10

  11. 11.

    Sort the distance measures which are obtained in step 10 and retrieve the top-matched images as output.

6 Experimental results and discussion

To analyse the efficacy of the presented image retrieval system, total five experiments are carried out on five different benchmark databases. In the first three experiments, coloured natural image databases (Corel-1k, Corel-5k and Corel-10k) are used and in the last two experiments, coloured texture image databases (STex, MIT VisTex) are used. All these databases contain a large number of versatile images from different categories. For this reason, they are most suitable to judge the effectiveness of an image retrieval system. The results obtained from these experiments are described in the succeeding subsections.

As a query image, each image in the database is used in every experiment. For each query image, the system retrieves n images from the database which have the smallest image matching distance calculated using Eq. 10. For testing the suggested method, two conventional evaluation metrics, precision and recall have been computed in all experiments which are described below:

$$P_{j} \left( n \right) = \frac{\text{Number of relevant images retrieved}}{{{\text{Total no of images retrieved (}}n )}}$$
(11)
$$R_{j} \left( n \right) = \frac{\text{Number of relevant images retrieved}}{{{\text{Total no of relevant images in the database (}}n_{\text{cat}} )}}$$
(12)

where P j (n) and R j (n) indicate precision and recall, respectively, for using the jth database image as query image. The parameter n indicates the number of images fetched by the system. And the parameter n cat indicates the total number of relevant images in the database for the given query image, i.e., number of images in each category of the referenced database. For a given query image, relevant images are the most similar images in the database. Now, the average precision and average recall for each category in the database are computed using following two equations, respectively,

$$P_{\text{average}}^{\text{i}} \left( n \right) = \frac{1}{{n_{\text{cat}} }}\mathop \sum \limits_{j = 1}^{{n_{\text{cat}} }} P_{j} \left( n \right)$$
(13)
$$R_{\text{average}}^{\text{i}} \left( n \right) = \frac{1}{{n_{\text{cat}} }}\mathop \sum \limits_{j = 1}^{{n_{\text{cat}} }} R_{j} \left( n \right)$$
(14)

where parameter i indicates the category number. At the end, total average precision and total average recall for the complete referenced database are calculated as follows:

$$P_{\text{average}}^{\text{Total}} \left( n \right) = \frac{1}{{n_{\text{tot}} }}\mathop \sum \limits_{i = 1}^{{n_{\text{tot}} }} P_{\text{average}}^{\text{i}} \left( n \right)$$
(15)
$$R_{\text{average}}^{\text{Total}} \left( n \right) = \frac{1}{{n_{\text{tot}} }}\mathop \sum \limits_{i = 1}^{{n_{\text{tot}} }} R_{\text{average}}^{\text{i}} \left( n \right)$$
(16)

where n tot indicates the total number of categories in the database. Total average precision and total average recall are also called average precision rate and average retrieval rate, respectively.

6.1 Experiments in coloured natural image databases

Total three experiments are performed on coloured natural image databases to analyse the potency of the presented method. During experiment, Corel-1k, Corel-5k and Corel-10k databases are used as natural image databases. Average recall and average precision are computed in each database and graphs are plotted to evaluate the capability of the proposed technique. In all three experiments, the capacity of the suggested method is compared with LTP (Local Ternary Pattern) [20], LBP (local binary pattern) [12], LMEBP (Local Maximum Edge Binary Pattern) [28], LTrP (local tetra pattern) [23], DLEP (Directional Local Extrema Pattern) [24], CS-LBP (Center Symmetric Local Binary Pattern) [15], DWT and colour histogram [38], LEPSEG (Local Edge Pattern for Segmentation) [22], LEPINV (Local Edge Pattern for image retrieval) [22], multi-resolution LEP (Local Extrema Pattern) [32], joint histogram of color and LEP [39], LECoP (Local Extrema Co-occurrence Pattern) using HSV colour histogram [43] where quantization levels of hue and saturation components are 18 and 10, respectively.

Since in the proposed method, colour and texture features are combined to get the final feature vector, we have concatenated the RGB colour histogram (8 bins for each colour channel, i.e. the length of colour feature vector is 24) in LBP, LTP, LMEBP, LTrP, DLEP, CS-LBP, multi-resolution LEP, LEPSEG, LEPINV during experiment.

The time required to obtain the similar images for a given query image is called the image retrieval time of the given image. Image retrieval time depends on the feature vector length of the given image. If the feature vector length is very large then it takes more time during the computation of similarity distance between the query image and other images in the database. All the existing image retrieval methods with which the performance of the proposed method is compared are divided into two categories: methods with high-dimensional feature vector (i.e. methods whose feature vector length is relatively high like LMEBP) and methods with low-dimensional feature vector (i.e. methods whose feature vector length is relatively low like CS-LBP) to make the retrieval performance of the proposed method easy to perceive with respect to feature vector length. All methods in these two categories with their abbreviated names and feature vector lengths are described in Tables 1 and 2. The proposed method is abbreviated as PM.

Table 1 Abbreviated name and feature vector length of the methods with high-dimensional feature vector
Table 2 Abbreviated name and feature vector length of the methods with low dimensional feature vector

Proposed method feature vector length is 248. The feature vector length of all the methods, which are mentioned in Table 1, is larger than the proposed technique. The feature vector length of CS-LBP + CH, LEPINV + CH, DWT + LEP + CH, Wavelet + CH (declared in Table 2) is smaller than the suggested method. The feature vector lengths of LBP + CH and LECoP + CH (declared in Table 2) are very close to the feature vector length of the presented method. However, the accuracy of the suggested technique is superior to these methods regarding precision and recall which is clearly illustrated in the following experimental sections.

6.1.1 Experiment 1

Corel-1k [44] database is used in experiment 1. It consists of total 1000 images. These 1k images are collected from 10 (n tot) different categories where each category in the database contains total 100 (n cat) images having resolution of either 256 × 384 or 384 × 256. These categories are Africans, beaches, buildings, dinosaur, elephant, flower, buses, hills, mountains and food. In Fig. 7, some sample images from Corel-1k database are shown where from each category, two images are shown.

Fig. 7
figure 7

Corel-1k sample images (two images from each category)

Precision and recall are computed using Eqs. 1116. Here, two types of graphs are plotted. First one is the performance of the suggested technique along with other existing techniques in terms of average precision and average recall versus number of images retrieved (10, 20,…, 100) which are illustrated in Figs. 8 and 9, respectively. Second one is the category wise result of all the methods in terms of precision and recall in Corel-1k database which are shown in Fig. 10. In Table 3, the retrieval results regarding average precision and average recall of all methods are presented clearly in percentage. From Figs. 8, 9, 10 and Table 3, it is evident that the presented method is giving more desirable performance than the other methods. The accuracy of the suggested technique, in terms of average precision, has been improved from LBP + CH, LTP + CH, LTrP + CH, LMEBP + CH, DWT + LEP + CH, CS-LBP + CH, LEPSEG + CH, LEPINV + CH, DLEP + CH, Joint LEP CH, Wavelet + CH, and LECoP + CH up to 1.85, 4.75, 3.3, 9.76, 1.57, 11.79, 7.8, 11.36, 3.75, 6.11, 18.5, 3.02 %, respectively. The presented system outperforms in terms of average recall from LBP + CH, LTP + CH, LTrP + CH, LMEBP + CH, DWT + LEP + CH, CS-LBP + CH, LEPSEG + CH, LEPINV + CH, DLEP + CH, Joint LEP CH, Wavelet + CH, and LECoP + CH up to 3.64 %, 11.85, 5.01, 18.38, 2.39, 14.31, 14.26, 17.18, 1.56, 16.53, 25.34, 3.18 %, respectively.

Fig. 8
figure 8

Corel-1k database, plot of average precision versus no. of images retrieved, comparison with methods having a high-dimensional feature vector. b Low-dimensional feature vector

Fig. 9
figure 9

Corel-1k database, plot of average recall versus no. of images retrieved, comparison with methods having a high-dimensional feature vector. b Low-dimensional feature vector

Fig. 10
figure 10

Corel-1k database, graph of precision versus category number, comparison with methods having a high-dimensional feature vector, b low-dimensional feature vector and recall versus category number, comparison with methods having, c high-dimensional feature vector. d Low-dimensional feature vector

Table 3 Results of all methods in Corel-1k, Corel-5k and Corel-10k database in average precision (for n = 10) and average recall (for n = 100) in percentage

6.1.2 Experiment 2

Corel-5k database [45] is used for experiment 2 which contains total 5000 images. These images belong to 50 different categories which include images of animals, e.g. lion, bear, elephant, tiger, etc.; humans, e.g. swimmer, Africans, etc.; nature, e.g. hills, sky, etc.; foods, e.g. vegetables, fruits, drinks; paintings; vehicles; buildings; bridges, etc. Each category contains 100 images.

Precision and recall are calculated using Eqs. 1116. The effectiveness of the presented method along with other existing techniques regarding average precision and average recall can be seen from Fig. 11. Category wise retrieval results of all techniques in terms of precision and recall are explained in Fig. 12. Average precision and average recall of the suggested method with all other methods in Corel-5k database are presented in Table 3. Figures 11 and 12 and Table 3 demonstrate that the presented algorithm yields both category wise and overall superior performance than the other existing methods.

Fig. 11
figure 11

Corel-5k database, graph of average precision versus no. of images retrieved, comparison with methods having a high-dimensional feature vector, b low-dimensional feature vector and average recall versus no. of images retrieved, comparison with methods having, c high-dimensional feature vector. d Low-dimensional feature vector

Fig. 12
figure 12figure 12

Corel-5k database, graph of precision versus category number, comparison with methods having a high-dimensional feature vector, b low-dimensional feature vector and recall versus category number, comparison with methods having c high-dimensional feature vector. d Low-dimensional feature vector

The accuracy of the suggested method in terms of average precision has been increased from CS-LBP + CH, LEPINV + CH, DWT + LEP + CH, Wavelet + CH, LBP + CH, LECoP + CH, LTP + CH, LTrP + CH, LMEBP + CH, LEPSEG + CH, DLEP + CH and Joint LEP CH up to 19.19, 19.95, 4.65, 25.73, 3.10, 4.15, 6.71, 7.25, 17.63, 16.91, 18.16, 17.08 %, respectively. The recommended technique outperforms significantly regarding average recall from CS-LBP + CH, LEPINV + CH, DWT + LEP + CH, Wavelet + CH, LBP + CH, LECoP + CH, LTP + CH, LTrP + CH, LMEBP + CH, LEPSEG + CH, DLEP + CH and Joint LEP CH up to 23.77, 36.30, 5.23, 30.60, 9.38, 7.59, 23.44, 16.17, 37.17, 32.81, 38.83, 31.45 %, respectively.

6.1.3 Experiment 3

In this experiment, Corel-10k database [45] is used which is an extended version of Corel-5k database. Additional 5000 images are appended with Corel-5k database to make it more enormous and resourceful. Thus, it contains total 10,000 images from 100 different categories and 100 images per category. These domains include all categories of Corel-5k database. Furthermore, it contains images of ships, trains, aero plane, cats, bikes, fishes, birds, furniture, leaves, music instruments, army, ocean, etc. In Fig. 13, some instances of image retrieval using the presented technique on Corel-10k database are shown. In each case, the top left image is used as a query image for retrieving purpose.

Fig. 13
figure 13

Some instances of image retrieval by the presented technique on Corel-10k database

Precision and recall are obtained with the help of Eqs. 1116. Plots of average precision and average recall with number of images retrieved in Corel-10k database are shown in Fig. 14. Figure 15 explains the category wise result of the presented technique with all other methods regarding precision and recall. Table 3 clarifies the efficacy of all the methods in respect of average precision and average recall. Figures 14 and 15 and Table 3 exhibit that the accuracy of the presented method is more appropriate than the other existing methods.

Fig. 14
figure 14

Corel-10k database, graph of average precision versus no. of images retrieved, comparison with methods having, a high-dimensional feature vector, b low-dimensional feature vector and average recall versus no. of images retrieved, comparison with methods having, c high-dimensional feature vector. d Low-dimensional feature vector

Fig. 15
figure 15figure 15

Corel-10k database, graph of precision versus category number, comparison with methods having, a high-dimensional feature vector, b low-dimensional feature vector and recall versus category number, comparison with methods having, c high-dimensional feature vector. d Low-dimensional feature vector

The proposed method showing 22.66, 21.51, 5.6, 29.39, 2.97, 4.31, 6.07, 6.80, 18.18, 17.92, 19.54, 18.36 % better performance in terms of average precision from CS-LBP + CH, LEPINV + CH, DWT + LEP + CH, Wavelet + CH, LBP + CH, LECoP + CH, LTP + CH, LTrP + CH, LMEBP + CH, LEPSEG + CH, DLEP + CH and Joint LEP CH, respectively. The accuracy of the presented technique exceeds notably regarding average recall from CS-LBP + CH, LEPINV + CH, DWT + LEP + CH, Wavelet + CH, LBP + CH, LECoP + CH, LTP + CH, LTrP + CH, LMEBP + CH, LEPSEG + CH, DLEP + CH and Joint LEP CH up to 26.91, 33.68, 5.67, 38.54, 7.89, 6.65, 18.59, 13.23, 36.49, 29.65, 36.49, 32.80 %, respectively.

6.2 Experiment in coloured texture image databases

In this work, two experiments are performed on the STex and MIT VisTex database which contain coloured texture images from different domains to judge the capability of the proposed method. These databases meet the necessity of images which contain both colour and texture features. Two types of evaluation measures: precision and recall are performed for each database and obtained results are illustrated using graphs and tables. The retrieval result of the suggested system is compared with that of DLEP [24] +CH, CS-LBP [15] +CH, DWT + LEP [32] +CH, LEPSEG [22] +CH, LEPINV [22] +CH, Joint LEP CH [39] and Wavelet + CH [38].

6.2.1 Experiment 1

The Salzburg Texture Image Database (STex) [46] is used for experiment 1 which is a large collection of coloured image textures. It contained total 476 image textures. For experimental purpose, each texture is split into 16 non-overlapping sub-images. Hence, it creates a huge database of total 7616 coloured texture images belong to 476 different categories. In Fig. 16, some instances of image retrieval using the proposed technique on STex database are shown. In each instance, the top left image is used as a query image for retrieving purpose.

Fig. 16
figure 16

Some instances of image retrieval by the suggested method on STex database

Using Eqs. 11–16, precision and recall are computed during experiment. The plot of average precision and average recall versus number of images retrieved (16, 32, 48,…, 112) are shown in Fig. 17. In Table 4, average retrieval rate (ARR) of the proposed technique with all other techniques on STex database have been described. From Table 4, it is obvious that the ARR of the presented technique is much higher than the other existing techniques. ARR of the suggested method in STex database has improved outstandingly from CS-LBP + CH, LEPINV + CH, DWT + LEP + CH, Wavele + CH, LEPSEG + CH, DLEP + CH and Joint LEP CH by 19.22, 26.65, 7.06, 42.95, 23.31, 13.55, 6.21 %, respectively.

Fig. 17
figure 17

STex database, graph of a average precision and b average recall versus no. of images retrieved

Table 4 Average retrieval rate (ARR) of all methods in STex and MIT VisTex database in percentage

6.2.2 Experiment 2

In experiment 2, one more coloured texture database is used which is collected from the MIT VisTex database [47]. The MIT VisTex database contained a large amount of coloured textures. From these, 40 textures are taken for experiment. The size of each texture is 512 × 512. Each of these 40 images is divided into 16 non-overlapping model images of size 128 × 128. Therefore, it produces a vast database containing total 640 images of 40 different types. Figure 18 shows a few example images from this database taking one image from each category.

Fig. 18
figure 18

Sample images from MIT VisTex database (one image from each category)

Using Eq. 1116, precision and recall are measured for this database. Figure 19 illustrates the result of the suggested method in comparison with all other techniques regarding average precision and average recall. From Table 4, it is observed that the presented technique is showing superior results than the other existing techniques in terms of ARR. The proposed method improves the retrieval result in terms of ARR from CS-LBP + CH, LEPINV + CH, DWT + LEP + CH, Wavelet + CH, LEPSEG + CH, DLEP + CH and Joint LEP CH by 6.48, 6.39, 3.32, 15.38, 6.05, 2.26 and 4.49 %, respectively.

Fig. 19
figure 19

MIT VisTex database, graph of a average precision and b average recall versus no. of images retrieved

7 Conclusion

In this paper, a new local pattern, LEPVP has been introduced to acquire texture feature from an image. The conferred image retrieval technique uses the combination of top two dominant features (texture and colour) of the image. It uses the properties of discrete wavelet transform and local pattern to extract the texture feature. First, two-level DWT is applied on the input image which helps to enhance the different directional information of the given input image and gives seven sub-images. Then, LEPVP operator is applied on these seven sub-images to extract local directional information. Thus, total 7 LEPVP patterns are obtained and the resultant histogram is constructed. This histogram is concatenated with colour histogram of the original input image to get the combined feature vector. Experiments are carried out on five different benchmark databases (Corel-1k, Corel-5k, Corel-10k, STex and MIT VisTex) to analyse the performance of the presented method. Experimental results signify that the proposed method performs better than the previous existing image retrieval methods on the basis of average precision and average recall.