1 Introduction

With the development of the Internet era, image data is increasing, and image retrieval is widely used in target recognition, photo filtering and other scenarios. Of course, as more and more image data is stored, corresponding security problems and solutions [1,2,3] and the need for image retrieval efficiency are also increasing. Therefore, it is increasingly important to effectively improve the retrieval efficiency of images.

Currently, there are many algorithms for content-based image retrieval (CBIR) that have been proposed and widely used [4]. Content-based image retrieval mainly uses the underlying features such as the color, texture, shape and spatial features of the image for retrieval. Color is the most direct and simple feature in an image. It has less dependence on the size, orientation, rotation, etc. of the image itself. Therefore, image color feature retrieval is the most commonly used basic method in content-based image retrieval technology. However, the color feature is sensitive to the change of brightness, and the histogram as the color feature does not contain spatial layout information of any color, and the shape feature is one of the essential features of the object, which does not change with the change of the surrounding environment and brightness, relative to the color. And texture is more intuitive, and carries a certain amount of spatial layout information. Therefore, image retrieval of a single feature is currently widely used.

However, the retrieval efficiency of image retrieval with a single feature can no longer meet the needs of image retrieval. It is especially important to find an efficient image retrieval method. The criminal investigation image retrieval algorithm based on double-tree complex calibration wave combined with four-direction six-parameter gray level co-occurrence matrix Hu invariant moment in literature has high retrieval efficiency and is not widely used for criminal investigation images [5,6,7].

Therefore, this paper proposes an image retrieval algorithm based on the combination of color features and shape features. Firstly, the local cumulative histogram of the image is statistically analyzed, and similarity ranking is performed. Then, the shape characteristics are calculated by calculating the invariant moment of the image Hu, and the similarity is performed. Sorting; Finally, we assign color and shape features to a certain weight for image retrieval. Through coherent experiment comparison, the algorithm effectively improves the accuracy of image retrieval [8, 9].

In summary, the main contribution of this paper is to assign certain weights to color features and shape features, improve the retrieval efficiency of images, and make up for the problem of low efficiency when only single feature image retrieval is performed.

In section 2, we briefly introduce the related methods of color feature extraction. In section 3, we introduce the method of shape feature. In section 4, we introduce the method of similarity measure. In section 5, we introduce the steps of the algorithm in this paper. In section 6, we introduce the experimental environment and the results of the comparative experiments and the retrieval efficiency of each algorithm. We conclude this paper in section 7.

2 Color Feature

The color histogram method proposed by Swain [10] et al. can divide the color space into several fixed subspaces, then count the number of pixels belonging to each subspace for each image, and adopt the intersection of color histogram to measure the similarity between images. Also, the color has scale, translation and rotation invariance. The main disadvantage of the color histogram: it only contains the frequency of a certain color and loses the position information of the pixel. Either image can give the only histogram corresponding to it, but different images may have the same histogram, which means that the histogram and the image are one-to-many, which does not match the human visual perception. That is, the rate of false positives is high.

To further improve the color histogram method, Pass et al. proposed an image coherence vector CCV (color coherence vector) as the image feature of the color [11]. The core idea is that when the area of the contiguous area occupied by the pixels with similar colors in the image is greater than a certain threshold, the pixels of the area are aggregated pixels, otherwise they are non-aggregated pixels. The ratio of the aggregated pixels to the non-aggregated pixels of each color included in such a statistical image is referred to as the color aggregation vector of the image, and the aggregated vector of the target image and the aggregated vector of the retrieved image are matched during the retrieval process. The aggregate vector preserves spatial information of the image color to some extent. Stricker et al. [12] proposed a method of cumulative color histogram, and proposed a method of color moment, mainly for the first, second and third moments of each color component; He Heng [13] et al., put forward a fuzzy histogram method for applying fuzzy theory to image retrieval; a new weighted primary color descriptor is proposed in [14], which is based on the proportion of each dominant color, using MP7DCD or fast LBA algorithm to extract images. The main color, the weight value of each main color is obtained, and combined into a new main color descriptor, considering the background color of the image; In literature [15], the color difference histogram is used to represent the color features, which not only considers the role of image edge points, color and color differences, but also considers the spatial layer features of the image without using any image segmentation techniques; the literature [16] uses the Gaussian mixture model generated from the training set using the expected maximum algorithm to quantify the color, and then consider the color space information to construct a new spatial color histogram. In this paper, for the above problem that only the single retrieval algorithm of color features has insufficient retrieval precision, the color feature and shape feature are combined to perform image retrieval. The process is to use the cumulative histogram method to calculate 7 Hu invariant moments as shape features. Combining the color and shape features, using Euclidean distance as the similarity measure, an algorithm based on the combination of color features and shape features is proposed, and its effectiveness is verified by experiments [6, 17].

2.1 Color Histogram

The color histogram is the proportion of different colors in the entire image and does not care about the spatial position of each position. The definition of the color histogram is as follows.

$$ \mathrm{H}=\left\{\mathrm{h}\left[{\mathrm{c}}_1\right],\mathrm{h}\left[{\mathrm{c}}_{12}\right],\dots \dots, \mathrm{h}\left[{\mathrm{c}}_{\mathrm{k}}\right]=1,0\le \mathrm{h}\left[{\mathrm{c}}_{\mathrm{k}}\right]\le 1\right\} $$
(1)

The pixel frequency h[ck]of the Kth color appearing in the image is as follows.

$$ \mathrm{h}\left[{\mathrm{c}}_{\mathrm{k}}\right]=\frac{\sum \limits_{\mathrm{i}=0}^{{\mathrm{N}}_1-1}\sum \limits_{\mathrm{j}=0}^{{\mathrm{N}}_2-1}\left\{\begin{array}{c}1\left(\mathrm{I}\left(\mathrm{i},\mathrm{j}\right)={\mathrm{c}}_{\mathrm{k}}\right)\\ {}0\left(\mathrm{other}\right)\end{array}\right.}{{\mathrm{N}}_1\times {\mathrm{N}}_2} $$
(2)

Where N1 and N2 represent the width and height of the image.

2.2 Cumulative Histogram

The local accumulation histogram method is as follows: a picture is provided, and for convenience of processing, it is converted into a 256 × 256 pixel size, and the color value of one of the pixels is ai, j (where i is the abscissa of the pixel, j For the ordinate of the pixel), ai, j……ai + 16, j + 16 is a color block, and the accumulated color value C is calculated:

$$ \kern2.75em \mathrm{C}=\sum \limits_{\mathrm{m}=\mathrm{i}}^{\mathrm{i}+16}\sum \limits_{\mathrm{n}=\mathrm{j}}^{\mathrm{j}+16}{\mathrm{a}}_{\mathrm{m},\mathrm{n}} $$
(3)

Let the C value of the hth block in the picture be Ch, the picture has 256 blocks, and there are 256 values, then the local cumulative histogram Ha(h)= Ch /(C1 + C2 + … + C256).

2.3 Color Moment

The color moment is a simple and effective color feature representation method, which is widely used in the field of image processing. The distribution of image color information is mainly concentrated in the lower-order moments. The first moment (mean), second moment (variance) and third moment (skewness) of the color information can fully express the color distribution of the image. Color moments represent color features. Its mathematical model is as follows.

$$ \kern2.5em {\upmu}_{\mathrm{i}}=\frac{1}{\mathrm{N}}\sum \limits_{\mathrm{j}=1}^{\mathrm{N}}{\mathrm{p}}_{\mathrm{i},\mathrm{j}} $$
(4)
$$ \kern2em {\upsigma}_{\mathrm{i}}={\left(\frac{1}{\mathrm{N}}\sum \limits_{\mathrm{j}=1}^{\mathrm{N}}{\left({\mathrm{p}}_{\mathrm{i},\mathrm{j}}-{\upmu}_{\mathrm{i}}\right)}^2\right)}^{\frac{1}{2}} $$
(5)
$$ \kern2.5em {\mathrm{s}}_{\mathrm{i}}={\left(\frac{1}{\mathrm{N}}\sum \limits_{\mathrm{j}=1}^{\mathrm{N}}{\left({\mathrm{p}}_{\mathrm{i},\mathrm{j}}-{\upmu}_{\mathrm{i}}\right)}^3\right)}^{\frac{1}{3}} $$
(6)

Where μi, σi, and si represent the first moment, the second moment, and the third moment, respectively, and N represents the number of pixels.

3 Shape Feature

Shape features are another important feature that describes image content and is a fundamental issue in computer vision and pattern recognition. Using shape features for retrieval, the user can retrieve similar images from the image library by sketching the shape or outline of the image. There are two kinds of retrieval based on shape features: one is to obtain the contour of the target after edge extraction, and to perform image feature retrieval on the contour; the other is to search based on the regional features of the image [9, 18].

The description methods for shape contour features are: boundary histogram [19], chain coding [20], curvature scale space [21], Fourier description [22], etc. The most typical method is Fourier description, its basic idea The Fourier transform of the object boundary is used as the shape description, and the closedness of the regional boundary and the Monday property are transformed into a one-dimensional problem, thereby improving the retrieval efficiency. The description methods for the regional features mainly include the shape-independent moment, the area of the area, and the aspect ratio of the shape. For shape-based retrieval, the extraction, description, and matching of shapes are the key issues to be solved. Shape-based retrieval methods are more difficult than color- and texture-based retrieval methods.

3.1 Definition of Geometric Moment

Moments are used in imaging as a valid description of image shape features. It has been widely used in image analysis, pattern recognition, and other fields.

Let the gray function of an image be f(x, y), where (x, y) represents the pixel point of the image, that is, the (p + q) order geometric moment (standard moment) of the image is defined as [23]:

$$ \kern3em {\mathrm{m}}_{\mathrm{p}\mathrm{q}}=\sum \limits_{\mathrm{y}=1}^{\mathrm{N}}\sum \limits_{\mathrm{x}=1}^{\mathrm{M}}{\mathrm{x}}^{\mathrm{p}}{\mathrm{y}}^{\mathrm{q}}\mathrm{f}\left(\mathrm{x},\mathrm{y}\right)\kern1em \mathrm{p},\mathrm{q}=0,1,2\dots \kern0.5em $$
(7)

The (p + q) order center moment is defined as

$$ \kern1em {\upmu}_{\mathrm{p}\mathrm{q}}=\sum \limits_{\mathrm{y}=1}^{\mathrm{N}}\sum \limits_{\mathrm{x}=1}^{\mathrm{M}}{\left(\mathrm{x}-\overline{\mathrm{x}}\right)}^{\mathrm{p}}{\left(\mathrm{y}-\overline{\mathrm{y}}\right)}^{\mathrm{q}}\mathrm{f}\left(\mathrm{x},\mathrm{y}\right)\ \mathrm{p},\mathrm{q}=0,1,2\dots $$
(8)

Where \( \overline{\mathrm{x}} \) and \( \overline{\mathrm{y}} \) represent the center of gravity of the image, \( \overline{\mathrm{x}}=\frac{{\mathrm{m}}_{10}}{{\mathrm{m}}_{\infty }},\overline{\ \mathrm{y}}={\mathrm{m}}_{01}/{\mathrm{m}}_{\infty } \), where N and M are the height and width of the image, respectively.

The normalized central moment is defined as

$$ \kern2.5em {\upeta}_{\mathrm{pq}}=\frac{\upmu_{\mathrm{pq}}}{\upmu_{00}^{\uprho}} $$
(9)

Where \( \uprho =\frac{\mathrm{p}+\mathrm{q}}{2}+1 \), p + q = 2, 3, 4…

3.2 Hu Invariant Moment

Combining the normalized second-order central moments η11, η20, η02 and the third-order central moments η12, η21, η03, η30 can obtain seven moments with translation, rotation, and scale invariance. Extract the shape feature of the image, i.e.

$$ {M}_1={\eta}_{20}+{\eta}_{02} $$
(10)
$$ {M}_2={\left({\eta}_{20}-{\eta}_{02}\right)}^2+4{\eta}_{11}^2 $$
(11)
$$ {M}_3={\left({\eta}_{30}-3{\eta}_{12}\right)}^2+{\left(3{\eta}_{21}-{\eta}_{03}\right)}^2 $$
(12)
$$ {M}_4={\left({\eta}_{30}+{\eta}_{12}\right)}^2+{\left({\eta}_{21}+{\eta}_{03}\right)}^2 $$
(13)
$$ {M}_5=\left({\eta}_{30}-3{\eta}_{12}\right)\left({\eta}_{30}+{\eta}_{12}\right)\left[{\left({\eta}_{30}+{\eta}_{12}\right)}^2-3{\left({\eta}_{21}+{\eta}_{03}\right)}^2\right]+\left(3{\eta}_{21}-{\eta}_{03}\right)\left({\eta}_{21}+{\eta}_{03}\right)\left[3{\left({\eta}_{30}+{\eta}_{12}\right)}^2-{\left({\eta}_{21}+{\eta}_{03}\right)}^2\right] $$
(14)
$$ {M}_6=\left({\eta}_{20}-{\eta}_{02}\right)\left[{\left({\eta}_{30}+{\eta}_{12}\right)}^2-{\left({\eta}_{21}+{\eta}_{03}\right)}^2\right]+ $$
$$ 4{\eta}_{12}\left({\eta}_{30+}{\eta}_{12}\right)\left({\eta}_{21}+{\eta}_{03}\right) $$
(15)
$$ {M}_7=\left(3{\eta}_{21}-{\eta}_{03}\right)\left({\eta}_{30}+{\eta}_{12}\right)\left[{\left({\eta}_{30}+{\eta}_{12}\right)}^2-3{\left({\eta}_{21}+{\eta}_{03}\right)}^2\right]+\left(3{\eta}_{12}-{\eta}_{30}\right)\left({\eta}_{21}+{\eta}_{03}\right)\left[3{\left({\eta}_{30}+{\eta}_{12}\right)}^2-{\left({\eta}_{21}+{\eta}_{03}\right)}^2\right] $$
(16)

Thus, the internal grayscale distribution can be expressed by the seven central moments: M1, M2, …, M7.

4 Similarity Measure

In image retrieval, the similarity or difference between images is measured by calculating the distance between the image to be queried and the database image feature vector. In this paper, the histogram intersection and Euclidean distance are used as the similarity measure between image feature vectors, and the applicability of histogram intersection and Euclidean distance to image library is verified by experiments [5, 7, 24].

Let the database image be D, and the image to be checked is Q, where HQ(k) and HD(k) are the histograms of the query Q and the database image D, respectively, and L represents the gray level of the histogram, then the histogram intersection formula is as follows:

$$ \kern3.5em \mathrm{C}=\sum \limits_{\mathrm{m}=\mathrm{i}}^{\mathrm{i}+16}\sum \limits_{\mathrm{n}=\mathrm{j}}^{\mathrm{j}+16}{\mathrm{a}}_{\mathrm{m},\mathrm{n}} $$
(17)

The Euclidean distance formula is as follows:

$$ \kern3em {\mathrm{M}}_{\mathrm{E}}\left(\mathrm{Q},\mathrm{D}\right)=\sqrt{\sum \limits_{\mathrm{k}=0}^{\mathrm{L}-1}\left[{\mathrm{H}}_{\mathrm{Q}}\left(\mathrm{k}\right)\right]-{\mathrm{H}}_{\mathrm{D}}\left(\mathrm{k}\right)\Big]{}^2}\kern0.5em $$
(18)

The lower the calculated distance, the greater the similarity; otherwise, the smaller the similarity between the two graphs.

5 Image Retrieval Algorithm Based on Color Feature and Shape Feature(CSIR)

This paper proposes a retrieval algorithm based on a combination of color and shape features. The specific steps are as follows:

Step 1 Convert the image library image into an image of 256 × 256 pixels;

Step 2 Calculate an image cumulative histogram according to eq. (3);

Step 3 Calculate the shape of the shape feature vector from the seven Hu invariant moments of the image according to eqs. (10) to (16);

Step 4 Combine the features obtained in step 2 and step 3 to obtain a combination of color and shape characteristics;

Step 5 Calculate the Euclidean distance between the target image feature and the image feature in the image library according to eq. (18) and sort the values obtained by the Euclidean distance from small to large to obtain the search result.

6 Experimental Results and Analysis

In the experimental environment of this algorithm, the CPU is Intel Core i7–7700, eight cores, 3.6GHz; the memory is 16G; operating system is Windows 10 (64-bit); the programming software is Visual C++6.0.

The image library used in the experiment contains 4 categories of traffic signs, gestures, cars and leaves, 80 images in each category, a total of 320 images, traffic signs category number is 1~80, gesture category number is 81~160, and car category number is 161~240, the leaf category number is 241~320. An example of various types of pictures is shown in Fig. 1.

Fig. 1
figure 1

Sample image, (a) Traffic sign (b) Gesture, (c) Car (d) Leaf

In order to verify the effectiveness of the proposed algorithm, the simulation experiment uses color histogram retrieval algorithm, cumulative histogram retrieval algorithm, color moment retrieval algorithm, invariant moment retrieval algorithm, color histogram-based and invariant moment combination retrieval. The algorithm and the correlation algorithm based on the combination of color moment and invariant moment are compared with the algorithm of this paper.

6.1 Search Results

Each image entered into the image library was processed into an image of 256 × 256 pixels before the experiment. The target images are all from the image library and the first 10 images with the highest similarity are displayed in the query result, and the first image is the target image. Some of the search results are shown in Figs. 2a–g. Among the 10 result images obtained by searching for traffic sign images, there are 4 related images based on the color histogram algorithm. There are 8 related images based on the cumulative histogram algorithm. The search results based on the color moment algorithm have 2 related images. The correlation image, based on the Hu invariant moment algorithm, has 5 related images. The search results based on the literature [25] algorithm have 8 related images. The search results based on the combination of color moment and Hu invariant moment have 8 related images. The search results based on the combination of color features and shape features proposed in this paper have 9 related images.

Fig. 2
figure 2

Traffic sign inspection results, (a) Search results based on color histogram algorithm, (b) Search results based on cumulative histogram algorithm, (c) Search results based on color moment algorithm, (d) Search results based on Hu invariant moment algorithm, (e) Search results of the literature [25] algorithm, (f) Search results of the algorithm in this paper, (g) Search results based on the combination of color moment and Hu invariant moment

It can be seen from the experimental results that the algorithm combining color feature and shape feature in this paper is better than other image retrieval algorithms for a single feature.

6.2 Precision and Recall

The precision and recall rate are important indicators for measuring the efficiency of image retrieval. To objectively analyze the effectiveness of the algorithm, this experiment uses the algorithm of [25], the color moment based retrieval algorithm, the algorithm based on Hu invariant moment and the combination of color features and shape features proposed in this paper. The images are searched one by one, and the average precision and the average recall rate are calculated, as shown in Tables 1 and 2.

Table 1 Average precision
Table 2 Average recall rate

The following conclusions can be drawn from the data of Table 1 and Fig. 3. Firstly, the image retrieval algorithm comparing single color features or shape features shows that the retrieval effect of combining color features and shape features is improved to some extent. Secondly, comparing the color histogram algorithm with the color histogram and the invariant moment algorithm, adding the invariant moment shape feature can improve the image retrieval effect to some extent. Comparing the algorithm of [25] with the algorithm of this paper, the color histogram algorithm has a large amount of computation when extracting color features. Combining the algorithm of the color moment and Hu invariant moment, we can see that the average precision of the proposed algorithm in the four categories of traffic signs, gestures, cars and leaves is higher than the combination of color moment and Hu invariant moment. To some extent, the improvement of its retrieval effect is achieved.

Fig. 3
figure 3

Average precision histogram

Besides, by observing the average recall rate of Table 2 and Fig. 4, it can be further verified that the algorithm combining the color feature and the shape feature proposed in this paper has higher retrieval accuracy and better retrieval performance than other single retrieval conditions.

Fig. 4
figure 4

Average recall rate histogram

Finally, in the comparison process, it is found that considering the attachment, calculation, average precision and average recall rate of the algorithm, the combined color features and shape features proposed in this paper are the best among the seven algorithms.

7 Conclusions

In this paper, we finally propose a retrieval algorithm based on the combination of color features and shape features, which allows users to experience better image retrieval services. Our method can be applied to many different scenes, and the efficiency of image retrieval is improved to some extent by combining color features with shape features. Experimental and theoretical results prove that our method can effectively improve the efficiency of image retrieval and has the ability to be portable. All in all, we finally proposed a more efficient method for image retrieval. In the next research process, other image feature extraction methods will be considered to obtain better retrieval results.