1 INTRODUCTION

The original purpose of artificial intelligence is to replace the simple repetitive labor of human, and in order to ensure that the machine working in the process of repetitive labor will not make a big mistake; it is given a way to imitate human thinking [1]. Limited by hardware and software, the initial artificial intelligence can only make simple judgment, i.e., the set judgment program, which is relatively rigid; with the progress of hardware equipment, artificial intelligence also advances, and the application scope is gradually expanded [2]. In the process of the extensive application of artificial intelligence, image recognition technology also develops, for example, various patrol robots and fault detection equipment that needs the aid of artificial intelligence to fully play a role. Taking patrol robot as an example, with the aid of artificial intelligence, patrol robot can judge the image collected by camera. However, even with the aid of artificial intelligence, robot cannot directly recognize the content of images as human beings do. The conventional steps are to extract image features and then compared the features with the database or make judgment according to the program settings [3]. The feature extraction and feature comparison in the above steps are all part of the image recognition technology. The wide application of artificial intelligence also promotes the progress of image recognition technology. The recognition of black-and-white images has not been able to meet the requirement of artificial intelligence, and the recognition of color image has become a part of artificial intelligence. Li et al. [4] proposed a rotation parameter extraction method based on time difference and range Doppler image edge detection and verified through simulation experiment that the algorithm was effective and had superiority in calculation time. Das [5] proposed a new sort block truncation coding technique, carried out simulation experiment using three kinds of common data sets, and found that the method had higher recognition accuracy. Zhan et al. [6] proposed a high-performance privacy preserving Scale Invariant Feature Transform (SIFT) feature detection system and verified through a lot of experiments that the system not only protected image privacy, but also maintained the recognition accuracy of original SIFT features. This study introduced two kinds of image feature mining algorithms: Histogram of Oriented Gradient (HOG) and gray-level co-occurrence matrix (GLCM), combined the two algorithms with SVM model for color image recognition and classification, and carried out simulation experiments in MATLAB R2018a software.

2 COLOR IMAGE FEATURE MINING METHOD BASED ON HOG

HOG is short for direction gradient histogram [7]. In the detection image, the gradient value reflects the gray level change in a certain direction. The change of camera environment will affect the pixel value in the image, but the gradient direction is not affected, so it can be used as the recognition feature of the image.

As shown in Fig. 1, the main steps of feature mining for color image are as follows:

Fig. 1.
figure 1

Feature extraction process based on HOG.

(1) Firstly, the color image is converted to gray image.

(2) After preprocessing, the gradient of each pixel in the image is calculated, including gradient value, gradient intensity and gradient direction. The calculation formula [8] is:

$$\left\{ \begin{gathered} {{G}_{x}}(x,y) = f(x + 1,y) - f(x - 1,y), \hfill \\ {{G}_{y}}(x,y) = f(x,y + 1) - f(x,y - 1), \hfill \\ M(x,y) = \sqrt {{{G}_{x}}{{{(x,y)}}^{2}} + {{G}_{y}}{{{(x,y)}}^{2}},} \hfill \\ \theta (x,y) = \arctan \left( {\frac{{{{G}_{x}}(x,y)}}{{{{G}_{y}}(x,y)}}} \right), \hfill \\ \begin{array}{*{20}{c}} {\theta = \theta + \pi }&{{\text{if}}\,\,{\text{ }}\theta < 0,} \end{array} \hfill \\ \end{gathered} \right.$$
(1)

where \({{G}_{x}}\) and \({{G}_{y}}\) are the gradient values of pixel \((x,y)\) in the direction of x and y respectively, \(f(\bullet )\) is the gray value of pixel \((x,y)\), M is the gradient intensity of pixel \((x,y),\) and \(\theta \) is the gradient direction of pixel \((x,y)\).

(3) After obtaining the gradient value, gradient intensity and gradient direction of each pixel in the image, the image is divided into different blocks, and the directional gradient histogram in the image block is calculated, blocks are normalized, and finally the directional gradient histogram of blocks is combined into HOG feature.

3 COLOR IMAGE FEATURE MINING BASED ON GLCM

3.1 Brief Introduction of Color Image

Different from black-and-white image, color image is composed of RGB system [9] and HSI system, with multi-dimensional information. RGB system is a color system, which represents the color of image pixels: R represents red, G represents green, and B represents blue. Its numerical value reflects the brightness of the corresponding color. Different colors can be modulated by adjusting the brightness of three-primary colors. Moreover, there are various colors in nature, so it is difficult to completely restore the original color by relying on three primary colors alone. Therefore, HSI system is also needed in color image. H is the hue, which is used to represent the attribute of pixel color, S is the saturation, which is used to represent the degree of pure color of pixel, and I is the intensity, which is used to represent the brightness of pixel color. Through the combined action of the above two systems, the color image can restore the original image as much as possible. The relationship between RGB system and HSI system in color image is:

$$\left\{ \begin{gathered} r = \frac{R}{{R + G + B}}, \hfill \\ g = \frac{G}{{R + G + B}}, \hfill \\ b = \frac{B}{{R + G + B}}, \hfill \\ H = \arccos \,\,\frac{{{{(R - G)} \mathord{\left/ {\vphantom {{(R - G)} 3}} \right. \kern-0em} 3} + (R - B)}}{{{{{(R - G)}}^{3}} + (R - B)(G - B)}}, \hfill \\ S = 1 - \frac{{3I}}{{R + G + B}} = \frac{{R + G + B}}{3}, \hfill \\ \end{gathered} \right.$$
(2)

where r, g and b represent the ratios of red, green and blue in color map respectively. It can be seen from equation (2) that r, g and b has secrecy and is not easy to be interfered by the external factors, and H and S has a stronger secrecy than I. Therefore, H and S can mine the gray-scale features of color image better, and RGB system can reflect the gray-scale values of images better.

3.2 GLCM

In this study, the features of color image were mined using GLCM [10]. GLCM is a method that reflects image features using the spatial relationship of grayscale between pixels. The basic steps of GLCM for color image feature mining are as follows:

(1) Firstly, similar to the HOG method, the color image is transformed from the original image (RGB image) to the gray image.

(2) The gray-scale image is compressed using histogram equalization, and the formula of histogram equalization [11] is:

$$s = \int\limits_0^r {{{p}_{r}}(\omega )d\omega } ,$$
(3)

where s is the gray value after histogram equalization, r is the gray value before histogram equalization, and \({{p}_{r}}(\omega )\) is the probability density of gray-level r in the image.

(3) The size of sliding window is determined, \(5 \times 5\) in this study.

(4) GLCM in the sliding window is calculated: any pixel \((x,y)\) in the window is taken, then another pixel \((x + a,y + b)\) which has a distance of \((a,b)\) with the previous pixel is taken, and the gray value pair \(({{g}_{1}},{{g}_{2}})\) of the two pixels is obtained. Then pixel \((x,y)\) in the window is traversed to obtain the gray value pairs of pixels in the whole window. The occurrence number of different \(({{g}_{1}},{{g}_{2}})\) is recorded, and the probability is calculated to obtain the GLCM. According to different values of \((a,b)\), different GLCMs are obtained. In this study, \((1,0),\,\,(0,1),\,\,(1,1),\,\,( - 1,1)\) are taken, and they represent the pixel pairs which are composed of pixels and pixels that are adjacent to directions of 0°, 90°, 45° and 135°.

(5) The features of matrix are calculated according to GLCM, and the calculation formula is:

$$\left\{ \begin{gathered} ASM = \sum\limits_{i = 1}^k {\sum\limits_{j = 1}^k {{{{(G(i,j))}}^{2}},} } \hfill \\ CON = \sum\limits_{n = 0}^{k - 1} {{{n}^{2}}\left( {\sum\limits_{\left| {i - j} \right| = n} {G(i,j)} } \right),} \hfill \\ IDM = \sum\limits_{i = 1}^k {\sum\limits_{j = 1}^k {\frac{{G(i,j)}}{{1 + {{{(i - j)}}^{2}}}}} } , \hfill \\ ENT = - \sum\limits_{i = 1}^k {\sum\limits_{j = 1}^k {G(i,j)\log G(i,j),} } \hfill \\ COR = \sum\limits_{i = 1}^k {\sum\limits_{j = 1}^k {\frac{{(ij)G(i,j) - {{u}_{i}}{{u}_{j}}}}{{{{s}_{i}}{{s}_{j}}}},} } \hfill \\ {{u}_{i}} = \sum\limits_{i = 1}^k {\sum\limits_{j = 1}^k {iG(i,j),} } \hfill \\ {{u}_{j}} = \sum\limits_{i = 1}^k {\sum\limits_{j = 1}^k {jG(i,j),} } \hfill \\ {{s}_{i}} = \sqrt {\sum\limits_{i = 1}^k {\sum\limits_{j = 1}^k {G(i,j){{{(i - {{u}_{i}})}}^{2}},} } } \hfill \\ {{s}_{j}} = \sqrt {\sum\limits_{i = 1}^k {\sum\limits_{j = 1}^k {G(i,j){{{(j - {{u}_{i}})}}^{2}}} ,} } \hfill \\ \end{gathered} \right.$$
(4)

where ASM is the energy of GLCM, CON is matrix contrast, IDM is the matrix inverse difference moment, ENT is the entropy of the matrix, COR is the autocorrelation coefficient of the matrix, \(G(i,j)\) represents GLCM, and k represents the maximum gray level. The energy, contrast, inverse difference moment, entropy and autocorrelation coefficients in the above formula are all characteristic values of GLCM, and they are also the values of image texture features. In this study, four scanning directions were used, i.e., four \((a,b)\), and four sets of eigenvalues were obtained. The above four sets of eigenvalues were comprehensively represented by one vector, then the vector was the image texture description of the window, which was used for subsequent recognition and classification.

(6) After the GLCM of the current window and its corresponding eigenvalues are calculated, the window is slid for one pixel.

(7) Step ④ ⑤ ⑥ repeat until the whole image is traversed to obtain the eigenvector matrix of the whole image. The eigenvector vector matrix can be used for subsequent recognition and classification and can also be converted into the texture feature image of color image.

4 SVM

The two methods mentioned above can effectively mine the features of color image and obtain the texture feature values. However, in the practical application, mining the features of color image is not enough, and what is more important is to classify the extracted features so as to realize the recognition of color image. It is seen from the above description that many feature values can be obtained from the multi-dimensional color image even after the gray-scale dimension reduction generally. Theoretically, although the image can be recognized and classified by human based on the obtained eigenvalues, the efficiency of artificial recognition and classification is low because of the large amount of data. In order to improve the recognition efficiency, the mined features were classified using SVM, so as to realize the fast recognition and classification of color pictures.

SVM [12] is an intelligent algorithm that can learn by itself. Its basic principle is to project data into high-dimensional space using kernel function, then find the optimal classification boundary according to the linear division principle, and classify the detection data taking the classification boundary as the standard.

After combining the above-mentioned color image feature mining method with SVM, the color image recognition process is shown in Fig. 2.

Fig. 2.
figure 2

The color image recognition process based on SVM and image feature mining algorithm.

(1) Firstly, the collected color image which needs to be recognized is input into the recognition model, and then it is preprocessed, including denoising and image graying.

(2) Feature mining is performed on the preprocessed image. In this study, the method based on HOG and GLCM was used for feature mining, and see the detailed steps above.

(3) After image feature mining, in the training stage of the recognition model, the extracted features are constructed as training sets and then input them into SVM, and the appropriate kernel function and penalty parameter are selected to calculate the decision function. The calculation formula of the decision function is:

$$\left\{ \begin{gathered} f(x) = \operatorname{sgn} \left( {\sum\limits_{i = 1}^l {{{a}_{i}}{{y}_{i}}K({{x}_{i}},{{x}_{j}}) + b} } \right), \hfill \\ \begin{array}{*{20}{c}} {\sum\limits_{i = 1}^l {{{a}_{i}}{{y}_{i}}} = 0,\,\,}&{0 \leqslant {{a}_{i}} \leqslant C,} \end{array} \hfill \\ \end{gathered} \right.$$
(5)

where a is the set of \({{a}_{i}}\); \({{a}_{i}}\) is Lagrange coefficient [13], l is the sample size, \(K({{x}_{i}},{{x}_{j}})\) is the kernel function, C is the penalty parameter, \({{y}_{i}}\) is the result of classification, and \({{x}_{i}}\) is sample data.

(4) In the use stage of the recognition model, the extracted features are input into the trained SVM, and the SVM classifies the image according to the optimal boundary calculated during the training.

5 SIMULATION EXPERIMENT

5.1 Experimental Environment

In this study, the above two feature mining algorithms were simulated and analyzed using MATLAB R2018a software [14]. The experiment was carried out in a laboratory server. The server configurations were Windows 7 system, Intel Core i7 processor and 16 GB memory.

5.2 Experimental Data

This study selected 200 color images which contained four different content, including vehicles, boats, animals and buildings. The images were all in bmp format, and the size was \(600 \times 512\). Moreover, in order to detect the anti-interference performance of the two feature mining algorithms to the noise, every image was added with a Gaussian noise (\({{\sigma }^{2}} = 0.02\)) [15].

5.3 Experiment Setup

After preprocessing the experimental color images, the features were mined using HOG and GLCM respectively, and then the extracted features were constructed into training set and testing set. 60% of the HOG and GLCM features were randomly selected as the training sets, and the remaining 40% as the testing set.

Two kinds of features were input into SVM model for training respectively. Radial basis function was selected as \(K({{x}_{i}},{{x}_{j}})\) in the SVM model, and C was set as 0.01. After SVM model training, test was carried out using the testing set. Images in the testing set were divided into 8 groups, and each group repeated the test three times under the two recognition models. The average result was taken as the final result.

5.4 Experimental Results

In this study, 800 color images were involved in the simulation experiment. Limited by the space, the texture feature images of part of color images obtained by the feature mining method were displayed. The original images and texture feature images are shown in Fig. 3. It was seen from Fig. 3 that the two feature mining methods could mine the texture of the main target in the original image, and the texture also showed the features of the original image completely. However, compared with the feature mining method based on GLCM, the former took the unimportant and relatively fuzzy part of the original image background as a part of the texture feature in the bird image, while the latter completely removed the fuzzy part of theoriginal image background and only retained the main part of the texture feature for later recognition and classification. The comparison of building and ship images also showed that the textures of the main part obtained by GLCM were clearer. The comparison demonstrated that the feature mining method based on GLCM was more accurate subjectively than the feature mining method based on HOG.

Fig. 3.
figure 3

The texture feature images obtained by two feature mining methods.

The testing set was input into the SVM models which have been trained by the two feature mining methods, and the model recognition accuracy obtained according to the output results is shown in Fig. 4. In this study, the testing set was divided into eight groups, each group was tested three times, and the average value was taken as the final result. The results are as follows. The average accuracy of the SVM model based on HOG was 80.6% for one group, 75.8% for two groups, 82.3% for three groups, 84.3% for four groups, 75.6% for five groups, 78.9% for six groups, 80.2% for seven groups, and 78.6% for eight groups; the average accuracy of the SVM model based on GLCM was 98.3% for one group, 98.1% for two groups, 98.5% for three groups, 97.9% for four groups, 98.2% for five groups, 97.9% for six groups, 98.2% for seven groups, and 98.3% for eight groups. The overall average accuracy of the SVM model based on HOG was 79.5%, and the standard deviation was 2.99%; the overall average accuracy of the SVM model based on GLCM was 98.2%, and the standard deviation was 0.21%. Both the visual representation in Fig. 4 and the calculated data showed that the SVM model based on GLCM had higher recognition accuracy, and the accuracy was more stable when recognizing color images.

Fig. 4.
figure 4

The recognition accuracy of SVM models under two feature mining methods.

The testing set was divided into eight groups, each group was tested three times, and the average time needed to detect the color image was shown in Fig. 5. The results are as follows. The average detection time of the SVM model based on HOG was 15.6 s for one group, 18.6 s for two groups, 16.5 s for three groups, 15.6 s for four groups, 15.7 s for five groups, 15.8 s for six groups, 16.8 s for seven groups, and 18.8 s for eight groups. The average detection time of the SVM model based on GLCM was 8.2 s for one group, 8.3 s for two groups, 8.5 s for three groups, 8.4 s for four groups, 8.3 s for five groups, 8.1 s for six groups, 8.3 s for seven groups and 8.2 s for eight groups. The average detection time of the SVM model based on HOG was 16.7 s, and the standard deviation was 1.33 s; the average detection time of the SVM model based on GLCM was 8.3 s, and the standard deviation was 0.12 s. It was seen from Fig. 5 and statistical data that the SVM model based on GLCM was not only less time-consuming, but also more stable.

Fig. 5.
figure 5

The detection time of SVM models under two feature mining methods.

6 CONCLUSIONS

This paper introduced two algorithms of image feature mining: HOG and GLCM. Then the two algorithms were combined with SVM model for color image recognition and classification, and simulation experiments were carried out in MATLAB R2018a software. The final results are as follows: (1) the image texture features mined by GLCM were clearer and accurate than those mined by HOG; the latter was easy to mine the unimportant fuzzy background as texture feature; (2) in terms of the accuracy of image recognition, the average accuracy of the SVM model based on HOG was 79.5%, and the standard deviation was 2.99%; the average accuracy of the SVM model based on GLCM was 98.2%, and the standard deviation was 0.21%; (3) in terms of image recognition time, the average detection time of the SVM model based on HOG was 16.7 s, and the standard deviation was 1.33 s; the average detection time of the SVM model based on GLCM was 8.3 s, and the standard deviation was 0.12 s.