Keywords

1 Introduction

High speed and accurate identification of coal gangue is the key to realize the intelligence of coal industry [1], and it is very important to improve the production efficiency of coal mining and coal preparation. Therefore, the research on coal gangue identification has always been the focus and hotspot in the field of coal intelligent technology. It is related to the further improvement of coal mining technology and the development of safe and green mining. In recent years, ray transmission coal preparation method has developed rapidly, including γ Ray detection technology [2], X-ray detection [3], laser detection [4] and so on. Ray detection techniques are easy to achieve integration, but X-ray transmission coal preparation method has some defects, such as high cost, difficult maintenance and great harm to human body. At the same time, the technology of image processing and pattern recognition has been developing continuously, and the use of image recognition technology [5] has promoted the development of coal gangue recognition. Eshaq et al. [6] used the infrared camera to obtain the image of coal and gangue, and used the gray level co-occurrence matrix to extract the gray level features of the image. Finally, SVM was combined with the gray level features to classify coal and gangue, and achieved good classification results. Singh V et al. [7] compared and summarized the gray histogram of coal gangue through image processing technology, and set the recognition threshold to distinguish coal and gangue. Hu et al. [8] Proposed a method to identify coal and gangue by using spectral imaging technology, which overcomes the influence of light, dust and other environmental factors on the detection of ordinary images. Su et al. [9] proposed an automatic recognition method of coal gangue image based on convolutional neural network, and the recognition rate of coal gangue reached 95.88%.

The fruit fly optimization algorithm (FOA) is an intelligent optimization algorithm proposed by Pan [10] according to the foraging behavior of Drosophila. Compared with genetic algorithm (GA) [11, 12], particle swarm optimization (PSO) [13, 14], ant colony optimization (ACO) [15, 16], FOA has the advantages of easy understanding and simple calculation process. It has been widely used in various fields, but in practical application, the algorithm is easy to fall into local optimization and other problems. Huang. H et al. [17] Proposed the twin swarm Drosophila optimization algorithm (LFOA) with Levy flight characteristics, which effectively solved the problem of FOA falling into local optimization and improved the performance of the algorithm. In order to overcome the shortcomings of the original FOA algorithm, Wu [18] and others proposed an improved fruit fly optimization algorithm (IAFOA). Compared with the original FOA, IAFOA also includes four additional mechanisms. Pan et al. [19] proposed an improved fruit fly optimization algorithm (IFFO), which introduces a new control parameter to adaptively adjust the search range around the population position. Specifically, after setting the maximum and minimum radius of the search range, the iteration step value decreases with the increase of the number of iterations. Hu et al. [20] proposed the fruit fly optimization algorithm (SFOA) with decreasing step size. In SFOA, the current step value RI can be calculated according to the formula, where R is the initial step value, M is the maximum number of iterations, and M is the current number of iterations.

In view of the fruit fly algorithm adopts a fixed search radius in the fruit fly foraging stage, the convergence speed and accuracy of the algorithm are poor. In this paper, the dynamic step factor and the improved fruit fly smell concentration judgment formula are proposed to improve the algorithm, then an improved fruit fly optimization algorithm is proposed, and several commonly used test functions are selected to optimize and verify the improved FOA. The experimental results show that the improved FOA has better optimization performance and higher stability. On the basis of improving FOA, this paper proposes to optimize the SVM algorithm with the improved FOA. Then, the gray texture joint feature parameters of coal and gangue are extracted as the input vector of the improved FOA-SVM classification model, and the classification experiments of coal and gangue are carried out. The improved FOA-SVM, FOA-SVM, PSO-SVM (based on the particle swarm optimization algorithm) and SVM were compared in detail. Experimental results show that the proposed improved FOA-SVM method is superior to other methods in average accuracy, average accuracy, average recall and average F1 score.

2 SVM Classifier

SVM (Support Vector Machine) is to find a hyperplane with the largest distance from the edge. That is to solve the following problems:

$$ \left\{ {\begin{array}{*{20}l} {\min \frac{1}{2}\left\| \omega \right\|^2 + C\sum\limits_{i = 1}^n {\varepsilon_i } } \hfill \\ {s.t.\,\,y_i \left[ {(\omega \times x_i ) + b} \right] \ge 1 - \varepsilon_i ,\varepsilon_i \ge 0,i = 1, \cdots ,n} \hfill \\ \end{array} } \right. $$
(1)

where \(\omega\) is weight vector, \(C\) is penalty parameter, \(\varepsilon_i\) is relaxation variable, and \(b\) is the classification threshold. By introducing Lagrange function and solving the dual problem, and a new objective function is obtained.

$$ \left\{ {\begin{array}{*{20}l} {\max \frac{1}{2} {\alpha }_i - \frac{1}{2}\sum\limits_{i,j = 1}^n {y_i y_j {\alpha }_i } {\alpha }_j } \hfill \\ {s.t.\,\,\sum\limits_{i = 1}^n {{\alpha }_i y_i = 0,0 \le } { \alpha }_i \le c,i = 1, \cdots ,n} \hfill \\ \end{array} } \right. $$
(2)

where \(\rm{\alpha }\) is the Lagrange multiplier and \(K\) is the kernel function. We consider radial basis function (RBF) as the kernel function, and \(k(x_i ,x) = \exp ( - g \cdot \left\| {x_i - x} \right\|^2 )\),  \(g\) is the parameter of the kernel function. SVM decision function can be represented as.

$$ f(x) = {\text{sgn}} (\sum_{i = 1}^n {{\alpha }_i y_i K(x_i ,x) + b} ) $$
(3)

According to the above derivation, the classification of support vector machine needs to determine the parameters \(C\) and \(g\). when we use SVM and radial basis kernel function to deal with a classification problem, it is of great significance to select the best parameter \(C\) and penalty parameter \(g\), which directly determines the performance of the classifier and the effect of processing the problem. Therefore, this paper optimizes the parameters \(C\) and \(g\) with the improved FOA, and then obtains the optimal support vector classification model, so as to improve the classification performance of SVM model.

3 Fruit Fly Optimization Algorithm

3.1 The Basic Fruit Fly Optimization Algorithm

According to the foraging principle of fruit fly population, we can summarize the fruit fly optimization algorithm into the following steps:

  • Step 1: The initial population size of fruit fly is \(G\), the maximum number of iterations is \(M\), the initial position of the population is randomly set as \(X_{axis}\), \(Y_{axis}\), and the direction and distance of searching food are randomly set for fruit fly individuals, where \(R\) is the search distance.

    $$ \left\{ \begin{gathered} X_i = X_{axis} + R \hfill \\ Y_i = Y_{axis} + R \hfill \\ \end{gathered} \right. $$
    (4)
  • Step 2: Calculate the distance \(D_i\) from fruit fly individual to the origin, and then calculate the judgment value \(S_i\) of smell concentration.

    $$ \left\{ \begin{aligned} &D_i = \sqrt {X_i^2 + Y_i^2 } \hfill \\ &S_i = 1/D_i \hfill \\ \end{aligned} \right. $$
    (5)
  • Step 3: Substituting \(S_i\) into the fitness function to calculate the food smell concentration of each fruit fly individual.

    $$ Smell_i = f(S_i ) $$
    (6)
  • Step 4: Save the information of fruit fly individuals with the largest fitness value \(Smell_i\) in the fruit fly population.

    $$ [bestSmell,bestindex] = \max (smell_i ) $$
    (7)
  • Step 5: Record and retain \(bestsmell\) and its corresponding coordinate \(X,Y\), and its remaining fruit flies are updated from the previous position to the position of this coordinate.

    $$ \left\{ {\begin{array}{*{20}l} {Smellbest = bestSmell} \hfill \\ {X_{axis} = X(bestIndex)} \hfill \\ {Y_{axis} = Y(bestIndex)} \hfill \\ \end{array} } \right. $$
    (8)
  • Step 6: Judge whether the algorithm reaches the maximum number of cycles or target accuracy. If not, the algorithm executes step 2–step 5. If it reaches, the optimal fruit fly individual is returned and the algorithm ends.

3.2 A Improved FOA

Because FOA uses a fixed search radius in the fruit fly foraging stage, it has a great impact on the speed and accuracy of the algorithm. Therefore, in order to improve the optimization accuracy of FOA, improve the calculation efficiency, enhance the global search ability and search range of the algorithm, this paper will make the following improvements to the algorithm:

The dynamic step size factor is proposed to realize the dynamic change of the step size, and then realize the dynamic decrease of the search radius. Under the condition of ensuring a certain global search ability, the convergence speed and solution accuracy of the algorithm are improved at the same time.

The determination formula of fruit fly smell concentration in the algorithm is improved to realize the search of the algorithm in the negative value area, expand the search range and improve the ability to solve complex problems.

3.2.1 Decreasing Radius Strategy

Using fixed radius to search, the algorithm can not be improved from two aspects of convergence speed and accuracy. When the search radius remains large, the algorithm has strong global search ability, which can accelerate the convergence speed of the algorithm, but at the same time, the accuracy will decline, and it is difficult to find an accurate global optimal solution. When the search radius remains small, the global optimization ability will decline, which will slow down the convergence speed of the algorithm, but improve the optimization accuracy. In this paper, an improved method of dynamic step size is proposed to solve the shortcomings of fixed radius. While ensuring the global search ability of the algorithm, the accuracy can be significantly improved. The dynamic step factor proposed in this paper is as follows:

$$ \left\{ {\begin{array}{*{20}l} {w = w_0 \times e^{ - (\alpha d)/\max gen} } \hfill \\ {X_i = X_{axis} + wR} \hfill \\ {Y_i = Y_{axis} + wR} \hfill \\ \end{array} } \right. $$
(9)

where \(w\) represents the weight, \(w_0\) represents the initial weight, \(\rm{\alpha }\) represents the weight coefficient, \(d\) represents the current number of iterations, and maxgen represents the maximum number of iterations.

3.2.2 Improved Taste Concentration Formula

The judgment value of smell concentration of FOA algorithm is the reciprocal of the distance between the individual fruit fly and the coordinate origin, and its value range is always positive, which can not realize the search of negative spatial solutions, and lacks the ability to solve high latitude and complex problems. Therefore, the smell concentration determination formula is optimized, as shown in formula (10):

$$ S_i = \exp^{ - L_i } \times {\text{sgn}} (X_i \times Y_i ) $$
(10)

In Eq. (10), the exponential function ensures the negative correlation between the candidate solution and the position of fruit fly. At the same time, the function is used. When the individual fruit fly is within the two and four image limits of two-dimensional coordinates, it is negative. Equation (10) can realize the comprehensive search of negative value space, increase the application scenario of the algorithm, and improve the ability of the algorithm to solve high latitude and complex problems.

4 Implementation of Improved FOA-SVM Algorithm

4.1 Improved FOA-SVM Classification Algorithm Flow

The improved FOA-SVM algorithm flow is as follows:

  • Step 1: Initialize the population size \(G\), the maximum evolution times \(M\), and set the initial positions as \(X_{ - axis}\) and \(Y_{ - axis}\). Since there are two optimization parameters, when the initial position of fruit fly, \(X_{ - axis}\) and \(Y_{ - axis}\) should take two random numbers respectively, then the initial coordinates \((X_1^{\text{c}} ,Y_1^{\text{c}} )\) and \((X_1^g ,Y_1^g )\).

  • Step 2: The distance d from fruit fly to the origin was calculated, and the judgment values \(S_1^C\) and \(S_1^g\) of smell concentration were obtained by using the improved judgment formula.

  • Step 3: Through the 5-fold cross validation method, the accuracy is the fitness value of fruit fly, and the information of the best fitness individual is saved.

  • Step 4: The information of fruit fly individuals with the highest accuracy is saved, and the other individuals gather to the optimal individual location.

  • Step 5: Judge whether the set maximum evolution times is reached. If not, return to step 2. Otherwise, the optimal parameter of the output model, that is, \(C = S_1^C ,g = S_1^g\).

  • Step 6: The SVM recognition model is established by using the optimal parameters \(C\) and \(g\), and the test samples are classified to obtain the test results.

4.2 Classifier Performance Evaluation Index

At present, various classifier evaluation indexes have been used in different classification models and achieved good results. Among the evaluation indicators, accuracy, accuracy, recall and F1 score are the most commonly used.

Accuracy: measure the proportion of all samples classified accurately. The calculation formula is:

$$ Accuracy = \frac{TP + TN}{{TP + FP + TN + FN}} $$
(11)

Precision: also known as precision, it measures the classification accuracy of positive samples. The calculation formula is:

$$ \Pr ecision = \frac{TP}{{TP + FP}} $$
(12)

Recall: the proportion of positive samples with correct classification in the total positive samples. The calculation formula is:

$$ \rm{Re}call = \frac{TP}{{TP + FN}} $$
(13)

F1 score: harmonic average of accuracy rate and recall rate, and its calculation formula is:

$$ \frac{2}{F_1 } = \frac{1}{P} + \frac{1}{R} \Rightarrow F_1 = \frac{2PR}{{P + R}} = \frac{2TP}{{2TP + FP + TN}} $$
(14)

TP: it was originally a positive sample, but it was also judged as a positive sample by the model, that is, the category judged by the classification result is consistent with the category to which the sample belongs.

TF: It is originally a negative sample, but it is also judged as a negative sample by the model, that is, the category judged by the classification result is consistent with the category to which the sample belongs.

FP: originally a negative sample, but it is also judged as a positive sample by the model, that is, the category judged by the classification result is inconsistent with the category to which the sample belongs.

FN: originally a positive sample, but it is also judged as a negative sample by the model, that is, the category judged by the classification result is inconsistent with the category to which the sample belongs.

Accuracy rate is used to measure the proportion of real positive samples in the total number of predicted positive samples, which describes the accuracy of classifying positive samples. Recall rate is used to measure the proportion of positive samples with correct classification in the total samples. The recall rate reflects the recall rate of the classifier. F1 score is the harmonic average of accuracy and recall. The larger the value of F1 score, the better the overall performance of the classifier. The above indicators have the characteristics of independence, small amount of calculation and easy to understand. It can evaluate the classification results well, so it is widely used in the research of classification problems.

4.3 Performance Evaluation Results of Classifier Based on UCI Data Set

UCI data set, as a standard test data set, is often used to train machine learning models. The data set contains 559 sub data sets. In this experiment, the heart dataset in UCI dataset is selected to verify the performance of the classifier. The heart sub dataset contains 303 samples, 2 categories and the feature dimension is 13. In the experiment, the number of training samples was 153 and the number of test samples was 150.

When evaluating the classifier performance, this paper uses the heart dataset in UCI dataset to verify the improved FOA optimized SVM algorithm, and calculates the selected indicators to evaluate the classifier performance. Run the program 10 times in MATLAB 2018, keep the parameters consistent before and after each run, and count the mean value of each index of the classification results 10 times to evaluate the performance of the model.

Table 1. Evaluation indexes of improved FOA-SVM model

Table 1 shows the final evaluation results of the classifier using UCI dataset. As can be seen from Table 2, the average accuracy of the results of these 10 experiments is about 95.38%, the average accuracy is about 94.51%, the average recall is about 96.03%, and the average F1 score is about 95.30%. According to the above indicators, the improved FOA optimized SVM algorithm proposed in this paper has good classification effect. Except that the accuracy rate is 0.9451, the values of other indicators are more than 0.95, which has high classification performance.

Table 2. Comparison of classification performance of several models

In order to verify the reliability of the algorithm proposed in this paper, the heart data set in UCI is selected and classified by FOA optimized support vector machine and LFOA optimized support vector machine. The classification effect is evaluated by the above four indicators: average accuracy, average accuracy, average recall and F1 score. Compared with each index of improved FOA optimized support vector machine classification results in this paper, Table 2 shows the final results.

It is obvious from Table 2 that the improved FOA-SVM algorithm in this paper has better classification effect than the other two algorithms. The accuracy, accuracy, recall and F1 score of the classification results are higher than the corresponding index values of the other two algorithms in Table 1, and a good classification effect is obtained.

5 Experiment and Analysis

5.1 Image Preprocessing

The coal gangue images used in this paper are from a coal preparation plant in Han Cheng. A total of 1500 coal gangue images are taken, including 750 coal and 750 gangue. In order to easily distinguish and facilitate the experiment, the coal and gangue images are numbered respectively in the shooting, and finally the coal gangue image sample data set is established. In this paper, CCD industrial camera is used to capture the image of coal gangue. Its model is \(MV{ - }VS030FM{ - }L\), the shooting resolution is \(640 \times 480\), and the pixel size is 5.6 \(\upmu \text{m}\). It is developed on the basis of MOS integrated circuit, which can realize the automatic scanning of coal gangue.

In the process of image preprocessing, [21] image enhancement is an important step. Due to various factors and problems such as unclear picture and quality degradation in the process of image transmission, it will affect the later feature extraction and final recognition effect. Therefore, it is very necessary to enhance the picture. Image enhancement can highlight the target area in the image and weaken the useless or unimportant information, so as to increase the readability of the image. The reinforcement diagram of coal and gangue is shown in Fig. 1.

Fig. 1.
figure 1

Reinforcement diagram of coal and gangue

5.2 Coal Gangue Classification Based on Improved FOA-SVM

The gray mean, gray variance and skewness in the limestone features of coal gangue have low correlation [22], and the contrast and entropy in the texture features have low correlation. By comparing the classification results based on gray features, texture features and gray texture joint features, it is concluded that compared with gray features or texture features, the combination of gray and texture features has a high recognition rate for coal gangue recognition. In this paper, the gray texture joint feature parameters of coal and gangue will be extracted as the input vector of the improved SVM classification model.

Table 3. Range of gray texture joint feature parameters

In this experiment, 750 photos of coal and gangue are selected, a total of 1500 photos are selected, and their gray texture joint feature parameters are analyzed. The distribution range is shown in Table 3. The above parameters are used as the input vector of the support vector machine model, the recognition model is established, and the improved FOA algorithm is used to optimize the parameters of the recognition model, so as to obtain the optimal accuracy. The population number of the improved FOA algorithm is set to 30, the number of iterations is set to 100, and the other parameters remain unchanged. Figures 2, 3 and 4 are the results of coal and gangue classification.

As can be seen from Fig. 2 above, the accuracy of traditional FOA optimized support vector machine for coal gangue classification is 95.3%, and the corresponding optimal parameter C is 6.1610 and G is 0.9827; In the identification error results, the total number of test samples is 300, and the number of error identification is 14. It can be seen from Fig. 3 that the classification accuracy of the improved FOA optimized support vector machine for coal gangue is 96.3%, the corresponding optimal parameter C is 22.4804, and G is 13.8065; In the identification error results, the total number of test samples is 300, and the number of error identification is 11.

Fig. 2.
figure 2

Accuracy and error of coal gangue classification by FOA-SVM

Fig. 3.
figure 3

Accuracy and error of coal gangue classification by improve FOA-SVM

Fig. 4.
figure 4

Comparison of iterative evolution curves of three algorithms for coal gangue classification

Fig. 5.
figure 5

Comparison of evaluation indexes of coal gangue classification results of several models

It can be seen from Fig. 4 that the improved FOA algorithm optimized support vector machine has fewer iterations and higher accuracy in coal gangue recognition than the other two algorithms. From the above comparison, it can be seen that the improved FOA optimized support vector machine has good results in coal gangue recognition.

In order to verify the reliability and effectiveness of the algorithm proposed in this paper, support vector machine, PSO optimized support vector machine [23], FOA optimized support vector machine and other algorithms are used to classify coal and gangue, and the corresponding evaluation index values are calculated, which are compared with the evaluation index values of the classification algorithm proposed in this topic to verify the feasibility of this algorithm for coal gangue classification. It can be seen intuitively from Fig. 5 that the optimized support vector machine is used to classify coal gangue, and each evaluation index is higher than that before optimization. Among the indexes of the improved FOA optimized support vector machine classification algorithm in this paper. Therefore, the improved FOA optimized SVM has better classification performance and good classification effect than other algorithms.

6 Conclusion

This paper presents a SVM classification model based on improved the fruit fly algorithm. It is applied to the identification of coal and gangue. The classification model proposed in this paper overcomes the disadvantage that it is difficult to determine the optimal parameters of the model in the classification process of SVM algorithm. In this paper, the heart classification data set is used to evaluate the classification results of the improved algorithm. The evaluation results show that the algorithm has good classification performance. Finally, taking the coal gangue image as the research object, PSO-SVM algorithm, FOA-SVM algorithm, LFOA-SVM algorithm and the improved FOA-SVM are used to classify the coal gangue image. Compared with the classification results, the improved FOA-SVM algorithm in this paper has the best classification effect on the coal gangue image, and its accuracy reaches 96.3%, Therefore, the algorithm proposed in this paper has a certain application value in coal gangue recognition. In the future research, the deep learning method can be considered to recognize the coal gangue image, and the specific recognition results need to be further analyzed and proved.