1 Introduction

In recent years, predictive models based on artificial intelligence methods have rapidly risen in popularity and have been successfully applied to many fields. Although artificial intelligence methods can be widely applied to the fields of decision support and data mining, a few issues in the model development process remain. Such issues include parameter settings which are not objective, parameters which cannot satisfy different types of datasets, and the inability of the learning and prediction rates of the model to converge. Therefore, how to develop an artificial intelligence model that converges rapidly and is applicable to different types of datasets has always been an important issue.

Parameter settings are critical to model development. In the past, the design of experiment (DOE) approach has been proposed to solve related problems. The goal of DOE is to find the important variables through experimentation, enhance performance accuracy, and further decrease both the frequency and duration of the experiments [9]. Scholars conducted DOE studies covering everything from the traditional Trial and Error and Rule of Thumb to the later Taguchi method [29, 30]. The concept of Trial and Error is similar to that of the Exhaustive-Attack Method: determining a better parameter combination over time. Although the Trial and Error method incurs a greater cost, the method finds the best parameter combination relatively easily. The Trial and Error method is most often used when the number of parameters to be optimized is not large. Under such conditions, the use of the Trial and Error method to adjust slowly can sometimes achieve unexpectedly good results [2, 16]. With the Rule of Thumb, parameter settings are based on personal experience or a setting mode recommended by past scholars in order to establish an effective and stable model. The experimental model can thus be constructed with a lower time cost, but the model may not necessarily be stable. The main reason for this is that different types of datasets have their own unique features. If we use the same method to set the model parameters for different types of datasets, there will be a big gap between the experimental result and the stability of the model. Therefore, the Rule of Thumb method is more suitable for a mature model [6, 7]. The Taguchi method is an experimental design in which each factor is set to determine a relatively good result combination by exploring the interaction between the factors [29, 30]. Although the Taguchi method can effectively get relatively good results with fewer experiments, the both number of experimental factors used and how the experimental level is set will influence model stability. The Taguchi method is applicable in well-developed artificial intelligence methods, and many scholars use it to assist in the adjustment of neural network parameters [10, 25, 26, 35].

Bonabeau et al. [5] defined swarm intelligence (also called bio-inspired algorithms) as ‘‘…algorithms or distributed problem-solving devices inspired by the collective behavior of social insect colonies and other animal societies…’’ Developed by Eberhart and Kennedy in 1995 [13], the particle swarm optimization (PSO) algorithm is a population-based stochastic optimization technique inspired by the social behavior of flocking birds or schooling fish. In PSO, potential solutions are called particles. The algorithm searches for the optimal value by sharing historical information and social information between individual particles [8]. Moreover, the cooperation and competition among the population often allows the PSO algorithm to efficiently and effectively obtain solutions. However, the main drawback of the PSO algorithm is how easily it is impacted by partial optimism, which decreases accuracy when speed and direction are regulated [22]. After the introduction of PSO, some scholars proposed to model the intelligent behaviors of honeybee swarms in an effort to solve heterogeneous dataset problems [36,37,38,34, 37]. Karaboga [18] presented the artificial bee colony (ABC) algorithm and the empirical results showed that ABC can overcome the common issue of falling into local optimization. However, the results also showed that the ABC cannot quickly select the global optimum and requires an increase in the number of iterations during the probability choice stage [20, 21], significantly lengthening the convergence time at the cost of efficiency.

In recent years, novel bio-inspired algorithms such as PSO and ABC have been used to solve parameter optimization issues. To address the above problem, the novel approach proposed by this study is a discretized food source for an artificial bee colony (DfABC) optimization algorithm, which, along with parameter adjustments in the core program of an optimal support vector machine (SVM), which proposed by Vapnik and Cortes [36], allows us to construct a quickly converging classification model that has excellent prediction performance and can handle different types of datasets. In addition, this study uses a number of popular benchmark datasets from UCI Machine Learning Repository with different dimensions, sizes, and dimensional classifications in order to validate the proposed DfABC-SVM model. The experimental results show that the proposed DfABC algorithm can be applied to the SVM model and used to handle different types of datasets, and that the classification accuracy and convergence rate are better than other artificial intelligence or bio-inspired methods.

The rest of this paper is organized as follows. In Sect. 2, the proposed model is briefly presented with a step by step explanation. Section 3 analyzes the experimental results involving several UCI datasets. Section 4 discusses the results about the proposed methods. Finally, our findings and future recommendations are discussed in Sect. 5.

2 Methodology

2.1 Bio-inspired Software-Defined Networking

Software Defined Networks (SDN) raise a variety of opportunities for network evolution. The key feature of SDNs is the decoupling of data and control planes, removing the control plane from the network hardware. This provides remarkable resilience in programming, along with a broad range of opportunities to optimize network resource utilization. In fact, SDNs can address numerous IoT (Internet of Things, IoT) challenges, from serving massive numbers of service requests and responses; processing the enormous data flow of IoT sensors, devices, and appliances; accommodating unique identification schemes, allowing for autonomous network and service management, optimizing resources, and enabling service virtualization to seamless communication.

Bio-inspired networking techniques have been proposed and applied into the practices for a long time. The previous research contributions promoted the networking developments and related applications growth especially in large scale networks. Dressler and Akan [12] figured out the three different bio-inspired researches, including bio-inspired computing, bio-inspired systems, and bio-inspired networking. The bio-inspired computing is focus on the efficient algorithms deployment and reach the optimization network processes and network-level traffic pattern recognition. The bio-inspired systems is focus on the technical developments for the ideal structure and function of the biological systems. Finally, the bio-inspired networking is focus on the strategies developments to guarantee the efficient and scalable networks under the complex and uncertain conditions. However, there are many issues need to deal with and then can achieve the next generation network advantages [12]. In such a network, link or node failures in the SDN-enabled physical infrastructure may disrupt not only its own connectivity and capacity, but may also disrupt user requests for cloud applications and services in network abstraction. Considering their dynamic nature and lack of infrastructure, to ensure continuous service, such networks must be capable of self-organization [11], self-evolution [24], and survivability [38].

2.2 The Proposed DfABC Algorithm

In this paper, a discretized food source for an artificial bee colony (DfABC) algorithm is proposed and improve the traditional ABC bottleneck of food source fitness values (Fig. 1). The notation of the proposed DfABC is listed in Table 1.

Fig. 1
figure 1

The pseudo code for the DfABC algorithm

Table 1 The notation of DfABC

2.3 The Illustration of the DfABC-SVM Model

The main purpose of this research is to optimize the core parameters of the SVM model using the DfABC algorithm. By optimizing the two parameters r and \( \gamma \) in the SVM model, a better cutting line is drawn to classify different types of datasets and minimize the prediction error rate. The performance of SVM mainly depends on model selection, including the selection of the kernel function type and kernel parameter [1]. In the research, polynomial kernel function was selected and implemented in the SVM model. The first consideration of choosing polynomial kernel function is the because it has fewer parameters to select and has significant performance to other kernel functions for the hybrid SVM model with stochastic optimization technique [15]. Second, Fujibuchi and Kato [14] found the different kernel function - linear, polynomial, Gaussian radial basis function kernel function (RBF) showed the similar accuracies for the heterogeneous microarray data when no noise was added. However, the standard linear and RBF kernels decreased the accuracies with increasing noise levels in the training datasets. Third, Sebtosheikh and Salehi [27] claimed that the prediction by SVMs using small training datasets, they recommended to use the polynomial kernel function. The empirically results showed the polynomial kernel function outperformed than the RBF kernel function. Fourth, Alibrandi et al. [3] used the second-order polynomial SVM model and found it can obtain a good approximation of the limit state without high computational complexity. Finally, Liu et al. [23] used SVM to learn the dynamic kinematics relationships between the legs and the trunk of the biped robots. The empirically results demonstrated the polynomial kernel superiority of the RBF kernel. They claimed that polynomial kernel is typical representative of global kernels and RBF kernel is the typical representative of local kernels. Therefore, we decided to choose the polynomial kernel function based on the above reasons and it will be suitable for the heterogeneous data classification. The kernel function of the SVM model adopted by this research is the polynomial equation shown below:

$$ K_{{\left( {x_{i} ,x_{j} } \right)}} = \left( {\gamma X_{i}^{T} X_{j} + r} \right)^{d} $$
(1)

where d is the dimensions of one problem to be solved, T is the transposition of a matrix, and γ and r are parameters. DfABC is used to optimize parameters γ and r so that each food source has both parameter values γ and r. Through the steps of discretization, grouping, and probabilistic choice, a better food source is found. Next, after repeated cycles, the optimal solution for γ and r are found, and this solution is applied to the SVM model to obtain the best classification result. The entire parameter optimization process is described in the following steps.

Step 1. Randomly Generate 10 Food Sources and Food Source Fitness Values in the First Iteration: e.g. fitness value r, γ.

$$ \begin{aligned} {\text{Food}}\,{\text{ source}}\,{\text{ sequence}}\left[ i \right] & = 0.65_{0.1,0.2} ,0.7_{0.3,0.8} ,0.2_{0.2,0.4} ,0.65_{0.1,0.3} ,0.3_{0.1,0.4} ,0.3_{0.1,0.5} , \hfill \\ & \quad 0.25_{0.1,0.9} ,0.4_{0.25,0.3} ,0.35_{0.1,0.6} ,0.75_{0.3,0.76} \hfill \\ \end{aligned} $$

With food source initialization, the Employ bee generates one new food source with two parameter solutions, r and γ. After comparing the new food source with the current food source, the Employ bee will choose the one with a greater fitness value to be the guarded food source. Below is an example of the first food source.

$$ v_{ij} = x_{ij} + \varphi_{ij} \left( {x_{ij} - x_{kj} } \right) = 0.65_{0.1,0.2} + 0.5\left( {0.65_{0.1,0.2} - 0.61_{0.15,0.23} } \right) $$
$$ v_{ij} = 0.65_{0.1,0.2} + 0.5\left( {0.04_{ - 0.05, - 0.03} } \right) = 0.67_{0.05,0.17} $$

Once \( v_{ij} \) is obtained, the Employed bee executes the greedy selection and chooses \( v_{ij} \).

Step 2. Onlooker Bee Executes the Probabilistic Choice

The food source fitness values have been successfully divided into m groups after performing the DfABC algorithm. In this step, the m–groups of food sources are applied to different probabilistic choice methods as a reference for the Onlooker bee. The threshold value of this study is 0.1, i.e., only the food source with a probabilistic choice greater than 0.1 can be selected by the Onlooker bee.

Step 3. Onlooker Bee Executes a Comparison of Fitness Values

The Onlooker bee finds one food source near to the selected food source and compares the fitness values of these two food sources. The Onlooker be chooses the food source with the greater fitness value. This step is the same as Step 1, and we assume that the Onlooker bee selects the second food source in the descending sort sequence.

Food source in the descending sort sequence [i]=

$$ 0.75_{0.3,0.76} ,0.7_{0.3,0.8} ,0.65_{0.1,0.2} ,0.65_{0.1,0.3} ,0.4_{0.25,0.3} , $$
$$ 0.35_{0.1,0.6} ,0.3_{0.1,0.4} ,0.3_{0.1,0.5} , 0.25_{0.1,0.9} ,0.2_{0.2,0.4} $$
$$ v_{ij} = x_{ij} + \varphi_{ij} \left( {x_{ij} - x_{kj} } \right) $$
$$ = 0.7_{0.3,0.8} + 0.7\left( {0.73_{0.3,0.8} - 0.73_{0.25,0.7} } \right) $$
$$ v_{ij} = 0.7_{0.3,0.8} + 0.7\left( { - 0.3_{0.05,0.1} } \right) = 0.679_{0.35,0.9} $$

Once \( v_{ij} \) is obtained, the Onlooker bee executes the greedy selection and chooses \( x_{ij} \).

Step 4. Scout Bee Looks for a New Food Source

This phase may or may not be executed. The triggering condition is that the fitness value of one food source has not changed after the number of iterations exceeds its restriction. When that happens, the Employed bee will discard the food source and generate a new food source. Assuming the fitness value of the 5th food source after descending sorting has not changed after the restricted limit of iterations, a new food source is generated.

$$\begin{aligned} {\text{Set}}\,{\text{ parameter}}\, \, j & = \left[ { - 5,5} \right]\quad X_{i}^{j} = X_{min}^{j} + rand\left( {0,1} \right)\left( {X_{max}^{j} - X_{min}^{j} } \right)\\ &\quad = \, - 5 + 0.7\left( {10} \right) \, = \, 2 \end{aligned} $$

By repeatedly performing the above steps, the best parameter values of γ and r in the SVM model can be obtained.

3 Experiment and Result Methodology

In order to evaluate the effectiveness of the proposed model, this study uses several datasets to determine if the DfABC-SVM model can be applied to different dataset types. In addition, we also compare the DfABC-SVM model with decision tree algorithms such as the classification and regression tree (CART) and the conventional SVM model, and with other hybrid SVM models which use EA or bio-inspired algorithms, such as GA, PSO, and ABC. The results show that the model developed by this study has superior convergence speed and a high level of classification accuracy, and does not fall to local optimization.

3.1 Dataset Descriptions

This study adopts six datasets of different types to evaluate the accuracy and convergence speed of DfABC-SVM. To evaluate if DfABC-SVM has consistent effect in heterogeneous datasets with different features, we chose datasets which vary in the number of instances, dimensions, and subject column categories. The six datasets (i.e., Iris, Glass, Spam Based, MAGIC Gamma, German, and Australian) were collected from the UCI Machine Learning Repository. Table 2 describes these datasets.

Table 2 UCI dataset descriptions

The Iris dataset is one of the most popular datasets in the data mining field and is frequently used to prove model stability. This dataset has 5 dimensions and only 150 instances to classify three different kinds of irises. What distinguishes this dataset from the others is that it has few dimensions, a small number of instances, and only three categories of subject columns. The four major dimensions (sepal length, sepal width, petal length, and petal width) are the main base by which to determine to which of the three iris breeds any data belongs (i.e., Iris Setosa, Iris Versicolour, and Iris Virginica).

The creation of the Glass dataset was motivated by criminological investigations. The glass left behind at the scene of a crime can be used as evidence, if it is correctly identified. This dataset uses 10 dimensions and only 214 instances to classify seven different kinds of glass. The number of dimensions of the Glass dataset is not high. The instances of data are also not too many. However, the subject columns for classification are divided into seven glass categories. This provides a challenge for a classification model because the classification error rate is higher when more types of classifications are available. Therefore, among the seven datasets, the Glass dataset is the most difficult to process.

The Spam Based dataset has the highest number of dimensions and a relatively large number of dataset instances. The dimensions record information regarding e-mail content, such as how many commas and spaces are in the message. Most of the dimensions indicate whether a particular word or character occurs frequently in the e-mail. This dataset has 57 dimensions and 4601 instances to classify only two different kinds of e-mail, i.e., to determine whether a given message is or is not junk mail.

The MAGIC Gamma Telescope observes the cosmic rays striking Earth’s atmosphere. The resulting dataset includes both gamma rays and hadronic showers. This dataset has 11 dimensions and 19,020 instances to classify future readings from this telescope as either Signal (gamma) or Background (hadron). This dataset has the largest number of instances, so it is used to evaluate whether the model is capable of handling a large amount of data.

The German dataset consists of client data from a single German bank. The dataset contains 20 dimensions and 1000 instances. The German dataset is considered to be of moderate complexity and has a large number of instances. The main purpose of this dataset is to classify credit card applicants into one of two types to determine whether or not to issue a credit card to the client. Nineteen dimensions of client information (salary, whether or not the client has children, etc.) allow the model to determine if the client’s credit status is good. The subject column for classification discriminates the records into only two classes: good and bad.

The Australian dataset also classifies clients into two categories: those with good credit and those with poor credit. The complexity and number of instances are low and the entire dataset is continuous. This dataset uses 14 dimensions and 690 instances to classify credit as good or bad. The subject column for classification is divided into two types, so the dataset can be used to analyze and explore the classification model, and makes it easier to obtain better predictive results.

3.2 Experiment Environment and Parameter Setting

In order to make a fair comparison, the values of colony size and maximum cycle number of the ABC algorithm are chosen same as or less than the values of swarm size and maximum iteration number used in PSO case, respectively. The ABC algorithm has a few control parameters: the maximum number of cycles (MCN) equals the maximum number of iterations and the colony size equals the population size (i.e., 125), as in Srinivasan and Seow [28], Karaboga and Basturk [20]. Onlooker bees and employed bees each made up 50% of the colony, while the number of scout bees was set to one. The increased number of scouts encourages exploration as increasing the number of onlookers on a food source increases the degree of exploitation. In the SVM model, the searching range of C and \( \gamma \) were set between 1 and 100 [17]. Then, the PSO-SVM model, the cognition learning factor c1 was set to 2, and social learning factor c2 was set to 1. The number of particles was set to 125, and maximum number of iterations was set to 1000. The inertia weight and maximum velocity value were set to 0.9 and 2. In the GA setting, the single point crossover rate and mutation rate were set to 0.8 and 0.01 [4, 19].

3.3 Evaluating Classification Accuracy

As noted, above, 6 distinctly different datasets are used by this study to evaluate the applicability, accuracy and stability of the DfABC-SVM model. The model is also compared with other classification methods.

The ratio of training to testing datasets was 7:3. In order to determine the best clusters (m) of cutting intervals in DfABC-SVM, the experimental design set the clusters from 2 to 10 and assessed the classification accuracy of the six datasets. The results are shown in Table 3.

Table 3 Classification accuracy of different datasets (unit:%)

Table 3 shows that the SVM model is better than CART, and the SVM model which was optimized with the bio-inspired algorithm is better than conventional SVM. This means that the bio-inspired algorithm has a certain degree of effect on finding the relative optimal solution for parameter settings. Moreover, the method proposed by this study proved to be the most stable, had the best predictive results, and performed better than the others when handling different types of datasets.

Table 4 shows that applying the concept of discrete food sources to m clusters is feasible. When the subject column for classification has less than three categories, setting the number of clusters m to a value greater than 8 produces a less effective result. However, if the subject column for classification has more than three categories, setting the clusters m to a value greater than 8 has a better result. Under general conditions, setting the number of clusters m to between 4 and 7 provides a preferable, more stable experimental result which is not impacted by the features of the dataset.

Table 4 Classification accuracy of DfABC-SVM in different numbers of clusters

3.4 Evaluating Convergence Effectiveness

First, the global optimum was determined and used as the standard line. The number of iterations, convergence speed, and global optimum of DfABC-SVM was then compared with those of other SVM models which had been combined with a bio-inspired algorithm. In Figs. 2, 3, 4, 5, 6 and 7, the Y-axis represents the classification accuracy as a percentage, and the X-axis represents the number of iterations. Optimization Fitness is the average accuracy after five rounds of cross-validation substituting the global optimal parameter setting generated by the Exhaustive Attack Method in the model. Average Fitness is the average accuracy after five rounds of cross-validation substituting the relative better parameter setting by various bio-inspired algorithms in the SVM model. When the two lines are very close or matching, it means that the classification model is completely trained and has achieved convergence.

Fig. 2
figure 2

Convergence result of various bio-inspired SVM models on the iris dataset

Fig. 3
figure 3

Convergence result of various bio-inspired SVM models on the glass dataset

Fig. 4
figure 4

Convergence result of various bio-inspired SVM models on the spam based dataset

Fig. 5
figure 5

Convergence result of various bio-inspired SVM models on the MAGIC Gamma dataset

Fig. 6
figure 6

Convergence result of various bio-inspired SVM models on the German dataset

Fig. 7
figure 7

Convergence result of various bio-inspired SVM models on the Australian dataset

Figures 2, 3, 4, 5, 6 and 7 show that the convergence statuses of various bio-inspired SVM models are quite different. It is obvious that the convergence of the GA- and PSO-SVM models is not complete, and, in some datasets, those models failed to achieve convergence. According to the convergence figures, both the ABC-SVM and DfABC-SVM models reached a stable state within a very few iterations. This proves that the ABC parameter optimization model is indeed more capable than the other bio-inspired algorithms. In addition, the convergence speed of the DfABC model is faster than that of the ABC model.

Table 5 shows that the ABC and DfABC optimized models reach convergence more quickly. The definition of “reach convergence” means that the gap between Optimization Fitness and Average Fitness is no greater than 0.3 after three continuous iterations. Although GA and PSO spent less time processing each iteration, they did not reach convergence after 100 iterations. Besides, such models are very unstable. Therefore, although the time cost for the proposed model is higher, the accumulated convergence time is, in fact, the shortest among the four models. Its classification effectiveness is better than that of the other models. Thus, we conclude that the proposed model is a stable classification model which reaches convergence quickly.

Table 5 The time cost of each method (Unit: second)

4 Discussion

This study modifies the ABC algorithm based on a bio-inspired concept to assist in the parameter optimization of an SVM model. According to the experimental results, the classification effectiveness of DfABC-SVM model proposed by this study is indeed better than that of the other models. The convergence speed is fast, and the model is capable of handling different types of datasets. This study also explored six datasets in three aspects to understand the degree to which the DfABC algorithm could be applied to different types of datasets, and how the number of clusters m should be set.

  1. 1.

    Datasets with few instances, few dimensions, and fewer classification categories

In datasets with a relatively small number of instances, few dimensions, and with classification dimensions fewer than four categories (e.g., the Iris and Australian datasets), setting the number of clusters m between 3 and 7 can obtain good results, as shown in Table 4. This is showed the DfABC algorithm, in datasets with few instances, few dimensions, and fewer classification categories, the discretization of the food source can improve the fitness value of each food source. Dividing the food sources into different groups results in faster convergence and increase in classification effectiveness.

  1. 2.

    Datasets with large instances, many dimensions, and fewer classification categories

In datasets with a large number of instances, many dimensions, and with fewer than four categories for the classification dimensions (e.g., the Spam Based and MAGIC Gamma datasets), the classification effectiveness is higher in comparison to other models. In addition, the classification effectiveness does not differ when the number of clusters m is set differently, indicating that the number of clusters of food sources does not affect classification accuracy and the accuracy rate is very stable than other approaches. Therefore, the DfABC algorithm proposed by this study has a certain level of effectiveness in handling this type of dataset but the difference of clusters m is not so obvious.

  1. 3.

    Datasets with many classification categories

In datasets with more than four categories in their classification dimensions (e.g., the Glass and German datasets), good predictive results are difficult to obtain (see Table 4) because there are too many categories. However, the results show that the proposed DfABC algorithm can obtain a more stable result than do the other models when the number of clusters is higher. This proves that the proposed algorithm can handle complex datasets with a high number of dimensions.

According to Tables 3 and 4, the classification model of DfABC-SVM achieves better results with all six different types of datasets, when compared to other artificial intelligence models. For the setting of the cluster of food source parameter, the DfABC-SVM model achieves a better result when the number of clusters m is set between 5 and 7 (with the exception of the Glass dataset which has more classification dimensions). The convergence speed of the DfABC algorithm is better than that of the GA, PSO, and ABC optimized models. This study proves that the proposed DfABC algorithm can effectively perform parameter optimization and reduce the number of iterations.

5 Conclusion

This study proposed the DfABC algorithm to improve the food source fitness value selection and probabilistic calculation of the original ABC, and applied the proposed DfABC algorithm to an SVM model to improve its parameter adjustment capabilities. After being applied to six different types of datasets, and compared to CART, conventional SVM, and hybrid SVM models with bio-inspired optimized algorithms, the DfABC-SVM model was found to perform better in terms of classification accuracy and convergence speed. Thus, we conclude that DfABC-SVM is a stable model with good classification effectiveness and generalizability.

In addition, the following four findings can be obtained from the experimental results: (1) the modified DfABC algorithm has a faster convergence speed than does the original ABC, and has better classification results. (2) What affects the classification effectiveness most in one particular dataset (the Glass dataset) is not the number of dimensions or instances, but the number of classification dimensions. (3) Of the bio-inspired algorithms, GA and PSO showed poor convergence effectiveness and typically required hundreds of iterations. (4) The experimental results showed suitable values for the cluster of food source parameter for the DfABC-SVM model for different types of datasets. Future studies should investigate whether the DfABC algorithm can be used for all types of optimization problems, i.e., binary, combinatorial, and integer optimization problems. We also suggest that further research be done to determine whether the proposed algorithm can be used in an online data stream or in large scale data environments.