1 Introduction

Optimization problems widely exist in the real world, and therefore methods used to solve these problems have been being a hot topic. However, optimization problems are becoming more and more complicated with the development of science and technology, and traditional gradient-based methods are inefficient and inconvenient for such problems as they require substantial gradient information, depend on a well-define starting point, and need a large amount of enumeration memory. On the other hand, meta-heuristic algorithms, such as Genetic Algorithms (GAs) [1], Differential Evolution [2], Particle Swarm Optimization (PSO) [3], and Ant Colony Optimization (ACO) [4], have shown better results on various complex problems such as feature selection [5], the design of controllers [6], and the node placement of wireless sensor networks [7]. Encouraged by the achievements of meta-heuristics, more and more researchers devote themselves into the study of the design and application of meta-heuristics.

The well-known No Free Lunch theorem states that any two optimization algorithms are equivalent when their performance is averaged across all possible problems. It hints that some algorithm can be better than the others on a class of problems, which has been demonstrated by previous works. Thus developing new meta-heuristics for solving various problems more efficiently and effectively has drawn more and more attention, and because of the great success of GAs, PSO, and ACO, which are inspired by biological systems, exploring biologically inspired meta-heuristics has been one of hottest topics in evolutionary computation community. During the last decade, varieties of biosystem-based meta-heuristics, such as Artificial Fish Swarm Algorithms (AFSA) [8], Artificial Bee Colony Optimization (ABC) [9], Bat Algorithms (BA) [10], Hunting Search Algorithms [11], Harmony Search (HS) [12], Fruit Fly Optimization Algorithms (FOA) [13], Firefly Algorithms [14], Shuffled Frog-leaping Algorithms [15], and Cuckoo Search [16], have been developed and applied to different problems. As is known to all, human being is the smartest creature in the world because of the most powerful learning ability, and humans are able to tackle a large number of complicated problems that other living beings, such as birds and ants, cannot solve. Therefore, it is natural to presume that the meta-heuristic based on the learning mechanisms of human being may have advantages over other biological systems based algorithms on optimization problems in our daily life. Actually, many human learning activities are similar to the search process of meta-heuristics. For example, people repeatedly study and evaluate the performance of each practice to update their experience for guiding the following study to master a new skill better, which is analog to meta-heuristics iteratively yielding new candidate solutions and calculating the corresponding fitness values for adjusting their following search. Motived by this idea, Wang et al. [17] presented a new meta-heuristic algorithm called Human Learning Optimization (HLO) recently. However, HLO assumes that all the individuals have the same learning ability, which is not true. Herrnstein presented in his famous book “The bell curve” that Intelligence Quotient (IQ) scores followed Gaussian distribution [18], and the previous research results also showed that IQ test scores had significantly increased and would continue to rise with the development of society and technology [19, 20]. Inspired by these facts, this paper proposes an improved HLO algorithm, called Diverse Human Learning Optimization (DHLO), in which the learning ability of individuals follows a Gaussian distribution and dynamically adjusts to improve the search ability of the algorithm.

The rest of the paper is organized as follows. Section 2 presents the concept, operators, and implementation of DHLO in details. Then the parameter study of DHLO is performed and discussed in Sect. 3. Section 4 verifies the performance of DHLO on benchmark functions as well as knapsack problems, and the results are compared with the standard HLO as well as the other eight meta-heuristic algorithms. Finally, conclusions are remarked in Sect. 5.

2 Diverse human learning optimization

Human learning process is extremely complicated of which the study is the part of neuropsychology, educational psychology, learning theory, and pedagogy. For the ease of implementation, DHLO, like HLO [17], uses three learning operators, i.e. the random learning operator, the individual learning operator, and the social learning operator, to update the population and search out the optimal solution, which emulates the behaviors of random learning, individual learning, and social learning in human learning activities. For example, when a person learns to play basketball, he or she may study new skills randomly because of lack of prior knowledge (random learning), learn from his or her former experience (individual learning), and find useful methods from his or her coach or related books (social learning).

DHLO adopts the binary-coding framework, that is, the individual of DHLO is represented as a binary string, in which each bit of solutions is analog to a basic element of the knowledge that humans need to learn. Assuming that there is no prior-knowledge of the problems at the beginning, an individual is initialized with “0” or “1” randomly as Eq. (1)

$$\begin{aligned} x_i =\left[ {x_{i1}\,\; x_{i2}\,\; \ldots \,\; x_{ij}\,\; \ldots \,\; x_{iN} } \right] ,\;x_{ij} \in \{0,1\},1\le i\le M,1\le j\le N \end{aligned}$$
(1)

where \(x_{ij} \) is the jth bit of the ith individual, and M and N denote the number of individuals in the population and the length of solutions, respectively.

2.1 Random learning operator

As Cziko [21] presented that human learning was the result of random variation and universal selection, randomness always exists in the process of human learning. At the beginning of learning, humans usually learn by their random acts since there is no prior knowledge of a new problem. With the proceeding of studying, people still perform random learning because of various factors such as forgetting, disturbance, and knowing partial knowledge about problems. Besides, human being keeps exploring new strategies to learn better in which random learning is unavoidable. DHLO performs the random learning operator to mimic these phenomena as Eq. (2),

$$\begin{aligned} x_{ij} =RE(0,1)=\left\{ {\begin{array}{l} 0,\;rand\le 0.5 \\ 1,\;else \\ \end{array}} \right. \;\;\; \end{aligned}$$
(2)

where rand is a stochastic number between 0 and 1.

2.2 Individual learning operator

Individual learning is the ability of humans to gain knowledge through the individual reflection on external stimuli [22]. People memorize the useful experience during their study and use it when they face the same or similar problems and therefore they can avoid mistakes and learn more efficiently. To simulate this learning behavior, each individual in DHLO stores its personal best solutions in the individual knowledge database (IKD) represented as Eq. (3)

$$\begin{aligned} {\textit{IKD}}_i =\left[ {\begin{array}{l} {ikd}_{i1} \\ {ikd}_{i2} \\ \vdots \\ {ikd}_{ir} \\ \vdots \\ {ikd}_{iP} \\ \end{array}} \right] =\left[ {\begin{array}{llllll} {ik}_{i11} &{}{ik}_{i12} &{}\cdots &{}{ik}_{i1j} &{}\cdots &{} {ik}_{i1N} \\ {ik}_{i21} &{}{ik}_{i22} &{}\cdots &{}{ik}_{i2j} &{}\cdots &{}{ik}_{i2N} \\ \vdots &{}\vdots &{}&{}\vdots &{}&{} \vdots \\ {ik}_{ir1} &{}{ik}_{ir2} &{}\cdots &{}{ik}_{irj} &{}\cdots &{}{ik}_{irN} \\ \vdots &{}\vdots &{}&{}\vdots &{}&{} \vdots \\ {ik}_{iP1} &{}{ik}_{iP2} &{}\cdots &{}{ik}_{iPj}&{} \cdots &{}{ik}_{iPN} \\ \end{array}} \right] ,1\le r\le P \end{aligned}$$
(3)

where \({\textit{IKD}}_{i}\) denotes the individual knowledge database of person \(i, {ikd}_{ir}\) stands for the rth best solution of person i, and P is the size of IKDs. When DHLO executes the individual learning operator, it chooses a random solution in the IKD and then copies the corresponding value as Eq. (4),

$$\begin{aligned} x_{ij} ={ik}_{irj} \end{aligned}$$
(4)

where r is a random integer.

2.3 Social learning operator

However, when problems become extremely complicated, it would be impossible or very time-consuming for a single person to solve. In a social environment, humans directly or indirectly transfer their knowledge and therefore improve the efficiency and effectiveness of study by social learning [23]. The previous works demonstrate that population-based meta-heuristics have an advantage on complicated problems because of the sharing of knowledge among individuals. Therefore, social learning is simulated in DHLO to enhance the search ability of the algorithm and the best solutions found by all the individuals are archived in the social knowledge database (SKD) as Eq. (5) for sharing experience in the population,

$$\begin{aligned} {\textit{SKD}}=\left[ {\begin{array}{l} {skd}_1 \\ {skd}_2 \\ \vdots \\ {skd}_s \\ \vdots \\ {skd}_Q \\ \end{array}} \right] =\left[ {\begin{array}{llllll} {sk}_{11} &{}{sk}_{12} &{}\cdots &{}{sk}_{1j}&{} \cdots &{}{sk}_{1N} \\ {sk}_{21} &{}{sk}_{22} &{}\cdots &{} {sk}_{2j} &{}\cdots &{}{sk}_{2N} \\ \vdots &{}\vdots &{}&{} \vdots &{}&{} \vdots \\ {sk}_{s1} &{}{sk}_{s2}&{} \cdots &{}{sk}_{sj} &{}\cdots &{}{sk}_{sN} \\ \vdots &{}\vdots &{}&{} \vdots &{}&{} \vdots \\ {sk}_{Q1} &{}{sk}_{Q2} &{}\cdots &{}{sk}_{Qj} &{}\cdots &{}{sk}_{QN} \\ \end{array}} \right] ,1\le s\le Q \end{aligned}$$
(5)

where \({skd}_{s}\) denotes the sth solution in the SKD and Q is size of the SKD. Based on the knowledge in the SKD, DHLO performs the social learning operator to generate a new solution as Eq. (6),

$$\begin{aligned} x_{ij} ={sk}_{sj} \end{aligned}$$
(6)

where s is a random integer.

2.4 Gaussian-distribution and dynamic updating of the learning ability

DHLO, as well as HLO, generates new solutions by performing the random learning operator, the social learning operator, and the individual learning operator. In general, the implementation of these three learning operators can be formulated as Eq. (7),

$$\begin{aligned} x_{ij} =\left\{ {\begin{array}{l} RE(0,1),\;\;\;0\le rand\le p_r \\ ik_{irj} ,\;\;\;\;\;\;\;\;\;pr<rand\le p_i \\ sk_{sj} ,\;\;\;\;\;\;\;\;\;\;else \\ \end{array}} \right. \end{aligned}$$
(7)

where \(p_{r}\) and \(p_{i}\) are two control parameters used to determine the probabilities of running the operators. Specifically, \(p_{r}\) determines the probability of random learning while (\(p_{i}-p_{r})\) and (\(1-p_{i})\) are the rates of individual learning and social learning, respectively. In the standard HLO these two parameters, i.e. \(p_{r}\) and \(p_{i}\), are both set as constants and the recommended values are 5/M and \(0.85+2/M\) where M is the length of solutions. Therefore, all the individuals of HLO have the same learning capabilities, which is not true in a real human population. For instance, the IQ scores of humans [24], as well as some other factors influencing human learning, follow Gaussian distribution, which results in different learning ability of people, and consequently the scores on an exam usually follow an approximately Gaussian distribution. In addition, Flynn points out that IQ test scores would rise. Inspired by these facts, the Gaussian-distributed learning ability and dynamic adjusting strategy are developed in the DHLO.

Taking a deep insight into the learning operators of HLO, it is obvious that the random learning operator performs a random search in which none of knowledge is taken into account. Considering that only two values, i.e. 0 and 1, exist in binary space, the function of the random learning operator is similar to the mutation operator of Genetic Algorithms. Thus it is sensible that the suggested value of \(p_{r}\) is very small since the contribution of the random learning operation is to keep the diversity of the population and perform a local search, otherwise the random search may impair the learning mechanisms of HLO and significantly spoils the performance of the algorithm. Compared with the random learning operator, the individual learning operator and the social learning operator are two main learning operators that update the population according to the individual experience and the knowledge of the population, respectively. Therefore, \(p_{i}\) plays a very important role since it directly determines the abilities of individual learning and social learning. For example, if \(p_{i} =1\), HLO would lose the ability of social learning and consequently the efficiency and effectiveness of the algorithm is ruined since the advantage from the knowledge sharing does not exist. On the other hand, if \(p_{i}=p_{r}\), which means that individual learning is abandoned, HLO would be degraded to a local search around the global best solution. Unfortunately, the optimal \(p_{i}\) depends on problems and thus it is almost impossible to set the optimal value without prior knowledge. To tackle this problem, the Gaussian distribution and the dynamic updating of the parameter \(p_{i}\) are introduced in DHLO to tune \(p_{i}\) and improve the search ability.

First, when initializing the algorithm, each individual of DHLO is given a different personal \(p_{i}\) instead of the same one for all the individuals in HLO, which follows Gaussian distribution as Eq. (8),

$$\begin{aligned} p_i \sim N(\mu ,\sigma ^{2}) \end{aligned}$$
(8)

where \(\mu \) and \(\sigma \) are the mean and standard deviation, respectively. The advantages of using Gaussian distribution are: (1) a majority of values of \(p_{i}\) are yielded in the range determined by \(\mu \) and \(\sigma \), and therefore a fair performance of DHLO can be guaranteed; (2) compared with HLO using the only one value of \(p_{i}\), the robustness of DHLO is enhanced by searching with various reasonable \(p_{i} \) values; (3) the difference of the performance of individuals will be shown due to using different \(p_{i}\) values, which can be used for dynamically updating \(p_{i}\) to improve the search ability further.

Then the dynamic updating of \(p_{i}\) is executed every DG generations where DG is a pre-defined constant. When performing dynamic updating, \(\mu \), i.e. the mean of the Gaussian distribution, is set as Eq. (9)

$$\begin{aligned} \mu =p_i^*\end{aligned}$$
(9)

where \(p_i^*\) is the \(p_{i}\) value of the individual with the best fitness. The \(p_{i}\) value of each individual is adjusted as Eq. (10) if the global optima found by DHLO is updated in the latest DG generations,

$$\begin{aligned} p_{i,j} =p_{i,j} +rand\times (p_i^*-p_{i,j} ) \end{aligned}$$
(10)

where \(p_{i,j} \) is the \(p_{i}\) value of the jth individual, and therefore the \(p_{i}\) of all individuals moves to a better value to improve the performance in the following search. Otherwise, all the values of \(p_{i}\) are re-initialized with \(\sigma \) and the updated \(\mu \).

2.5 Updating of the IKD and the SKD

After a new population is generated, the fitness of candidates is calculated according to the fitness function and used to update the IKDs and the SKD, which is analog to the process that humans evaluate their performance through practicing to refresh their knowledge for further studying. For the updating of the IKD, if the number of solutions in the current IKD is less than P, i.e. the pre-defined size of the IKD, the new candidate will be stored in the IKD no matter of its fitness. Otherwise the new candidate is reserved and used to replace the solution with the worst fitness in the IKD only when it has a better fitness. For the updating of the SKD, the same strategies as the updating of the IKD are applied. However, DHLO only permits to replace one solution in the SKD in each generation to keep the diversity and avoid the premature of the algorithm.

2.6 Implementation of DHLO

In summary, the procedure of DHLO can be concluded as follows:

  • Step 1: initialize the population randomly, yield the initial values of \(p_{i}\) for each individual following Gaussian distribution, and set the other parameters of DHLO such as \(p_{r}\) and the maximal generation;

  • Step 2: calculate the fitness of initial individuals and initialize the IKDs and SKD;

  • Step 3: yield new candidates by performing the three learning operators as Eq. (7);

  • Step 4: compute the fitness of all the new solutions;

  • Step 5: update the IKDs and SKD according to the updating rules;

  • Step 6: for every DG generations, set the mean \(\mu \) of Gaussian distribution as Eq. (9), and then adjust the value of \(p_{i}\) of each individual as Eq. (10) if the global optima is updated; otherwise re-initialize the \(p_{i}\) of each individual with the updated \(\mu \);

  • Step 7: if the termination conditions are met, output the best solution; otherwise go to step 3.

3 Parameter analysis of DHLO

To apply the strategies of Gaussian distribution and dynamic updating efficiently, a parameter study on these two kinds of operation were carried out, and two functions, i.e. F2 and F9, chosen from the CEC05 benchmark functions [25] were adopted for testing. The characteristics of these two functions as well as the other 13 functions used as benchmarks for evaluating the DHLO in the next section are listed in Table 1.

Table 1 The CEC05 benchmark functions

Gaussian distribution includes two parameters, i.e. the mean \(\mu \) and the standard deviation \(\sigma \). In DHLO \(\mu \) is dynamically adjusted by the algorithm, thus only the standard deviation \(\sigma \) need be manually set. As known to all, about 99.7 % of random numbers generated by Gaussian distribution are within three standard deviations of the mean, i.e. \([\mu -3\sigma ,\mu +3\sigma ]\). Therefore, \(3\sigma \) was adopted as a variable for simplification in the parameter study. As for the dynamic updating strategy, the variable is DG. A set of \(3\sigma \) and DG, i.e. {0.005, 0.01, 0.02, 0.05, 0.08, 0.1, 0.15} and {100, 200, 500, 1000, 1500, 3000}, respectively, were used to solve the 2-dimensional and 30-dimeinsional F2 and F9. For the 2-dimensional functions, the population size was set to 50 and the maximal generation was 3000. For the 30-dimensional functions, the population size and the maximal generation number were increased to 100 and 6000, respectively. Each variable was encoded by 30 bits, and each function ran 50 times independently. The results, including the best fitness value (BFV), the mean fitness value (MFV), and the standard deviation (STD), are given in Tables 2 and 3. The best results of the algorithms are marked with bold-face in the corresponding tables.

Table 2 The results of parameter study on F2
Table 3 The results of parameter study on F9

Tables 2 and 3 show that the optimal \(3\sigma \) and DG are dependent on problems and these two parameters also interact with each other. However, with a very big \(3\sigma \), for instance, a value bigger than 0.1, \(p_{i}\) will spread in a wide range and greatly deviate from the recommended value, which consequently spoils the exploration-exploitation trade-off of the algorithm. On the other hand, a very small \(3\sigma \) is also improper since it reduces or even vanishes the advantage from Gaussian distribution. As for the DG, a large DG decreases the influence of dynamic updating since it reduces the chance of performing the operation, while a small DG can enhance the function of dynamic updating and improve the performance of the algorithm. For example, DHLO obtains better result on F2 with \(3\sigma =0.005\) and DG=100 than any other results yielded with \(3\sigma =0.005\) and \(\hbox {DG}>100\). However, setting a very small DG is risky as DHLO is very sensitive to \(3\sigma \) and the algorithm is likely to be unstable in this situation. Due to the randomness of DHLO, the best solutions found during the search process might be far away from the real optimal solution, thus it is highly possible that the temporary best solutions might mislead the dynamic updating operation especially when a small DG is applied. Consequently, the performance of DHLO might become worse, which can be observed from the data of 30-dimensional F2.

In general, it is more reasonable to choose a moderate \(3\sigma \) and a large DG so that the former can effectively improve the search ability and the latter can decrease the negative effect from the “wrong” best solutions. Based on the comprehensive analysis of the results in Tables 2 and 3, 0.02 and 1000 are chosen as the default values of \(3\sigma \) and DG, respectively.

4 Experimental results and discussions

To evaluate the performance, DHLO, as well as the standard HLO [17] and the other eight binary-coded meta-heuristics, i.e. the Binary Differential Evolution algorithm (BDE) [26], the Simplified Binary Artificial Fish Swarm Algorithm (S_bAFSA) [27], the Adaptive Binary Harmony Search (ABHS) [28], the Binary Gravitational Search Algorithm (BGSA) [29], the Binary Bat Algorithm (BBA) [30], the Binary Artificial Bee Colony (BABC) [31], the Bi-Velocity Discrete Particle Swarm Optimization (BVDPSO) [32], and the Modified Binary Particle Swarm Optimization (MBPSO) [33], was applied to solve the 15 CEC05 benchmark functions listed in Table 1 and knapsack problems. For a fair comparison, the recommended parameter values of these algorithms were adopted, which are given in Table 4. As the CEC05 benchmarks and knapsack problems studied in this paper are the single-objective problems, the sizes of the IKDs and the SKD were both set to 1 as recommended in [17]. Besides, the IKDs of DHLO were re-initialized if the individual best solution was not updated in 100 successive generations to prevent the algorithm from being trapped in the local optima. The other parameters of DHLO, such as the population size and the maximal generation, were the same as those used in Sect. 3.

4.1 Benchmark functions

4.1.1 Low-dimensional functions

The numerical results and the Wilcoxon signed-rank test (W-test) results on the 2-dimensinal functions are given in Table 5, in which “1” denotes that DHLO significantly outperforms the compared algorithm at the 95 % confidence, “\(-1\)” represents that DHLO is significantly worse than the compared algorithm, and “0” indicates that the achieved results by DHLO and the compared algorithm are not statistically different. For clearly analyzing and comparing the performance, the rankings and the W-test results of all the algorithms are summarized in Tables 6 and 7, respectively.

Table 4 The recommended parameter values of all the algorithms
Table 5 The results of the 2-dimensional functions

Tables 6 and 7 show that DHLO has better performance on the low-dimensional functions. Specifically, DHLO achieves the best numerical results on all the functions. The performance ranking of all the algorithms sorted in the descending order is DHLO, HLO, BVDPSO, S_bAFSA, ABHS, BDE, BGSA, MBPSO, BBA, and BABC. The W-test results demonstrate that DHLO is significant better than HLO and the other eight algorithms on 12 and 14 out of 15 functions while it is inferior to them on none.

4.1.2 High-dimensional functions

The optimization results on the 30-dimensional functions are given in Table 8. Likewise, the rankings and the W-test results of all the algorithms are summarized in Tables 9 and 10 for clearly reviewing the performance of all the algorithms. The results of the high-dimensional functions also indicate that DHLO has an advantage over the other nine algorithms. Table 9 displays that DHLO obtains the optimal numerical result on 14 out of 15 functions and is only inferior to S_bAFSA on F10. The performance of all the algorithms on the high-dimensional functions sorted in the descending order is DHLO, HLO, BDE, S_bAFSA, ABHS, BVDPSO, MBPSO, BBA, BGSA, and BABC. The W-test results in Table 10 indicate that DHLO significantly surpasses ABHS, BGSA, BBA, BABC, BVDPSO, and MBPSO on all the functions. Compared with HLO, BDE, and S_bAFSA, DHLO has significantly better results on 13, 12, and 13 out of 15 functions and yields statistically similar results on the other 2, 3, and 2 functions, respectively.

Table 6 The rankings of all the algorithms on the 2-dimensional functions
Table 7 The summary of the W-test results between DHLO and the other meta-heuristics on the 2-dimensional functions
Table 8 The results of the 30-dimensional functions
Table 9 The rankings of all the algorithms on the 30-dimensional functions
Table 10 The summary of the W-test results between DHLO and the other meta-heuristics on the 30-dimensional functions

4.2 Knapsack problems

Previous work [34] show that the ranking of compared optimizers are sensitive to benchmark sets, and therefore the performance of DHLO is further evaluated on knapsack problems for a comprehensive comparison. Knapsack problems are combinatorial optimization problems and have been studied intensively in the last few decades as their simple structure, which, on the one hand, allows the exploitation of a number of combinatorial properties and, on the other hand, allows more complex optimization problems to be solved through a series of knapsack-type sub-problems [35]. Actually, many real application problems, such as cargo loading, cutting stock, project selection, and budget control, can be formulated as knapsack problems. In this work, DHLO and the other meta-heuristic algorithm are adopted to solve 0-1 knapsack problems (0-1 KP) and multidimensional knapsack problems (MKP).

4.3 0-1 knapsack problems

In a given set of N items, each of them has a weight \(w_{j}\) and a profit \(p_{j}\). The 0-1 knapsack problem is to select a subset from the set of Nitems such that the overall profit is maximized without exceeding a preset weight capacity C, which can be mathematically formulated as Eq. (11)

(11)

where the binary decision variable \(x_{j}\) takes values either 0 or 1 which represents the selection or rejection of the jth item. Without loss of generality, 0-1 KPs assume that all profits and weights are positive and all weights are smaller than C. As 0-1 KPs are constrained problems, infeasible solutions, of which the total weight exceeds the limit C, may be generated during the search process. Thus, the penalty function method as Eq. (12) is adopted to deal with infeasible solutions,

$$\begin{aligned} Max\;F(\text {x})= & {} f(x)-\lambda \times \max (0,c) \nonumber \\ c= & {} \sum \limits _{j=1}^N {w_j x_j -C} \end{aligned}$$
(12)

where the penalty coefficient \(\lambda \) is a big constant so that the fitness of infeasible solutions is inferior to that of feasible solutions, which can lead the algorithm to escape from the infeasible area and search in the feasible region.

A set of 0-1 KPs was generated according to [35, 36] for the performance evaluation. The numbers of items were set to 50, 100, 250, 500, 800, 1000, 1200, 1500, 2000 and 2500, and three cases of each scale were yielded for achieving the comprehensive and exact results. The weight \(w_{j}\) and the profit \(p_{j}\) were produced randomly from 5 to 20 and from 50 to 100, respectively. The weight capability C was correspondingly set to 600, 1200, 3000, 6000, 10,000, 12,000, 15,000, 18,000, 25,000, and 30,000. For low-dimensional instances, in which the number of decision variables is less than 1000, the population size and maximum generation were set to 100 and 5000, respectively. For high-dimensional problems of which the items are no less than 1000, the population and maximum generation were set to 300 and 10,000, respectively. The experimental results are listed in Tables 11 and 12, and the summary results of the ranking and W-test are given in Tables 13 and 14.

Tables 11 and 12 show that DHLO searches out the best known results on all the 0-1 KPs while HLO, BDE, S_bAFSA, ABHS, BGSA, BBA, BABC, BVDPSO, and MBPSO find 19, 6, 6, 20, 5, 5, 3, 9 and 22 best known solutions out of 30 instances, respectively. Specifically, DHLO has the equal search ability as HLO and ABHS, on small-scale problems since all of them can find the best-known values on 50.1, 50.2, 50.3, 100.1, 100.2, 100.3 and 250.3 cases with 100 % success rate. However, DHLO displays an advantage over the other meta-heuristics as the dimension of problems increases. Table 12 illustrates that only DHLO can reach all the best fitness values when the item of 0-1 KPs is more than 1000.

Table 11 The results of low-dimensional 0-1 knapsack problems
Table 12 The results of high-dimensional 0-1 knapsack problems

The ranking results in Table 13 show that the performance of all the 10 algorithms sorted in the descending order is DHLO, HLO, MBPSO, ABHS, BVDPSO, BDE, S_bAFSA, BGSA, BBA, and BABC, and the W-test results in Table 14 claim that DHLO is significant better than MBPSO, HLO, ABHS, BVDPSO, BDE, S_bAFSA, BGSA, BBA, and BABC on 18, 18, 21, 23, 27, 27, 27, 27, 27 out of 30 instances while it is worse than them on no one.

4.3.1 Multidimensional knapsack problems

The multidimensional knapsack problem (MKP) is a multi-constrained problem. The objective of MKPs is still to find out an optimal subset for the maximum total profit but with multiple constrains instead of only one constrain in the basic 0-1 knapsack problem, which can be formulated as Eq.(13):

$$\begin{aligned}&max\;f(\hbox {x}_1 ,\hbox {x}_2 ,\ldots ,\hbox {x}_\mathrm{N} )=\sum \limits _{j=1}^\mathrm{N} {p_j x_j} \nonumber \\&s.t.\left\{ {\begin{array}{ll} \sum \limits _{j=1}^\mathrm{N} {r_{ij} x_j \le c_i} , &{} \,i\in \{1,2,\ldots ,\hbox {M}\} \\ x_j \in \{0,1\}, &{} \,\hbox {j}\in \{1,2,\ldots ,\hbox {N}\} \\ \end{array}} \right. \end{aligned}$$
(13)

where N is the number of items, M is the number of constrains, \(p_{j}\) is the profit of the jth item, \(c_{i}\) is the capacity of the ith knapsack, and \(r_{ij}\) is the weight of the jth item in the ith knapsack with capacity constrain \(c_{i}\).

Table 13 The rankings of all the algorithms on the 0-1 knapsack problems
Table 14 The summary of the W-test results between DHLO and the other meta-heuristics on 0-1 knapsack problems

The MKP is well known to be much more difficult than the basic single-constrained 0-1 knapsack problem, thus various powerful local search or repair strategies have been to developed and introduced into meta-heuristics for fixing infeasible solutions and improving results. However, the real performance of meta-heuristics would be concealed with these additional heuristic operators, and therefore the penalty function strategy is still adopted in MKPs. Previous work [37] indicates that the penalty function method, called pCOR, has the best results on solving MKPs, and thus pCOR is adopted in this paper which can be described as Eqs. (1415),.

$$\begin{aligned} pCOR(x)= & {} \frac{p_{\max } +1}{r_{\min } } \times \max \{CV(x,i)\} \end{aligned}$$
(14)
$$\begin{aligned} CV(x,i)= & {} \max \left( 0,\sum {r_{ij} x_j -c_j}\right) \end{aligned}$$
(15)

where pCOR(x) is the penalty coefficient used in the penalty function for infeasible solutions, \(p_{max}\) is the maximum profit coefficient, \(r_{min}\) is the minimum resource consumption, and CV(x,i) is the amount of constraint violation for constraint i.

For a comprehensive comparison, six problem sets from the OR-Library, i.e. Pet, Sento, HP, 5-100, 10-100, and gk, of which the number of item ranges from 6 to 2500, are adopted to test the performance of DHLO as well as the other meta-heuristics. For the problems in which the number of items is less than 1000, the population size and the maximum generation of all the algorithms are set to 100 and 5000. Otherwise, the cases are regarded as high-dimensional problems and the population size and the maximum generation of the meta-heuristics increase to 300 and 10,000. The numerical results are given in Tables 15, 16, 17, and the ranking and W-test results on all the instances are summarized in Tables 18 and 19, respectively.

Table 15 Results of the Pet and Sento problem sets
Table 16 Results of the 5.100 and 10.100 problem sets
Table 17 Results of the gk problem set
Table 18 The rankings of all the algorithms on multidimensional knapsack problems

The results in Tables 15, 16, 17 indicate that MKPs are much complicated than the basic 0-1 KPs, and therefore most algorithms with the penalty function method can only find the best solutions of the first six instances of the simple problem set Pet, of which the number of items is no more than 39. As for the complicated problem sets like 5.100, 10.100, and gk, only DHLO successfully searches out the optimal solution on case 5.100.05. Specifically, DHLO, HLO, BDE, S_bAFSA, ABHS, BGSA, BBA, BABC, BVDPSO, and MBPSO find 12, 8, 11, 9, 6, 5, 4, 5, 10, and 6 best known solutions out of 42 instances, respectively, and DHLO achieves better fitness values on all the instances. Table 18 illustrates that the performance ranking of all the algorithms sorted in the descending order is DHLO, HLO, S_bAFSA, BVDPSO, BDE, MBPSO, BBA, ABHS, BABC, and BGSA, and the W-test results in Table 19 indicate that DHLO also has an advantage over the other algorithms on MKPs since it is superior to HLO, BDE, S_bAFSA, ABHS, BGSA, BBA, BABC, BVDPSO, and MBPSO on 29, 31, 31, 38, 40, 37, 40, 32, and 39 out of 42 instances, respectively.

Table 19 The summary of the W-test results between DHLO and the other meta-heuristics on multidimensional knapsack problems

In summary, based on the results of the benchmark functions and knapsack problems, it is fair to claim that the presented DHLO has better optimization performance in terms of search accuracy and scalability in comparison to HLO, BDE, S_bAFSA, ABHS, BGSA, BBA, BABC, BVDPSO, and MBPSO. In addition, the results on CEC05 benchmark functions and knapsack problems hints that the performance of algorithms is sensitive to problems. For example, BDE and S_bAFSA have better performance in high-dimensional numerical function problems than two PSO variants, i.e. MBPSO and BVDPSO, while these two binary PSO algorithms both surpass BDE and S_bAFSA on 0-1 KPs. As for MBPSO and BVDPSO, it can be found that MBPSO is superior to BVDPSO on 0-1 KPs while it is worse than BVDPSO on MKPs. PSO, DE, AFSA and the other algorithms are originally developed to tackle continuous or discrete problems, and therefore the operators of these algorithms need to be re-defined and modified for binary problems. However, these re-definitions or modifications are not always easy or natural, and varied strategies would change the search ability of algorithms and lead to different strengths and weakness, which causes the diverse performance of MBPSO and BVDPSO on 0-1 KPs and MKPs. Compared with the other meta-heuristics such as PSO, DE, and AFSA, HLO is an inborn binary-coding algorithm and the results of benchmark functions and knapsack problems show that HLO has more robust and steadier performance on binary problems. Therefore, it is reasonable that the presented DHLO gains an advantage over the other algorithms since it inherits excellent characteristics from HLO on binary problems and the developed dynamic adjusting strategy as well as the re-initialization of the IKDs can adaptively balance the exploitation and exploration ability and efficiently help the algorithm escape from the local optima.

5 Concluding remarks

Human Learning Optimization is a novel binary-coded meta-heuristic based on a simplified model of human learning. By mimicking random learning, individual learning, and social learning of human being, HLO develops three learning operators, i.e. the random learning operator, the individual learning operator, and the social learning operator, to search the optimal solution efficiently. However, all the individuals in the standard HLO share the same control parameters of learning operations, that is, all the individuals possess the same learning ability, which is not true in a real human population. Inspired by the fact that human IQ scores follow Gaussian distribution and increase with the development of technology, this paper presents an improved HLO algorithm, named Diverse Human Learning Optimization, in which the Gaussian distributed learning operator and dynamic adjusting strategy are introduced. Through yielding a set of control parameters of learning operators following Gaussian distribution, the robustness of the algorithm is strengthened. Besides, by cooperating with the dynamic updating operation, DHLO can adjust to the better parameter values and hence the global search ability of the algorithm is enhanced. The proposed DHLO is applied to solve CEC05 benchmark functions and knapsack problems to evaluate its performance against the standard HLO and the other eight meta-heuristics, i.e. BDE, S_bAFSA, ABHS, BGSA, BBA, BABC, BVDPSO, and MBPSO. The comparison results demonstrate that DHLO is superior to the other nine algorithms in terms of search accuracy and scalability.

As mentioned above, DHLO, as well as HLO, is based on a simplified human learning model while the real human learning is an extremely complicated process. During the last decades many achievements on cognitive science and learning theories have been reported. Therefore, one of our future work is to introduce these achievements on human learning into HLO to consummate the algorithm. Another important direction of the future work is to extend the applications of HLO for better understanding the characteristics of HLO as well as further improving its performance.