1 Introduction

After John Holland presented the genetic algorithm (GA) as an abstraction of biological evolution and gave a theoretical framework for adaptation under the GA in his book [23], for the last four decades, a large amount of research focusing on nature-inspired optimization algorithms has been conducted, leading to the emergence of a series of classic metaheuristic optimization algorithms, such as simulated annealing, tabu search, ant colony optimization, and particle swarm optimization.

In 1994, a conceptual model of the cultural algorithm (CA) was first proposed by [34], which was further developed by Reynolds and his students in the following years [9, 25, 36, 37]. As a novel evolutionary framework with the principle of dual-inheritance between culture and individuals, the CA soon attracted much attention and was proved to be of practical success in a variety of problem domains [20, 44, 45].

When considering the CA, it is necessary to mention the memetic algorithm (MA), which was put forward by [31]. In essence, the key difference between the CA and MA lies in the way in which the individuals search and evolve. Specifically, the MA can be seen as a combination of a global search of the whole population and a local search of individuals, which is based on imitation from a certain subset of the population [32], while in the CA the learning process of each individual is based on the belief space, and the knowledge source stored in the belief space during the evolution can be any information that benefits progress.

Nevertheless, returning to optimization problems, decision makers usually need to consider more than one objective (which are often conflicting) when addressing practical optimization problems. For instance, minimization of the operational costs of a certain plant is one objective, while the quality of products is another aspect that should be taken into account. Though an intuitive approach to handling multiobjective problems is to add all the objectives together under a set of weighting coefficients according to the importance of each objective, the configuration of the preference vector is difficult, as it is inevitably problem-specific. Moreover, the outcome of using a classical optimization method is a single optimized solution. Benefiting from the ability of intelligent optimization algorithms to find multiple optimal solutions in one single simulation run, however, multiobjective optimization problems (MOOPs) can be solved without converting the task of finding multiple trade-off solutions into finding one single solution; thus, the task of decision makers is to simply pick out one solution among the resulting set of optimal trade-offs according to their preference.

Both evolutionary and swarm-based optimization algorithms have been successfully extended to multiobjective fields and quite a few of them have shown satisfactory performance on test and/or practical applications. Deb et al. [17] introduced a fast nondominated sorting approach called the NSGA-II based on the previous nondominated sorting genetic algorithm (NSGA) by [40] to reduce the computational complexity. Additionally, an elitism and the notion of crowding distance were incorporated for the purpose of preserving elites and omitting solutions located in overcrowded regions, respectively. As a successful improvement to the NSGA, the NSGA-II was one of the most popular multiobjective optimization algorithms in the two decades following its inception. In 2001, on the basis of their Pareto envelope-based selection algorithm (PESA, presented by [14]), [15] used a new selection technique in their improved algorithm PESA-II, in which selective fitness is assigned to the hyperboxes in the objective space instead of to individuals. Compared to the individual-based selection used in the PESA, the hyperboxes-based selection was shown to be more able to ensure a good spread of development along the Pareto front. In the same year, the strength Pareto evolutionary algorithm 2 (SPEA2), which integrates a fine-grained fitness assignment strategy, a density estimation technique, and an enhanced archive truncation method into its predecessor the SPEA in [49], was proposed by [51] and performed competitively on both combinatorial and continuous test problems compared with the SPEA, NSGA-II and PESA. One of the most famous swarm-based optimization algorithms, i.e., the particle swarm optimization (PSO) algorithm, was also extended to address MOOPs in [13]. In the multiobjective particle swarm optimization (MOPSO), the Pareto dominance is adopted, and the previously found nondominated vectors are maintained in an external repository to guide the flight of other particles. In 2007, [46] proposed a multiobjective evolutionary algorithm based on decomposition (MOEA/D), which decomposes an MOOP into a number of scalar optimization subproblems and optimizes them simultaneously. The application of the MOEA/D with three different decomposition methods on both multiobjective 0/1 knapsack problems and multiobjective continuous test instances was compared with the MOGLS and NSGA-II, respectively.

Over the past five years, several newer multiobjective evolutionary algorithms (MOEAs) have been proposed, with competitive performance compared to the representative ones mentioned above. A new multiobjective optimization framework, comprising nondominated sorting, local search and the farthest-candidate approach, named the nondominated sorting and local search (NSLS) algorithm was introduced by [5]. Tian et al. [42] proposed an MOEA based on an enhanced inverted generational distance metric (termed MOEA/IGD-NS) that can omit noncontributing solutions during the evolutionary search. In the same year, [27] proposed a framework containing a bicriterion evolution (BCE) with an indicator-based evolutionary algorithm (IBEA) embedded into its non-Pareto criterion (NPC) evolution part (termed BCE-IBEA). The effectiveness of this framework was shown by experiments on seven groups of test problems with various characteristics.

Moreover, the potential of the CA, which is an expandable framework that allows communication between the population and knowledge base, in solving MOOPs has attracted researchers’ attention [11]. However, there is still a lack of clear demonstrations of the merits of multiobjective cultural algorithms (MOCAs) compared to other classic multiobjective optimization algorithms. Furthermore, few comprehensive statistical analyses of the performance of MOCAs in solving a sufficient number of benchmark problems are available in the existing literature. In addition, there is still a large space in which to explore the potential of MOCAs in terms of the usage of various types of knowledge sources.

In this work, an improved multiobjective cultural algorithm (IMOCA), as a variant version of the previously proposed MOCAs, is presented. Following Reynolds’ basic framework of the CA in [34], the IMOCA incorporates several strategies into the basic knowledge sources in the belief space, which aim to balance convergence and coverage in finding a proper set of Pareto optimal solutions with efficiency. The situational knowledge is designed as an elite repository to avoid losing promising solutions during the search process, whereas the topographical knowledge is utilized to expand the spread of solutions. The historical knowledge functions as a selector that determines which influence function will be used to generate offspring. A mutation scheme is also included as one of the influence functions to improve the diversity of the solution set.

The remainder of this paper is organized as follows. Some related studies on MOCAs are reviewed in Section 2. Section 3 briefly introduces a basic description of an MOOP. Section 4 gives a concise description of the basic cultural algorithm. Section 5 presents the proposed IMOCA. Section 6 presents configurations of the experimentation on a series of standard test problems. The experimental results and a statistical analysis are given in Section 7. Section 8 concludes this paper and suggests some directions for future studies.

2 Related work

In this section, some of the recent studies on expanding CAs from solving single-objective to multiobjective problems are reviewed.

In 2003, the first extension of the CA to solving MOOPs was presented by [12]. In their approach, named the multiobjective cultural algorithm with evolutionary programming ((MO)CAEP), evolutionary programming (EP) is used as the search engine, and the belief space comprises a phenotypic normative part and a segmentation scheme that aim to improve the distribution of the solutions along the PF. In the belief space of the (MO)CAEP, the phenotypic normative part is not updated at every iteration, as doing so involves the rebuilding of the grids once the upper and lower bounds stored in the normative part are changed. On the other hand, the counters of the nondominated individuals contained within each cell are updated in every generation.

Instead of utilizing only one type of knowledge source, [3] introduced another multiobjective version of the CA, in which situational, domain, normative, historical and topographical knowledge are incorporated in the belief space. The five knowledge sources were defined as follows: situational knowledge was recognized as a set of exemplary individuals that were nondominated; domain knowledge was designed to perform an incremental search of a certain region in the search space; normative knowledge provided a promising region based on the interval for each dimension over a set of high-performing individuals; historical knowledge recorded the progress of the elites for each objective separately; topographical knowledge was designed to divide the search space and identified regions containing well-performing solutions. In each generation, only one of the five is chosen to influence the solutions in the population by the corresponding influence function. The test function DTLZ1 was used to investigate the capability of this version of the MOCA, and a detailed analysis of the productivity of the five knowledge sources was carried out. It appeared that the topographical knowledge performed far from satisfactorily in producing nondominated solutions, which indicated that certain adjustments should be made to the influence function of topographical knowledge.

Then, in the following year, [38] provided another extension of the CA called cultural algorithms for multiobjective optimization (MOCAT), in which all of the available categories of knowledge sources are fully utilized. In MOCAT, the normative, situational, domain and historical knowledge are generally inherited from the MOCA of [3], while the implementation of topographical knowledge is reversed from top-down to bottom-up to avoid high computational consumption in higher-dimensional objective space. However, the application of MOCAT on the test function ZDT1 did not reflect a good distribution on the Pareto front. Unfortunately, neither an experimental comparison with other multiobjective algorithms nor a thorough analysis was presented in this article.

Several other contemporary variations of the cultural algorithm for solving multiobjective problems were proposed in the past decade. Qin et al. [33] presented a multiobjective cultured differential evolution (MOCDE) algorithm based on the cultural algorithm framework and used it to deal with reservoir flood control operation (RFCO) problems. In MOCDE, situational, normative and historical knowledge are used to influence the evolution of a population, where the historical knowledge stores the coverage metrics of the past h generations and the adaptive Cauchy mutation operation is carried out on a certain dimension when the change in the coverage record is smaller than a predefined threshold. Before being applied to a case study of RFCO, the MOCDE algorithm was tested on benchmark problems ZDT1/3/4/6, SRN and TNK, and competitive results were obtained compared with six well-known multiobjective optimization algorithms. A hybrid multiobjective cultural algorithm (HMOCA) was introduced by [29], and a performance comparison with the NSGA-II was made in an application to two case studies of a short-term environmental/economic hydrothermal scheduling problem. In their method, a self-adaptive mutation operator taken from DE is used in the population space, and the preservation of the elite Pareto optimal solutions found along with the evolution process, which is similar to the archive strategy in the SPEA2, is incorporated in the belief space. In addition, the historical knowledge in the HMOCA was redefined as a local search operator based on the dominance concepts and crowding distance to carry out a guided local fine tuning process on the Pareto optimal solutions in the archive set to enhance the convergence properties. Zhang et al. [47] proposed an enhanced multiobjective cultural algorithm (EMOCA) that integrates PSO into the evolutionary process in the population space of the cultural algorithm. Situational knowledge was used as an external repository to preserve elite particles, whereas the historical knowledge in the EMOCA was redesigned as a local search operation to search the area around the particles in situational knowledge. The EMOCA was implemented on an economic environment dispatch optimization problem and the simulation results were compared with those of the NSGA-II. Liu et al. [28] introduced the cultural evolution mechanism into the multiobjective quantum-behaved particle swarm optimization framework and proposed the cultural MOQPSO algorithm. In the belief space of the cultural MOQPSO, a local-search-based strategy and a combination-based update operator were utilized to guide the search in the population space. In [21] and [22], the cultural evolution framework was combined with particle swarm optimization and quantum-inspired evolution for solving uncertain multiobjective optimization problems and interval multiobjective optimization problems respectively. Stanley et al. [41] proposed a parallelized multiobjective cultural algorithm particle swarm optimizer, CAPSO, which is a hybrid system composed of a particle swarm and the vector-evaluated genetic algorithm population component operating under the control of the cultural algorithm framework. In their work, the relative contribution of different CA knowledge sources was analyzed under the deployment of PSO swarms in solving constrained multiobjective optimization problems. In [1], an enhanced cultural algorithm using only normative and situational knowledge was applied to solve the multi-objective attribute reduction problem, which was a discrete problem. In the work of [30], the evolution scheme of a multiobjective five-element cycle optimization algorithm (MOFECO) was successfully incorporated into the population space of the CA, and this novel combination, named MOCAFECO, was proved to be feasible and efficient in solving MOOPs based on statistical results.

In the research of [3, 12] and [38], a set of basic knowledge sources was utilized to solve multiobjective problems; however, the results did not reflect satisfactory performance of the knowledge sources. The lack of thorough experimentation and analysis is another key problem. In the rest of the aforementioned research, some novel search strategies were adopted in the framework of the CA to tackle multiobjective problems. However, the productivity of each knowledge source in the belief space of the CA was rarely analyzed or discussed, and the performance of the proposed MOCAs was seldom evaluated or compared with that of other multiobjective algorithms using Pareto-compliant metrics. In this paper, we aim to introduce a simple but effective multiobjective cultural algorithm framework using the most basic knowledge source without adopting other novel evolutionary notions or strategies, say, to keep the algorithm as vanilla as possible, and to validate the effectiveness of each knowledge source, and analyze its performance in comparison with that of other algorithms under several Pareto-compliant metrics.

3 Description of the multiobjective optimization problem

The definition of an MOOP addressing more than one objective function is given. Consider, without loss of generality, a multiobjective minimization problem with n decision variables, M objectives, J inequality constraints and K equality constraints:

$$ \begin{aligned} & \text{Minimize} && \mathbf{y} = f(\mathbf{x}) = (f_{1}(\mathbf{x}), \cdots, f_{M}(\mathbf{x})) \\ & \text{subject to} && g_{j}(\mathbf{x}) \leq 0, \ j = 1, 2, \cdots, J \\ &&& h_{k}(\mathbf{x}) = 0, \ k = 1, 2, \cdots, K\\ & \text{where} && \mathbf{x} = (x_{1}, \cdots, x_{n}) \in \mathbf{X} \\ &&& \mathbf{y} = (y_{1}, \cdots, y_{M}) \in \mathbf{Y} . \end{aligned} $$
(1)

Here, the n-dimensional vector x is called the decision vector with n decision variables in it, X is called the decision space, y is the objective vector, and Y is the objective space.

4 Basic notions of the cultural algorithms

The cultural algorithm was proposed and developed by [34] as a dual-inheritance process that occurs simultaneously at the micro-evolutionary level (population space) and at the macro-evolutionary level (belief space), which metaphorically models the cultural evolution of human society. The basic framework of the CA is shown in Fig. 1, from which we can see the two main components of the CA: a population space where the regeneration of individuals takes place and a belief space where individual experiences are preserved and passed from generation to generation. Communication between the two components is realized through protocols that enable not only knowledge extraction from the population space to the belief space but also knowledge to exert influence on the population in turn. We call the two protocol functions the acceptance function Accept() and the influence function Influence(), respectively. Further explanations of the CA framework are given in the rest of this section.

Fig. 1
figure 1

Framework of the cultural algorithm

For the purpose of illustration, in the rest of this paper, we use an N × n matrix p to denote the population space with capacity N, in which element xij is the j-th decision variable of the i-th individual in the current population:

$$ \mathbf{p} = { \left[ \begin{aligned} \mathbf{x}_{1} \\ \mathbf{x}_{2} \\ {\vdots} \\ \mathbf{x}_{N} \end{aligned} \right ]} = { \left[ \begin{aligned} x_{11} & x_{12} & {\cdots} & x_{1n} \\ x_{21} & x_{22} & {\cdots} & x_{2n} \\ {\vdots} & {\vdots} & {\ddots} & {\vdots} \\ x_{N1} & x_{N2} & {\cdots} & x_{Nn} \end{aligned} \right ]} , $$
(2)

where xi = (xi1, xi2,⋯ , xin), with i ∈{1,2,⋯ , N}, denotes one of the N solutions to the MOOP at hand.

4.1 Population space

At the micro-evolutionary level of cultural algorithms, there is a population of individuals, each of which represents a solution to the problem to be optimized. The number of individuals that the population accommodates, namely, the size of the population, is defined by the user as a predefined parameter of the algorithm. In the population space, the individuals evolve following a certain evolutionary mechanism over the course of an iteration.

As Reynolds stated in [34], as a class of computational models of cultural evolution that support the dual-inheritance perspective, cultural algorithms provide a framework in which to describe all of the current models of cultural evolution from a computational point of view. More specifically, as illustrated by [12], any evolutionary computational technique in which a population is adopted could be used in the population space of cultural algorithms. For example, [8] introduced a cultural-algorithm-based testbed with the genetic algorithm and evolutionary programming as the evolutionary technique in the population space to achieve constrained numerical optimization. It was also addressed in their paper that evolution strategies are another alternative to be embedded in the population space. In addition, [2] incorporated differential evolution into the population space and applied the cultural algorithm to constrained optimization problems.

4.2 Belief space

One of the most distinguishing characteristics of the CA compared with other nature-inspired metaheuristic evolutionary algorithms is its information-sharing scheme among individuals through five major knowledge types, namely, normative knowledge, situational knowledge, domain knowledge, historical knowledge, and topographical knowledge. Normative, topographical, and situational knowledge have been used to solve real-valued function optimization problems under static environments [39]. The other two knowledge sources, i.e., historical and domain knowledge, were added because of their particular use in solving dynamic problems [4]. It should be noted that either the way the data structures are employed or even how the knowledge sources are defined changes along with the development of the CA.

Among the two basic knowledge sources, situational knowledge provides a set of exemplars that represent specific individual experience, and normative knowledge provides standards for individual behavior and sets guidelines within which individual adjustments can be made [7]. When it was first proposed in [26], topographical knowledge was used to handle constraints in solving single-objective optimization problems (SOOPs). Within the region given by the intervals saved in normative knowledge, the topographical knowledge segments the search space into (nGrid)M hypercubes; the promising areas are then re-split into sub-hypercubes, while poorly performing areas are fused back into larger ones. Historical knowledge was originally proposed for dynamic objective functions and was used to record the location of the best individual ever found before each environmental change to find patterns in the environmental changes [39]. Domain knowledge was proposed to cope with dynamic optimization problems.

4.3 Acceptance function

As one of the two protocol functions between the population space and the belief space, the acceptance function is used to glean the experience of the selected individuals, just as in human society, where knowledge is generalized by certain experts or elites [26]. Simply speaking, the acceptance function determines which individuals in the current population can be used to impact the update of the belief space knowledge. It is usually defined as a percentage so that a certain proportion of the population is selected to exert the impact. There is also a dynamic acceptance function with an acceptance percentage that can be adjusted to manipulate the pace of the belief space update [35].

4.4 Influence function

The influence function is a vehicle for reproducing individuals in the population under the guidance of the knowledge in the belief space. In general, each knowledge source has its influence function, which can be modified according to the nature of the problem at hand. The influence functions can be based on more than just one knowledge source. To save space, the influence functions for the different knowledge structures in the CA are presented, along with an illustration of the IMOCA, in the next section, and some of the classic configurations of influence functions of the basic knowledge sources can be found in the literature [4].

5 The improved multiobjective cultural algorithm with a multistrategy knowledge base (IMOCA)

As an improved multiobjective version of the basic CA (which was originally designed for SOOPs), the proposed IMOCA integrates necessary modifications in the knowledge base to cope with MOOPs, taking advantage of various knowledge sources. First, the structure of historical and topographical knowledge is redesigned to improve the efficiency of the knowledge selection scheme and to gain a better spread of the obtained PF, respectively. Thus, there is a corresponding adjustment in the influence functions of the two sources of belief space knowledge. Second, in addition to the influence functions of the four knowledge sources, a straightforward mutation method is used as an “influence function” in the influence step instead of in the evolution step of the population. In other words, the mutation operation affects the population only when it is selected in the influence phase. Third, due to the fact that multiobjective problems involve more than one single objective, the Pareto dominance relationship should be introduced in the comparison between solutions. Furthermore, the nondominated sorting method, which firstly proposed for the NSGA-II [17], is adopted in the IMOCA for ranking individuals efficiently. Finally, another modification, i.e., an elite maintenance mechanism, is included in the IMOCA to avoid losing promising solutions during the search process. Apart from the adjustments mentioned above, the evolutionary scheme adopted in the IMOCA population space is inherited from the prototype of [8]. The details of the proposed IMOCA algorithm are given in the rest of this section.

5.1 Evolution scheme in the population space of the IMOCA

Based on the simplified EP algorithm suggested by [8], a framework analogous to the EP algorithm is adopted as the evolution mechanism of the population space in the IMOCA, except that the standard mutation operator is replaced by one of the five influence functions driven by knowledge sources and a simple mutation scheme.

5.2 Knowledge sources in the belief space of the IMOCA

5.2.1 Situational knowledge

When dealing with SOOPs, situational knowledge contains a set of exemplars that are used to guide the search. Accordingly, in the IMOCA, the situational knowledge serves as an external archive /repository preserving elite solutions, i.e., nondominated solutions found during the search process. This tactic is analogous to the use of an external repository in quite a few classic multiobjective algorithms such as the SPEA2, MOPSO, etc.

Similar to the basic configuration of situational knowledge in the CA for SOOPs, the situational knowledge S adopted in the IMOCA is represented in the following form:

$$ S = \left\{ \mathbf{E}_{1}, \ \mathbf{E}_{2}, \ \cdots, \ \mathbf{E}_{s} \right\} , $$
(3)

where s is the predefined capacity of situational knowledge, which here we make equal to the size of the population N, and Ek is the k-th nondominated individual, with k ∈{1,2,⋯ , s}. The nondominated exemplars in situational knowledge are arranged in ascending order from the most preferable one to the least preferable, where the ordering operation is performed according to their crowding distances, as demonstrated in Section 5.4.

Since the situational knowledge functions as a leader for the other individuals in the population to follow, when it is selected to influence the population, its influence function is to push each individual towards a certain exemplar that is randomly selected from the situational repository. For the j-th decision variable xij of an arbitrary individual xi in the current population, the increment/decrement Δxij is computed as follows:

$$ \left\{ \begin{aligned} \sigma_{\text{s}} & = 0.1 \times (x^{\max}_{j} - x^{\min}_{j}) \\ {{\varDelta}} x_{ij} & = \sigma_{\text{s}} \times r \end{aligned} \right. , $$
(4)

where \(x^{\max \limits }_{j}\) and \(x^{\min \limits }_{j}\) are the j-th elements of \(\mathbf {x}^{\max \limits }\) and \(\mathbf {x}^{\min \limits }\) (the upper and lower bounds of the decision variables given by the optimization problem), respectively, and r is a random scalar drawn from the standard normal distribution N(0,1). Then, the new individual influenced by the situational knowledge, \(x^{\text {new\_s}}_{ij}\), can be produced by (5):

$$ \begin{aligned} & x^{\text{new\_s}}_{ij} = \left\{ \begin{aligned} & x_{ij} + | {{\varDelta}} x_{ij} |, && \text{ if } x_{ij} < E_{cj} \\ & x_{ij} - | {{\varDelta}} x_{ij} |, && \text{ if } x_{ij} > E_{cj} \\ & x_{ij} + {{\varDelta}} x_{ij}, && \text{ if } x_{ij} = E_{cj} \end{aligned} \right. , \end{aligned} $$
(5)

where Ecj is the j-th variable of a randomly chosen elite Ec from repository S.

The influence function driven by both normative and situational knowledge is illustrated in the next subsection on normative knowledge.

When the situational knowledge is updated, all nondominated individuals in the current population space are inserted into the repository. Then, the fast nondominated sorting method (see Section 5.4) is implemented on the new archive, and the dominated ones and repeated ones are discarded. Finally, if the number of the current exemplars exceeds the maximum size N, the individuals with the smallest crowding distances (which is also explained in Section 5.4) will be omitted.

5.2.2 Normative knowledge

Normative knowledge preserves the intervals where nondominated solutions located. Specifically, it saves the upper and lower bounds of both the decision and the corresponding objective space of the nondominated individuals in the current population. In the IMOCA, the data structure and update function of the normative knowledge are inherited from the dissertation of [7]. The structure of normative knowledge in the IMOCA can be described as follows:

$$ N = \left\{ \mathbf{l}, \ \mathbf{u}, \ \mathbf{L}, \ \mathbf{U} \right\} , $$
(6)

where l = (l1,⋯ , ln) and u = (u1,⋯ , un) are the minimum and maximum values of each decision variable that are found in all the individuals sorted out by the acceptance function, while L = (L1,⋯ , LM) and U = (U1,⋯ , UM) are the minimum and maximum values of each objective found in all the accepted individuals.

When only normative knowledge itself is chosen to influence the population, increment/ decrement can simply be applied to individuals that are smaller/larger than the lower/ upper bounds l and u saved in the normative knowledge. Here, the increment/decrement is computed based on the interval [l, u] using a coefficient α and random scalar r drawn from the standard normal distribution N(0,1):

$$ \left\{ \begin{aligned} \sigma_{\text{n}} & = \alpha \times (u_{j} - l_{j}) \\ {{\varDelta}} x_{ij} & = \sigma_{\text{n}} \times r \end{aligned} \right. . $$
(7)

Hence, the adjustment produced by the normative knowledge can be expressed by (8):

$$ \begin{aligned} & x^{\text{new\_n}}_{ij} = \left\{ \begin{aligned} & x_{ij} + | {{\varDelta}} x_{ij} |, && \text{ if } x_{ij} < l_{j} \\ & x_{ij} - | {{\varDelta}} x_{ij} |, && \text{ if } x_{ij} > u_{j} \\ & x_{ij} + \beta \times {{\varDelta}} x_{ij}, && \text{ otherwise} \end{aligned} \right. . \end{aligned} $$
(8)

Under the other circumstance, when the influence function of the combination of the normative knowledge and the situational knowledge is selected, we apply a normative-knowledge-based increment/decrement to the current individual if it is smaller/larger than the exemplar, which is randomly selected from the situational knowledge. For the j-th decision variable xij of an arbitrary individual xi to be improved in the population, as in the case where normative knowledge is used alone, the increment/decrement is computed as follows:

$$ \left\{ \begin{aligned} \sigma_{\text{ns}} & = \alpha \times (u_{j} - l_{j}) \\ {{\varDelta}} x_{ij} & = \sigma_{\text{ns}} \times r \end{aligned} \right. , $$
(9)

where uj and lj are the j-th elements of l and u, the first two vectors stored in normative knowledge, defined as the lower and upper bounds of the closed interval that contains all the accepted exemplar individuals, respectively, and r is a random scalar drawn from the standard normal distribution N(0,1). Then, the new individual generated by normative and situational knowledge can be computed as follows:

$$ \begin{aligned} & x^{\text{new\_ns}}_{ij} = \left\{ \begin{aligned} & x_{ij} + | {{\varDelta}} x_{ij} |, && \text{ if } x_{ij} < E_{cj} \\ & x_{ij} - | {{\varDelta}} x_{ij} |, && \text{ if } x_{ij} > E_{cj} \\ & x_{ij} + {{\varDelta}} x_{ij}, && \text{ if } x_{ij} = E_{cj} \end{aligned} \right. , \end{aligned} $$
(10)

where Ecj is the j-th variable of Ec, a randomly chosen elite from the situational repository S.

The update operation of the normative knowledge is performed in every iteration by recovering the lower and upper bounds of the decision and objective values according to the accepted individuals selected by the acceptance function.

5.2.3 Topographical knowledge

To obtain a nondominated solution set with better coverage and diversity, rather than promoting the promising regions, in the IMOCA we utilize the topographical knowledge to expand the spread of solutions in the objective space.

Taking a simple bi-objective problem as an example, topographical knowledge locates all the individuals into nGridnGrid cells and chooses the ones that accommodate on the edge of the objective space. As Fig. 2 presents below, suppose that we have partially reached the true PF (denoted by the orange dashed line) of the optimization problem of minimizing f1 and f2; thus, the next step is to expand the spread of the obtained individuals to cover the unsearched part of the orange dashed line. Both the blue and red triangles denote the solutions in the first rank obtained in the current iteration, among which the red ones are selected by topographical knowledge as they are located in the edge areas. Thus, in the IMOCA we sort out all the individuals located in the edge cells (cells in light blue in Fig. 2) to which adjustment is performed.

The influence function of topographical knowledge is implemented to exert a certain mutation operation on the individuals “in red” to achieve better diversity among edge solutions. For each individual located in the edge cells, we randomly select μt × n of its decision variables to perform fine tuning, where μt is a percentage parameter. Then, the adjustments to the μt × n decision variables of the edge individual xi are made by

$$ \left\{ \begin{aligned} \sigma_{\text{t}} & = 0.1 \times (x^{\max}_{j} - x^{\min}_{j}) \\ {{\varDelta}} x_{ij} & = \sigma_{\text{t}} \times r \end{aligned} \right. , $$
(11)

where j ∈{1,2,⋯ , n}. Thus, the j-th decision variable of an arbitrary individual xi located in the edge cells is updated by

$$ x^{\text{new\_t}}_{ij} = x_{ij} + {{\varDelta}} x_{ij} . $$
(12)
Fig. 2
figure 2

Implementation of topographical knowledge in the IMOCA

It should be pointed out that, topographical knowledge is not updated in every generation, but only when it is chosen to influence the population to conserve computation resources. When topographical knowledge is selected, the search region spanned by the current elites in the situational knowledge is re-split into (nGrid)M grids, and each individual in the population is relocated to its own residence grid.

5.2.4 Historical knowledge

Here, we redesigned the historical knowledge as a metaknowledge source that records the performance of the four influence functions generated by the other three knowledge sources and the mutation strategy. In other words, the historical knowledge saves the total number of surviving individuals in the situational knowledge over the past h generations produced by each influence function or the mutation strategy to manipulate the selection operation of the knowledge source to be used in generating offspring.

In the previous h generations, i.e., when the current iteration number has not reached a predefined threshold h, one of the five influence functions randomly selected to generate new individuals. Under this condition, we have

$$ p_{\text{n}} = p_{\text{s}} = p_{\text{ns}} = p_{\text{t}} = p_{\text{m}} = 0.2 , $$
(13)

where pn, ps, pns, pt, and pm are the probabilities of each knowledge to be chosen.

Otherwise, the probability for each influence function to be chosen in the selection phase is then calculated according to the number of corresponding survivals recorded. For this purpose, for each of the N elites stored in the situational knowledge, through which knowledge source it was generated is also saved in the situational knowledge. Then, the probability of each knowledge to be chosen, which is based on the productivity of each knowledge in the previous iterations, can be simply calculated by

$$ \begin{aligned} \left\{ \begin{aligned} p_{\text{n}} &= \frac{c_{\text{n}}}{N} \\ p_{\text{s}} &= \frac{c_{\text{s}}}{N} \\ p_{\text{ns}} &= \frac{c_{\text{ns}}}{N} \\ p_{\text{t}} &= \frac{c_{\text{t}}}{N} \\ p_{\text{m}} &= \frac{c_{\text{m}}}{N} \end{aligned} \right. , \end{aligned} $$
(14)

where cn, cs, cns, ct, and cm are the numbers of individuals generated from normative knowledge, situational knowledge, a combination of normative and situational knowledge, topographical knowledge, and the mutation strategy, respectively. Furthermore, to prevent starvation of any knowledge source, here, we introduce a parameter 𝜖 = 0.03, and thereby, an adjustment of each probability p is made as follows:

$$ p^{\prime} = \frac{p + \epsilon}{1 + 5 \epsilon} , $$
(15)

where p ∈{pn, ps, pns, pt, pm}, which guarantees that every knowledge source has the opportunity to produce new individuals while keeping \(p^{\prime }_{\text {n}} + p^{\prime }_{\text {s}} + p^{\prime }_{\text {ns}} + p^{\prime }_{\text {t}} + p^{\prime }_{\text {m}} = 1\). By doing so, from the (h + 1)-th generation, when selecting influence functions, the selector has a reference based on the performance of the influence functions in the past h generations. At the same time, even the most poorly performing influence function has chance to impact the current population.

As illustrated in (16), in the influence phase, among normative knowledge, situational knowledge, the combination of normative and situational knowledge, and the mutation scheme, one influence strategy is chosen to accomplish the evolution of the current population in each generation. The selection is based on the performance of the influence strategies stored in the historical knowledge.

$$ \begin{aligned} x_{ij}^{\text{new}} = \left\{ \begin{aligned} & x_{ij}^{\text{new\_s}}, && \text{if situational knowledge is selected} \\ & x_{ij}^{\text{new\_n}}, && \text{if normative knowledge is selected} \\ & x_{ij}^{\text{new\_ns}}, && \text{if normative and situational knowledge is selected} \\ & x_{ij}^{\text{new\_t}}, && \text{if topographical knowledge is selected} \\ & x_{ij}^{\text{new\_m}}, && \text{if mutation scheme is selected} \end{aligned} \right. , \end{aligned} $$
(16)

where \(x_{ij}^{\text {new\_m}}\) denotes the new value for the j-th decision variable of the individual xi, as illustrated in Section 5.3.

To update the historical knowledge, simply add the number of the current survivals in the situational knowledge generated by each influence function to its corresponding counter.

5.3 A mutation scheme

As illustrated previously, the mutation scheme is also among the influence strategies used to alter the individuals living in a population. The mutation operation used here is analogous to the one in the GA and is described as follows. First, for each individual xi in the current population (i ∈{1,2,⋯ , N}), randomly select nμ decision variables from the total n ones to apply the mutation operation to, where nμ is calculated by

$$ n_{\mu} = n \times \mu_{m} , $$
(17)

where n is the number of decision variables and μm is the mutation percentage. Then, use the equation given below to obtain σm, i.e., the mutation step size on the j-th decision variable:

$$ \sigma_{m} = 0.1 \times (x_{j}^{\max} - x_{j}^{\min}) . $$
(18)

Thereafter, execute the mutation operation on the j-th variable of the individual xi:

$$ x_{ij}^{\text{new\_m}} = x_{ij} + \sigma_{m} \times r , $$
(19)

where r is a random scalar drawn from the standard normal distribution N(0,1). Finally, if any dimension of the mutated individual exceeds the corresponding constraints, set it to the boundary value.

Here, it should be noted that the topographical knowledge and the mutation scheme use the same form of equations to produce offspring, but the key point here is the selection of individuals to be operated on. In other words, the difference between the two strategies lies in which kind of individual is selected for variation. The topographical knowledge is used to find promising solutions and broaden the obtained front, the influence function of which is therefore applied to the individuals located on the edge of the objective space, while the mutation scheme aims to carry out fine tuning in the late stage of search, which is applied to every individual in the current population. The results prove that these two influence functions are superior in in producing nondominated solutions at different stages of evolution (see Fig. 4).

In the IMOCA, the mutation operation is not applied to the population in every generation. Instead, it plays the same role as that of the other influence functions of knowledge sources. That is, the use of mutation depends on its performance in terms of productivity in the preceding h generations stored in the historical knowledge. As the simulation results (shown in Section 7) indicate, the evolution process makes full use of the mutation operation in the last stage of search.

5.4 Pareto dominance and the fast nondominated sorting approach

For MOOPs, since there is more than one decision value to be considered, it is not as simple as for SOOPs to compare the performances of different solutions with less computational effort. Hence, adopting an efficient sorting strategy among solutions is one of the significant aspects of an MOOP optimization algorithm. To address this problem, we utilize the fast nondominated sorting approach together with the notion of crowding distance inspired from the NSGA-II. For a detailed explanation of the two techniques, refer to [17].

5.5 Elite preservation

In the competition of individuals in a population space, because newborn individuals are always merged into the parent population once generated, no excellent parent individuals are omitted. On the other hand, during the update procedure of the situational knowledge, the nondominated solutions sorted out from the current population are in competition with all the elites previously saved in the repository to prevent the best solutions in the first PF from dying out. Furthermore, when the algorithm is terminated, the repository of the situational knowledge is saved as the final solutions to the problem at hand. In general, the IMOCA avoids any loss of the well-performing individuals that have emerged in the population space.

5.6 Communication channel

5.6.1 Acceptance function

The acceptance function is used to accept a proportion of high-performing individuals to update the knowledge in the belief space. Here, we use a very simple acceptance function: 35% of the individuals in the current population with the best nondominated ranks are accepted.

5.6.2 Influence function

The influence functions of situational, normative and topographical knowledge have already been described in detail in Section 5.2. The influence function generated from the mutation scheme is demonstrated in Section 5.3.

5.7 Implementation framework of the IMOCA

Within the EP framework mentioned above, the IMOCA works as follows:

INPUT:

  • a multiobjective problem with n decision variables and M objectives;

  • a stopping criterion;

  • the population size N.

OUTPUT: individuals stored in the current situational knowledge.

Step 1) Initialization:

Step 1.1) Set generation counter t = 0.

Step 1.2) Generate N individuals x1,⋯ , xN randomly within the range of decision variables as the initial population, where N is the size of the population. Here, each individual xi = (xi1,⋯ , xin)T, with i ∈{1,2,⋯ , N}, where n is the number of decision variables.

Step 1.3) Evaluate the N individuals in the initial population by the objective functions and obtain N corresponding objective vectors y1,⋯ , yN, where yi = (yi1,⋯ , yiM)T, with i ∈{1,2,⋯ , N}, and M is the number of objectives.

Step 1.4) Initialize the knowledge sources in the belief space according to the knowledge structure of each source, which is described in Section 5.2.

Step 2) Population Space Evolution:

Step 2.1) If the generation counter t has not exceeded h, randomly choose one influence function (among normative knowledge, situational knowledge, normative and situational knowledge, topographical knowledge, and the mutation scheme) to influence the evolution of the current population. Otherwise, carry out the selection according to the former performance of the five influence functions stored in the historical knowledge. If the topographical knowledge is chosen, update it before utilizing.

Step 2.2) For the N individuals xi with i ∈{1,2,⋯ , N}, generate N offspring xN+ 1,⋯ , x2N using the chosen influence function. Then, evaluate the new individuals and merge the old and the new ones; thus, we have 2 × N individuals in total.

Step 2.3) Conduct the nondominated sorting method on the entire 2 × N population and discard the last N individuals with larger rank values. Accordingly, the best N ones survive as the new population of the next generation.

Step 2.4) Set t = t + 1.

Step 3) Belief Space Update: Update the 3 knowledge sources (except topographical knowledge) in the belief space using the accepted individuals in the new population.

Step 3.1) Update situational knowledge.

Step 3.2) Update normative knowledge.

Step 3.3) Update historical knowledge according to the productivities of the other four influence functions in the last generation.

Step 4) Stopping Criterion: If the stopping criterion, i.e., the generation counter t exceeds the maximum amount, is satisfied, stop and output the current individuals stored in the situational knowledge as the results of the optimization. Otherwise, go to Step 2).

The implementation flowchart of the algorithm is shown in Fig. 3.

Fig. 3
figure 3

Framework of the IMOCA

Table 1 Parameter configuration in the IMOCA

6 Experimentation and related configurations

6.1 Multiobjective continuous test suites

For simulation, two classic benchmark suites WFG [24] and MaF [6] are used in the comparison between the IMOCA and 5 classic multiobjective algorithms and 3 recently proposed MOEAs. It should be noted that since the problems of WFG1, WFG2 and WFG9 are used in the MaF suites as MaF10, MaF11 and MaF12 without any change, in the experiment, we omit the redundant ones in the WFG suites. All of the 21 test instances are tri-objective problems and involve minimization of the objectives, including problems with a variety of characteristics.

6.2 Parameter configuration of the IMOCA for simulation

The parameters used in each knowledge source of the IMOCA are configured as shown in Table 1.

First, we tested a set of values for the parameters α and β, which were used in the influence function of normative knowledge and the influence function of normative and situational knowledge in (7), (8) and (9). As shown in Tables 23 and 4, the performance of the IMOCA when adopting six different combinations of the parameters α and β was evaluated by the MaF functions. In the three tables, the best indicator values are marked in bold. The results reveal that when α = 0.3 and β = 0.5, the best performance of the IMOCA could be achieved compared to the other combinations.

Then, for the influence function of topographical knowledge, the segmentation number on each dimension nGrid is specified as 10 in accordance with the literature [38]. For the parameter μt, which decides the number of decision variables to be adjusted, it is specified as 0.02 so that only one decision variable is tuned for problems with less than 50 decision variables; for problems with (50,100] decision variables, two decision variables need to be changed. For the same reason, the value 0.02 is also assigned to the parameter μm in the mutation scheme. Last, for historical knowledge, we assign h as 50 so that at every iteration, the productivity of each influence function in the last 50 iterations is a reference for selecting the influence function to be used in the current iteration. The parameter 𝜖 is set to avoid starvation of any knowledge source.

Table 2 Medians of the inverted generational distance (IGD) of 20 runs under different settings of the parameters α and β
Table 3 Medians of the hypervolume (HV ) of 20 runs under different settings of the parameters α and β
Table 4 Medians of the unary 𝜖-indicator (I𝜖1) of 20 runs under different settings of the parameters α and β

6.3 Algorithms for comparison

In this paper, we choose eight multiobjective algorithms, including five classic multiobjective algorithms (NSGA-II, PESA-II, SPEA2, MOPSO and MOEA/D) and three well-performing algorithms (NSLS, MOEA/IGD-NS and BCE-IBEA) selected from more recent research, to compare with the IMOCA in terms of the performance.

Referring to the recommendations regarding the experimental setup in the literature [6], for all nine algorithms, the population size was set to 25 times the number of objectives of the problems, and for the IMOCA, PESA-II, SPEA2 and MOEA/D, which are algorithms with external repositories, the size of the repositories was set equal to the population size. The other parameter settings of the 8 algorithms are as follows.

  • NSGA-II: crossover rate Pc = 0.7, mutation rate Pm = 0.4, mutation step size σ = 0.1;

  • PESA-II: number of grids per dimension nGrid = 7, grid inflation factor IF = 0.1, crossover rate Pc = 0.5, crossover parameter γ = 0.15, mutation rate Pm = 0.5, mutation step size σ = 0.3;

  • SPEA2: crossover rate Pc = 0.7, crossover parameter γ = 0.1, mutation rate Pm = 0.3, mutation step size σ = 0.2;

  • MOPSO: inertia weight w = 0.4, inertia weight damping rate wdamp = 0.99, personal learning coefficient c1 = 1, global learning coefficient c2 = 2, number of grids per dimension nGrid = 30, inflation rate α = 0.1, leader selection pressure β = 2, deletion selection pressure γ = 2, mutation rate Pm = 0.5;

  • MOEA/D: crossover rate Pc = 0.5, number of neighbors: 15% of the population but no larger than 15;

  • NSLS: mean value of the Gaussian distribution μ = 0.5, mutation strength σ = 0.1;

  • MOEA/IGD-NS: the archive size NA is set to 5 times the population size;

  • BCE-IBEA: the scaling factor κ = 0.05.

6.4 Experimental configurations

The IMOCA and the other 8 algorithms are written in MATLAB scripts and the testing experiments were implemented in MATLAB R2018a on a 3.1 GHz Intel Core i7 processor under macOS Mojave 10.14.4. The source codes of the NSGA-II, PESA2, SPEA-II, MOPSO and MOEA/D are available from the EMOO repository [10], and the implementations of the NSLS, MOEA/IGD-NS, and BCE-IBEA are carried out on the PlatEMO [43].

All the algorithms were run 20 times independently for each benchmark problem. Additionally, to conduct a fair comparison among the nine methods, instead of adopting a maximum number of iterations as the termination condition, we terminate each algorithm once the calculation times of the objective functions reach \(\max \limits (100 000, \ 10 000 \times n)\).

6.5 Performance metrics

Six performance metrics are adopted in this paper to evaluate the performances of all 9 algorithms in terms of both diversity and convergence.

  1. 1.

    Set Coverage (C(A, B)) [16]. It calculates the proportion of solutions in B, which are weakly dominated by the solutions in A:

    $$ C(A, B) = \frac{| \{ b \in B \ | \ \exists \ a \in A : \ a \preceq b \} |}{|B|}. $$

    The metric value C(A, B) = 1 means that all members of B are weakly dominated by A. On the other hand, C(A, B) = 0 means that no member of B is weakly dominated by A. It can be noted that C(A, B) is not necessarily equal to 1 − C(B, A) because the domination operator is not a symmetric operator. It is thus necessary to calculate both C(A, B) and C(B, A) to investigate how many solutions of A are covered by B and vice versa.

  2. 2.

    Inverted Generational Distance (IGD). This measure was proposed by [52]. It is similar to the generational distance (GD) and formulated as follows:

    $$ IGD = \frac{{\sum}^{n}_{i=1} {d_{i}^{2}} }{n} , $$

    where n is the number of true Pareto optimal solutions and di indicates the Euclidean distance between the i-th true Pareto optimal solution and the closest obtained Pareto optimal solutions in the reference set. The Euclidean distance between obtained solutions and the reference set is different here. For the IGD, the Euclidean distance is calculated for every true solution with respect to its nearest obtained Pareto optimal solutions in the objective space.

  3. 3.

    Hypervolume (HV ) [48]. This metric calculates the volume (in the objective space) covered by members of the solution set A. For each solution xA, a hypercube vi is constructed with a reference point W and the solution itself as two diagonal corners of the hypercube. Thereafter, the hypervolume of all the hypercubes is calculated as

    $$ HV = volume \left( \cup^{| A |}_{i=1} v_{i} \right). $$

    Obviously, an algorithm with a larger HV is desirable.

  4. 4.

    Unary 𝜖-indicator (I𝜖1) [52]. This indicator is based on the binary 𝜖-indicator, which is defined as

    $$ I_{\epsilon}(A, \ B) = \inf \limits_{\epsilon \in \mathbb{R}} \{ \forall \ \mathbf{z}^{2} \in B, \ \exists \ \mathbf{z}^{1} \succeq_{\epsilon} \mathbf{z}^{2} \} $$

    for any two approximation sets A, B ∈Ω. For the unary 𝜖-indicator, substitute the set B with a reference set of points:

    $$ I_{\epsilon 1}(A) = I_{\epsilon}(A, \ P) , $$

    where P is the true PF set. If P is unknown, any reference set R can be used instead.

  5. 5.

    Spacing (S) [16]. This is a relative distance measure between consecutive solutions in the obtained nondominated set:

    $$ S = \sqrt{ \frac{1}{|Q|} \sum\limits_{i=1}^{|Q|} {(d_{i} - \bar{d})}^{2} } , $$

    where \( d_{i} = \min \limits _{k \in Q \wedge k \neq i} {\sum }_{m=1}^{M} |{f_{m}^{i}} - {f_{m}^{k}}| \) and \(\bar {d} = {\sum }_{i=1}^{|Q|} d_{i} / |Q|\). Obviously the metric S calculates the standard deviations of different di values. Therefore, an algorithm finding a set of a nondominated solutions having smaller spacing S is better.

  6. 6.

    Spread (Δ) [16]. This metric reflects the extent of spread:

    $$ {{\varDelta}} = \frac{{\sum}_{m=1}^{M} {d_{m}^{e}} + {\sum}_{m=1}^{|Q|} |d_{i} -\bar{d}| }{{\sum}_{m=1}^{M} {d_{m}^{e}} + |Q| \bar{d}} , $$

    where di can be any distance measure between neighboring solutions, \(\bar {d}\) is the mean value of these distance measures, and \({d_{m}^{e}}\) is the distance between the extreme solutions of P and Q in the m-th objective function. For an ideal distribution of obtained solutions, Δ = 0.

The set coverage metric is a binary measure, while the inverted generational distance, hypervolume and unary 𝜖-indicator are unary Pareto-compliant indicators [19]. The other two indicators, spacing and spread, are unary but non-Pareto-compliant metrics evaluating the distribution and diversity, respectively.

7 Results and discussions

7.1 Performance of knowledge sources in the IMOCA

To examine the performances of the knowledge sources used in the IMOCA, we examine the elite solutions saved in the situational knowledge in different search phases in a random run when solving a basic test problem ZDT1 from the classic multiobjective test suite ZDT [50].

We use a population size of 50 and take a number of objective functions M = 2 and number of decision variables n = 30 for ZDT1. Figure 4 plots the minimum values of the sum of all the objective functions among all the solutions saved in the current situational knowledge. The abscissa is based on the logarithm of the iteration so that the performance of each knowledge source in the early stage of the iteration can be seen clearly. It should be stated here that in this experiment, we set the terminal condition as a maximum generation of 300 because for ZDT1, convergence can be achieved by the IMOCA before the 300th iteration; in the competition experiments between the IMOCA and the other chosen algorithms, to guarantee that most of the algorithms have enough chances to search for solutions, we terminate the algorithms according to the calculation times of the objective functions.

Fig. 4
figure 4

The minimum values for \({\sum }_{m=1}^{M} f_{m}(\mathbf {x})\) when optimizing ZDT1 using the IMOCA (iteration t ∈ [1,300])

To better analyze how each knowledge source guides the search process, in Fig. 4, the knowledge sources that produce the minimum sum of objective functions are indicated in different colors and shapes. It is obvious from the figure that all five influence functions made efforts to find new promising individuals. At the 5th iteration, topographical knowledge first found promising solutions different from the initial ones, which were generated randomly in the first iteration. As the optimization process progressed, the five influence functions took turns generating individuals with small values of \({\sum }_{m=1}^{M} f_{m}(\mathbf {x})\). From the 30th iteration, the mutation scheme and the influence functions of topographical knowledge began to take up a larger portion of the producers of promising individuals. From the 43th to the 90th iterations, situational knowledge also played a significant role in generating new elite solutions. Similar performances of the five influence functions were also observed when solving the 15 MaF instances (see Appendix).

In [3], the productivity of different knowledge sources was also observed by plotting the minimum value of \({\sum }_{m=1}^{M}\) fm(x) for all the individuals x in the population in each iteration. By comparing the figures given in both [3] and our paper, the implementation of topographical knowledge in [3] produced very few promising individuals, while that in the IMOCA contributed to finding better solutions, especially in the middle and late stages of evolution.

In a nutshell, it can be concluded that the effectiveness of all the adopted influence functions is verified. In the late stage of evolution, the mutation scheme seems to be more productive in promoting the quality of a population. It can be also noted that the productivity of the adapted topographical knowledge in the proposed IMOCA is much better than that of the topographical knowledge used in the MOCA in [3].

7.2 Results on the test functions

7.2.1 Obtained nondominated solutions

According to the simulation experimentation, in which the obtained Pareto optimal solutions of the 9 algorithms in 20 runs are compared, the IMOCA is able to converge to the true PFs of most of the 21 test problems. Here, three typical simulation results of the test problems MaF4, MaF14 and WFG6 are given in Fig. 5a, b and c, respectively. In the three figures, the true PFs of these three functions are plotted as gray surfaces, and the nondominated solutions obtained in the 20 runs are represented by small circles in different colors (the solutions of each run correspond to each of the different colors).

Fig. 5
figure 5

Non-dominated solutions obtained by 9 algorithms in 20 independent runs in the objective space of MaF4, MaF14 and WFG6. a Non-dominated solutions obtained by 9 algorithms in 20 independent runs on MaF4. b Non-dominated solutions obtained by 9 algorithms in 20 independent runs on MaF14. c Non-dominated solutions obtained by 9 algorithms in 20 independent runs on WFG6

Table 5 Medians of the inverted generational distance (IGD) and the ranks achieved by the Friedman test (A: Algorithms, P: Problems)

MaF4 is a concave, multimodal, and badly scaled optimization problem with no single optimal solution in any subset of the objectives. Its PF surface is a part of a sphere that intersects at the following coordinates: (0,4,8), (2,4, 0), and (2,0,8). From Fig. 5a, it can be observed that hardly any of the solutions found by the algorithms occurred on or near the true Pareto front in the 20 runs, but the axis tick values in each subfigure in Fig. 5a indicate that the solution set obtained by the IMOCA is the one nearest the true Pareto front of MaF4. This reveals that the IMOCA is more capable of dealing with badly scaled PFs.

The test problem of MaF14 is linear but with a complicated fitness landscape. Its decision variables are nonuniformly correlated with different objectives, and the decision variables are partially separable. Similarly, from the axis labels in Fig. 5b, it can be inferred that only the proposed algorithm and NSLS approached the Pareto front of MaF14, which is a triangle plane spanned by the points [0,1,0], [1,0,0] and [0,0,1]. Compared to those of the NSLS, the solutions obtained by the IMOCA seemed to cover a larger portion of the Pareto front and were relatively evenly distributed, as there were at least four sets of solutions that reached the upper part of the Pareto plane instead of falling into a single straight line in the xy-plane.

WFG6 is a concave problem with nonseparable reduction. The obtained solutions of the 9 algorithms on WFG6 are depicted in Fig. 5c, where almost all 9 algorithms found solutions near the Pareto sphere surface in the 20 runs. Under this circumstance, the distribution of the solutions becomes notable when comparing the performances of the multiobjective algorithms. According to the figure, the solutions obtained by the IMOCA covered most of the Pareto front without any omission, while there was always more or less space without coverage in the cases of the other 8 algorithms.

7.2.2 Statistical results on the pareto-compliant metrics

To quantitatively observe the quality of the solutions obtained by the proposed IMOCA, the median values of the three Pareto-compliant indicators (IGD, HV and unary 𝜖-indicator) for the 21 test problems over 20 runs were computed, as shown in Tables 56 and 7, respectively. The Friedman test was also performed on the results, as recommended in [18].

Table 6 Medians of the hypervolume (HV ) and the ranks achieved by the Friedman test (A: Algorithms, P: Problems)
Table 7 Medians of the unary 𝜖-indicator (I𝜖1) and the ranks achieved by the Friedman test (A: Algorithms, P: Problems)
Table 8 Medians of time consumed by each algorithm (A: Algorithms, P: Problems; time unit: second(s))

In Table 5, the result of the Friedman test reflected the IGD values of the 9 algorithms on the 21 test problems from an overall perspective, and the proposed IMOCA seemed to have an average performance on this metric. However, the p-value on the Friedman test was larger than 0.01 (but still smaller than 0.05), which indicated that there was not an explicit significance in terms of the difference between the IGD values of the 9 algorithms. Moreover, if we consider the Friedman mean rank, there was not a large difference between the rank values of the first five algorithms, while the rank differences in Tables 6 and 7 were considerably large. This occurred because the algorithms performed disparately in terms of the IGD for the different test problems. For example, the IMOCA also had a better IGD performance on several problems, such as MaF3, MaF5, and MaF10, than that of the first-ranked algorithm NSLS.

All of three metrics could measure both the diversity and convergence of an obtained nondominated set in a sense. Specifically, the IGD indicator calculates the average distance from the uniformly distributed points along the PF to the obtained approximation set; HV calculates the area covered by the approximation set with respect to a set of predefined reference points; and the unary 𝜖-indicator calculates how far the obtained set should move to cover every part of the true PF. If an algorithm has a superior performance in terms of the HV and unary 𝜖-indicator but an average performance on the IGD indicator, the reason may be the uneven distribution of the obtained nondominated set.

Therefore, combined with the results in Tables 6 and 7, which show that the IMOCA outperformed its competitors in terms of the HV and unary 𝜖-indicator with statistical significance, it could be concluded that the solutions obtained by the IMOCA have a wider distribution along the whole PF with better diversity but are distributed somewhat unevenly on some of the test problems. The superiority of the proposed algorithm over the other ones may be generally attributed to its full use of different knowledge categories at different stages of the evolution according to their productivity.

7.2.3 Set coverage

As illustrated in Section 6.5, when evaluating the solution sets obtained by algorithms A and B, it is necessary to compute both C(A, B) and C(B, A). To compare the quality of the solution sets obtained by the different algorithms, or more specifically, to compare the dominance relationship between the two final populations resulting from two different algorithms, a pairwise comparison for the metric C(A, B) is made based on the solutions obtained by the IMOCA and four representative algorithms, i.e., NSGA-II, NSLS, MOEA/IGD-NS and BCE-IBEA (see Fig. 6a, b and c). In this comparison, we selected the first four algorithms among the remaining 8 algorithms based on the statistical results of the hypervolume and unary 𝜖-indicator.

Fig. 6
figure 6

Box plots of pairwise comparison of set coverage C(A, B) based on 20 independent runs obtained by the 9 algorithms. a Box plots of pairwise comparison of set coverage C(A;B) on MaF1-MaF8. b Box plots of pairwise comparison of set coverage C(A;B) on MaF9-MaF15. c Box plots of pairwise comparison of set coverage C(A;B) on WFG3-8

In Fig. 6a, the subfigures in the first row starting with “IMOCA” are the box plots of the set coverage of the IMOCA on test problems MaF1-MaF8. For example, the subfigure in row 1 and column 2 (the box on the right side of “IMOCA” in Fig. 6a) denotes the proportion of solutions obtained by the IMOCA dominated by the ones obtained by the NSGA-II on the 8 test problems. Similarly, the subfigure in row 2 and column 2 (the box on the left side of “NSGA-II”) denotes the proportion of solutions obtained by the NSGA-II dominated by the ones obtained by the IMOCA. According to these two boxes, it can be inferred that, except for problem MaF8, the IMOCA found better solutions than those found by the NSGA-II, especially for problems MaF3 and MaF4. The results on the other 13 test problems are given in Fig. 6b and c.

From a comparison of the first rows and the first columns in Fig. 6a, b and c, it can be concluded that the solution sets of the IMOCA outperform those of the other four competitive algorithms in producing nondominated solutions when solving most of the 21 test problems, whereas for MaF8/9/10 and WFG7, the solution sets of the IMOCA are worse.

7.2.4 Spacing and spread

Last, we examine the performance of the proposed IMOCA in terms of the distribution and diversity by the spacing and spread metrics, respectively. The box plots of the spacing S are given in Fig. 7a, b and c, while the box plots of the spread Δ are given in Fig. 8a, b and c.

Fig. 7
figure 7

Box plots of metric Spacing S based on 20 independent runs obtained by the 9 algorithms in solving MaF and WFG problems. a Box plots of metric Spacing S on MaF1-8. b Box plots of metric Spacing S on MaF9-15. c Box plots of metric Spacing S on WFG3-8

Fig. 8
figure 8

Box plots of metric Spread Δ based on 20 independent runs obtained by the 9 algorithms in solving MaF and WFG problems. a Box plots of metric Spread (insert symbol here) on MaF1-8. b Box plots of metric Spread ((insert symbol here) on MaF9-15. c Box plots of metric Spread ((insert symbol here) on WFG3-8

These figures show that the IMOCA has lower or nearly equal values of both the spacing and spread indicators compared with those of the other algorithms on most of the problems. Considering the excellence that the IMCOA reveals in terms of the hypervolume, unary 𝜖-indicator and set coverage metric C(A, B), the performance of the IMOCA in terms of the spacing and spread indicates that the solution sets obtained by the IMOCA approach the PFs with both accuracy and diversity in most cases.

7.2.5 Time consumption

The time consumed by the IMOCA and other five algorithms in the comparison is given in Table 8. Since the rest three algorithms are carried out on another platform (the PlatEMO platform), the time consumptions are not commensurable and thus are not listed here. It can be seen from Table 8 that there is no big difference between the time consumptions of the IMOCA and the NSGA-II, which due to the fact that the computational complexity of the influence functions and the sorting part in the IMOCA is the same as the corresponding operations in the NSGA-II.

Overall, the experimental results confirm that, with several simple but necessary modifications, the IMOCA is competitive in capturing well-distributed, near-optimal nondominated solutions under acceptable execution time when solving MOOPs.

8 Conclusions and future work

In this paper, we proposed a modified version of the CA with a multistrategy knowledge base for solving multiobjective optimization problems. Four basic knowledge sources, i.e., normative, situational, topographical and historical knowledge, are adopted in the knowledge base of the proposed algorithm, and necessary modifications were made to the situational, historical and topographical knowledge. In addition, a simple mutation scheme was integrated into the algorithm to perform fine tuning in the late stage of evolution. The experimental results showed that, on most of the selected test problems, the proposed IMOCA was able to converge better to the true PFs and provided a wide and well-distributed spread of solutions for most of the problems compared to the other 8 multiobjective algorithms.

There is still abundant room to refine the proposed algorithm in future work. Inspired by the performance metric computation used in the elimination of noncontributing solutions in the MOEA/IGD-NS, exploitation of the performance evaluation in the selection operation in the IMOCA may further improve the quality of the obtained solution set. Moreover, given the fact that the framework of the CA can integrate any form of evolutionary technique, exploring other effective and efficient knowledge categories for solving MOOPs will also be addressed in future work. Furthermore, attempts at applying the IMOCA to practical MOOPs could be made in the near future. In addition, since the proposed IMOCA is an unconstrained search approach, developing an efficient mechanism to deal with the constraints in MOOPs could be a future direction of research.