1 Introduction

In this paper we focus on problems that optimize several objectives at the same time. Most of the time these objectives conflict with each other, meaning that optimizing one objective implies a loss of performance in another. As the number of objectives and of items under analysis increases, the complexity of the problem increases considerably. Namely the time the problem takes to be solved, due to the large number of possible choices. We seem to have intuitive knowledge of this complexity. From a psychological point of view this may have the negative impact of increasing anxiety (Schwartz 2005). Interestingly, as the amount of choice increases so do the artifacts people use for coping with complexity.

MOEAs (Deb 2009) solve multiobjective optimization problems which occur in a wide range of problems, scheduling, economics, finance, automatic cell planning, traveling salesman, etc. Updated surveys on these algorithms are readily available (Goh et al. 2009; Zhou et al. 2011). There is a class of MOEAs in which we are particularly interested because they use indicators to guide their decisions, in particular they may use the hypervolume (Knowles and Fleischer 2003; Emmerich et al. 2005; Huband et al. 2003; Zitzler and Künzli 2004).

In this paper we obtain the following results:

  • A new algorithm that computes the exclusive hypervolume of every point, from a set with n points. The algorithm is an extension of the QHV algorithm (Russo and Francisco 2014). The naive way to compute the same goal is to run QHV n times, omitting one point at a time. The solution we present is more efficient. It is slower than QHV  but only by a factor of about 3.

  • We propose a way to speed-up the QHV algorithm by using parallel computation. Parallelism can be combined with the divide and conquer methodology of QHV. We design and implement a solution that has minimal communication overhead, and moreover allows the user to restrict the number of threads it uses.

2 The problem

This section gives a description of the problems we study. Given a set of d-dimensional points we first consider the hypervolume of the dominated space. Figure 1 shows a set of points and the respective 2D hypervolume, commonly referred to as area. We define the problem in 2D, but it is easily generalizable to higher dimensions. The region of space under consideration is delimited by a rectangle with opposing vertexes z and o, that are close to 0 and 1, respectively. We consider only rectangles that are parallel to the axis. In general we use hyper-rectangles, instead of rectangles, to define dominance between points. The point z is always a vertex of these hyper-rectangles.

Fig. 1
figure 1

The area of a set of 2D points

The coordinates can be any reals in [0, 1]. Our algorithm uses a sub-routine to eliminate dominated points, so we do not insist on having a set of non-dominated points.

Point p dominates point \(a'\) because \(a'\) is contained in the rectangle of vertexes z and p. Notice that we cannot state that d dominates \(a'\), since the rectangle with vertexes z and d does not contain \(a'\). Likewise point \(b'\) is dominated simultaneously by points a and b.

Given a set S of points, in our example \(S = \{a, a', b, b', p, c, d\}\), we want to compute the dominated area, shown shaded in Fig. 1. The coordinates can be any reals in [0, 1]. We represent this value as \({\textsc {HV}}(z, o, S)\).

The mathematically inclined reader may prefer formal definitions. Consider the d dimensional space \([0,1]^d \subseteq (\mathrm {I\!R}_0^{+})^{d}\). Let us assume that z is the point where all coordinates are 0. If \(p = (p_1, \ldots , p_d) \in [0,1]^d\) and \(a = (a_1, \ldots , p_d) \in [0,1]^d\) we say that p dominates a, denoted \(a \preceq p\) if \(a \ne p\) and \(a_i \le p_i\) for all \(i \in \{1, \ldots , d\}\). The hyper-rectangle associated with p consists in the set \(\mathcal {H}(p) = \{ q \in [0,1]^d | q \preceq p\}\). Given a set of points \(\mathcal {S} \subseteq [0,1]^d\) the hypervolume problem asks for the value of \(\lambda (\cup _{r \in \mathcal {S}} \mathcal {H}(r))\), where \(\lambda \) represents the Lebesgue measure of the set. For computational reasons \(\mathcal {S}\) should be finite, we use n to represent its size.

Let us now consider the exclusive hypervolume of each point, i.e., for each point \(p'\) we consider the area that is dominated exclusively by \(p'\) and by no other point of \(\mathcal {S}\). For a point \(p'\) the respective exclusive hypervolume is defined as:

$$\begin{aligned} {\textsc {XV}}(p', z, o, S) = {\textsc {HV}}(z, o, S) - {\textsc {HV}}(z, o, S\setminus \{p'\}) \end{aligned}$$
(1)

Figure 2 shows the exclusive area of each point. This area may be a hyper-rectangle, but it might also not be. For example the hypervolume of p is not a hyper-rectangle, because of point \(a'\). This means that some dominated points are relevant. More precisely, the points that are dominated only once are relevant, whereas the points that are dominated by more than one other point are not. These latter points do not restrict the exclusive areas. For example, point \(b'\) could potentially affect the exclusive hypervolume of points a and b, but it does not, precisely because it is dominated by more than one point.

Fig. 2
figure 2

The exclusive area of a set of 2D points

Notice that in general, adding up the \({\textsc {XV}}(p', z, o, S)\) values for all points \(p'\) is not necessarily the same as \({\textsc {HV}}(z, o, S)\). Observe that in the rectangles of Fig. 2 the exclusive hypervolumes do not cover all the grey area of Fig. 1.

It is common for MOEAs to select which individuals to discard by determining which point or points have the smallest exclusive hypervolume. This motivates our exQHV algorithm. After determining the exclusive hypervolume of all points, the point that corresponds to the minimum exclusive hypervolume can be determined with a linear scan. Notice that our approach is still extremely efficiently, because using QHV to compute Eq. (1) for more than a single point is not more efficient than using exQHV. Recall the factor of 3 we mentioned in our first result in Sect. 1. On the other hand exQHV does not consider interaction among points, hence it is not designed to determine the set of k points that has maximal, or minimal, hypervolume. Still we can determine the k points with the smallest exclusive hypervolume, by keeping k elements in a binary max-heap and scanning, or in linear time with high probability (Hoare 1961; Floyd and Rivest 1975; Kiwiel 2005).

To compute hypervolumes, and exclusive hypervolumes, as efficiently as possible, we consider parallel computation. This form of computation is available in commodity hardware. Dual and quad core processors are nowadays fairly common, moreover current industry trends are directed towards increasing the number of cores. Therefore we design a way to distribute the computation of hypervolume across several cores. We show that our solution is efficient, meaning that fixed and strictly sequential costs are minimized, as far as possible. This is particularly important as the number of cores increases. Moreover we also present a design that makes a reasonable use of resources, meaning that it does not consume all the cores in system. Our solution allows us to specify how many cores to use. We show that the divide and conquer methodology of QHV can be explored to design efficient parallel versions, provided proper technical design decisions are implemented.

3 Extensions

In this section we present our solution to compute the exclusive hypervolume and describe an architecture for the parallel computation of QHV and exclusive QHV. We focus our discussion on the 2D case for the sake of clarity. The solution for higher dimensions follows from the exposition in the original QHV paper (Russo and Francisco 2014), by exploiting the structure of higher dimension space and by using projections, Hasse diagrams, and optimizations for small sub-problems. In the next section we detail our solution for exclusive hypervolumes at higher dimensions. Let us start with a brief overview of 2D QHV.

3.1 Quick hypervolume

Pivot divide and conquer is the technique used by QuickSort (Hoare 1961) and QHV. The process consists of the following three steps:

  1. 1.

    Select a “special” pivot point. This point is processed and excluded from the recursion.

  2. 2.

    Divide the space according to the pivot, more precisely classify points into the possible space regions.

  3. 3.

    Recursively solve the points in each of the “smaller” sub-regions of space, and add up the hypervolumes.

Figure 3 shows an example of this process, in 2D. First we choose point p to be the pivot. Second we divide the rectangle, of vertexes z and o, according to p. Thirdly we recursively compute the area of the points in quadrants 01 and 10. Let us now focus on the details of each step. In the first step it is necessary to choose a pivot p that is not dominated by any other point \(p'\). Naively comparing all pairs of points would yield \(O(n^2)\) time. Instead we compute the area of the rectangles with vertexes z and \(p'\). We choose p to be the point with the largest dominated area. Hence p is non-dominated by any other \(p'\), meaning that the quadrant 11 will contain no other point.

Fig. 3
figure 3

Pivot divide and conquer for 2D points. The quadrants are labeled by binary numbers

The space is divided into quadrants, according to the pivot. The quadrants are labeled in binary. The first bit is used for the x coordinate and the second bit for the y coordinate. A 0 indicates that, for the given coordinate, the points of the quadrant are between z and p. Otherwise the points of the quadrant are between p and o and we use a 1. In our example with \(p = (0.5,0.4)\), point \(c = (0.6,0.2)\) is in the quadrant 10, because \(0.5 < 0.6\) and \(0.4 > 0.2\). We can classify a point into a quadrant by comparing it with p.

The conquer procedure is called recursively for quadrants 10 and 01. Each recursive call uses its own bounding rectangle and the respective list of points. Assume that the points \(z'\) and \(o'\) are the vertexes of the bounding rectangle of quadrant 01 and likewise points \(z''\), \(o''\) bound quadrant 10. The total area is obtained by adding the results of the smaller sub-problems with the area of the pivot. In our example this yields:

$$\begin{aligned} {\textsc {QHV}}(z, o, \{a, a', b, b', p, c, d \})= & {} {\textsc {QHV}}(z', o', \{a, b, b'\}) \nonumber \\&+\, {\textsc {QHV}}(z'', o'', \{c, d\}) \nonumber \\&+\, 0.5 \times 0.4 \end{aligned}$$
(2)

Notice that point \(a'\) is discarded because it belongs to quadrant 00, and therefore is dominated by p. This is one way by which QHV identifies dominated points. Using the same analysis as QuickSort we can show that, for 2D, QHV requires \(O(n \log n)\) expected time. Let us now explain on modifying QHV to compute exclusive hypervolumes.

3.2 Exclusive hypervolume

The simplest approach to compute exclusive hypervolumes would be to use QHV \((n+1)\) times, using the definition of exclusive hypervolume directly, i.e., Eq. (1). This expression can be computed with any hypervolume algorithm. However it multiplies the overall time by O(n), thus compromising performance.

We solve this problem by modifying the original QHV approach. Instead of computing only one hypervolume we attempt to simultaneously compute all the exclusive hypervolumes. Consider Fig. 3, notice that the exclusive hypervolume of a is entirely contained in quadrant 01. More precisely we have that:

$$\begin{aligned} {\textsc {XV}}(a, z, o, \{a, a', b, b', p, c, d \}) = {\textsc {XV}}(a, z', o', \{a, b, b'\}) \end{aligned}$$
(3)

Therefore we can recursively solve our problem by computing the exclusive hypervolumes of points \(\{a, b, b'\}\) in quadrant 01 and points \(\{c,d\}\) in quadrant 10. This means that, in 2D, the \({\textsc {XV}}\) values can be coherently computed, with the divide and conquer technique, by restricting the calculus to the quadrants to which the points are assigned. The only problematic point is p, because it is not assigned to a quadrant, since p is the pivot. A solution to this problem would be to compute QHV without p and use the definition of exclusive hypervolume, Eq. (1). Instead we propose a uniform approach. We alter the pivot selection criterion and choose point \(p'\) of Fig. 4 as the pivot.

As before we start by choosing p as the point with the largest hypervolume. We then intercept all other points with p and choose \(p'\) to be the one with the largest volume. In this case by intercepting p with b we obtain point \(p' = (0.3, 0.4)\) with hypervolume 0.12. Therefore we have that

$$\begin{aligned} {\textsc {XV}}(p, z, o, \{a, a', b, b', p, c, d \}) = {\textsc {XV}}(p, z'', o'', \{a', p, c, d\}) \end{aligned}$$
(4)

Notice that \(p'\) does not appear in the point set, by choosing a point that was not in the original set we solve the previous problem. It is no longer necessary to compute the exclusive hypervolume of the pivot. Choosing \(p'\) as the pivot keeps the algorithm simple. It may seem that this choice would guarantee that we could compute exclusive hypervolumes in the same time as QHV. This is not the case. First, QHV is highly sensitive to the choice of pivot. In this case, selecting a pivot that is slightly before p slows down the performance. A paradigmatic example of why this happens is \(a'\). This point is dominated by p but not by \(p'\). Therefore, according to p, it would be classified into quadrant 00, whereas according to \(p'\) it is classified into quadrant 10. This significantly affects the overall time. Second, do not forget that, we need to compute several exclusive hypervolume values, instead of a single hypervolume value. Although our division reduces the time to compute a given exclusive hypervolume, for very small sub-problems we resort to other algorithms, which therefore have to handle repeated computation. The time to compute small sub-problems is a very significant fraction of the total time.

Fig. 4
figure 4

Exclusive pivot divide and conquer for 2D points

Our new criterion has an important side-effect. If the only two existing points were \(a'\) and p, the resulting pivot would be \(a'\), which is the correct choice. Still, in this example, the property that we have just identified does not hold. This encapsulation property allowed us to restrict the computation of the \({\textsc {XV}}\) values of a point to the same quadrant to which it was assigned. This example illustrates that this encapsulation technique needs to be refined, particularly for spaces with higher dimensions. To solve the problem in 2D the \({\textsc {XV}}\) of p would be divided and therefore it would be necessary to add it up across sub-problems. Section 4 gives a detailed explanation, moreover it also solves higher dimensions.

Let us now consider the stability of our algorithm. Computing \({\textsc {XV}}\) can lead to numerical difficulties, for some instances of the problem. Consider the set of points (0.5, 0.5),(0.45, 0.5),(0.5, 0.45). The \({\textsc {XV}}\) of the first point is \(0.05^2=0.025\). Computing this value with Eq. (1) is problematic due to the cancellation effect of subtraction. Extending our example to 7D, by assigning 0.45 to each coordinate at a time, the resulting \({\textsc {XV}}\) value of the first point is \(0.05^7\), which evaluates to around 7.812E-10, using a float. If we evaluate Eq. (1), as \(0.5^7 - (0.5^7 - 0.05^7)\), we obtain 9.31E-10, i.e., a significant deviation from the actual value. Therefore as the number of dimensions increases this problem becomes more significant.

To minimize precision problems it is better to compute subtractions among single coordinates, and multiply the resulting values, rather than take subtractions of products, i.e., hypervolumes, as in Eq. (1). The algorithm we propose works in the stable way. Reducing the bounding rectangle to \(z'\) and \(o'\) or \(z''\) and \(o''\), implies a subtraction in coordinates, and multiplications are computed afterwords, to obtain \({\textsc {XV}}\) values. In fact, in the current exposition, the hypervolumes are never subtracted, instead they are added-up.

3.3 Parallel computation

Recall the division of \({\textsc {QHV}}(z, o, \{a, a', b, b', p, c, d \})\) in the beginning of this section. According to this equation we need to compute \({\textsc {QHV}}(z', o', \{a, b, b'\})\) and \({\textsc {QHV}}(z'', o'', \{c, d\})\). These two tasks are independent, in the sense that none needs the result of the other to be computed. That is not to say that they do not share common data. In this example there are no shared points, but in general there might be. Still, both tasks can be computed in parallel. The same happens for the exclusive values. We focus on designing an architecture to run parallel sub-problems with minimal overhead costs.

There are several significant costs that we incur when using threads. These costs force us to minimize the number of threads that are created. In particular initialising a thread involves, amongst other tasks, zeroing an array with \(2^d\) elements. For large values of d this process consumes a significant fraction of the overall time. These variables are used as counters for assigning points to hyperoctants, see Sect. 4.1.

Limiting the total amount of threads is not necessarily the same as limiting the number of threads that are executing simultaneously. The task pool (Butenhof 1997) solution we used also allows us to restrict the number of simultaneously executing threads. This is important, not only because of the previous costs we mentioned, but also because passing tasks to worker threads is likely to be faster than thread switching. Moreover we also set the number of worker threads to be at most the number of system cores, hence easily guaranteeing that the system is responsive, even when processing hard hypervolumes. The trade-off in this case is that it will take longer for the computation to finish.

One way to minimize the total number of tasks that are performed by threads is to assign them a small number of lengthy tasks. For example assign 4 tasks to 4 cores. This would be an extreme solution. In parallel computation the overall time is determined by the core that is last to finish. This extreme solution can compromise the performance by leading to situations where only few processors are occupied, while many others wait. Therefore balancing the load among threads is crucial.

Instead we avoid the thread creation, and initialisation costs by reusing threads, i.e., the worker threads, which are computing the QHV sub-problems, are not destroyed when they finish a task. Instead they receive a new task and process it. We therefore hold a task queue, from which worker threads can pick up tasks. Worker threads are destroyed only after all tasks are calculated, i.e., when the queue is empty. Figure 5 shows a diagram that represents this technique.

Fig. 5
figure 5

Task pool architecture

4 Discussion and analysis

In this section we discuss how to adapt our extensions to higher dimensional spaces and present present experimental results from our prototype. We start by revising how QHV (Russo and Francisco 2014) generalizes to higher dimensions, discussing point projections, the structure of higher dimension space, Hasse diagrams, and the optimization for small sub-problems, etc. We then explain how this theory applies to XV. Next we focus in task pool architecture for parallel computation. We finish this section with an experimental comparison between our prototypes and state of the art alternatives.

4.1 Higher dimensions

Let us pick on the discussion that we started in Sect. 3.1. Let us consider, for a moment, that p was not a point of the original set. This means that we were partitioning the space according to an “external” pivot. The relation in the in Eq. (2) would now be inaccurate, because of the \(0.5 \times 0.4\) term. This term is related to the 00 quadrant, for which the contributing area is now different. To determine the correct area we need to project the points b and c to that quadrant. Point b is projected to \(b'=(0.3,0.4)\), point c is projected to \(c'=(0.5,0.2)\), see Fig. 6. In this case projecting point b means finding the point of the 00 quadrant that is dominated by b and that dominates all other points of that quadrant that are also dominated by b. Operationally determining the coordinates of the projected points consists of replacing some of the coordinates with the values of the pivot. For point b we replaced the y coordinate and for point c we replaced the x coordinate. Another way to understand how projections work is to consider set intersection. Let A be the rectangle with vertexes z, b and B the rectangle with vertexes z and p. The projection \(b'\) is the upper right vertex of \(A \cap B\). This notion of projection cannot however be used when we are given the quadrant number instead of the vertex p.

In this hypothetical scenario, the last term in the equation could be replaced by \({\textsc {QHV}}(z, p, \{a', b', c'\})\). In 2D it is redundant to project the points a and d, but the same is not necessarily true in higher dimensions. One way to realize this claim is to look at Fig. 6 as the top view of a 3D space, meaning that the points may have distinct z coordinates. In this case the division lines, the ones that intercept at p, are 2D planes.

Fig. 6
figure 6

Projection of points b and c to the 00 quadrant and of point f to quadrants 01 and 10

Point projections are indeed necessary for higher dimensions. Points from quadrants 01 and 10 can be projected into the 00 quadrant. During the course of the execution no point will be classified into the 11 quadrant, because such a point would dominate p and hence violate the pivot selection criteria. Still if there existed a fictitious point f in the 11 quadrant, it could be projected into the 01 and 10 quadrants, and by transitivity into 00, the resulting point would be p, no matter what point of 11 is projected.

An important observation is that a point from the 01 quadrant cannot be, meaningfully, projected into the 10 quadrant, and vice versa. These properties can be represented graphically by a Hasse diagram, Fig. 7. The diagram represents the quadrants as vertexes and the fact that points from a quadrant can be project into another by arrows. This diagram can be generalized into higher dimensions. Figure 8 shows the Hasse diagram of a 4D space.

Fig. 7
figure 7

2D Hasse diagram

Fig. 8
figure 8

4D Hasse diagram

Although this latter diagram looks intricate the underlying logic is simple. Each of the nodes represents a region of space, referred to as hyperoctants instead of quadrants. The hyperoctant 0000 is the one that contains the bounding vertex z, likewise 1111 contains o. Hyperoctants 1110 and 1010 are adjacent in the diagram, because 1010 is obtained by flipping the second bit from 1 to 0. Adjacent hyperoctants differ by exactly one bit. This means that the points of 1110 can be projected into 1010, by replacing the second coordinate. Moreover by also replacing the first coordinate the points of 1110 can be projected into 0010. In general the points of a hyperoctant h can be projected into \(h'\) iff \( h\ \mathtt { \& }\ h' = h'\), where \( \mathtt { \& }\) is the bitwise AND operation. For example the points of 1110 cannot be projected into 0101, since \( 1110\ \mathtt { \& }\ 0101 = 0100 \ne 0101\). The \( \mathtt { \& }\) operation exposes that, for the last coordinate, a point of 1110 is “before” the pivot whereas a point of 0101 is “after” the pivot. Hence the last coordinate invalidates projections from 1110 to 0101.

In d dimensions QHV is generalized as follows:

  1. 1.

    Pivot selection is essentially the same procedure as before, computing hypervolumes, instead of areas.

  2. 2.

    Dividing the problem into sub-problems consists in classifying and projecting points. Point classification is essentially the same as before. Comparing \(p'\) with the pivot involves comparing all d dimensions, and now there are \(2^d\) possible hyperoctants. Moreover each hyperoctant is also assigned all possible point projections. For instance hyperoctant 0001 includes the projections of all the points assigned to 0011, 0101, 0111, 1001, 1011 and 1101. Therefore a point, by means of projections, has influence on several sub-problems.

  3. 3.

    The conquer step is performed recursively, but now it is slightly more complex. There are no points in 1111, or in general in hyperoctant \(\top \), therefore there is no recursive call into this region. There is also no recursive call to 0000, in general we represent this hyperoctant as \(\bot \). For \(\bot \) we use the hypervolume of the pivot.Footnote 1 If there are any points in this hyperoctant they are dominated by p and therefore discarded. For any other hyperoctant h we compute a recursive call.

In the recursive call it is also necessary to compute the appropriate bounding vertexes \(z'\) and \(o'\). Vertex \(o'\) is obtained as the projection of o into the respective hyperoctant. Vertex \(z'\) is a dual projection of z, meaning that the coordinates are projected away from z, instead of towards it.

4.2 Exclusive hypervolume

Let now adapt the multi-dimensional QHV to compute exclusive hypervolumes, by extending Sect. 3.2 and the previous section.

First notice that the right side of Eqs. (3) and (4) is very simple, because it contains only one term, which is related to the 10 quadrant. In a higher dimensional space the exclusive hypervolume of a given point a will span across several hyperoctants. Hence the right side of those equations will become the sum of the portions of the hypervolume of a that are spread over several hyperoctants. In other words the exclusive hypervolume of a will consist in the sum of the exclusive hypervolume of all its projections inside the respetive hyperoctants. For the general formula we use the following notation:

  • \(a\downarrow p(h)\), represents the projection of point a into the hyperoctant h, using p as a pivot. We also use this notion by extending a to a set of points.

  • \(a\downarrow p\), represents the set of all hyperoctants for which the projection of point a exists.

Hence given a set of points P, an element \(a \in P\) and some pivot p the general formula is the following:

$$\begin{aligned} {\textsc {XV}}(a, z, o, P) = \sum _{h \in a\downarrow p} {\textsc {XV}}(a\downarrow p(h), z\uparrow p(h), o\downarrow p(h), P\downarrow p(h)) \end{aligned}$$
(5)

Note that in the previous Equation the highest hyperoctant \(\top \) may belong to \(a\downarrow p\), which justifies why exQHV may consider this hyperoctant for recursion. Note also that we use the notation \(z\uparrow p(h)\), because the projection of z is upwards towards p, instead of downwards like all the other points. Another way to represent this would be as \(p\downarrow z(h)\).

Since exQHV computes the exclusive hypervolume for all points in P simultaneously the implementation contains an array double* exHV to store these values. All the values are initialized to 0, as exQHV recursively processes its input whenever it encounter a point \(a\downarrow p(h)\) it adds its XV value to the value already in exHV for point a. These hypervolumes shoud be obtained in the basis of the recursion for exQHV, and they are, but those sub-problems are not necessarily of size 1 or 2 points. When we proposed QHV, we pointed out that for small enough sub-problems, of around size 10, it was better to use another algorithm such as IEX and HSO. We also adapted these algorithms to compute \({\textsc {XV}}\). For HSO the value is calculated in a naive way, by removing one point at a time. Which may be numerical unstable. Therefore we coded internal asserts to monitor unstable instances. For IEX we used a small optimization. The interception of a set of points is computed only once, it is not repeated for sets where a non related point is removed. The overall time becomes \(\varTheta (d2^n + n2^n)\), instead of the totally naive \(\varTheta (nd2^n)\) time. Since for higher dimensions, e.g. \(d = 10\), we use \(n=10\), this optimization means that we are closer to \(20 \times 2^{10}\) ops, than to \(100 \times 2^{10}\) ops. This analysis is also a bit too optimistic, since we are using SSE2 instructions to compute projections, therefore reducing the impact of the d factor.

If instead of using optimized sub-problem resolution we used the QHV approach until there is only 1 point in each sub-problem we could minimize the numerical instability, but the resulting algorithm would be significantly slower. Moreover the numerical problems in computing \({\textsc {XV}}\) are intrinsic, and cannot be completely attributed to the algorithm that is computing \({\textsc {XV}}\). For a given function f an algorithm that computes f(x) is mixed stable if it computes \(y^*\) and \(f(x + \varDelta x) = y^*\) for a small \(\varDelta x\), meaning that we missed the actual result, but a small change in the input would yield the result we got. Now recall the example we gave in Sect. 3.2. Suppose we distorted (0.5, 0.5) to (0.51, 0.5) now the respective \({\textsc {XV}}\) would be \(0.025 + 0.01 \times 0.5 = 0.075\), which is significantly different from the original, i.e., there is a big difference between f(x) and \(y^*\).

An important optimization sub-routine of QHV was to remove dominated points, or projections, from the recursive calls into a given hyperoctant, i.e. from \(P\downarrow p(h)\). This was done because dominated points did not contributed to the hypervolume, but still required space and processing time. For exclusive hypervolume this is not exactly the case. Recall that in Figs. 2 and 4 the point \(a'\) is dominated by point p, but it is still relevant for the \({\textsc {XV}}\) value of p. In this case the logic is that points that are dominated by at most one point are still relevant to the exclusive hypervolume computations and cannot be excluded. Still points that are dominated by at least two points can be safely excluded, for example point \(b'\) can be safely excluded because it is dominated simultaneously by a and b. This was simple to integrate into the dominance detection algorithm, because it already compared all pairs of points, it was only a matter for keeping count. Notice also that if \(a \downarrow p(h)\) is found to be dominated in \(P\downarrow p(h)\) then its exclusive hypervolume in h is 0.

4.3 The task pool

In this section we complete the Task Pool description, initiated in Sect. 3.3.

In our implementation the number of working threads is set to m at the beginning of the program. Only when all tasks are finished do we terminate the threads. The workers also post tasks to the queue, basically they post the sub-problems of the task they receive. The number of tasks is potentially larger than the number of workers, meaning that when a task is added to the queue it is not necessarily solved immediately. It waits in line until a worker picks it up.

Before submitting a sub-task to the queue the worker checks the size of the task, if it is too small then it solves the sub-task himself. The communication costs in that case might be too high.

The size of task queue is limited, it accepts only a fixed number of tasks. Therefore when a task is submitted it is tested for admission. If the queue is full it is not admitted, otherwise a score is assigned to the task. The score is compared with a global value in the queue. The global value depends on the scores of tasks that where previously submitted to the queue. If the task is not admitted to the queue then the proposing worker must solve it himself. Figure 5 shows the architecture of our task pool.

A reason to have a fixed queue size is that each task contains a fairly large task structure, which must be large enough to contain the largest tasks. Therefore restricting the queue size controls the overall amount of space consumption. This is illustrated by the empty task structures that return to the queue in Fig. 5.

Our task pool contains an extra layer, jobs. The underlying assumption, up to now, is that we are computing the hypervolume of a certain set of points. Whenever we have a sequence of problems, i.e., several sets of points, it is better to process them in parallel. This allows us to build bigger tasks and reduce communication costs among threads. Which is significant, because even a minimal reduction in communication costs can have a very significant impact in the overall performance. See Amdahl’s in Sect. 5.2.

Therefore a job represents the hypervolume computation for some set of points. Our prototype uses a console thread that reads the sets of points and puts them in the job queue, moreover whenever a job is queued it is assigned an initial task that gets queued in the task queue. This task always gets queued, the operation blocks the console until that happens. The job queue is also static, as its elements store a job structure, which must also have enough space to store large jobs. A job is finished when all of its sub-tasks are processed, therefore each job must keep track of how many of its tasks exist. Each job keeps a task counter.

A worker thread solving a task has access to data and structures from three sources.

  • The jobData, which contains information related to the job, that does not change among sub-tasks. In QHV jobData contains the coordinates of all points.

  • The taskData, which in QHV contains the indexes idx of the points, for the specific sub-problem.

  • The toolBox provides space for each worker to store tools. A tool is a structure that is useful for a worker, but does not need to be replicated for each task. Moreover they do not need to be re-initialized whenever it receives a new task. The splitter structure and structures to verify point dominance and inclusion exclusion are some examples of such tools, where for instance the initialization of the splitter structure requires \(O(2^d)\), and the same cost occurs in some of the other structures. For the parallel version of exclusive hyperVolume we include another structure into the toolBox, which is similar to our adaptation of Briggs and Torczon (Briggs and Torczon 1993; Bentley 2000; Aho and Hopcroft 1974) but is used to accumulate exclusive hypervolumes and therefore stores double values.

In Appendix we exemplify the kind of information that can be stored in these structures for computing Fibonacci numbers.

4.4 Testing

We implemented a parallel prototype of QHV, a version of it that computes exclusive volumes, named exQHV, and also a parallel version of the latter exclusive algorithm. All these prototypes, as well as the original QHV, are publicly available at http://dx.doi.org/10.5281/zenodo.16929.

We compared our parallel prototype of QHVwith FHV (Watanabe et al. 2015), the only prototype that we are aware of that supports parallelism. Although exQHV computes the exclusive hypervolume of each point in a set, we still compared it with IHSO (Bradstreet et al. 2008) and IWFG (While and Bradstreet 2012) algorithms, that determine which point in a set contributes least to the hypervolume of the set. See Sect. 5 for more details.

All results were obtained on a computer with four AMD Opteron 6276 sixteen-core processors at 2.60GHz, with 256KB + 512KB of L1 cache, 16MB of L2 cache, 16MB of L3 cache, and 256GB of main memory. The prototypes were compiled with gcc 4.8.2. For QHV we used -msse2 flag to support SSE2 and, in order to align memory to cache lines, we passed the cache line size into the code -DCLS=$$(getconf LEVEL1_DCACHE_LINESIZE). The dimension was also determined at compile time, so there is a different binary for each dimension, this allows for better loop unrolling. Each one of the other prototypes contains a sophisticated makefile that produces optimized binaries, it automatically selects the best flags for a given platform. We used the optimized binaries produced by those makefiles, for our platform. Besides the proper flags we included -O9 -march=core2. IWFG and IHSO were provided already as binaries by authors.

In the experimental evaluation we used points uniformly distributed on the surface of hyper-sphere. In the original QHV paper (Russo and Francisco 2014) we proposed that such points could be obtained by choosing a random point in \([0,1]^d\) and projecting it into a hyper-sphere. This process avoids bias towards a specific dimension, but it is not uniform. The points are more likely to be closer to \((1,\ldots ,1)\) than they are to be closer to \((0,\ldots ,0,1)\). This bias can be at most a factor of \(\sqrt{d}\).

Table 1 Run time and speedup statistics for parallel QHV, FHV and WFG on hyper-spherical fronts with 7 dimensions and 100,000 points

In this paper we used the following method to generate points uniformly on a hyper-sphere. We generated points, where each coordinate is chosen from independent normal distributions with average 0. We then projected the points into the hyper-sphere, by calculating the distance to the origin and dividing every coordinate by this value. Hence obtaining points at distance 1.

The standard deviation \(\sigma \) is not relevant, provided it is the same for all coordinates. Negative values are converted to their absolute value. To see why this method is direction independent, in 2D, recall that the density function of such points is:

$$\begin{aligned} \frac{1}{\left( \sigma \sqrt{2\pi }\right) ^{2}} e^{-(x^2+y^2)/(2\sigma ^2)} \end{aligned}$$
(6)

Considering a direction \(\theta \) and a radius r we can express the coordinates as \(x = r \cos (\theta )\) and \(y = r \sin (\theta )\). Then it follows by the Pythagorean trigonometric identity that the density function does not depend on \(\theta \), i.e., it is uniform in all directions. This result also holds in higher dimensions.

To obtain points from a normal distribution we used the IrwinHall distribution with 20 samples (Irwin 1927; Philip 1927), i.e., to obtain one value from a normal distribution we generated 20 values in \([-0.5,0.5]\) and added them up. More efficient methods are available, but this approach was simple and effective.

In this section we report only results on random and hyper-spherical front datasets. All tests were ran 10 times and we report average values, as well as standard deviations. See Table 2 for details. Results for other datasets are identical in what concerns speedups, and these are the hardest cases from a computational point of view (Russo and Francisco 2014).

Table 1 summarizes results concerning the comparison of parallel QHVwith FHV (Watanabe et al. 2015) on fronts with 7 dimensions and 100,000 points. As FHV authors point out, this algorithm is competitive for large sets and, hence, we generate large sets of points to get a fair comparison. We also included WFG (While and Barone 2011) in our comparison since FHV relies on it to solve small instances. Since FHV relies on OpenMP (http://www.openmp.org/) for parallelization, we controlled the maximum number of threads by setting the environment variable OMP_NUM_THREADS to 1, 2 and 4.

As we can observe in Table 1, FHV is slower and shows lower speedups than parallel QHV, with the latter making better use of parallel processing. This is not surprising as the divide and conquer strategy employed by FHV leads in general to larger subproblems than the divide and conquer strategy used in parallel QHV(see Sect. 5 for details). On the other hand FHV relies on temporary files, that must be written on disk, processed by WFG, and then read back by FHV to compute the final result. It is still interesting to note that, although FHV relies on WFG to solve smaller instances, it performs much better than WFG alone even if using a single thread. This corroborates the observation that divide and conquer based approaches can really help on the computation of hypervolume of large sets, as already observed before (Russo and Francisco 2014).

Let us now focus on the performance of parallel QHV. Figs. 9 and 10 show speedups obtained with up to 32 threads. In Sect. 5.2 we recall Amdahl’s law. We denote the time spent on non parallelizable tasks as \((1-f)\). This value is shown to be smaller than 1 %. This value is estimated using Amdahl’s law, the resulting formula is know as the Karp–Flatt metric. See Eq. (9) in Sect. 5.2. The actual values are are 0.008 on average, with a standard deviation of 0.003.

Fig. 9
figure 9

Speedups obtained with up to 4 threads for hyper-spherical and random fronts, with 500 to 1000 points. Plotted values (asterisk) were averaged over 10 runs and 20 different fronts, for both QHV and exQHV. See Table 2 for further details. The bounds f are due to Amdahl’s law, see Sect. 5.2

Fig. 10
figure 10

Speedups obtained with up to 32 threads for hyper-spherical and random fronts, with 500 to 1000 points. Plotted values (asterisk) were averaged over 10 runs and 20 different fronts, for both QHV and exQHV. See Table 2 for further details. The bounds f are due to Amdahl’s law and the bound \(\alpha \) is due to Gustafson’s law, see Sects. 5.2 and 5.3

Figure 10 also provides an estimate for the parameter \(\alpha \) found in Gustafson’s law. See Sects. 5.2 and 5.3 for further details and discussion concerning Amdahl’s law and Gustafson’s law interpretation and implications. We note also that, as the number of dimensions and points increase, the standard deviation values diminish. This fact is expected as tasks become harder, taking more time to compute and allowing a better balance among threads, independently of the problem instance being considered.

Results presented till now concern both QHV and exQHV, but since we expect that computing exclusive hypervolumes would increase computing time when compared with computing just hypervolumes, we provide a comparison between QHV and exQHV in Fig. 11. Hence, the extension to exclusive increased the running time by a factor of 3.066 on average. We note however that this factor was observed for all datasets under testing, with a different number of dimensions and points. In particular, the values in Fig. 11 were averaged over all runs with standard deviation of only 0.034.

Fig. 11
figure 11

QHV versus exQHV running on 4 threads, where exQHV takes 3.066 times more time on average, with a standard deviation of 0.034. Values computed over all 7D datasets, where n is the number of points per front

Table 2 Speedup statistics for parallel QHV on hyper-spherical fronts with 5, 7 and 13 dimensions, and on random fronts with 13 dimensions
Fig. 12
figure 12

Run time for non-parallel exQHV, IHSO and IWFG, on hyper-spherical fronts with 7 dimensions. Plotted values were obtained for 10 runs and 20 different fronts

Fig. 13
figure 13

Time for non-parallel exQHV, IHSO and IWFG, on hyper-spherical fronts with 13 dimensions. Plotted values were obtained for 10 runs and 20 different fronts

We computed also the overhead concerning the QHV parallel extension by comparing it with our previous purely sequential implementation. As we can see in last two columns of Table 2, the impact is below 10 %, or below 5 % for higher dimensions. This degradation on performance is however clearly compensated by using multiple threads, as we have shown with above results.

Finally we compare exQHV with IHSO (Bradstreet et al. 2008) and IWFG (While and Bradstreet 2012) algorithms, that determine which point in a set contributes least to the hypervolume of the set. As discussed in Sect. 5, although this problem is different than computing the exclusive hypervolume of each point in a set, it constitutes a particular application as the latter includes necessarily the solution for the former. Figures 12 and 13 show the run time for non-parallel exQHV, IHSO and IWFG, on hyper-spherical fronts with 7 and 13 dimensions, respectively. As expected IHSO performs better for lower dimensions and worser for higher dimensions, for which IWFG performs much better. Although solving an harder problem, exQHV achieves a good performance. Note that in what concerns non-parallel QHV, we shown that it is competitive with original WFG (Russo and Francisco 2014) and, as discussed above, the extension of original QHV to exclusive increased the running time by a factor of around 3. Hence, given that IWFG is solving a simpler problem, the performance is according to our expectations.

Note that since exQHV determines \({\textsc {XV}}\) for all the points in the data set it is actually competitive against IWFG. Assume we wish to discard somewhere between 4 and 8 points, which is reasonable when there are 100 or more points. In this case it is enough to execute exQHV once and determine the points to discard in non-significant time, with a linear scan (Hoare 1961; Floyd and Rivest 1975; Kiwiel 2005). On the other hand IWFG and IHSO would became slower, because the algorithms would have to be iterated 4–8 times. These two approaches might not return the same sets of points, but Bringmann and Friedrich showed that removing the least contributing point can perform arbitrarily bad. Hence discarding a compelling argument for choosing the latter approach.

5 Related work

An updated description of work on hypervolumes computation is given in the original QHV paper (Russo and Francisco 2014). Meanwhile Watanabe et al. (Watanabe et al. 2015) independently proposed a divide and conquer algorithm for computing the hypervolume, the FHV algorithm. Contrary to QHV the FHV algorithm divides the space into 2 sub-problems. Likewise, FHV also uses point projections for sub-problems, hence the total amount of points in the sub-problems is potentially more than the original number of points. The authors immediately recognized that this approach can be used by parallel algorithms. Section 4.4 includes an experimental comparison with the FHV, as it is the only prototype that we are aware of that supports parallelism. We should note that FHV relies on another algorithm for computing the hypervolume of smaller sets, in our case we used WFG (While and Barone 2011).

The description of FHV is more general than QHV. Meaning that the QHV algorithm is a particular case of FHV, because QHV uses a pivot to guide the division. The authors of FHV mention that their implementation partitions the dimensions sequentially, which is equivalent to dividing the space according to a point, although such a point need not be part of the problem set.

At this time we are not aware of any implementations to compute hypervolumes in parallel. Instead we will discuss Amdahl’s law and Gustafson’s law, as they are necessary to interpret the experimental values in Sect. 4.4.

5.1 Exclusive hypervolume

Let us start by reviewing exclusive hypervolumes.

Most hypervolume indicator based optimization algorithms like SIBEA, SMS-EMOA or MO-CMA-ES remove, from the population, the point with the smallest hypervolume. This procedure is iterated \(\lambda \) times until the number of points no longer exceeds a fixed \(\mu \). However Bringmann and Friedrich showed that this scheme can perform arbitrarily bad (Bringmann and Friedrich 2010). They also presented the first algorithm that could compute several \({\textsc {XV}}\) values in one run. Their algorithm required \(O(n^{d/2} \log n + n)\) time, since it was based on the algorithm by Overmars and Yap (1988). A direct comparison with our approach is hard, as QHV was shown to have a worst case of \(O(nd2^{nd})\), but in most hard cases it behaved as \(O(n^{0.12+0.38d})\), the extension to exclusive hypervolumes degraded the performance by a factor of 3.066 (on average over all conducted tests, with a standard deviation of 0.034), see Fig. 11. In fact the algorithm by Bringmann and Friedrich was even more flexible, it could compute the exclusive hypervolumes of sets of points. Contrary to our approach, where each set has exactly one point, they could simultaneously compute the hypervolume of several sets of \(\lambda {} \ge 1\) points each. The total time was then \(O(n^{d/2} \log n + \lambda n)\). Still this contribution was mostly theoretical, as we found no implementation of this algorithm.

In fact the same authors showed that computing \({\textsc {XV}}\) is “NP-hard in general, but fast in practice” (Bringmann and Friedrich 2009), meaning that they proposed an algorithm that can approximate the minimal exclusive hypervolume by a ratio of \((1+\epsilon )\) with probability \(1-\delta \), where \(\epsilon ,\delta >0\).

The point with the smallest hypervolume in a set can be computed using IHSO (Bradstreet et al. 2008) or IWFG (While and Bradstreet 2012), the faster algorithms for this particular problem as far as we know. Hence, even if exQHV solves an harder but related problem, we compared it with these two algorithms in Sect. 4.4.

5.2 Amdahl’s law

Amdahl’s law is crucial in designing parallel algorithms and in interpreting results and projecting expected speed-ups (Amdahl 1967). Neglecting Amdahl’s is perhaps a tempting reaction to its paradoxical conclusions. On the one hand it points out that one should only optimize if the fraction that can be optimized constitutes a large portion of the overall time. On the other hand, if the optimization is effective, the speed-up we obtain is largely determined by the strictly sequential fraction, which cannot be optimized and initially constituted only a small fraction of the time.

Given m processors (cores) and a program that spends a fraction f of time on operations that are infinitely parallelizable, and the remaining fraction \(1-f\) on strictly sequential operations. The overall speedup s(fm), obtained by optimizing f with m processors and leaving the fraction \(1-f\) untouched is given by the following equation:

$$\begin{aligned} s(f, m) = 1/((1-f)+(f/m)) \end{aligned}$$
(7)

This means for example that for \(f = 90 \%\) and \(m=4\) the resulting speedup is not much more than 3. Hence even for problems with a large f the overall speed-up comes up one core short. Notice that all cores are necessary to obtain this performance, but the speed-up is significantly \(<\)4. In fact such program has a maximal speed-up of 10, consider s(0.90, m) as m grows to \(+\infty \). The limit speed-ups are given by \(1/(1-f)\). Figures. 14 and 15 show the plots of s(fm), with varying m with different f values, the horizontal lines represent the limits.

Fig. 14
figure 14

Bounds due to Amdahl’s law, with 4 cores. The horizontal axis counts the number of processors m. The vertical axis indicates the overall speedup s. The plots show the s(fm) bounds for different f. These values are indicated in the right margin. The horizontal dots give the limit speedup when \(m \rightarrow +\infty \), these f values are shown inside the plot, on top of the respective lines and close to the left margin

Fig. 15
figure 15

Bounds due to Amdahl’s law with 32 cores. See caption of Fig. 14

Hence the first implication of Amdahl’s law is that to obtain significant speedups one needs to focus on programs for which f is large, meaning that most operations can be executed in parallel. Ideally all of them. When f is large, for example when \(f=99\,\%\), there is the tendency to forget about the remaining \(1\,\%\), and expect unrealistic speedups. Notice that for 32 cores \(s(0.99,32) = 24.4\), i.e., when the number of cores is high even the speedups of extremely parallel programs fall significantly bellow the number of cores. This is graphically illustrated in Figs. 14 and 15.

Beyond limiting the maximum speedup, Amdahl’s law also implies that convergence is slow. For example, if \(f=95\,\%\), the limit speedup is 20 times. One may expect that with 32 cores the overall speedup was close to the limit. This is not the case, as \(s(0.95, 32)= 12.5\), which means that we are actually only a bit past half way through. The following expression gives an intuitive explanation of this phenomenon.

$$\begin{aligned} s(f,m_1) \times s(f \times s(f,m_1)/m_1, m_2) = s(f,m_1 \times m_2) \end{aligned}$$
(8)

The left expression multiplies two speed-ups factors, with \(m_1\) and \(m_2\) cores respectively. The right expression shows that to obtain the combined speed-up we need to multiply the number of processors, i.e., \(m_1 \times m_2\). The subtlety in the expression is that for the second speed-up we need to update the parallelizable fraction, by using a \(s(f,m_1)/m_1\) factor. Let us evaluate this expression for \(f = 0.95\) and \(m_1 = 4\) and \(m_2 = 8\). We have that \(s(0.95, 4) \approx 3.5\), and \(f \times s(f,m_1)/m_1 \approx 0.83\), i.e., the fraction that is left of optimize is \(83\%\). Now we have that \(s(0.83, 8) \approx 3.6\) and therefore obtain \(s(0.95, 32) = 3.5 \times 3.6\). Hence when the sequential fraction increases further speed-ups are smaller, and we still need to multiply the processors.

In our experimental results we report also the \((1-f)\) values instead of reporting only the speedups. Using Eq. (1) we can rewrite the strictly sequential fraction of parallel programs as \((1-f)s\), where s(fm) was simplified to s. However f is not directly observable, whereas m is known and s can easily be computed from observation. Therefore the values we report are calculated as follows:

$$\begin{aligned} ((m/s)-1)/(m-1) \end{aligned}$$
(9)

This value is known as the Karp–Flatt metric (Karp and Flatt 1990) and provide a, cleaner, measure of quality. Values close to \(0\,\%\) are good. Meaning that the parallel program was well designed, and moreover that by using more processors significantly better results can be achieved.Footnote 2 Larger values reflect worse performance, although it is not possible to distinguish whether this bad performance is due to the particular implementation, or due to the sequential nature of the problem, either way it compromises further speedups. The respective speedup can be computed by Amdahl’s law, i.e., using Eq. (7). Another advantage of these values is that the limit speed-up is also easy to compute, since it is \(1/(1-f)\). This limit is important since no matter how many processors we have, there is no way to obtain a speed-up bigger than the limit.

Note that parallel programs incur in non-negligible synchronization costs. We avoid modeling these costs explicitly, we assume they are accounted by the sequential fraction. In fact this relates to the third reason why we use the \(1-f\) values. Amdahl’s law is, in a way, optimistic. It is not realistic to assume that a fraction f is infinitely parallelizable. When that is the case our estimates of \(1-f\) are constant for any number of m cores. Usually programs are composed of several fractions \(f_1\), \(f_2\), \(f_3\), ..., such that \(f_1\) is strictly sequential, \(f_2\) can be optimized with 2 processors, \(f_3\) can be optimized with 3 processors and so on. When we use two processors the fraction f that appears to be parallelizable will be larger then when we use three processors. Therefore our estimates of f usually decrease with m. Moreover it means that Eq. (8) is in fact optimistic and in reality we obtain an \(\ge \) inequality.

5.3 Gustafson’s law

Gustafson’s law gives a more optimistic argument for parallel computation (Gustafson 1988). We have thus far assumed that f is a fixed value, meaning that we did not worry about how f varies with the size of the problem, i.e., f(n). In Gustafson’s law we assume that m and n both increase in agreement, i.e., that when we have more cores we can solve larger problems. In a simple model a software is executed with m processors, only if it requires \(r + m \times p\) time, where p is the time of the parallelizable tasks and r the time of the strictly sequential tasks. In this case the resulting speed-up is \((r + m \times p)/(r + p) = m - \alpha (m-1)\), where \(\alpha = r/(r+p)\). Notice that \(\alpha \) is not \(1-f = r /(r + m \times p)\). Therefore Gustafson’s law shows that it is possible to choose speedups that are linear on the number of processors, depending on a factor \(\alpha \).

Amdahl’s law and Gustafson’s law seem to be on opposing sides of the argument in favor of parallel computation. However they are not mutually exclusive, in fact together they provide a way to dimension parallel problems. Either by providing an optimal number of processors for a given problem or by asserting the optimal size of a problem to run on m processors. To obtain these values we can solve the equation \(s(f, m) = m - \alpha (m-1)\). The general solution is \(m = (f\alpha )/((1-f)(1-\alpha ))\).

Fig. 16
figure 16

Gustafson and Amdahl’s law, with 32 cores. The horizontal axis counts the number of processors m. The vertical axis indicates the overall speedup s. The f and \(\alpha \) values are shown close to the right margin. Gustafson law corresponds to the straight lines. The interception of s(0.95, m) with \(\alpha =55\%\) is highlighted by a point

The resulting points are shown in Fig. 16, as the interception of the straight lines and the curves for s(fm), for example for s(0.95, m) and \(\alpha =55\,\%\) the result is around 23 cores. Meaning that if we use 23 processors or less we can still say that performance improves linearly, according to Gustafson’s law.

6 Conclusions and further work

In this paper we extended the functionality of the QHV algorithm, essentially showing the importance of the divide and conquer methodology of this algorithm. This methodology can be used to speed-up the original algorithm by using parallel computation, thus making the algorithm even faster and solving larger problems in reasonable time. Moreover our implementation is designed to consume a specified amount of cores. We also showed that QHV can be used to solve related problems, namely exclusive hypervolumes, in an efficient and non-trivial way.

In this way we further highlight the advantages of the QHV approach, in processing sets of points in high dimensional spaces. We expect future, independent work to use the QHV approach to solve related problems, such as empirical attainment functions (Fonseca et al. 2011), extending the exclusive hypervolume to sets of more than one point, etc. Moreover it may be possible to further speed-up QHV by using GPUs.