Direct Global Optimization Algorithm

Jones, Donald R.

doi:10.1007/978-0-387-74759-0_128

Donald R. Jones³

1439 Accesses
35 Citations

Article Outline

Keywords

Gauss on least-squares and maximum-likelihood estimation

Article Open access 02 April 2022

The Method of Least-Squares

A Proximal Point Algorithm Revisited and Extended

Article 17 May 2019

Keywords

For a black-box global optimization algorithm to be truly global, some effort must be allocated to global search, that is, search done primarily to ensure that potentially good parts of the space are not overlooked. On the other hand, to be efficient, some effort must also be placed on local search near the current best solution. Most algorithms either move progressively from global to local search (e. g., simulated annealing) or combine a fundamentally global method with a fundamentally local method (e. g., multistart, tunneling). DIRECT introduces a new approach: in each iteration several search points are computed using all possible weights on local versus global search (how this is done will be made clear shortly). This approach eliminates the need for ‘tuning parameters’ that set the balance between local and global search, resulting in an algorithm that is robust and easy-to-use.

DIRECT is especially valuable for engineering optimization problems. In these problems, the objective and constraint functions are often computed using time-consuming computer simulations, so there is a need to be efficient in the use of function evaluations. The problems may contain both continuous and integer variables, and the functions may be nonlinear, nonsmooth, and multimodal. While many algorithms address these problem features individually, DIRECT is one of the few that addresses them collectively. However, the versatility of DIRECT comes at a cost: the algorithm suffers from a curse of dimensionality that limits it to low-dimensional problems (say, no more than 20 variables).

The general problem solved by DIRECT can be written as follows:

$$ \begin{cases} \displaystyle \min & f(x_{1} ,\ldots, x_{n}) \\ \text{s.t.} & g_{1}(x_{1} ,\ldots, x_{n})\le 0, \\ & \qquad\vdots \\ & g_{m}(x_{1} ,\ldots, x_{n})\le 0, \\ & \ell_{i} \le x_{i} \le u_{i}, \\ & x_{i} \in I \ \text{integer}. \end{cases} $$

To prove convergence, we must assume that the objective and constraint functions are continuous in the neighborhood of the optimum, but the functions can otherwise be nonlinear, nondifferentiable, nonconvex, and multimodal. While DIRECT does not explicitly handle equality constraints, problems with equalities can often be rewritten as problems with inequality constraints (either by replacing the equality with an inequality that becomes binding in the solution, or by using the equalities to eliminate variables). The set I in the above problem is the set of variables that are restricted to integer values. DIRECT works best when the integer variables describe an ordered quantity, such as the number of teeth on a gear. It is less effective when the integer variables are categorical.

In what follows, we begin by describing how DIRECT works when there are no inequality and integer constraints. This basic version corresponds, with minor differences, to the originally published algorithm [2]. After describing the basic version, we then introduce extensions to handle inequality and integer constraints (this article is the first publication to document these extensions). We conclude with a step-by-step description of the algorithm.

The bounds on the variables limit the search to an n-dimensional hyper-rectangle. DIRECT proceeds by partitioning this rectangle into smaller rectangles, each of which has a ‘sampled point’ at its center, that is, a point where the functions have been evaluated. An example of such a partition for n = 2 is shown in Fig. 1.

We have drawn the rectangle as a square because later, whenever we measure distances or lengths, we will weight each dimension so that the original range (u _i − ℓ_i) has a weighted distance of one. Drawing the hyper-rectangle as a hyper-cube allows us to visualize relative lengths as they will be used in the algorithm.

Figure 2 shows the first three iterations of DIRECT on a hypothetical two-variable problem. At the start of each iteration, the space is partitioned into rectangles. DIRECT then selects one or more of these rectangles for further search using a technique described later. Finally, each selected rectangle is trisected along one of its long sides, after which the center points of the outer thirds are sampled. In this way, we sample two new points in the rectangle and maintain the property that every sampled point is at the center of a rectangle (this property would not be preserved if the rectangle were bisected).

At the beginning of iteration 1, there is only one rectangle (the entire space). The process of selecting rectangles is therefore trivial, and this rectangle is trisected as shown. At the start of iteration 2, the selection process is no longer trivial because there are three rectangles. In the example, we select just one rectangle, which is then trisected and sampled. At the start of iteration 3, there are 5 rectangles; in this example, two of them are selected and trisected.

The key step in the algorithm is the selection of rectangles, since this determines how search effort is allocated across the space. The trisection process and other details are less important, and we will defer discussion of them until later.

To motivate how DIRECT selects of rectangles, let us begin by considering the extremes of pure global search and pure local search. A pure global search strategy would select one of the biggest rectangles in each iteration. If this were done, all the rectangles would become small at about the same rate. In fact, if we always trisected one of the biggest rectangles, then after 3^kn function evaluations every rectangle would be a cube with side length 3^−k, and the sampled points would form a uniform grid. By looking everywhere, this pure global strategy avoids overlooking good parts of the space.

A pure local strategy, on the other hand, would sample the rectangle whose center point has the best objective function value. This strategy is likely to find good solutions quickly, but it could overlook the rectangle that contains the global optimum (this would happen if the rectangle containing the global optimum had a poor objective function value at the center).

To select just one ‘best’ rectangle, we would have to introduce a tuning parameter that controlled the local/global balance. Unfortunately, the algorithm would then be extremely sensitive to this parameter, since the proper setting would depend on the (unknown) difficulty of the problem at hand.

DIRECT avoids tuning parameters by rejecting the idea of selecting just one rectangle. Instead, several rectangles are selected using all possible relative weightings of local versus global search. The idea of using all possible weightings may seem impractical, but with the help of a simple diagram this idea can actually be made quite intuitive. For this diagram, we will need a way to measure of the size of a rectangle. We will measure size using the distance between the center point and the vertices, as shown in Fig. 3.

With this measure of rectangle size, we can now turn our attention to Fig. 4 which shows how rectangles are selected. In the figure, each rectangle in the partition is represented by a dot. The horizontal coordinate of a dot is the size of the rectangle, measured by the center-vertex distance. The vertical coordinate is the function value at the midpoint of the rectangle. The dot labeled A represents the rectangle with the lowest function value, and so this would be the rectangle selected by a pure local strategy. Similarly, the dot labeled B represents one of the biggest rectangles, and so it would be selected by a pure global strategy. DIRECT selects not only these two extremes but also all the rectangles on the lower-right convex hull of the cloud of dots (the dots connected by the line). These rectangles represent ‘efficient trade-offs’ between local versus global search, in the sense that each of them is best for some relative weighting of midpoint function value and center-vertex distance. (We will explain the other lines in Fig. 4. shortly.)

One might think that the idea illustrated in Fig. 4 would extend naturally to the constrained case; that is, we would simply select any rectangle that was best for some weighting of objective value, center-vertex distance, and constraint values. Unfortunately, this does not work because it leads to excessive sampling in the infeasible region. However, as we explain next, there is an alternative way of thinking about the lower-right convex hull that does extend to the constrained case.

For the sake of the exposition, let us suppose for the moment that we know the optimal function value f ^∗. For the function to reach f ^∗ within rectangle r, it would have to undergo a rate of change of at least (f _r − f ^∗)/d _r, where f _r is the function value at the midpoint of rectangle r and d _r is the center-vertex distance. This follows because the function value at the center is f _r and the maximum distance over which the function can fall to f ^∗ is the center-vertex distance d _r. Intuitively, it seems ‘more reasonable’ to assume that the function will undergo a gradual change than to assume it will make a steep descent to f ^∗. Therefore, if only we knew the value f ^∗, a reasonable criterion for selecting a rectangle would be to choose the one that minimizes (f _r − f ^∗)/d _r.

Figure 4 shows a graphical way to find the rectangle that minimizes (f _r − f ^∗)/d _r. Along the vertical axis we show the current best function value, f _min, as well as the supposed global minimum f ^∗. Now suppose we anchor a line at the point (0, f ^∗) and slowly swing it upwards. When we first encounter a dot, the slope of the line will be precisely the ratio (f _r − f ^∗)/d _r, where r is the index of the rectangle corresponding to the encountered dot. Moreover, since this is the first dot touched by the line, rectangle r must be the rectangle that minimizes (f _r − f ^∗)/d _r.

Of course, in general we will not know the value of f ^∗. But we do know that, whatever f ^∗ is, it satisfies f ^∗ ≤ f _min. So imagine that we repeat the line-sweep exercise in Fig. 4 for all values of f ^∗ ranging from f _min to −∞. How many rectangles could be selected? Well, with a little thought, it should be clear that the set of dots that can be selected via these line sweeps is precisely the lower-right convex hull of the dots.

This alternative approach to deriving the lower-right convex hull suggests a small but important modification to the selection rule. In particular, to prevent DIRECT from wasting function evaluations in pursuit of very small improvements, we will insist that the value of f ^∗ satisfy f ^∗ ≤ f _min − ϵ. That is, we are only interested in selecting rectangles where it is reasonable that we can find a ‘significantly better’ solution. A natural value of ϵ would be the desired accuracy of the solution. In our implementation, we have set ϵ = max(10⁻⁴|f _min|, 10⁻⁸).

As shown in Fig. 5, the implication of this modification is that some of the smaller rectangles on the lower-right convex hull may be skipped. In fact, the smallest rectangle that will be selected is the one chosen when f ^∗ = f _min−ϵ.

The version of DIRECT described so far corresponds closely to the originally published version [2]. The only difference is that, in the original version, a selected rectangle was trisected not just on a single long side, but rather on all long sides. This approach eliminated the need to arbitrarily select a single long side when there were more than one and, as a result, it added an element of robustness to the algorithm. Experience has since shown, however, that the robustness benefit is small and that trisecting on a single long side (as here) accelerates convergence in higher dimensions.

Let us now consider how the rectangle selection procedure can be extended to handle inequality constraints. The key to handling constraints in DIRECT is to work with an auxiliary function that combines information on the objective and constraint functions in a special manner. To express this auxiliary function, we will need some additional notation. Let g _rj denote the value of constraint j at the midpoint of rectangle r. In addition, let c ₁, …, c _m be positive weighting coefficients for the inequality constraints (we will discuss how these coefficients are computed later). Finally, for the sake of the exposition, let us again suppose that we know the optimal function value f ^∗. The auxiliary function, evaluated at the center of rectangle r, is then as follows:

$$ \max(f_{r} - f^{\ast}, 0) + \sum_{j = 1}^{m} c_{j} \max(g_{rj}, 0) $$

The first term of the auxiliary function exacts a penalty for any deviation of the function value f _r above the global minimum value f ^∗. Note that, in a constrained problem, it is possible for f _r to be less than f ^∗ by violating the constraints; due to the maximum operator, the auxiliary function gives no credit for values of f _r below f ^∗. The second term in the auxiliary function is a sum of weighted constraint violations. Clearly, the lowest possible value of the auxiliary function is zero and occurs only at the global minimum. At any other point, the auxiliary function is positive either due to suboptimality or infeasibility.

This auxiliary function is not a penalty function in the standard sense. A standard penalty function would be a weighted sum of the objective function and constraint violations; it would not include the value f ^∗ since this value is generally unknown. Moreover, in the standard approach, it is critical that the penalty coefficients be sufficiently large to prevent the penalty function from being minimized in the infeasible region. This is not true for our auxiliary function: as long as f ^∗ is the optimal function value, the auxiliary function is minimized at the global optimum for any positive constraint coefficients.

For the global minimum to occur in rectangle r, the auxiliary function must fall to zero starting from its (positive) value at the center point. Moreover, the maximum distance over which this change can occur is the center-vertex distance d _r. Thus, to reach the global minimum in rectangle r, the auxiliary function must undergo a minimum rate of change, denoted h _r(f ^∗), given by

$$ h_{r}(f^{\ast}) = \frac{\max(f_{r} - f^{\ast}, 0) + \sum_{j = 1}^{m} c_{j} \max(g_{rj}, 0)}{d_{r}}. $$

Since it is more reasonable to expect gradual changes than abrupt ones, a reasonable way to select a rectangle would be to select rectangle that minimizes the rate of change h _r(f ^∗). Of course, this is impractical because we generally will not know the value f ^∗. Nevertheless, it is possible to select the set of rectangles that minimize h _r(f ^∗) for some f ^∗ ≤ f _min − ϵ. This is how we select rectangles with constraints—assuming a feasible point has been found so that f _min is well-defined (we will show how this is implemented shortly). If no feasible point has been found, we simply select the rectangle that minimizes

$$ \frac{\sum_{j = 1}^{m} c_{j} \max(g_{rj}, 0)}{d_{r}}. $$

That is, we select the rectangle where the weighted constraint violations can be brought to zero with the least rate of change.

To implement this selection rule, it is again helpful to draw a diagram. This new selection diagram is based on plotting the rate-of-change function h _r(f ^∗) as a function of f ^∗. Figure 6 illustrates this function. For values of f ^∗ ≥ f _r, the first term in the numerator of h _r(f ^∗) is zero, and so h _r(f ^∗) is constant. As f ^∗ falls below f _r, however, the h _r(f ^∗) increases, because we now exact a penalty for f _r being above the supposed global minimum f ^∗. The slope of h _r(f ^∗) function to the left of f _r is − 1/d _r.

Figure 7 superimposes, in one diagram, the rate-of-change functions for a hypothetical set of seven rectangles. For a particular value of f ^∗, we can visually find the rectangle that minimizes h _r(f ^∗) by starting at the point (f ^∗, 0) along the horizontal axis and moving vertically until we first encounter a curve. What we want, however, is the set of all rectangles that can be selected in this way using any f ^∗ ≤ f _min − ϵ. This set can be found as follows (see Fig. 7). We start with f ^∗ = f _min − ϵ and move upwards until we first encounter a curve for some rectangle. We note this rectangle and follow its curve to the left until it intersects the curve for another rectangle (these intersections are circled in Figure 7). When this happens, we note this other rectangle and follow its curve to the left. We continue in this way until we find a curve that is never intersected by another one. This procedure will identify all the h _r(f ^∗) functions that participate in the lower envelope of the curves to the left of f _min− ϵ. The set of rectangles found in this way is the set selected by DIRECT.

Along the horizontal axis in Fig. 7, we identify ranges of f ^∗ values for which different rectangles have the lowest value of h _r(f ^∗). As we scan from f _min − ϵ to the left, the rectangles that participate in the lower envelope are 1, 2, 5, 2, and 7. This example illustrates that it is possible to encounter a curve more than once (here rectangle 2), and care must be taken not to double count such rectangles. It is also possible for some curves to coincide along the lower envelope, and so be ‘tied’ for the least rate of change (this does not happen in Fig. 7). In such cases, we select all the tied rectangles.

Tracing the lower envelope in Fig. 7 is not computationally intense. To see this, note that each selected rectangle corresponds to a curve on the lower envelope, and for each such curve the work we must do is to find the intersection with the next curve along the lower envelope. Finding this next intersection requires computing the intersection of the current curve with all the other curves. It follows that the work required for each selected rectangle (and hence for every two sampled points) is only on the order of the total number of rectangles in the partition.

The tracing of the lower envelope can also be accelerated by some pre-processing. In particular, it is possible to quickly identify rectangles whose curves lie completely above other curves. For example, in Fig. 7, curve 3 lies above curve 1, and curve 4 lies above curve 2. These curves cannot possibly participate in the lower envelope, and so they can be deleted from consideration before the lower envelope is traced.

It remains to explain how the constraint coefficients c ₁, …, c _m are computed, as well as a few other details about trisection and the handling of integer variables. We will cover these details in turn, and then bring everything together into a step-by-step description of the algorithm.

To understand how we compute the constraint coefficient c _j, suppose for the moment that we knew the average rate of change of the objective function, denoted a ₀, and the average rate of change of constraint j, denoted a _j. Furthermore, suppose that at the center of a rectangle we have g _j > 0. At the average rate of change of constraint j, we would have to move a distance equal to g _j/a _j to get rid of the constraint violation. If during this motion the objective function got worse at its average rate of change, it would get worse by a ₀ times the distance, or a ₀(g _j/a _j) = (a ₀/a _j) g _j. Thus we see that the ratio a ₀/a _j provides a way of converting units of constraint violation into potential increases in the objective function. For this reason, we will set c _j = a ₀/a _j.

The average rates of change are estimated in a very straightforward manner. We maintain a variable s ₀ for the sum of observed rates of change of the objective function. Similarly we maintain variables s ₁, …, s _m for the sum of observed rates of change for each of the m constraints. All of these variables are initialized to zero at the start of the algorithm and updated each time a rectangle is trisected. Let x ^mid denote the midpoint of the parent rectangle and let x ^left and x ^right denote the midpoints of the left and right child rectangles after trisection. The variables are updated as follows:

$$ \begin{aligned} &s_{0} = s_{0} + \sum_{\text{child} = \text{left}}^{\text{right}} \frac{\left| f(x^{\text{child}}) - f(x^{\text{mid}})\right|} {\left\| x^{\text{child}} - x^{\text{mid}}\right\|} \\ & s_{j} = s_{j} + \sum_{\text{child} = \text{left}}^{\text{right}} \frac{\left| g_{j}(x^{\text{child}}) - g_{j}(x^{\text{mid}})\right|}{\left\| x^{\text{child}} - x^{\text{mid}}\right\|}. \end{aligned} $$

Now the average rates of change are a ₀ = s ₀/N and a _j = s _j/N, where N is the number of rates of change accumulated into the sums. It follows that

$$ \frac{a_{0}}{a_{j}} = \frac{\frac{s_{0}}{N}}{\frac{s_{j}}{N}} = \frac{s_{0}}{s_{j}}. $$

We may therefore compute c _j using

$$ c_{j} = \frac{s_{0}}{\max(s_{j}, 10^{ - 30})}, $$

where we use the maximum operator in the denominator to prevent division by zero.

So far we have said that we will always trisect a rectangle along one of its long sides. However, as shown in Fig. 2, several sides may be tied for longest, and so we need some way to break these ties. Our tie breaking mechanism is as follows. We maintain counters t _i (i = 1, …, n) for how many times we have split along dimension i over the course of the entire search. These counters are initialized to zero at the beginning of the algorithm, and counter t _i is incremented each time a rectangle is trisected along dimension i. If we select a rectangle that has several sides tied for being longest, we break the tie in favor of the side with the lowest t _i value. If several long sides are also tied for the lowest t _i value, we break the tie arbitrarily in favor of the lowest-indexed dimension. This tie breaking strategy has the effect of equalizing the number of times we split on the different dimensions.

Let us now turn to the calculation of the center-vertex distance. Recall that we measure distance using a weighted metric that assigns a length of one to the initial range of each variable (u _i − ℓ_i). Each time a rectangle is split, the length of that side is then reduced by a factor of 1/3. Now consider a rectangle that has been trisected T times. Let j = mod(T, n), so that we may write T = kn + j where k = (T − j)/n. After the first kn trisections, all of the n sides will have been trisected k times and will therefore have length 3^{− k}. The remaining j trisections will make j of the sides have length 3^{− (k+ 1)}, leaving n − j sides with length 3^{− k}. Simple algebra then shows that the distance d from the center to the vertices is given by

$$ d = \frac{3^{ - k}}{2} \left(\frac{j}{9} + n - j \right)^{0.5}. $$

(This is not obvious, but can be easily verified.)

The handling of integer variables is amazingly simple, involving only minor changes to the trisection routine and to the way the midpoint of a rectangle is defined. For example, consider an integer variable with range [1, 8]. We could not define the midpoint to be 4.5 because this is not an integer. Instead, we will use the following procedure. Suppose the range of a rectangle along an integer dimension is [a, b], with both a and b being integers. We will define the ‘midpoint’ as ⌊(a + b)/2⌋, that is, it is the floor of algebraic average (the floor of z, denoted ⌊z⌋, is the greatest integer less than or equal to z).

To trisect along the integer dimension, we first compute Δ = ⌊(b − a + 1)/3⌋. If Δ ≥ 1, then after trisection the left child will have the range [a, a + Δ − 1], the center child will have the range [a + Δ, b − Δ], and the right child will have range [b − Δ + 1, b]. If Δ = 0, then the integer side must have a range of two (i. e., b = a + 1). In this case, the center child will have the range [a, a] the right child will have the range [b, b], and there will be no left child. This procedure maintains the property that the midpoint of the parent rectangle always becomes the midpoint of the center child. As an example, Fig. 8 shows how a rectangle would be trisected when there are two integer dimensions. In the figure, the circles represent possible integer combinations, and the filled circles represent the midpoints.

Integer variables introduce three other complications. The first, which may be seen in Fig. 8, is that the sampled point may not be in the true geometric center of the rectangle. As a result, the center-vertex distance will not be unique but will vary from vertex to vertex. We ignore this detail and simply use the formula given above for the continuous case, which only depends upon the number of times a rectangle has been trisected.

The second complication concerns how we define a ‘long’ side. In the continuous case, the length of a side is directly related to the number of times it has been trisected along that dimension. Specifically, if a rectangle has been split k times along some side, then the side length will be 3^{− k} (recall that we measure distance relative to the original range of each variable). In the continuous case, therefore, the set of long sides is the same as the set of sides that have been split upon the least. When there are integers, however, the side lengths will no longer be multiples of 1/3. To keep things simple, however, we ignore this and continue to define a ‘long’ side as one that has been split upon the least. However, if an integer side has been split so many times that its side length is zero (i. e., the range contains a single integer), then this side will not be considered long.

The third and final complication is that, if all the variables are integer, then it is possible for a rectangle to be reduced to a single point. If this happens, the rectangle would be fathomed; hence, it should be ignored in the rectangle selection process in all subsequent iterations.

DIRECT stops when it reaches a user-defined limit on function evaluations. It would be preferable, of course, to stop when we have achieved some desired accuracy in the solution. However, for black-box problems where we only assume continuity, better stopping rules are hard to develop.

As for convergence, it is easy to show that, as f ^∗ moves to −∞, DIRECT will select one of the largest rectangles. Because we always select one of the largest rectangles, and because we always subdivide on a long side, every rectangle will eventually become very small and the sampled points will be dense in the space. Since we also assume the functions are continuous in the neighborhood of the optimum, this insures that we will get within any positive tolerance of the optimum after a sufficiently large number of iterations.

Although we have now described all the elements of DIRECT, our discussion has covered several pages, and so it will be helpful to bring everything together in a step-by-step description of the algorithm.

1)
Initialization.

Sample the center point of the entire space. If the center is feasible, set x _min equal to the center point and f _min equal to the objective function value at this point. Set s _j = 0 for j = 0, …, m; t _i = 0 for i = 1, …, n; and neval = 1 (function evaluation counter). Set maxeval equal to the limit on the number of function evaluations (stopping criterion).
2)
Select rectangles.

Compute the c _j values using the current values of s ₀ and s _j, j = 1, …, m. If a feasible point has not been found, select the rectangle that minimizes the rate of change required to bring the weighted constraint violations to zero. On the other hand, if a feasible point has been found, identify the set of rectangles that participate in the lower envelope of the h _r(f ^∗) functions for some f ^∗ ≤ f _min − ϵ. A good value for ϵ is ϵ = max(10⁻⁴ |f _min|, 10⁻⁸). Let S be the set of selected rectangles.
3)
Choose any rectangle r ∊ S.
4)
Trisect and sample rectangle r.

Choose a splitting dimension by identifying the set of long sides of rectangle r and then choosing the long side with the smallest t _i value. If more than one side is tied for the lowest t _i value, choose the one with the lowest-dimensional index. Let i be the resulting splitting dimension. Note that a ‘long side’ is defined as a side that has been split upon the least and, if integer, has a positive range. Trisect rectangle r along dimension i and increment t _i by one. Sample the midpoint of the left third, increment neval by one, and update x _min and f _min. If neval = maxeval, go to Step 7. Otherwise, sample the midpoint of the right third, increment neval by one, and update x _min and f _min (note that there might not be a right child when trisecting on an integer variable). Update the s _j j = 0, …, m. If all n variables are integer, check whether a child rectangle has been reduced to a single point and, if so, delete it from further consideration. Go to Step 5.
5)
Update S.

Set S = S − {r}. If S is not empty, go to Step 3. Otherwise go to Step 6.
6)
Iterate.

Report the results of this iteration, and then go to Step 2.
7)
Terminate.

The search is complete. Report x _min and f _min and stop.

The results of DIRECT are slightly sensitive to the order in which the selected rectangles are trisected and sampled because this order affects the t _i values and, hence, the choice of splitting dimensions for other selected rectangles. In our current implementation, we select the rectangles in Step 3 in the same order that they are found as we scan the lower envelope in Fig. 7 from f ^∗ = f _min − ϵ towards f ^∗ = − ∞.

On the first iteration, all the s _j will be zero in Step 2 and, hence, all the c _j will be zero when computed using c _j = s ₀/max(s _j, 10⁻³⁰). Thus, in the beginning the constants c _j will not be very meaningful. This is not important, however, because on the first iteration there is only one rectangle eligible for selection (the entire space), and so the selection process is trivial. As the iterations proceed, the s _j will be based on more observations, leading to more meaningful c _j constants and better rectangle selections.

When there are no inequality constraints, the above step-by-step procedure reduces to the basic version of DIRECT described earlier. To see this, note that, when there are no constraints, every point is feasible and so f _r ≥ f _min ≥ f ^∗ for all rectangles r. This fact, combined with the lack of any constraint violations, means that the h _r(f ^∗) function given earlier reduces to (f _r− f ^∗)/d _r, which is precisely the rate-of-change function we minimized in the unconstrained version. Thus, in the unconstrained case, tracing the lower envelope in Fig. 7 identifies the same rectangles as tracing the lower-right convex hull in Fig. 5.

We will illustrate DIRECT on the following two-dimensional test function:

$$ \begin{cases} \displaystyle \min & f(x_{1}, x_{2}) \\ \text{s.t.} & g(x_{1}, x_{2}) \le 0 \\ & - 1 \le x_{1}, x_{2} \le + 1, \end{cases} $$

where

$$ \begin{aligned} &f(x_{1}, x_{2}) \\ & = \left(4 - 2.1 x_{1}^{2} + \frac{x_{1}^{4}}{3} \right) x_{1}^{2} + x_{1} x_{2} + \left( - 4 + 4 x_{2}^{2}\right) x_{2}^{2}, \\ & g(x_{1}, x_{2}) = - \sin(4\pi x_{1}) + 2\sin^{2}(2\pi x_{2}). \end{aligned} $$

We call this problem the Gomez #3 problem since it was listed as the third test problem in an article by S. Gomez and A. Levy [1]. The global minimum of the Gomez #3 problem occurs at the point (0.109, − 0.623) where the function value is − 0.9711. The problem is difficult because the feasible region consists of many disconnected, approximately circular parts, giving rise to many local optima (see Fig. 9).

For this test function, DIRECT gets within 1% of the optimum after 89 function evaluations and within 0.01% after 513 function evaluations. The first 89 sampled points are shown in Fig. 10. For comparison, the tunneling algorithm of Gomez and Levy [1] converged using an average of 1053 objective function evaluations and 1873 constraint evaluations (averaged over 20 random starting points). One reason DIRECT converges quickly is that it searches both globally and locally during each iteration; as a result, as soon as the global part of the algorithm finds the basin of convergence of the optimum, the local part of the algorithm automatically exploits it.

In this example, DIRECT quickly gets close to the optimum but takes longer to achieve a high degree of accuracy. This suggests that the best performance would be obtained by combining DIRECT with a good local optimizer. The simplest way to do this is to run DIRECT for a predetermined number of function evaluations and then use the resulting solution as a starting point for a local optimization. While straightforward, this approach is highly sensitive to the number of function evaluations used in the global phase with DIRECT. If one uses too few function evaluations, DIRECT might not discover the basin of convergence of the global minimum.

To ensure that the global optimum is eventually found, we must somehow return to the global phase after we have performed a local search. One way of doing this is as follows. We start the local optimizer the very first time a feasible point is found (or perhaps after a minimum initial phase of 50–100 evaluations). After the local finishes, we return to DIRECT. However, DIRECT does not proceed the same as it would have without the local optimizer. Instead, the search will be more global, because the local optimizer will have reduced the value of f _min (which affects rectangle selection). DIRECT will now be looking for a point that improves upon the local solution—in effect, it will be looking for the basin of convergence of a better minimum. If DIRECT finds such an improving point, then we run a local search from this point and again return to DIRECT. This process continues until we reach a pre-determined limit on the total number of function evaluations (for both DIRECT and the local optimizer). Used in this way, DIRECT becomes an intelligent routine for selecting starting points for the local optimizer.

While DIRECT works well on the Gomez #3 problem and on test functions reported in [2], the algorithm is not without its disadvantages. For example, DIRECT's use of a space-partitioning approach requires the user to have relatively tight lower and upper bounds on all the variables. DIRECT will perform miserably if one specifies wide bounds such as [−10³⁰, +10³⁰]. The space-partitioning approach also limits the algorithm to low-dimensional problems (say, less than 20). While integer variables are handled, they must be ordered, such as the number of gear teeth, since only then can we expect the function value at a rectangle's midpoint to be indicative of what the function is like in the rest of the rectangle. Another limitation is that equality constraints are not handled. Finally, the stopping criterion—a limit on function evaluations—is weak.

The advantages of DIRECT, however, are considerable. The algorithm can handle nonsmooth, nonlinear, multimodal, and even discontinuous functions (as long as the discontinuity is not close to the global optimum). The algorithm works well in the presence of noise, since a small amount of noise usually has little impact on the set of selected rectangles until late in the search. Computational overhead is low, and the algorithm can exploit parallel processing because it generates several new points per iteration. Based on the comparisons in [2], the algorithm also appears to be efficient in terms of the number of function evaluations required to get close to the global minimum. But the most important advantage of DIRECT stems from its unique approach to balancing local and global search—the simple idea of not sampling just one point per iteration, but rather sampling several points using all possible weightings of local versus global search. This approach leads to an algorithm with no tuning parameters, making the algorithm easy-to-use and robust.

References

Gomez S, Levy A (1982) The tunneling method for solving the constrained global optimization problem with several non-connected feasible regions. In: Lecture Notes Math, vol 909. Springer, Berlin, 34–47
Google Scholar
Jones DR, Perttunen CD, Stuckman BE (1993) Lipschitzian optimization without the Lipschitz constant. J Optim Th Appl 73(1):157–181
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

General Motors Corp., Warren, USA
Donald R. Jones

Authors

Donald R. Jones
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Chemical Engineering, Princeton University, Princeton, NJ, 08544-5263, USA
Christodoulos A. Floudas
Center for Applied Optimization, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL, 32611-6595, USA
Panos M. Pardalos

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Jones, D.R. (2008). Direct Global Optimization Algorithm . In: Floudas, C., Pardalos, P. (eds) Encyclopedia of Optimization. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-74759-0_128

Download citation

DOI: https://doi.org/10.1007/978-0-387-74759-0_128
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-74758-3
Online ISBN: 978-0-387-74759-0
eBook Packages: Mathematics and StatisticsReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Direct Global Optimization Algorithm

Article Outline

Similar content being viewed by others