Keywords

Most engineering problems, including planning, control and design, have more than one solution. The theory of optimization provides a mathematical basis for establishing the acceptability conditions that outline the class of acceptable solutions, for the definition of the criterion that provides the measure of goodness of every individual solution, and the optimization procedure (algorithm) that results in finding the optimal solution, i.e. the solution maximizing the value of the goodness criterion. These three components, the class of acceptable solutions, the criterion of goodness, and the optimization procedure are to be present in any definition of the optimization problem.

The solution vector of the optimization problem is a set of particular numerical values of some optimization variables, X = [x 1, x 2, …, x n]T that represent the nature of the problem. For example, in a resource distribution problem when some material, monetary, or energy resources have to be distributed between n consumers, vector X represents the amounts of these resources designated to each consumer.

The class of acceptable solutions is typically defined as a set of conditions, equations and/or inequalities that the solution vector must satisfy. Thus in the resource distribution problem these conditions include the requirements that the amounts of the resource designated to individual consumers cannot be negative, i.e.

$$ {\boldsymbol{x}}_{\mathrm{i}}\ge 0,\ \mathrm{i}=1,\;2,\dots,\;\mathrm{n} $$

that the sum of the amounts of this resource designated to consumers shall not exceed the total amount available (i = 1,2,.. is the consumer index), i.e.

$$ {\displaystyle \sum_{\mathrm{i}=1}^{\mathrm{n}}{x}_{\mathrm{i}}}\ \le {P}^{\mathrm{TOT}} $$

that the amounts of the resources provided to some of the consumers are not negotiable (k is the consumer index), i.e.

$$ {\boldsymbol{x}}_{\mathrm{k}}={P}^{\mathrm{K}},\ \mathrm{k}={\mathrm{k}}_1,\ {\mathrm{k}}_2, \dots $$

or shall have allowable minimal and maximal values, i.e.

$$ {{\boldsymbol{P}}_{\mathrm{MIN}}}^{\mathrm{K}}\le {\boldsymbol{x}}_{\mathrm{k}}\le {{\boldsymbol{P}}_{\mathrm{MAX}}}^{\mathrm{K}},\ \mathrm{k}={\mathrm{k}}_1,\ {\mathrm{k}}_2, \dots $$

It is said that the conditions outlining the class of acceptable solutions reflect the feasibility requirements and the specifics of the problem, and form a special region in the solution space X.

The optimization criterion is always a scalar function defined in the solution space

$$ \boldsymbol{Q}\left(\boldsymbol{X}\right)=\boldsymbol{Q}\left({\boldsymbol{x}}_1,\;{\boldsymbol{x}}_2, \dots,\;{\boldsymbol{x}}_{\mathrm{n}}\right) $$

that represents the degree of the consistence of any solution vector X to the general goal of the engineering task. For example, the resource distribution problem may reflect the goal of maximizing the resource utilization, and intuitively its solution would provide maximum allowable amounts of the resource to the consumers having the highest coefficients of its utilization, α i, i = 1,2,…,n. It could be seen that in this case the criterion could be defined as

$$ Q(X)={\displaystyle \sum_{\mathrm{i}=1}^{\mathrm{n}}{\alpha}_{\mathrm{i}}{x}_{\mathrm{i}}} $$

In the situation when the goal of the resource distribution problem is to minimize the total cost of transporting the resource to the consumers, and intuitively the most remote consumers are expected to receive the least amounts of the resource within the allowable limits, the criterion could be visualized as

$$ Q(X)={\displaystyle \sum_{i=1}^n{\beta}_i{x}_i} $$

where β i, i = 1,2,…, n are the transportation costs per unit of the resource for particular consumers. It is common to refer to the function Q(X) as criterion, or objective function , or a loss function, that highlights various aspects of the nature of the optimization problem.

Finally, the optimization procedure must result in the rule that would facilitate the detection of such a point, X OPT, in the region of acceptable solutions in the space X where criterion Q(X) has its minimum (maximum) value

$$ {\boldsymbol{Q}}^{\mathrm{OPT}}=\boldsymbol{Q}\left({\boldsymbol{X}}^{\mathrm{OPT}}\right) $$

One should realize that the search for the maximum of criterion Q 1(X) is equivalent to the search of the minimum of the criterion Q 2(X) = −Q 1(X) and vice versa, therefore we will always refer to the task of optimization as the task of minimization.

Recall the approach to minimization presented as a part of undergraduate calculus. It suggests that if a minimum of some scalar function

$$ \boldsymbol{Q}\left(\boldsymbol{X}\right)=\boldsymbol{Q}\left({\boldsymbol{x}}_1,\;{\boldsymbol{x}}_2, \dots,\;{\boldsymbol{x}}_{\mathrm{n}}\right), $$

i.e. X* = [x 1*, x 2*, …, x n*], could be found as the solution of the system of n equations,

$$ \begin{array}{l}\frac{\partial }{\partial {x}_1}Q\left({x}_1,{x}_2,\dots, {x}_n\right)={f}_1\left({x}_1,{x}_2,\dots, {x}_n\right)=0\\ {}\frac{\partial }{\partial {x}_2}Q\left({x}_1,{x}_2,\dots, {x}_n\right)={f}_2\left({x}_1,{x}_2,\dots, {x}_n\right)=0\\ {}................................................................\\ {}\frac{\partial }{\partial {x}_n}Q\left({x}_1,{x}_2,\dots, {x}_n\right)={f}_n\left({x}_1,{x}_2,\dots, {x}_n\right)=0\end{array} $$

While this suggestion is consistent with the rigors of a community college, from the engineering point of view it is quite unrealistic because of the following reasons,

  1. 1.

    Function Q(x 1, x 2, …, x n) is typically so complex that its derivatives

    $$ \frac{\partial }{\partial {x}_i}Q\left({x}_1,{x}_2,\dots, {x}_n\right),\ i=1,2,\dots,\ n $$

    are very difficult and sometimes impossible to define analytically

  2. 2.

    Derivatives \( \frac{\partial }{\partial {x}_i}Q\left({x}_1,{x}_2,\dots, {x}_n\right),\ i=1,2,\dots,\ n \) are nonlinear functions of x 1, x 2, …, x n, and the system of equations shown above may have multiple solutions, may have no solution, and in any case, cannot be solved analytically.

  3. 3.

    The entire definition of the function minimization task does not address the existence of constraints.

  4. 4.

    The function to be minimized may not have any analytical definition, but for any combination of numerical values of its arguments its value could be defined numerically, for example by conducting an experiment.

In most real life situations, the optimization task could be performed only numerically and be compared with the navigation through a very complex terrain to the highest (in the maximization case) existing peak while avoiding the obstacles and low peaks. The task is aggravated by the fact that the terrain is multidimensional and the obstacles could be detected only by direct contact. Figure 4.1 below drawn by my cartoonist friend Joseph Kogan depicts the task of optimization based on my comments.

Unsurprisingly, the optimization became an engineering tool only due to the proliferation of modern computers. We will present several common models, methods and applications of optimization that should be included in the toolbox of a modern engineer. These techniques will include linear programming, numerical techniques of nonlinear programming (gradient, random and direct search), genetic optimization, and dynamic programming. We do not expect a modern engineer to develop optimization techniques, this is the mathematicians’ domain, however a good engineer shall be able to:

Fig. 4.1
figure 1

Myth vs. reality of optimization

  • recognize a situation lending itself to an optimization task

  • formulate the optimization problem, i.e. define its variables, criterion and constraints

  • recognize the resultant problem as one of the typical optimization problems

  • find and apply a suitable optimization tool (perhaps available in MATLAB)

4.1 Linear Programming

Linear programming is an optimization technique suitable for the situations when the set of conditions, outlining the region of acceptable solutions, and the goodness criterion are linear functions defined in the solution space.

In a linear programming problem, the region of acceptable solutions is defined by the set of equalities and inequalities as follows:

$$ {\displaystyle \sum_{i=1}^n{a}_{ij}{x}_i={b}_j}\;\mathrm{and}\;{\displaystyle \sum_{i=1}^n{a}_{iK}{x}_i\le {b}_K} $$

where x i, i = 1,2,.., n are optimization variables that constitute the solution space, j = 1,2,…, L is the equality index, and k = 1,2,…, M is the inequality index. Note that the number of equalities must be less than the dimension of the solution space otherwise the region of the acceptable solutions will include only one point (when n = L ), or could be empty (when L > n). One should understand that inequalities can always be redefined as the standard “greater or equal” type, indeed inequality of “less or equal” type, i.e. \( {\displaystyle \sum_{i=1}^n{a}_{iK}{x}_i\le {b}_K} \) could be easily converted into the “greater or equal” type by changing signs: \( -{\displaystyle \sum_{i=1}^n{a}_{iK}{x}_i\ge -{b}_K} \), consequently only the “greater or equal” type inequalities will be considered. Note that the class of acceptable solutions could be empty even when n > L: the inequalities and equalities could be mutually contradictive.

The criterion of a linear optimization problem is defined by a linear function ,

$$ Q\left({x}_1,{x}_2,\dots, {x}_n\right)={\displaystyle \sum_{i=1}^n{c}_i{x}_i} $$

that has to be minimized,

$$ Q\left({x}_1,{x}_2,\dots, {x}_n\right)={\displaystyle \sum_{i=1}^n{c}_i{x}_i}\to \min $$

or

$$ -Q\left({x}_1,{x}_2,\dots, {x}_n\right)=-{\displaystyle \sum_{i=1}^n{c}_i{x}_i}\to \min $$

if the original criterion Q(X) had to be maximized.

Example 4.1

Consider one of the typical problems of linear programming, the task distribution problem . There are 5 reactors operating at a chemical plant and producing the same product. Due to capacity, design specifics and the technical status, the reactors have different efficiency expressed by the extraction coefficients, α j, j = 1,2,3,4,5. The capacities of these reactors, q j, j = 1,2,3,4,5, are also different

reactor, j

1

2

3

4

5

coefficient α j

0.81

0.76

0.63

0.71

0.68

capacity q j (units)

150

200

175

120

96

The chemical plant is required to process a certain amount of raw material, say P = 500 units that should be rationally distributed between the reactors in the sense that the overall extraction coefficient will be maximized. It could be seen that the solution space of this problem comprises of 5 variables, x 1x 5, representing the amount of raw material loaded in respective reactors. The constraints of this problem must address the following requirements:

  • Amount of raw material loaded in the j-th reactor must be non-negative: x j ≥ 0, j = 1,2,…,5

  • The total amount of raw material to be loaded in reactors is defined: \( {\displaystyle \sum_{j=1}^5{x}_j}=P \)

  • The amount of raw material loaded in a particular reactor cannot exceed the capacity of this reactor: x j ≤ q j, j = 1,2,…,5

  • The criterion of this problem could be defined as \( {\displaystyle \sum_{j=1}^5{\alpha}_j{x}_j}\to \max \) or \( -{\displaystyle \sum_{j=1}^5{\alpha}_j{x}_j}\to \min \)

The mathematical formulation of this problem is

$$ -0.81{\boldsymbol{x}}_1-0.76{\boldsymbol{x}}_2-0.63{\boldsymbol{x}}_3-0.71{\boldsymbol{x}}_4-0.68{\boldsymbol{x}}_5\to \min $$

subject to conditions

$$ {\boldsymbol{x}}_1\ge 0,{\boldsymbol{x}}_2\ge 0,{\boldsymbol{x}}_3\ge 0,{\boldsymbol{x}}_4\ge 0,{\boldsymbol{x}}_5\ge 0 $$
$$ -{\boldsymbol{x}}_1\ge -150, - {\boldsymbol{x}}_2\ge -200, - {\boldsymbol{x}}_3\ge -175, - {\boldsymbol{x}}_4\ge -120, - {\boldsymbol{x}}_5\ge -96 $$
$$ {\boldsymbol{x}}_1+{\boldsymbol{x}}_2+{\boldsymbol{x}}_3+{\boldsymbol{x}}_4+{\boldsymbol{x}}_5=500 $$

One can realize that this problem has an infinite number of alternative solutions providing that the total amount of raw material P is less than the total capacity of the reactors, thus creating the opportunity for the optimization. In the case when the total amount of raw material P is equal to the total capacity of the reactors, only one solution exists and the optimization is impossible. Finally, in the case when the total amount of raw material P is greater than the total capacity of the reactors, the problem does not have any solution.

It could be also realized that the optimal solution procedure for this problem is quite trivial:

Step 1.:

The first reactor, having the highest efficiency coefficient, should be loaded to full capacity (x 1 OPT = 150, 350 units still is to be distributed), then

Step 2.:

The second most efficient reactor must be loaded to full capacity (x 2 OPT = 200, 150 units is to be distributed), then

Step 3.:

The third most efficient reactor must be loaded to full capacity (x 4 OPT = 120, 30 units is to be distributed), then

Step 4.:

The fourth most efficient reactor must be loaded with the remaining amount of raw material (x 5 OPT = 30 units, zero units to be distributed), then x 3 OPT = 0.

It should be noted that most linear programming problems do not allow for such a simple solution procedure.

Example 4.2

The transportation problem . A product stored at 3 warehouses must be distributed between 5 consumers in such a fashion that the total cost of transporting the product is minimized.

The solution space of this problem is formed by 3 × 5 = 15 variables, x jk, j = 1,2,3, k = 1,2,3,4,5, representing the amount of the product delivered from the j-th warehouse to the k-th consumer. Introduce the matrix of transportation costs, c jk, j = 1,2,3, k = 1,2,3,4,5, representing the cost of transportation of one unit of the product from the j-th warehouse to the k-th consumer. Introduce quantities P j, j = 1,2,3, representing the amount of the product at j-th warehouse, and quantities W k, k = 1,2,3,4,5, representing the amount of the product requested by k-th consumer. Then the mathematical formulation of the problem is

$$ {\displaystyle \sum_{\mathrm{k}=1}^5{\displaystyle \sum_{\mathrm{j}=1}^3{c}_{jk}{x}_{jk}}}\to \min $$

subject to the following conditions

  1. a)

    non-negativity, x jk ≥ 0, j = 1,2,3, k = 1,2,3,4,5

  2. b)

    amount of the product available at each warehouse, \( {\displaystyle \sum_{k=1}^5{x}_{jk}\le {P}_j} \), j = 1,2,3

  3. c)

    amount of product delivered to each consumer, \( {\displaystyle \sum_{\mathrm{j}=1}^3{x}_{jk}={W}_k} \), k = 1,2,3,4,5

One can realize that the solution of this problem exists if \( {\displaystyle \sum_{k=1}^5{W}_k}\le {\displaystyle \sum_{j=1}^3{P}_j} \), however it cannot be obtained without a computationally intensive and rigorously justified algorithm. It also should be noted that typical solutions of linear programming problems comprise non-negative variables and therefore the non-negativity of the solution is assured not by special constraints but by the solution procedure itself.

Example 4.3

The mixing problem . Preparing the right raw material is one of the conditions for obtaining a high quality end product in chemical or metallurgical manufacturing. Assume that the raw material is characterized by percentages of four ingredients: A 1%, A 2%, A 3%, and A 4%. The raw material is prepared by mixing six components in the amounts (in tons) x 1, x 2, …, x 6. Each component contains all four ingredients, but the concentrations are all different, for example a jk (%) is the concentration of the ingredient #j ( j = 1,2,3,4) in the component #k (k = 1,2,3,4,5,6). The cost of each component is given: c k ($/ton), (k = 1,2,3,4,5,6). Also given are the required total amount of the raw material, P (tons) and the available amounts of the individual components, q k (tons), (k = 1,2,3,4,5,6). It is required to prepare the least expensive mixture.

The problem definition is as follows:

Minimize the cost of the mixture:

$$ {\displaystyle \sum_{k=1}^6{c}_k{x}_k\ \to \min } $$

Subject to constraints on

  • the total amount of the raw material \( {\displaystyle \sum_{k=1}^6{x}_k = P} \)

  • percentages of four ingredients (j = 1,2,3,4) \( {\displaystyle \sum_{k=1}^6{a}_{jk}{x}_k = {A}_j\cdot P} \)

  • available amounts of individual components, (k = 1,2,3,4,5,6) \( {x}_k\le {q}_k \)

Again, the optimal problem solution, if it exists, could be obtained via some numerically extensive procedure.

Let us consider such a procedure.

4.1.1 Geometrical Interpretation of Linear Programming

Geometrical interpretation of linear programming is crucial for the understanding of the computational nature of its algorithm. Geometrical interpretation works best for the two-dimensional solution space and the inequality-type constraints.

Consider a straight line in two-dimensional space defined by the equation a 1x1 + a 2x2 = b like the one below in Fig. 4.2.

Fig. 4.2
figure 2

How linear constraints work

It is known that any point on this line, for example [x1 1, x2 1] satisfies this equation, i.e. a 1x1 1 + a 2x2 1 = b. It also known that any point above this line, such as [x1 2, x2 2], results in a 1x1 2 + a 1x2 2 > b, and any point below this line, [x1 3, x2 3], results in a 1x1 3 + a 2x2 3 < b. Consequently, any condition a 1x1 + a 2x 2  ≤ b (or −a 1x1a 2x2 ≥ −b) outlining the class of acceptable solutions indicates that the acceptable solutions must be located on or below the appropriate straight line. At the same time, any condition a 1x1 + a 2x 2  ≥ b (or −a 1x1a 2x 2  ≤ −b) indicates that acceptable solutions must be located on or above the appropriate straight line. One can visualize a domain of acceptable solutions defined by inequality-type conditions as the part of the plane that simultaneously complies with all inequality-type conditions (highlighted below in Fig. 4.3):

Fig. 4.3
figure 3

Combination of linear constraints and domain of acceptable solutions

Now consider a straight line c 1 x 1 + c 2 x 2 = 0 and two points, [x 1 A, x 2 A] and [x 1 *, x 2 *], located in the two-dimensional space. Note that the distance between the straight line and point [x 1 A, x 2 A] is greater than the distance between this line and point [x 1 *, x 2 *]. This results in the following result that could be easily verified by a numerical example: c 1 x 1 A + c 2 x 2 A > c 1 x 1 * + c 2 x 2 *.

Consider the combination of the domain of acceptable solutions bounded by contour ABCDEF and the straight line c 1 x 1 + c 2 x 2 = 0 representing the criterion of a minimization problem seen below in Fig. 4.4. Note that the domain of acceptable solutions bounded by contour ABCDEF, generally speaking, forms a convex polyhedron in the n-dimensional space, and its individual vertices (corner points), i.e. A, B, C, …, are known as basic acceptable solutions of the linear programming problem.

Fig. 4.4
figure 4

Solution point and criterion value

It could be concluded that the solution of the problem [x 1 OPT, x 2 OPT] minimizing the criterion Q(x 1,x 2) = c 1 x 1 + c 2 x 2 is located in the point that belongs to the domain of acceptable solutions and has the shortest distance from the straight line c 1 x 1 + c 2 x 2 = 0. It could be seen that in the above Fig. 4.5 this point is F. Should the solution maximizing the criterion Q(x 1,x 2) = c 1 x 1 + c 2 x 2 be sought, it will be found in the point D that belongs to the domain of acceptable solutions and has the largest distance from the straight line c 1 x 1 + c 2 x 2 = 0.

Fig. 4.5
figure 5

Graphical interpretation of a linear programming problem

Now consider the specifics of the linear programming problem preventing us from obtaining its optimal solution. The first condition is caused by the situation where at least two constraints are mutually exclusive, in this case even acceptable solutions do not exist. In the second case, the domain of acceptable solutions is not empty but unbounded, thus the solution minimizing the criterion does not exist. Both cases are shown in Fig. 4.6. Finally, Fig. 4.7 represents the situation where no unique optimal solution minimizing the criterion exists: the straight line representing the criterion is parallel to the side AB of the domain of acceptable solutions.

Fig. 4.6
figure 6

Situations when the solution does not exist

Fig. 4.7
figure 7

No unique minimum exists

So far our discussion addressed only the inequality-type constraints. Imagine that a linear programming problem contains k equality-type constraints, m inequality-type constraints and has n solution variables where n > k. Assume that the problem is formulated as follows:

  • minimize \( {\displaystyle \sum_{i=1}^n{c}_i{x}_i} \)

  • subject to constraints \( {\displaystyle \sum_{i=1}^n{p_i}_j{x}_i = {q}_j,\ j=1,2,3,\dots, k} \)

  • and \( {\displaystyle \sum_{i=1}^n{a_i}_j{x}_i\ \le {b}_j,\ j=1,2,3,\dots, m} \)

Note that condition n > k creates the situation when k variables could be assigned arbitrary values and removed from the list of solution variables. Since our goal is the minimization of the criterion \( {\displaystyle \sum_{i=1}^n{c}_i{x}_i} \) we shall assign zero values preferably to those variables that have largest values of the corresponding coefficients c i . This is done by sequential application of a special computational operation known in linear algebra as pivoting. Indeed, after k pivoting steps the problem will be reduced to the following definition:

  • minimize \( {\displaystyle \sum_{i=1}^{n-k}{\overline{c}}_i{x}_i} \)

  • subject to constraints \( {\displaystyle \sum_{i=1}^{n-k}{{\overline{a}}_i}_j{x}_i\ \le {\overline{b}}_j,\ j=1,2,3,\dots, m} \)

  • where \( {{\overline{a}}_i}_j,\ {\overline{b}}_j, {\overline{c}}_j,\ i=1,2,\dots, n-k,\ j=1,2,3,\dots m\kern0.5em \mathrm{are}\ \mathrm{problem}\ \mathrm{parameters}\ \mathrm{modified}\ \mathrm{b}\mathrm{y}\ \mathrm{pivoting}\ \mathrm{steps}. \)

In summary, a linear programming procedure intended for solution of a minimization problem with n variables, k equality-type and m inequality-type constraints (n>k), could be formulated as follows:

Step 1.:

Reduction of the dimension of the solution space by the elimination of k strategically chosen variables and setting their values in the optimal solution to zero

Step 2.:

Finding basic acceptable solutions of the problem by solving possible combinations of nk out of m equations \( {\displaystyle \sum_{i=1}^{n-k}{{\overline{a}}_i}_j{x}_i = {\overline{b}}_j,\ j=1,2,3,\dots, m} \)

Step 3.:

Finding the optimal solution of the problem as the basic acceptable solution that ≤

Note that there are many highly efficient software tools that could be recommended for the solution of a linear programming problem. (For example see http://www.onlinecalculatorfree.org/linear-programming-solver.html).

Example 4.4

Solving a simple linear programming problem given below:

$$ \mathrm{Minimize}\;\boldsymbol{Q}\left(\boldsymbol{X}\right) = 3{\boldsymbol{x}}_1+10{\boldsymbol{x}}_2+5{\boldsymbol{x}}_3+2{\boldsymbol{x}}_4 $$

subject to conditions

$$ {\boldsymbol{x}}_1+{\boldsymbol{x}}_2+{\boldsymbol{x}}_3+{\boldsymbol{x}}_4\le 125 $$
$$ {\boldsymbol{x}}_2-8{\boldsymbol{x}}_3+{\boldsymbol{x}}_4\le\ 12 $$
$$ -{\boldsymbol{x}}_1+2{\boldsymbol{x}}_2-3{\boldsymbol{x}}_3+{\boldsymbol{x}}_4\le 24 $$
$$ {\boldsymbol{x}}_1+{\boldsymbol{x}}_2=36 $$
$$ 2{\boldsymbol{x}}_1-5{\boldsymbol{x}}_2+8{\boldsymbol{x}}_3+4{\boldsymbol{x}}_4 = 16 $$

The optimal solution (as per tool http://www.onlinecalculatorfree.org/linear-programming-solver.html):

$$ {\boldsymbol{Q}}^{\mathrm{OPT}} = 164;\;{\boldsymbol{x}}_1 = 28,\;{\boldsymbol{x}}_2 = 8,\;{\boldsymbol{x}}_3 = 0,\;{\boldsymbol{x}}_4 = 0 $$

Example 4.5

A resource distribution problem . A product available from three suppliers is to be provided to four consumers. The amounts of the product requested by individual consumers are respectively: 150, 230, 80 and 290 (units). The amounts of the product available at each supplier are: 300, 270 and 275 units. The transportation costs of the product from each supplier to each consumer in $ per unit are listed in the table below:

 

Consumer #1

Consumer #2

Consumer #3

Consumer #4

Supplier #1

25

16

33

48

Supplier #2

45

15

36

11

Supplier #3

21

31

40

52

It is required to minimize the overall transportation cost while satisfying the consumers’ demands and not to exceed suppliers’ capabilities. The following problem definition is self-explanatory and at the same time is fully consistent with the data format of the tool offered at

http://www.onlinecalculatorfree.org/linear-programming-solver.html

$$ \mathrm{Maximize}\ \mathrm{p}=-25{x}_{11}-16{x}_{12}-33{x}_{13}-48{x}_{14}-45{x}_{21}-15{x}_{22}-36{x}_{23}-11{x}_{24}-21{x}_{31}-31{x}_{32}-40{x}_{33}-52{x}_{34} $$

subject to

$$ {\mathrm{x}}_{11} + {\mathrm{x}}_{12} + {\mathrm{x}}_{13} + {\mathrm{x}}_{14}< = 300 $$
$$ {\mathrm{x}}_{21} + {\mathrm{x}}_{22} + {\mathrm{x}}_{23} + {\mathrm{x}}_{24}< = 270 $$
$$ {\mathrm{x}}_{31} + {\mathrm{x}}_{32} + {\mathrm{x}}_{33} + {\mathrm{x}}_{34}< = 275 $$
$$ {\mathrm{x}}_{11} + {\mathrm{x}}_{21} + {\mathrm{x}}_{31} + {\mathrm{x}}_{41} = 150 $$
$$ {\mathrm{x}}_{12} + {\mathrm{x}}_{22} + {\mathrm{x}}_{32} + {\mathrm{x}}_{42} = 230 $$
$$ {\mathrm{x}}_{13} + {\mathrm{x}}_{23} + {\mathrm{x}}_{33} + {\mathrm{x}}_{43} = 80 $$
$$ {\mathrm{x}}_{14} + {\mathrm{x}}_{24} + {\mathrm{x}}_{34} + {\mathrm{x}}_{44} = 290 $$

The Optimal Solution: p = −13,550; x11 = 0, x12 = 230, x13 = 70, x14 = 0, x21 = 0, x22 = 0, x23 = 0, x24 = 270, x31 = 150, x32 = 0, x33 = 10, x34 = 20 and could be summarized as

 

Consumer #1

Consumer #2

Consumer #3

Consumer #4

Supplier total

Supplier #1

0

230

70

0

300

Supplier #2

0

0

0

270

270

Supplier #3

150

0

10

20

170

Consumer total

150

230

80

290

Total transportation cost: $13,550

4.2 Nonlinear Programming : Gradient

Gradient of a function of several variables, Q(x 1, x 2, …, x n ), is defined as a vector comprising partial derivatives of this function with respect to individual variables, i.e.

$$ \nabla Q(X)=\nabla Q\left({x}_1,{x}_2,\dots, {x}_n\right)={\left[\begin{array}{cccc}\hfill \frac{\partial Q}{\partial {x}_1}\hfill & \hfill \frac{\partial Q}{\partial {x}_2}\hfill & \hfill \dots \hfill & \hfill \frac{\partial Q}{\partial {x}_n}\hfill \end{array}\right]}^{\mathrm{T}} $$

The above expression refers to an analytical definition of the gradient, however, it could be numerically defined at a particular location of the problem space, X * = [x1*, x2*, …,xn*]T. Let us refer to numerically defined gradient as ∇Q(X*) where * is the index of the particular point where this gradient is defined. It is known that a numerically defined gradient is a good navigational tool: it is a vector always pointing in the direction of the increase of function Q in the space X.

Let us utilize this property of gradient for the minimization of function Q(X). First, select some initial point X 1 = [x 1 1, x 2 1,…, x n 1]T and numerically evaluate derivatives of function Q(X) in the vicinity of this point:

$$ \begin{array}{l}\frac{\partial Q\left({X}^1\right)}{\partial {x}_1}\approx \frac{Q\left({x_1}^1+\varDelta,\ {x_2}^1,\dots, {x_n}^1\right)-Q\left({x_1}^1,\ {x_2}^1,\dots, {x_n}^1\right)}{\varDelta}\\ {}\frac{\partial Q\left({X}^1\right)}{\partial {x}_2}\approx \frac{Q\left({x_1}^1,\ {x_2}^1+\varDelta, \dots, {x_n}^1\right)-Q\left({x_1}^1,\ {x_2}^1,\dots, {x_n}^1\right)}{\varDelta}\\ {}\kern4em .................................................\\ {}\frac{\partial Q\left({X}^1\right)}{\partial {x}_i}\approx \frac{Q\left({x_1}^1,\ {x_2}^1,\dots,\ {x_i}^1+\varDelta, \dots, {x_n}^1\right)-Q\left({x_1}^1,\ {x_2}^1,\dots, {x_n}^1\right)}{\varDelta}\\ {}\kern4.25em .................................................\\ {}\frac{\partial Q\left({X}^1\right)}{\partial {x}_n}\approx \frac{Q\left({x_1}^1,\ {x_2}^1,\dots,\ {x_n}^1+\varDelta \right)-Q\left({x_1}^1,\ {x_2}^1,\dots, {x_n}^1\right)}{\varDelta}\end{array} $$

where Δ is a small positive increment chosen on the basis of experience and intuition (V.S.: Δ = 0.0001 is a good choice). Note that this approximation of derivatives, known as a forward difference , is not unique but is good enough for most applications. Now, when the direction towards the increase of function Q(X) is known, and the direction towards the minimum is the opposite one, we can make a step from the initial point X 1 to the new point X 2 that is expected to be closer to the point of minimum: X 2 = X 1−a · ∇Q(X 1). Individual components of point X 2 will be defined as follows:

$$ \begin{array}{l}{x_1}^2={x_1}^1-a\cdot \frac{\partial Q\left({X}^1\right)}{\partial {x}_1}\\ {}{x_2}^2={x_2}^1-a\cdot \frac{\partial Q\left({X}^1\right)}{\partial {x}_2}\\ {}..................................\\ {}{x_n}^2={x_n}^1-a\cdot \frac{\partial Q\left({X}^1\right)}{\partial {x}_n}\end{array} $$

Now the procedure will repeat itself, but derivatives will be calculated in the vicinity of the new point X 2 and a transition to the point X 3 will be performed. This iterative process will lead to the vicinity of the minimum point of function Q (X) providing that some conditions be met. Parameter a in the expressions above is a positive adjustable constant responsible for the convergence rate of the minimization procedure. Its initial value is arbitrarily defined and could be changed (typically decreased) in the process according to the following rule. Assume that the transition from point X k to X k+1 is taking place: X k+1 = X k−a · ∇Q(X k). The transition is successful if Q(X k+1) < Q(X k), however in the situation when Q(X k+1) ≥ Q(X k) the value of parameter a must be reduced, for example by half, and the transition must be repeated with the value a NEW = 0.5a, i.e. X k+1 = X ka NEW · ∇Q(X k). If necessary, a value shall be repeatedly reduced until a successful transition will take place. The reduced a value shall be kept unchanged for the consequent step.

Termination conditions for the described procedure could be defined in a number of ways. First, and the simplest, is the definition of the maximum number of iterations (successful reduction steps of the function to be minimized). It is also common to stop the procedure if several (5, 10, 20) iterations did not result in a noticeable change in the optimization variables, i.e. │X k−6X k−5│ ≤ ξ and │X k−5X k−4│ ≤ ξ and … and │X k+1X k│ ≤ ξ where ξ > 0 is a small arbitrary number. A block diagram of the procedure is seen in Fig. 4.8.

Fig. 4.8
figure 8

Block diagram of gradient minimization procedure

The gradient minimization procedure is quite common due to its simplicity. It does not require analytical expressions for derivatives. Values of function Q may be defined by analytical expressions or experimentally. The drawbacks of this approach are also evident. The function must be continuous, otherwise working with derivatives presents an impossible task. This reality creates difficulties with constrained minimization. The approach implies that the function to be minimized has only one minimum point: it works only as a local minimization technique.

4.3 Nonlinear Programming: Search

Search-based optimization presents a valuable alternative to gradient optimization: it does not utilize derivatives of the function to be optimized thus expanding the range of its applications to discontinuous functions. But, how common are the discontinuous functions? It is common to introduce constraints in the optimization procedure through so-called penalty functions, and penalty functions are the typical sources of discontinuities. Therefore, search becomes very useful in many practical problems.

4.3.1 Penalty Functions

Consider the following optimization problem where criterion and constraints are represented by generally speaking, nonlinear functions Q(.) and f i(.), i = 1,2…:

$$ \mathrm{Minimize}\;\boldsymbol{Q}\left({\boldsymbol{x}}_1,\;{\boldsymbol{x}}_2,\dots {\boldsymbol{x}}_{\mathrm{n}}\right) $$

subject to conditions

$$ \begin{array}{l}{\boldsymbol{f}}_1\left({\boldsymbol{x}}_1,\;{\boldsymbol{x}}_2,\dots {\boldsymbol{x}}_{\mathrm{n}}\right)\ge {\boldsymbol{a}}_1\hfill \\ {}{\boldsymbol{f}}_2\left({\boldsymbol{x}}_1,\;{\boldsymbol{x}}_2,\dots {\boldsymbol{x}}_{\mathrm{n}}\right)\ge {\boldsymbol{a}}_2\hfill \\ {}.................................\hfill \\ {}{\boldsymbol{f}}_{\mathrm{K}}\left({\boldsymbol{x}}_1,\;{\boldsymbol{x}}_2,\dots {\boldsymbol{x}}_{\mathrm{n}}\right)\ge {\boldsymbol{a}}_{\mathrm{K}}\hfill \end{array} $$

Introduce penalty functions defined as

$$ {P}_{\mathrm{i}}\left({x}_1,{x}_2,\dots, {x}_{\mathrm{n}}\right)=\left\{\begin{array}{c}\hfill {C}_{\mathrm{i}}\cdot {\left[{f}_{\mathrm{i}}\left({x}_1,{x}_2,\dots, {x}_{\mathrm{n}}\right)-{a}_{\mathrm{i}}\right]}^2,\kern0.5em \mathrm{if}\ {f}_{\mathrm{i}}\left({x}_1,{x}_2,\dots, {x}_{\mathrm{n}}\right)\ge {a}_{\mathrm{i}}\ \hfill \\ {}\hfill 0,\kern2em \mathrm{if}\ {f}_{\mathrm{i}}\left({x}_1,{x}_2,\dots, {x}_{\mathrm{n}}\right)<{a}_{\mathrm{i}}\ \hfill \end{array}\right. $$

or \( {P}_{\mathrm{i}}\left({x}_1,{x}_2,\dots, {x}_{\mathrm{n}}\right)=\left\{\begin{array}{c}\hfill {C}_{\mathrm{i}}\cdot \left|{f}_{\mathrm{i}}\left({x}_1,{x}_2,\dots, {x}_{\mathrm{n}}\right)-{a}_{\mathrm{i}}\right|,\kern0.5em \mathrm{if}\ {f}_{\mathrm{i}}\left({x}_1,{x}_2,\dots, {x}_{\mathrm{n}}\right)\ge {a}_{\mathrm{i}}\ \hfill \\ {}\hfill 0,\kern2em \mathrm{if}\ {f}_{\mathrm{i}}\left({x}_1,{x}_2,\dots, {x}_{\mathrm{n}}\right)<{a}_{\mathrm{i}}\ \hfill \end{array}\right. \)

or \( {P}_{\mathrm{i}}\left({x}_1,{x}_2,\dots, {x}_{\mathrm{n}}\right)=\left\{\begin{array}{c}\hfill {C}_{\mathrm{i}},\kern0.5em \mathrm{if}\ {f}_{\mathrm{i}}\left({x}_1,{x}_2,\dots, {x}_{\mathrm{n}}\right)\ge {a}_{\mathrm{i}}\ \hfill \\ {}\hfill 0,\kern0.5em \mathrm{if}\ {f}_{\mathrm{i}}\left({x}_1,{x}_2,\dots, {x}_{\mathrm{n}}\right)<{a}_{\mathrm{i}}\ \hfill \end{array}\right. \)

where C i ≫ 1 are arbitrary weights reflecting the importance of particular constraints, i = 1,2,…K. Then the original constrained optimization problem can be represented by the following unconstrained optimization problem

Minimize \( L\left({x}_1,{x}_2,\dots, {x}_{\mathrm{n}}\right)=Q\left({x}_1,{x}_2,\dots, {x}_{\mathrm{n}}\right)+{\displaystyle \sum_{\mathrm{i}=1}^{\mathrm{K}}{P}_{\mathrm{i}}}\left({x}_1,{x}_2,\dots, {x}_{\mathrm{n}}\right) \)

Function L(.) is commonly referred to as the “loss function”. It could be seen that due to the definition of penalty functions P i(.) it is a discontinuous function. It also could be seen that due to large values of weights C i virtually any minimization algorithm would first “drive” penalty values to zero, and then, when constraints are satisfied, minimize the original function Q(.).

Consider the following example illustrating the introduction of penalty functions .

Example 4.6

unconstrained optimization problem

Minimize \( Q\left({x}_1,{x}_2,{x}_3\right)=5{\left({x}_1+6\right)}^2+2{\left({x}_1\cdot {x}_2-6{x}_3\right)}^2-10{x}_2{\left({x}_3-2\right)}^3 \)

subject to conditions:

$$ \begin{array}{l}{x}_1+{x}_2+6{x}_3=10\\ {}0\le {x}_1\le 25\\ {}-10\le {x}_2+{x}_3\le 10\\ {}{x}_1-4{x}_3\le 100\end{array} $$

Define penalty functions representing the imposed constraints:

$$ \begin{array}{l}{P}_1={10}^{15}\cdot {\left[{x}_1+{x}_2+6{x}_3-10\right]}^2\\ {}{P}_2=\left\{\begin{array}{c}\hfill {10}^{10}\cdot {x_1}^2,\kern0.5em \mathrm{if}\kern0.5em {x}_1<0\ \hfill \\ {}\hfill 0,\kern0.5em \mathrm{if}\ {x}_1\ge\ 0\ \hfill \end{array}\right.\\ {}{P}_3=\left\{\begin{array}{c}\hfill {10}^{10}\cdot {\left({x}_1-25\right)}^2,\kern0.5em \mathrm{if}\kern0.5em {x}_1>25\ \hfill \\ {}\hfill 0,\kern0.5em \mathrm{if}\ {x}_1\le\ 0\ \hfill \end{array}\right.\\ {}{P}_4=\left\{\begin{array}{c}\hfill {10}^{10}\cdot {\left({x}_2+{x}_3-10\right)}^2,\kern0.5em \mathrm{if}\kern0.5em {\left({x}_2+{x}_3-10\right)}^2>0\ \hfill \\ {}\hfill 0,\kern0.5em \mathrm{otherwise}\ \hfill \end{array}\right.\\ {}{P}_5=\left\{\begin{array}{c}\hfill {10}^{10}\cdot {\left({x}_1-4{x}_3-100\right)}^2,\kern0.5em \mathrm{if}\kern0.5em {x}_2-4{x}_3>100\ \hfill \\ {}\hfill 0,\kern0.5em \mathrm{otherwise}\ \hfill \end{array}\right.\end{array} $$

The resultant loss function

$$ L\left({x}_1,{x}_2,{x}_3\right)=5{\left({x}_1+6\right)}^2+2{\left({x}_1\cdot {x}_2-6{x}_3\right)}^2-10{x}_2{\left({x}_3-2\right)}^3+{\displaystyle \sum_{\mathrm{i}=1}^5{P}_{\mathrm{i}}}\left({x}_1,{x}_2,{x}_3\right) $$

could be easily defined by a computer code. Understandably, it should be minimized by a procedure that does not utilize derivatives

$$ \frac{\partial L\left({x}_1,{x}_2,{x}_3\right)}{\partial {x}_1},\ \frac{\partial L\left({x}_1,{x}_2,{x}_3\right)}{\partial {x}_2},\ \frac{\partial L\left({x}_1,{x}_2,{x}_3\right)}{\partial {x}_3} $$

It should also be noted that due to nonlinear criterion and constraints, this problem most likely does not have one minimum, and finding the global minimum presents an additional challenge. As it is commonly done when some parameters are arbitrarily chosen (in this case, weight coefficients) the user shall inspect the obtained solution and if necessary, change the weight values. It is a good practice to demonstrate that the solution does not depend on the choice of the weights.

4.3.2 Random Search

This approach could be perceived as the most straight forward “trial-and-error” technique utilizing the full power of a modern computer and perhaps a supercomputer. It facilitates finding the global solution of linear and nonlinear, constrained and unconstrained, continuous and discontinuous optimization problems. Its only drawback is the gigantic amount of computations that is prohibitive in many practical situations. The strategy of random search is illustrated by Fig. 4.9.

Fig. 4.9
figure 9

Random search

4.3.3 Simplex Method of Nelder and Mead

Direct search is a much more efficient alternative to random search. One can define direct search as a thoughtful and insightful trial-and-error approach. It still has to start from some initial conditions but its steps are based on a reasonable expectation of success. It works well with continuous and discontinuous, linear and nonlinear, constrained and unconstrained functions. Its only drawback compared to random search is the inherent inability to assure that the global minimum be found. This fault is not that crucial: direct search is typically used in realistic situations where properly chosen, the initial point guarantees that the global minimum can be found. Since direct search does not call for a gigantic number of steps, it could be used in situations when values of the objective functions are defined by computer simulations and even by physical experiments.

Although there is a good number of direct search procedures utilizing different rationale for making the “next step,” one of the most practical is the Simplex Method by Nelder-Mead (1965). The algorithm works with n + 1 vertices of a simplex (convex polytope) defined in the n-dimensional search space. It calculates (obtains) numerical values of the function to be minimized at every vertex, compares these values, and implements some rules for replacing the worst vertex (i.e. the one with the largest value of the objective function). This process could be best illustrated in two dimensional space when simplex, with its three vertices, is just a triangle .

First assume that the initial simplex with vertices A, B, C is established. It is often done by specifying some initial point, say point A, and the step size that determines the size of the resultant initial simplex, i.e. triangle ABC. Next is the evaluation of the objective function Q(x 1,x 2) at each vertex (x 1, x 2 are coordinates of points A, B, C) thus resulting in numerical values Q(A), Q(B) and Q(C). Assume that the comparison reveals that Q(A) >Q(B) >Q(C), and since our task is minimization, the “worst” point is A. Then as seen in Fig. 4.10 above, the algorithm performs a special operation, reflection, thus establishing a new point D. What happens next, depends on the value Q(D). If Q(A) >Q(D), the algorithm performs expansion as shown above, creating a new point E. The expansion could be repeated providing that still Q(A) >Q(E). In the situation when Q(D) >Q(A), the contraction is performed. It should be performed repeatedly until condition Q(A) >Q(E) is achieved. Upon the establishment of the “new” point E, the “old” point A is discarded. Now the new simplex with vertices B, C, and E is ready for performing the same computational cycle.

Fig. 4.10
figure 10

How the simplex procedure works

The termination conditions can be defined in terms of the total number of steps (optimization cycles), or in terms of the distance between vertices of the simplex.

It is good to realize that besides “purely computational” applications, the Simplex procedure can be implemented in the “(wo)man in the loop” regime for the real-time optimization of technical systems that could be represented by a simulator. Figure 4.11 below illustrates an application of the Simplex optimization to the tuning of a PID (proportional-integral-derivative) controller. The Vissim-based simulator (see http://www.vissim.com/) features a controlled process with a PID controller with manually adjustable parameters KP, KI, and KD known as proportional, integral and derivative gains.

Fig. 4.11
figure 11

Simplex application to tuning PID Controller

The optimization criterion is the commonly used ITSE (integral-time-squared-error) defined as

$$ Q\left({\mathrm{K}}_{\mathrm{P}},{\mathrm{K}}_{\mathrm{I}},{\mathrm{K}}_{\mathrm{D}}\right)={\displaystyle \underset{0}{\overset{T}{\int }}t\cdot {e}^2\cdot dt} $$

where e is the system error (the discrepancy between the actual and desired system output values), t is continuous time, and T is the simulation period. It is known from Controls that minimization of ITSE-type criteria leads to the most desirable transient process in the system.

4.3.4 Exercise 4.1

Problem 1

Solving a mixing problem . The table below contains characteristics of several materials that are to be mixed to obtain a raw material for a metallurgical process. Obtain the mixture recipe that would have the following required chemical composition and total volume at minimum cost. The mixture characteristics are as follows:

Fe  20 %, Zn  10 %, SiO2 42 %, Cu  5 %, total weight 500 tons

 

Fe %

Zn %

SiO2 %

Cu %

Cost, $/ton

Availability

Material 1

15

38

41

6

120

250 tons

Material 2

40

12

40

1

150

590 tons

Material 3

35

5

27

28

211

1000 tons

Material 4

16

11

21

18

140

520 tons

Material 5

33

1

60

5

75

2500 tons

Material 6

7

23

45

25

214

800 tons

Problem 2

Solving an LSM parameter estimation problem using a gradient procedure. Generate input and the output variables as follows (k = 1, 2,…, 500):

$$ {\mathrm{x}}_1\left(\mathrm{k}\right) = 5 + 3\cdot \mathrm{Sin}\left(17\cdot \mathrm{k}\right) + \mathrm{Sin}\left(177\cdot \mathrm{k}\right) + .3\cdot \mathrm{Sin}\left(1771\cdot \mathrm{k}\right) $$
$$ {\mathrm{x}}_2\left(\mathrm{k}\right) = 1 - 2\cdot \mathrm{Sin}\left(91\cdot \mathrm{k}\right) + \mathrm{Sin}\left(191\cdot \mathrm{k}\right) + .2\cdot \mathrm{Sin}\left(999\cdot \mathrm{k}\right) $$
$$ {\mathrm{x}}_3\left(\mathrm{k}\right) = 3 + \mathrm{Sin}\left(27\cdot \mathrm{k}\right) + .5\cdot \mathrm{Sin}\left(477\cdot \mathrm{k}\right) + .1\cdot \mathrm{Sin}\left(6771\cdot \mathrm{k}\right) $$
$$ {\mathrm{x}}_4\left(\mathrm{k}\right) = -.1\cdot {\mathrm{x}}_1\left(\mathrm{k}\right) + .3\cdot {\mathrm{x}}_2\left(\mathrm{k}\right) + 2.5\cdot \mathrm{Sin}\left(9871\cdot \mathrm{k}\right) + .7\cdot \mathrm{C}\mathrm{o}\mathrm{s}\left(6711\cdot \mathrm{k}\right) $$
$$\mathrm{y}\left(\mathrm{k}\right) = 2\cdot {\mathrm{x}}_1\left(\mathrm{k}\right) + 3\cdot {\mathrm{x}}_2\left(\mathrm{k}\right) - 2\cdot {\mathrm{x}}_3\left(\mathrm{k}\right) + 5\cdot {\mathrm{x}}_4\left(\mathrm{k}\right) + .3\cdot \mathrm{Sin}\left(1577\cdot \mathrm{k}\right) + .2\cdot \mathrm{C}\mathrm{o}\mathrm{s}\left(7671\cdot \mathrm{k}\right) $$

Obtain “unknown” coefficients of the regression equation

$$ {y}^{MOD}(k)={a}_1{x}_1(k)+{a}_2{x}_2(k)+{a}_3{x}_3(k)+{a}_4{x}_4(k) $$

using the least squares method implemented via the gradient procedure listed below (that could be rewritten in MATLAB). Assume zero initial values of the coefficients. Compute the coefficient of determination of the obtained regression equation.

Problem 3

Utilize data of Problem #2 to obtain coefficients of the regression equation \( {v}^{MOD}(k)={a}_1{x}_1(k)+{a}_2{x}_2(k)+{a}_3{x}_3(k)+{a}_4{x}_4(k) \) applying the gradient procedure. It is required, however, that all regression coefficients be positive. Show the obtained coefficients. Compute the coefficient of determination for the resultant regression equation. Explain the change in the coefficient of determination comparing with Problem #2

PROGRAM GRADIENT DIMENSION X(10),X1(10),DER(10) WRITE(*,*)' ENTER NUMBER OF VARIABLES ' READ(*,*) N WRITE(*,*)' ENTER THE gain OF THE PROCEDURE ' READ(*,*)A WRITE(*,*)' ENTER INITIAL NUMBER OF STEPS ' READ(*,*) NSTEP H = .001 DO 1 I = 1,N WRITE(*,*)' ENTER INITIAL VALUE FOR X(',I,')' 1 READ(*,*)X(I) 10 CONTINUE K = 1 CALL SYS(N,X,Q) QI = Q 100 CONTINUE DO 4 I = 1,N X(I) = X(I) + H CALL SYS(N,X,Q1) DER(I) = (Q1-Q)/H X(I) = X(I)-H 4 CONTINUE 50 CONTINUE DO 5 I = 1,N 5 X1(I) = X(I)-DER(I)*A CALL SYS(N,X1,Q1) IF(Q1.GE.Q) A = A/2 IF(Q1.GE.Q) GOTO 50 DO 30 I = 1,N 30 X(I) = X1(I) Q = Q1 IF(ABS(Q).LE.1e-5)GOTO 2 K = K + 1 IF(K.GT.NSTEP) GOTO 2 GOTO 100 2 CONTINUE WRITE(*,*)' ITERATIONS RUN: ',NSTEP WRITE(*,*)' INITIAL CRITERION AVLUE: ',QI WRITE(*,*)' CRITERION VALUE REACHED: ',Q DO 7 I = 1,N 7 WRITE(*,*)' OPTIMAL VALUE: X(',I,') = ',X(I) WRITE(*,*)' ENTER ADDITIONAL NUMBER OF STEPS ' IF(ABS(Q).LE.1e-5)CALL EXIT READ(*,*) NSTEP IF(NSTEP.EQ.0)CALL EXIT GOTO 10 END C SUBROUTINE SYS(N,X,Q) DIMENSION X(10) Q = 0. DO 1 I = 1,N Q = Q+(X(I)-5.*I)**2 1 CONTINUE Q = Q**2 RETURN END

4.4 Genetic Optimization

Genetic optimization algorithms possess the advantages of random and direct search optimization procedures. Combined with the availability of high performance computers they alleviate major obstacles in the way of solving multivariable, nonlinear constrained optimization problems. It is believed that these algorithms emulate some concepts of the natural selection process responsible for the apparent perfection of the natural world. One can argue about the concepts, but the terminology of genetic optimization is surely adopted from biological sciences.

Assume that we are in the process of finding the optimum, say the maximum, of a complex, multivariate, discontinuous, nonlinear cost function Q(X). The constraints of the problem have already been addressed by the penalty functions introduced in the cost function and contributing to its complexity.

Introduce the concepts of an individual, generation, and successful generation. An individual is an entity that is characterized by its location in the solution space, X I and the corresponding value of the function Q, i.e. Q(X I). A generation is a very large number of individuals created during the same cycle of the optimization procedure. A successful generation is a relatively small group of K individuals that have some common superior trait, for example, they all have the highest associated values Q(.) within their generation. The genetic algorithm consists of repeated cycles of creation of successful generations.

Creation of the Initial Generation

First, the feasibility range [x k MIN , x k MAX] for each solution variable x k k = 1,2,3,…,n, is to be established. Each interval [x k MIN , x k MAX] is divided into the same number of subintervals, say L, thus resulting in a grid within the solution space with numerous nodes. The next task is the evaluation of function Q at each node of the grid, i.e. the creation of individuals “residing” at every node. During this process the successful generation is selected consisting of K individuals that have the highest values of the function Q . It is done by forming a group of individuals ordered according to their Q values, i.e.

$$ \boldsymbol{Q}\left({\boldsymbol{X}}^{\mathrm{K}}\right)\ \le \boldsymbol{Q}\left({\boldsymbol{X}}^{\mathrm{K}-1}\right)\ \le \dots \boldsymbol{Q}\left({\boldsymbol{X}}^2\right)\ \le \boldsymbol{Q}\left({\boldsymbol{X}}^1\right)\kern0.75em \left(*\right) $$

Any newly generated individual X I is discarded if Q(X I) ≤ Q(X K). However if Q ( X I) > Q(X L), it is included in the group replacing the individual X L with the lowest Q value. Therefore the successful generation still includes K individuals that are being renumbered and reordered to assure (*). This process is repeated each time a new individual is generated i.e. until the entire initial generation is created and analyzed.

Creation of the Next Successful Generation

involves only members of the existing successful generation. Two techniques are utilized for this purpose, parenting and mutation. Parenting (crossover) involves two individuals, X A and X B and results in an “offspring”

$$ {\boldsymbol{X}}^{\mathrm{C}}={\left[{{\boldsymbol{x}}_1}^{\mathrm{C}},\;{{\boldsymbol{x}}_2}^{\mathrm{C}}, \dots {{\boldsymbol{x}}_{\mathrm{k}}}^{\mathrm{C}}, \dots {{\boldsymbol{x}}_{\mathrm{n}}}^{\mathrm{C}}\right]}^{\mathrm{T}} $$

defined as follows:

$$ \begin{array}{l}{{\boldsymbol{x}}_1}^{\mathrm{C}}={\lambda}_1{{\boldsymbol{x}}_1}^{\mathrm{A}}+\left(1-{\lambda}_1\right){{\boldsymbol{x}}_1}^{\mathrm{B}}\kern1em \\ {}{{\boldsymbol{x}}_2}^{\mathrm{C}}={\lambda}_2{{\boldsymbol{x}}_2}^{\mathrm{A}}+\left(1-{\lambda}_2\right){{\boldsymbol{x}}_2}^{\mathrm{B}}\kern1em \\ {}..............\kern1em \\ {}{{\boldsymbol{x}}_k}^{\mathrm{C}}={\lambda}_k{{\boldsymbol{x}}_k}^{\mathrm{A}}+\left(1-{\lambda}_k\right){{\boldsymbol{x}}_k}^{\mathrm{B}}\kern1em \\ {}..............\kern1em \\ {}{{\boldsymbol{x}}_n}^{\mathrm{C}}={\lambda}_n{{\boldsymbol{x}}_n}^{\mathrm{A}}+\left(1-{\lambda}_n\right){{\boldsymbol{x}}_n}^{\mathrm{B}}\kern1em \\ {}\end{array} $$

where 0 < λk < 1 are random numbers generated by a random number generator. Then, based on the computation of Q(X C) the newly created individual X C is accepted into the successful generation or discarded. The parenting process is repeated several number times for every combination of two members of the original successful generation.

The mutation process implies that every member of the original successful generation, X I originates a “mutant” X M = [x 1 M, x 2 M, …x k M,…x n M]T defined as follows:

$$ \begin{array}{l}{{\boldsymbol{x}}_1}^{\mathrm{M}}={\alpha}_1{{\boldsymbol{x}}_1}^{\mathrm{I}}\hfill \\ {}{{\boldsymbol{x}}_2}^{\mathrm{M}}={\alpha}_2{{\boldsymbol{x}}_2}^{\mathrm{I}}\hfill \\ {}.............\hfill \\ {}{{\boldsymbol{x}}_{\mathrm{k}}}^{\mathrm{M}}={\alpha}_{\mathrm{k}}{{\boldsymbol{x}}_{\mathrm{k}}}^{\mathrm{A}}\hfill \\ {}..............\hfill \\ {}{{\boldsymbol{x}}_{\mathrm{n}}}^{\mathrm{M}}={\alpha}_{\mathrm{n}}{{\boldsymbol{x}}_{\mathrm{n}}}^{\mathrm{A}}\hfill \end{array} $$

where αk are normally distributed random numbers generated by a random number generator. Based on the computation of Q(X M) the newly created individual X M is accepted into the successful generation or discarded. The mutation process is repeated several number times for every member of the original successful generation.

Understandably, parenting and mutation upon completion results in a new successful generation that is to be subjected to a new cycle of the procedure unless the termination conditions be satisfied. The most common termination condition refers to the variability within a successful generation, and could be expressed as:

$$ {\displaystyle \sum_{\mathrm{i}=1}^{K-1}\left|{X}^{\mathrm{i}}-{X}^{\mathrm{i}+1}\right|}\le \delta $$

where δ > 0 is some judiciously chosen small positive number.

It is good to remember that genetic optimization is capable of finding a global minimum of virtually any function Q(X). Moreover, it works even when this function does not exist as an analytical expression: in this situation for any particular X I the value of Q(X I) could be determined by running a computer simulation or by an experiment. Figure 4.12 provides a block diagram of the genetic optimization procedure.

Fig. 4.12
figure 12

Block diagram of a genetic optimization procedure

The following MATLAB code implementing a genetic optimization procedure was written by my former student Dr. Jozef Sofka

%genetic algorithm for minimization of a nonlinear function %(c) Jozef Sofka 2004 %number of crossovers in one generation cross = 50; %number of mutations in one generation mut = 30; %extent of mutation mutarg1 = .5; %size of population population = 20; %number of alleles al = 5; %trying to minimize function %abs(a^2/b + c*sin(d) + b^c + 1/(e + a)^2) clear pop pnew; %definition of "best guess" population pop(1:population,1) = 12 + 1*randn(population,1); pop(1:population,2) = 1.5 + .1*randn(population,1); pop(1:population,3) = 13 + 1*randn(population,1); pop(1:population,4) = 1.5 + .2*randn(population,1); pop(1:population,5) = (.5*randn(population,1)); % evaluation of fitness population for f = 1:population e(f) = abs(pop(f,1)^2/pop(f,2) + pop(f,3)*sin(pop(f,4)) + pop(f,2)^pop(f,3) + 1/(pop(f,5) + pop(1))^2); end [q,k] = sort(e); %number of generations for r = 1:500     parameters(r,1:al) = pop(k(1),1:al);     fitness(r) = e(k(1)); %crossover for f = 1:cross p1 = round((rand + rand)/2*(population-1)) + 1; p2 = round((rand + rand)/2*(population-1)) + 1; p3 = (2*rand-.5); pnew(f,:) = pop(k(p1),1:al) + p3*(pop(k(p2),1:al)-pop(k(p1),1:al)); %evaluation of fitness fit(f) = abs(pnew(f,1)^2/pnew(f,2) + pnew(f,3)*sin(pnew(f,4)) + pnew(f,2)^pnew(f,3) + 1/(pnew(f,5) + pnew(1))^2); end %selection for f = 1:cross if (fit(f) < e(k(population-3))) pop(k(population),:) = pnew(f,:); e(k(population)) = fit(f); [q,k] = sort(e); end end %mutation for f = 1:mut p = round(rand*(population-1)) + 1; o = round((al-1)*rand) + 1; pnew(f,:) = pop(p,:); pnew(f,o) = pnew(f,o) + mutarg1*randn(1,1); %evaluation of fitness fit(f) = abs(pnew(f,1)^2/pnew(f,2) + pnew(f,3)*sin(pnew(f,4)) + pnew(f,2)^pnew(f,3) + 1/(pnew(f,5) + pnew(1))^2); end %selection for f =1:mut if (fit(f) < e(k(population-1))) pop(k(population),:) = pnew(f,:); e(k(population)) = fit(f); [q,k] = sort(e);     end end end fprintf('Parameters a = %f; b = %f; c = %f; d = %f; e = %f\n', …, pop(k(1),1), pop(k(1),2), pop(k(1),3), pop(k(1),4), pop(k(1),5)) fprintf('minimize function abs(a^2/b + c*sin(d) + b^c + 1/(e + a)^2)\n') figure plot(parameters) figure     semilogy(fitness)

4.4.1 Exercise 4.2

Problem 1

Use Simplex Optimization procedure (to be provided) to tune parameters of a PID controller as shown in Fig. 3.3. The simulation setup could be implemented in Simulink or Vissim. The following transfer function is recommended for the controlled plant:

$$ G(s)=\frac{s+6}{s^3+6{s}^2+10s+10} $$

To show the effectiveness of the tuning procedure provide a sequence (five or so) of numerical values of the parameters of the controller, values of the criterion, and the system step responses.

Problem 2

Given input–output data representing a highly nonlinear, static process:

x 1

x 2

x 3

y

1

1

1

17.59141

1

1

2

21.59141

1

2

2

44.94528

2

2

2

81.89056

2

2

3

89.89056

2

3

3

216.8554

3

3

3

317.2831

−3

3

3

−285.2831

−3

−3

3

15.25319

−3

−3

−3

−0.496806

−3

3

−3

−301.0331

−1

3

−3

−100.1777

−1

3

5

−36.42768

−5

2

4

−152.7264

5

2

1

188.7264

Given the configuration of the mathematical model of this process:

$$ {y}^{MOD}={a}_1{x}_1{e}^{a_2{x}_2}+{a_3}^{\left({a}_4{x}_3+{a}_5\right)} $$

Utilize the Genetic Optimization (GO) program provided above and the input/output data to estimate unknown parameters of the mathematical model given above. Experiment with values of the control parameters of the GO procedure. Compute the coefficient of determination for the obtained regression model and comment on the model accuracy. Document your work.

4.5 Dynamic Programming

Many physical, managerial, and controlled processes could be considered as a sequence of relatively independent but interrelated stages. This division, natural or imaginative, could be performed in the spatial, functional, or temporal domains. The following diagram in Fig. 4.13 represents a typical multi-stage process containing four stages. Every stage or sub-process is relatively independent in the sense that it is characterized by its own (local) input x i , local output y i , local control effort u i, and the local goodness criterion q i . Both the output and the criterion of each stage (sub-process) are defined by its local input and the control effort, i.e. \( {\boldsymbol{y}}_i={\boldsymbol{y}}_i\left({\boldsymbol{x}}_i,{\boldsymbol{u}}_i\right) \) and \( {\boldsymbol{q}}_i={\boldsymbol{y}}_i\left({\boldsymbol{x}}_i,{\boldsymbol{u}}_i\right) \).

Fig. 4.13
figure 13

Multi-stage process with four stages

At the same time, individual stages (sub-processes) are interrelated. Indeed the output of every stage, except the last (n-th) stage, serves as the input of the consequent stage, i.e. for \( \boldsymbol{i}=1,2,3,\dots, \boldsymbol{n}-1 \) \( {\boldsymbol{y}}_i={\boldsymbol{x}}_{i+1}. \) This reality results in the following relationships that links the entire sequence:

$$ \begin{array}{l}{\boldsymbol{y}}_i={\boldsymbol{y}}_i\left({\boldsymbol{x}}_i,{\boldsymbol{u}}_i\right)={\boldsymbol{y}}_i\left[{\boldsymbol{y}}_{i-1}\left({\boldsymbol{x}}_{i-1},{\boldsymbol{u}}_{i-1}\right),{u}_i\right]={\boldsymbol{y}}_i\left({\boldsymbol{x}}_{i-1},{\boldsymbol{u}}_{i-1},{\boldsymbol{u}}_i\right)\\ {}\kern.73em ={\boldsymbol{y}}_i\left[{\boldsymbol{y}}_{i-2}\left({\boldsymbol{x}}_{i-2},{\boldsymbol{u}}_{i-2}\right),{\boldsymbol{u}}_{i-1},{\boldsymbol{u}}_i\right]={\boldsymbol{y}}_i\left({\boldsymbol{x}}_{i-2},{\boldsymbol{u}}_{i-2},{\boldsymbol{u}}_{i-1},{\boldsymbol{u}}_i\right)=\dots \\ {}\kern.73em ={\boldsymbol{y}}_i\left({\boldsymbol{x}}_1,{\boldsymbol{u}}_1,\ {\boldsymbol{u}}_{2,}\ {\boldsymbol{u}}_{3,}\dots, {\boldsymbol{u}}_i\right)\end{array} $$

and similarly \( {\boldsymbol{q}}_i={\boldsymbol{q}}_i\left({\boldsymbol{x}}_1,{\boldsymbol{u}}_1,\ {\boldsymbol{u}}_{2,}\ {\boldsymbol{u}}_{3,}\dots, {\boldsymbol{u}}_i\right) \), where i = 1,2,3,…, n is the sequential number of the stage.

These relationships indicate that the output and criterion value of any stage of the process, except the first stage, are defined by the input of the first stage, control effort applied at this stage and control efforts applied at all previous stages. In addition to the above relationships, the stages of the process are linked by the “overall goodness criterion” defined as the sum of all “local” criteria, \( Q={\displaystyle \sum_{k=1}^nq\left({x}_k,{u}_i\right)} \) where n is the total number of the stages. It could be seen that the overall criterion depends on the input of the first stage and all control efforts, i.e.

$$ \boldsymbol{Q}=\boldsymbol{Q}\left({\boldsymbol{x}}_1,{\boldsymbol{u}}_1,\ {\boldsymbol{u}}_{2,}\ {\boldsymbol{u}}_{3,}\dots, {\boldsymbol{u}}_{\boldsymbol{n}}\right) $$

Therefore the optimization problem of a multistage process implies the minimization (maximization) of the overall criterion Q(.) with respect to control efforts applied at individual stages, \( {\boldsymbol{u}}_{\boldsymbol{k}},\ k=1,2,\dots \boldsymbol{n} \), for any given input of the first stage, x 1, and may be subject to some constraints imposed on the outputs of the individual stages, \( {\boldsymbol{y}}_{\boldsymbol{k}},\ k=1,2,\dots \boldsymbol{n} \). One can realize that the process optimization problem cannot be solved by the independent optimization of the individual stages with respect to their “local” criteria, \( {\boldsymbol{q}}_{\boldsymbol{k}},\ k=1,2,\dots \boldsymbol{n} \). The optimal control strategy must be “wise”: “local” optimization of any sub-process may result in such an output that will completely jeopardize the operation of the consequent stages thus causing poor operation of the entire multistage process. Therefore, optimization of any stage of a multi-stage process must take into account the consequences of this optimization for all consequent stages. Selection of any “local” control effort cannot be performed without assessing its impact on the overall criterion.

Dynamic programming is an optimization technique intended for the optimization of multi-stage processes. It is based on the fundamental principle of optimality of dynamic programming formulated by Richard Bellman. A problem is said to satisfy the Principle of Optimality if the sub - solutions of an optimal solution of the problem are themselves optimal solutions for their sub - problems . Fortunately, optimization problems of multi-stage processes do satisfy the Principle of Optimality that offers a powerful solution approach in the most realistic situations. The key to the application of the Principle of Optimality is in the following statement that is stemming from this principle: any last portion of an optimal sequence of steps is optimal.

Let us illustrate this principle using the chart below in Fig. 4.14 that presents a process comprising of 12 sequential stages divided into two sections, AB and BC. It is assumed that each j-th stage of this process is characterized by its “local” criterion, q j . Assume that the overall criterion of the process is defined as the sum of local criteria: \( Q={\displaystyle \sum_{j=1}^{12}{q}_j} \)

Fig. 4.14
figure 14

Twelve stage process

Let us define the sectional criteria for each of the two sections: \( {Q}_{AB}={\displaystyle \sum_{j=1}^5{q}_j}\kern0.5em \mathrm{and} \) \( {Q}_{BC}={\displaystyle \sum_{j=6}^{12}{q}_j} \). Assume that for every stage of the process some control effort is chosen, such that the entire combination of these control efforts, \( {\boldsymbol{u}}_j^{OPT},\ j=1,2,\dots, 12 \), optimizes (minimizes) the overall process criterion Q. Then according to the principle of dynamic programming control efforts \( {\boldsymbol{u}}_j^{OPT},\ j=6,7,\dots, 12 \) optimize the last section of the sequence, namely BC , thus bringing criterion \( {Q}_{BC}={\displaystyle \sum_{j=6}^{12}{q}_j} \) to its optimal (minimal) value. At the same time, control efforts \( {\boldsymbol{u}}_j^{OPT},\ j=1,2,\dots, 5 \) are not expected to optimize section AB of the process, thus criterion \( {Q}_{AB}={\displaystyle \sum_{j=6}^{12}{q}_j} \) could be minimized by a completely different combination of control efforts, say \( {\boldsymbol{u}}_j^{ALT},\ j=1,2,\dots, 5 \).

The fundamental principle provides the framework for a highly efficient and versatile optimization procedure of dynamic programming that works on a step-by-step basis and defines optimal control efforts for individual stages of the multi-stage process. It is important that control decisions made at each step of the procedure do not optimize individual stages of the process, i.e. do not solve the “local” optimization problems. Instead, they optimize the last portion of the entire process that starts at the stage in question and end at the last stage of the process.

When doing so, every step of the optimization procedure takes into account not only the particular stage of the process but also all consequent stages. The procedure is iterative, therefore it shall start from the last section of the multistage process where there are no consequent stages to be considered. At the same time, the optimal solution of the control problem u OPT j , cannot be explicitly defined without knowing the input x j applied to the appropriate section of the process. Therefore, the dynamic programming procedure is performed in two steps: conditional optimization and unconditional optimization. Conditional optimization starts from the end of the process addressing the last stage of the process first, then the last two stages of the process, then the last three stages, and finally the entire process. Why is it called conditional?—because at the first step, the procedure defines the optimal conditional control effort (OCCE) for the last stage of the process that is dependent on the input of the last stage of the process:

$$ {\boldsymbol{u}}_N^{OPT}=F\left({\boldsymbol{x}}_N\right) $$

that minimizes the sectional criterion

$$ {\boldsymbol{Q}}_N\left({\boldsymbol{x}}_N,{\boldsymbol{u}}_N\right)={\boldsymbol{q}}_N\left({\boldsymbol{x}}_N,{\boldsymbol{u}}_N\right) $$

(Note that the sectional criterion is marked by the index of the first stage of the section). Now the output of the last stage of the process (and the output of the entire process) is

$$ {\boldsymbol{y}}_N={\boldsymbol{y}}_N\left({\boldsymbol{x}}_N,\ {\boldsymbol{u}}_N^{OPT}\right) $$

This solution must be consistent with the required (allowed) value of the output of the process, Y*, i.e.

$$ {\boldsymbol{y}}_N={\boldsymbol{y}}_N\left({\boldsymbol{x}}_N,\ {\boldsymbol{u}}_N^{OPT}\right)={\boldsymbol{Y}}^{*} $$

At the next step the OCCE for the second stage from the end of the process is defined as a function of the input applied to this stage:

$$ {\boldsymbol{u}}_{N-1}^{OPT}=F\left({\boldsymbol{x}}_{N-1}\right) $$

that minimizes the sectional criterion for the last two stages of the process:

$$ {\boldsymbol{Q}}_{N-1}\left({\boldsymbol{x}}_{N-1},{\boldsymbol{u}}_{N-1},{\boldsymbol{u}}_N^{OPT}\right)={\boldsymbol{q}}_{N-1}\left({\boldsymbol{x}}_{N-1},{\boldsymbol{u}}_{N-1}\right)+{\boldsymbol{q}}_N\left({\boldsymbol{x}}_N,{\boldsymbol{u}}_N^{OPT}\right) $$

Note that in this expression x N does not work as an independent factor, it is defined as the output of the previous stage of the process:

$$ {\boldsymbol{x}}_N={\boldsymbol{y}}_{N-1}\left({\boldsymbol{x}}_{N-1},{\boldsymbol{u}}_{N-1}^{OPT}\right) $$

and therefore criterion \( {Q}_{N-1} \) actually depends only on two variable factors, \( {\boldsymbol{x}}_{N-1} \) and \( {\boldsymbol{u}}_{N-1} \):

$$ {\boldsymbol{Q}}_{N-1}\left({\boldsymbol{x}}_{N-1},{\boldsymbol{u}}_{N-1}\right)={\boldsymbol{q}}_{N-1}\left({\boldsymbol{x}}_{N-1},{\boldsymbol{u}}_{N-1}\right)+{\boldsymbol{q}}_N\left[{\boldsymbol{y}}_{N-1}\left({\boldsymbol{x}}_{N-1},{\boldsymbol{u}}_{N-1}\right),{\boldsymbol{u}}_N^{OPT}\right] $$

The solution must ensure that the resultant output

$$ {\boldsymbol{y}}_{N-1}={\boldsymbol{y}}_{N-1}\left({\boldsymbol{x}}_{N-1},\ {\boldsymbol{u}}_{N-1}^{OPT},{\boldsymbol{u}}_N^{OPT}\right) $$

is within its allowed limits, i.e.

$$ {{\boldsymbol{y}}_{N-1}}^{\mathrm{MIN}}\le {\boldsymbol{y}}_{N-1}\le {{\boldsymbol{y}}_{N-1}}^{\mathrm{MAX}} $$

Now let us define the OCCE for the third stage from the end of the process as a function of the input applied to this stage:

$$ {\boldsymbol{u}}_{N-2}^{OPT}=F\left({\boldsymbol{x}}_{N-2}\right) $$

that minimizes the sectional criterion that “covers” the last three stages:

$$ {\boldsymbol{Q}}_{N-2}\left({\boldsymbol{x}}_{N-2},{\boldsymbol{u}}_{N-2},{\boldsymbol{u}}_{N-1}^{OPT},{\boldsymbol{u}}_N^{OPT}\right)={\boldsymbol{q}}_{N-2}\left({\boldsymbol{x}}_{N-2},{\boldsymbol{u}}_{N-2}\right)+{\boldsymbol{q}}_{N-1}\left({\boldsymbol{x}}_{N-1},{\boldsymbol{u}}_{N-1}^{OPT}\right)+{\boldsymbol{q}}_N\left({\boldsymbol{x}}_N,{\boldsymbol{u}}_N^{OPT}\right) $$

Again, in this expression \( {\boldsymbol{x}}_{N-1} \) and x N are not independent factors, they are defined as outputs of the previous stages of the process:

$$ {\boldsymbol{x}}_{N-1}={\boldsymbol{y}}_{N-2}\left({\boldsymbol{x}}_{N-2},{\boldsymbol{u}}_{N-2}\right)\;\mathrm{and}\;{\boldsymbol{x}}_N={\boldsymbol{y}}_{N-1}\left({\boldsymbol{x}}_{N-1},{\boldsymbol{u}}_{N-1}^{OPT}\right) $$

and therefore criterion \( {\boldsymbol{Q}}_{N-2} \) actually depends only on two variable factors, \( {\boldsymbol{x}}_{N-2} \) and \( {\boldsymbol{u}}_{N-2} \)

$$ {\boldsymbol{Q}}_{N-2}=\kern-.1em {\boldsymbol{q}}_{N-2}\left({\boldsymbol{x}}_{N-2},{\boldsymbol{u}}_{N-2}\right)+{\boldsymbol{q}}_{N-1}\left[{\boldsymbol{y}}_{N-2}\left({\boldsymbol{x}}_{N-2},{\boldsymbol{u}}_{N-2}^{OPT}\right),{\boldsymbol{u}}_{N-1}^{OPT}\right]+{\boldsymbol{q}}_N\left[{\boldsymbol{y}}_{N-1}\left({\boldsymbol{x}}_{N-1},{\boldsymbol{u}}_{N-1}^{OPT}\right),{\boldsymbol{u}}_N^{OPT}\right]={\boldsymbol{Q}}_{N-2}\left({\boldsymbol{x}}_{N-1},{\boldsymbol{u}}_{N-2},\ {\boldsymbol{u}}_{N-1,}^{OPT}{\boldsymbol{u}}_N^{OPT}\right) $$

The optimal value of this criterion is:

$$ {\boldsymbol{Q}}_{N-2}\left({\boldsymbol{x}}_{N-1},{\boldsymbol{u}}_{N-2,}^{OPT},\ {\boldsymbol{u}}_{N-1,}^{OPT}{\boldsymbol{u}}_N^{OPT}\right) $$

Again, the appropriate output,

$$ {\boldsymbol{y}}_{N-2}={\boldsymbol{y}}_{N-2}\left({\boldsymbol{x}}_{N-2},\ {\boldsymbol{u}}_{N-2}^{OPT}\right) $$

must be consistent with the allowed value for the output of the appropriate stage of the process:

$$ {{\boldsymbol{y}}_{N-2}}^{\mathrm{MIN}}\le {\boldsymbol{y}}_{N-2}\le {{\boldsymbol{y}}_{N-2}}^{\mathrm{MAX}} $$

It could be seen that eventually the procedure defines the control effort for the first stage of the process as a function of the input applied to this stage:

$$ {\boldsymbol{u}}_1^{OPT}=F\left({\boldsymbol{x}}_1\right) $$

that minimizes the sectional criterion

$$ {\boldsymbol{Q}}_1\left({\boldsymbol{x}}_1,{\boldsymbol{u}}_1,{\boldsymbol{u}}_2^{OPT},{\boldsymbol{u}}_3^{OPT},\dots,\ {\boldsymbol{u}}_{N-1}^{OPT},{\boldsymbol{u}}_N^{OPT}\right) $$

However, the input of the first stage (the input of the overall process), x 1, is explicitly known, therefore the control effort u OPT1 could be explicitly defined. This results in the explicitly defined output of the first stage, y 1. Since \( {\boldsymbol{x}}_2={\boldsymbol{y}}_1 \), the optimal conditional control effort

$$ {\boldsymbol{u}}_2^{OPT}=F\left({\boldsymbol{x}}_2\right) $$

could be explicitly defined thus resulting in an explicit definition of the output of the second stage and the input of the third stage, and so on… It could be seen that the procedure moves now from the first stage of the process to the last stage, converting conditional control efforts into explicitly defined unconditional optimal control efforts.

Let us consider the application of the outlined approach to the following numerical example representing the so-called optimal routing problem.

Example 4.7

Apply dynamic programming to establish the optimal (“minimum cost”) route within the following graph in Fig. 4.15

Fig. 4.15
figure 15

Process graph

It could be seen that the transportation problem featured by the above graph consists of five stages and four steps. Step 1 consists of four alternative transitions: 1/1 → 2/1, 1/1 → 2/2, 1/1 → 2/3 and 1/1 → 2/4 with the associated costs of 5, 3, 1, and 2 (units). Step 2 consists of 12 alternative transitions: 2/1 → 3/1, 2/1 → 3/2, 2/1 → 3/3 with the associated costs of 8, 4, and 3 (units); 2/2 → 3/1, 2/2 → 3/2, 2/2 → 3/3 with the associated costs of 4, 6, and 7 (units); 2/3 → 3/1, 2/3 → 3/2, 2/3 → 3/3 with the associated costs of 5, 6, and 8 (units); and 2/4 → 3/1, 2/4 → 3/2, 2/4 → 3/3 with the associated costs of 9, 5, and 6 (units). Step 3 also consists of 12 alternative transitions: 3/1 → 4/1, 3/1 → 4/2, 3/1 → 4/3, 3/1 → 4/4 with the associated costs of 6, 3, 4, and 10 (units); 3/2 → 4/1, 3/2 → 4/2, 3/2 → 4/3, 3/2 → 4/4 with the associated costs of 5, 6, 7, and 3 (units); 3/3 → 4/1, 3/3 → 4/2, 3/3 → 4/3, 3/3 → 4/4 with the associated costs of 11, 2, 6, and 8 (units). Finally, the last step, 4, consists of four alternative transitions: 4/1 → 5/1, 4/2 → 5/1, 4/3 → 5/1 and 4/4 → 5/1 with the associated costs of 13, 16, 10, and 11 (units). It is required to establish such a sequence of transitions (optimal path) that would lead from the initial to the final stage (nodes of the above graph) and had the minimal sum of the transition costs .

Could we have established the optimal path by considering all possible alternative paths within this graph?—perhaps, but the required computational effort is expected to be very high. Should the number of stages and alternative transitions at every step be greater, this approach will become prohibitively formidable.

According to the dynamic programming procedure, let us define conditionally optimal transitions for the last step of the process, step #4. This task is quite simple: if the starting node of the stage #4 is 4/1 then the optimal (and the only) transition to the last stage is 4/1 → 5/1 with the cost of 13 units. Should we start from node 4/2, the optimal (and the only) transition is 4/2 → 5/1 with the cost of 16 units, and so on. The results of the conditional optimization of the step #4 are tabulated below

Conditional optimization of step 4

Starting node of the stage 4

Final node of the stage 5

Transition costs

Optimal transition

Total cost for this portion of the path

4/1

5/1

13

4/1 → 5/1

13

4/2

5/1

16

4/2 → 5/1

16

4/3

5/1

10

4/3 → 5/1

10

4/4

5/1

11

4/4 → 5/1

11

Let us compile the table representing conditional optimization of the last two steps of the transportation process, namely steps 3 and 4. Assuming that the starting node of the stage 3 is 3/1 then the first available transition within step 3 is 3/1 → 4/1 with the cost of 6 units. At the next step, this transition will be followed by 4/1 → 5/1 and the total cost of both transitions, 3/1 → 4/1 → 5/1, is 19 units. Then, consider the second available transition within step 3, 3/1 → 4/2. It comes with the cost of 3 units and must be followed by the transition 4/2 → 5/1 with the total cost of transition 3/1 → 4/2 → 5/1 of 19 units. Upon consideration of transitions 3/1 → 4/3 → 5/1 and 3/1 → 4/4 → 5/1 it could be seen that for 3/1 as the entry point to step 3 the best transition is 3/1 → 4/3 → 5/1 with the lowest total cost of 14 units.

Conditional optimization of steps 3 and 4

Starting point of steps 3

Alternative transitions to states

Transition costs

Possible transition

Total cost for two stages

Optimal transition

3/1

4/1

6

3/1 → 4/1 → 5/1

6 + 13 = 19

 

4/2

3

3/1 → 4/2 → 5/1

3 + 16 = 19

 

4/3

4

3/1 → 4/3 → 5/1

4 + 10 = 14

3/1 → 4/3 → 5/1

4/4

10

3/1 → 4/4 → 5/1

10 + 11 = 21

 

3/2

4/1

5

3/2 → 4/1 → 5/1

5 + 13 = 18

 

4/2

6

3/2 → 4/2 → 5/1

6 + 16 = 22

 

4/3

7

3/2 → 4/3 → 5/1

7 + 10 = 17

 

4/4

3

3/2 → 4/4 → 5/1

3 + 11 = 14

3/2 → 4/4 → 5/1

3/3

4/1

11

3/3 → 4/1 → 5/1

11 + 13 = 24

 

4/2

2

3/3 → 4/2 → 5/1

2 + 16 = 18

 

4/3

6

3/3 → 4/3 → 5/1

6 + 10 = 16

3/3 → 4/3 → 5/1

4/4

8

3/3 → 4/4 → 5/1

8 + 11 = 19

 

Now let us compile the table representing conditional optimization of the last three steps of the transportation process, namely step 2 followed by steps 3 and 4. Assume that the starting point of stage 2 is 2/1 and the first available transition is 2/1 → 3/1 with the cost of 8 units. The optimal transition from 3/1 to the last stage has been already established: 3/1 → 4/3 → 5/1 and its cost is 14 units, therefore the cost of transition 2/1 → 3/1 → 4/3 → 5/1 is 8 + 14 = 22 units. Assume that the starting point is 2/1 and the chosen transition is 2/1 → 3/2 with the cost of 4 units. The already established optimal transition from 3/2 to the last stage is 3/2 → 4/4 → 5/1 with the cost of 14 units, therefore the cost of transition 2/1 → 3/2 → 4/4 → 5/1 is 4 + 14 = 18 units. Now assume that the starting point is still 2/1 and the chosen transition is 2/1 → 3/3 with the cost of 3 units. The already established optimal transition from 3/3 to the last stage is 3/3 → 4/3 → 5/1 with the cost of 16 units and the total cost is 3 + 16 = 19 units. This indicates that the optimal path from point 2/1 to 5/1 is 2/1 → 3/2 → 4/4 → 5/1 with the cost of 18 units. In the similar fashion optimal paths from points 2/2, 2/3 and 2/4 to point 5/1 are to be established. They are: 2/2 → 3/2 → 4/4 → 5/1 with the cost of 18 units, 2/3 → 3/2 → 4/4 → 5/1 with the cost of 19 units, and 2/4 → 3/2 → 4/4 → 5/1 with the cost of 18 units.

Conditional optimization of steps 2, 3 and 4

Starting point of step 2

Alternative transitions to states

Transition costs

Possible transition

Total cost for two stages

Optimal transition

2/1

3/1

8

2/1 → 3/1

8 + 14 = 22

 

3/2

4

2/1 → 3/2

4 + 14 = 18

2/1 → 3/2 → 4/4 → 5/1

3/3

3

2/1 → 3/3

3 + 16 = 19

 

2/2

3/1

4

2/2 → 3/1

4 + 14 = 18

2/2 → 3/1 → 4/3 → 5/1

3/2

6

2/2 → 3/2

6 + 14 = 20

 

3/3

7

2/2 → 3/3

7 + 16 = 23

 

2/3

3/1

5

2/3 → 3/1

5 + 14 = 19

2/3 → 3/1 → 4/3 → 5/1

3/2

6

2/3 → 3/2

6 + 14 = 20

 

3/3

8

2/3 → 3/3

8 + 16 = 24

 

2/4

3/1

9

2/4 → 3/1

9 + 14 = 23

 

3/2

5

2/4 → 3/2

5 + 14 = 19

2/4 → 3/2 → 4/4 → 5/1

3/3

6

2/4 → 3/3

6 + 16 = 22

 

Finally, let us compile the table representing optimization of all four steps of the transportation process. Note that the optimization results are not conditional anymore: the transition process is originated at the very particular point, 1/1. Assume that the first available transition is 1/1 → 2/1 with the cost of 5 units. The optimal transition from 2/1 to the last stage has been already established: 2/1 → 3/2 → 4/4 → 5/1 and its cost is 18 units, therefore the cost of transition 1/1 → 2/1 → 3/1 → 4/3 → 5/1 is 5 + 18 = 23 units. Assume that the chosen transition is 1/1 → 2/2 with the cost of 3 units. The already established optimal transition from 2/2 to the last stage is 2/2 → 3/1 → 4/3 → 5/1 with the cost of 18 units, therefore the cost of transition 1/1 → 2/2 → 3/1 → 4/3 → 5/1 is 3 + 18 = 21 units. Now assume that the chosen transition is 1/1 → 2/3 with the cost of 1 units. The already established optimal transition from 2/3 to the last stage is 2/3 → 3/1 → 4/3 → 5/1 with the cost of 19 units and the total cost of transition 1/1 → 2/3 → 3/1 → 4/3 → 5/1 is 1 + 19 = 20 units. Should the chosen transition be 1/1 → 2/4 with the cost of 2 units, and since the already established optimal transition from 2/4 to the last stage is 2/4 → 3/2 → 4/4 → 5/1 with the cost of 19 units, the total cost of transition 1/1 → 2/4 → 3/2 → 4/4 → 5/1 is 5 + 19 = 21 units. This clearly indicates that the optimal path from point 1/1 to 5/1 is 1/1 → 2/3 → 3/1 → 4/3 → 5/1. See this analysis summarized in the table below.

Optimization of steps 1, 2, 3 and 4

Starting point of step 1

Alternative transitions to states

Transition costs

Possible transition

Total cost for two stages

Optimal transition

1/1

2/1

5

1/1 → 2/1

5 + 18 = 23

 

2/2

3

1/1 → 2/2

3 + 18 = 21

 

2/3

1

1/1 → 2/3

1 + 19 = 20

1/1 → 2/3 → 3/1 → 4/3 → 5/1

2/4

2

1/1 → 2/4

2 + 19 = 21

 

Consider another quite practical example that ideally lends itself to the application of dynamic programming. It is the optimization of a sequence of manufacturing processes that could be found in chemistry and metallurgy. Each process has its own mathematical description representing quality/quantity of its end product and manufacturing costs as functions of the characteristics of the raw material x i and control efforts u i. Consider the mathematical model of i-th manufacturing process within a sequence consisting of N processes:

  • characteristic of the end material y i = y i(x i,u i), i = 1,2,…,N

  • manufacturing cost q i = q i(x i,u i), i = 1,2,…,N

  • quality/quantity requirements y i MIN ≤ y i ≤ y i MAX, i = 1,2,…,N

  • connection to neighboring processes y i = x i+1, i = 1,2,…,N

For simplicity, let us assume that the above functions are scalar and are represented on the basis of their mathematical model by numerical values of y i and u i for discretized x i = k · Δx i and u i = m · Δu i, i.e. y i(k,m) = y i(k · Δx i,m · Δu i) and q i(k,m) = q i(k · Δx i,m · Δu i) where k, m = 1,2,3,.... Does this representation of the manufacturing process result in the loss of accuracy? No, providing that the discretization steps Δx i, Δu i are judiciously chosen.

Example 4.8

Apply dynamic programming to optimize the operation of a sequence of three manufacturing processes represented by the tabulated description below. Note that the inputs of the individual processes are defined in % assuming that the 100 % value of the respective input corresponds to the maximum value of the output of the previous process. To simplify the problem further, the control efforts are defined not by real numbers, but as “control options.” The overall cost of manufacturing is defined as the sum of costs of individual processes. Finally, it could be seen that the specified acceptability limits of the process outputs are different from their feasibility limits that could be seen in the tables.

figure a

First, let us address the issue of acceptability limits of the process outputs. Computationally, it could be done by replacing associate cost values by penalties (1015) in the situations when output values are not acceptable—this will automatically exclude some cases from consideration, see the modified tables below

figure b
figure c
figure d

The following analysis of the problem solution is based on the printout of a specially written computer code. According to the Principle of Optimality, the solution starts from the conditional optimization of the last, third, process. It will provide an optimal recipe for the process operation for every possible grade of the process input. The printout below considers application of various control options when the input of the process is between 0 and 33 % of its maximum attainable value (grade 1). It could be seen that the acceptable value of the process output is obtained only when control option #3 is applied. This defines option #3 as the conditional optimal control option, and the associated cost of 73 units as the conditionally minimal cost.

                    PROCESS # 3 INP # 1 CONTR# 1 Q = .10000E + 16 Y= 12.00 INP # 1 CONTR# 2 Q = .10000E + 16 Y= 50.00 INP # 1 CONTR# 3 Q = .73000E + 02 Y= 18.00           OPT: INP # 1, CONTR# 3, QSUM = .73000E + 02, Y= 18.00

The following printout presents similar results for the situations when the input grade is 2 and 3.

INP # 2 CONTR# 1 Q = .10000E + 16 Y= 9.00 INP # 2 CONTR# 2 Q = .10000E + 16 Y= 13.00 INP # 2 CONTR# 3 Q = .13000E + 03 Y= 19.00            OPT: INP # 2, CONTR# 3 QSUM = .13000E + 03, Y= 19.00 INP # 3 CONTR# 1 Q = .29000E + 02 Y= 16.00 INP # 3 CONTR# 2 Q = .28000E + 02 Y= 19.00 INP # 3 CONTR# 3 Q = .10000E + 16 Y= 31.00     OPT: INP # 3, CONTR# 2 QSUM = .28000E + 02, Y= 19.00

Now consider conditional optimization of process #2. Note that QSUM represents the sum of costs associated with the chosen input and control option of process #2 and the consequent conditionally optimal input + control option of process #3. Consider the application of various control options when the input of process #2 is of grade 1 (i.e. between 0 and 33 % of its maximum attainable value). The resultant QSUM value includes the specific cost at the process #2 and the consequent already known optimal cost at process #3. Since the first two control options are penalized for resulting in unacceptable values of the output, the optimal result is offered by the control option #3 and the accumulated cost value is QSUM = 73 + 100 = 173 units. Some additional information seen in the printout addresses the following issue. Note that the action at step #2 has resulted in y 2 = 21 units, then how does one determine the consequent action at process #3? It could be seen that the highest y 2 value in the output of process #2 is 90 units. Therefore the output value y 2 = 21 falls within 0–33 % of the y 2 range, i.e. y 2 = 21 constitutes grade #1 of the input product for process #3. Based on the conditional optimization of process #3, for the input grade #1 control option #1 with the associate cost of 73 units is optimal (see Y = 21.00 => 1 + (.73000E + 02) QSUM = .17300E + 03)

                                      PROCESS # 2 INP # 1 CONTR# 1 Q = .10000E + 16 Y= 8.00 = > 1 + (.73000E + 02) QSUM = .10000E + 16 INP # 1 CONTR# 2 Q = .10000E + 16 Y= 11.00 = > 1 + (.73000E + 02) QSUM = .10000E + 16 INP # 1 CONTR# 3 Q = .10000E + 03 Y= 21.00 = > 1 + (.73000E + 02) QSUM = .17300E + 03                  OPT: INP # 1, CONTR# 3, QSUM = .17300E + 03, Y= 21.00 == > 1

Similar analysis is conducted to perform conditional optimization of process #2 for two other grades of the input.

INP # 2 CONTR# 1 Q = .61000E + 02 Y= 24.00 = > 1 + (.73000E + 02) QSUM = .13400E + 03 INP # 2 CONTR# 2 Q = .10000E + 16 Y= 13.00 = > 1 + (.73000E + 02) QSUM = .10000E + 16 INP # 2 CONTR# 3 Q = .10000E + 16 Y= 90.00 = > 3 + (.28000E + 02) QSUM = .10000E + 16     OPT: INP # 2, CONTR# 1, QSUM = .13400E + 03, Y= 24.00 == > 1 INP # 3 CONTR# 1 Q = .10000E + 16 Y= 1.00 = > 1 + (.73000E + 02) QSUM = .10000E + 16 INP # 3 CONTR# 2 Q = .10000E + 16 Y= 15.00 = > 1 + (.73000E + 02) QSUM = .10000E + 16 INP # 3 CONTR# 3 Q = .88000E + 02 Y= 35.00 = > 2 + (.13000E + 03) QSUM = .21800E + 03     OPT: INP # 3, CONTR# 3, QSUM = .21800E + 03, Y= 35.00 == > 2

Consider conditional optimization of process #1, that results in the optimization of the entire combination of three sequential processes. Consider the application of various control options when the input of process #1 is of grade 2 (i.e. between 34 and 66 % of its maximum attainable value). The resultant QSUM value includes the specific cost at the process #1 and the consequent already known optimal costs at process #2 and #3. The first control option results in the unacceptable value of the output and is penalized. The application of control option #2 results in y 1 = 13 or #1 grade of the input for process #2, and the cost of 19 units. The already established optimal decisions for this input grade for process #2 come with the cost of 173 units. Consequently QSUM = 19 + 173 = 192 units. The application of control option #3 results in y 1 = 19 (or #2 grade of the input for process #2), and the cost of 130 units. The already established optimal decisions for this input grade for process #2 comes with the cost of 134 units. Therefore QSUM = 130 + 134 = 264 units. It is clear that the control option #2 is optimal grade #2 of the input material.

                PROCESS # 1 INP # 2 CONTR# 1 Q = .10000E + 16 Y= 9.00 = > 1 + (.17300E + 03) QSUM = .10000E + 16 INP # 2 CONTR# 2 Q = .19000E + 02 Y= 13.00 = > 1 + (.17300E + 03) QSUM = .19200E + 03 INP # 2 CONTR# 3 Q = .13000E + 03 Y= 19.00 = > 2 + (.13400E + 03) QSUM = .26400E + 03                 OPT: INP # 2, CONTR# 2, QSUM = .19200E + 03, Y= 13.00 == > 1

Consider conditional optimization of process #1 when the input of process #1 is of grade #1 and grade #3 is featured below.

INP # 1 CONTR# 1 Q = .10000E + 16 Y= 2.00 = > 1 + (.17300E + 03) QSUM = .10000E + 16 INP # 1 CONTR# 2 Q = .10000E + 16 Y= 50.00 = > 3 + (.21800E + 03) QSUM = .10000E + 16 INP # 1 CONTR# 3 Q = .73000E + 02 Y= 18.00 = > 2 + (.13400E + 03) QSUM = .20700E + 03     OPT: INP # 1, CONTR# 3, QSUM = .20700E + 03, Y= 18.00 == > 2 INP # 3 CONTR# 1 Q = .29000E + 02 Y= 11.00 = > 1 + (.17300E + 03) QSUM = .20200E + 03 INP # 3 CONTR# 2 Q = .18000E + 02 Y= 19.00 = > 2 + (.13400E + 03) QSUM = .15200E + 03 INP # 3 CONTR# 3 Q = .10000E + 16 Y= 31.00 = > 2 + (.13400E + 03) QSUM = .10000E + 16      OPT: INP # 3, CONTR# 2, QSUM = .15200E + 03, Y= 19.00 == > 2

Finally, the following printout summarizes the results of the optimization of the entire sequence of three processes for every grade of the raw material.

OPTIMAL PROCESS OPERATION RAW MATERIAL GRADE: 1 2 3 PROCESS # 1 CONTROL OPTION: 3 2 2 OUTPUT = 18.00 13.00 19.00 PROCESS # 2 CONTROL OPTION: 3 1 3 OUTPUT = 21.00 24.00 35.00 PROCESS # 3 CONTROL OPTION: 3 3 2 OUTPUT = 18.00 19.00 19.00 ================================ TOTAL COST: 207.00 192.00 152.00

4.5.1 Exercise 4.3

Problem 1

Apply dynamic programming to optimize the following sequence of manufacturing processes.

figure e

The characteristics of each process are given below:

 

y1(x,u)

q1(x,u)

y2(x,u)

q2(x,u)

 

u = 1

u = 2

u = 3

u = 1

u = 2

u = 3

u = 1

u = 2

u = 3

u = 1

u = 2

u = 3

10 < x ≤ 40

25

45

55

25

28

25

65

44

74

13

21

33

40 < x ≤ 70

37

48

63

27

33

27

66

50

81

15

22

37

70 < x ≤ 100

45

58

79

22

24

25

78

62

96

18

28

40

 

y3(x,u)

q3(x,u)

y4(x,u)

q4(x,u)

 

u = 1

u = 2

u = 3

u = 1

u = 2

u = 3

u = 1

u = 2

u = 3

u = 1

u = 2

u = 3

10 < x ≤ 40

13

45

92

16

18

9

56

85

97

2

4

3

40 < x ≤ 70

48

18

68

13

17

8

42

61

81

3

6

4

70 < x ≤ 100

81

66

21

10

14

6

21

39

70

4

5

3

It is known that x1 = 37 (units) and the end product must be such that 70  y4 85. Obtain the optimal choice of control options for each process that would minimize the sum of “local” criteria, Q = q1 + q2 + q3 + q4, and define the corresponding values of the characteristics of the intermediate products.

Problem 2

Use dynamic programming to solve the optimal routing problem based on the graph below featuring the available transitions and the associated costs.

figure f