Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

In this chapter we describe the basics of the Better Optimization of Nonlinear Uncertain System (BONUS) algorithm. For better readability, we present the generalized stochastic optimization framework (Fig. 1.4 (from Chap. 1) for stochastic nonlinear programming (NLP) problem below. This chapter is derived from the work by [43].

General techniques for these types of optimization problems determine a statistical representation of the objective such as maximum expected value or minimum variance . Once embedded in an optimization framework, the iterative loop structure emerges where decision variables are determined, a sample set based on these decision variables is generated, the model is evaluated for each of these sample points, and the probabilistic objective function value and constraints are evaluated, as shown in the inner loop of the Fig. 4.1. When one considers that nonlinear optimization techniques rely on an objective function and constraints evaluation for each iteration, along with derivative estimation through perturbation analysis, the sheer number of model evaluations rises significantly rendering this approach ineffective for even moderately complex models. Figure 4.2 shows the general idea behind the BONUS algorithm. BONUS follows the grey arrows. In the stochastic optimization iterations (Fig. 4.1), decision variables values can vary between upper and lower bounds, and in sampling loop various probability distributions are assigned to uncertain variables. In the BONUS approach, initial uniform distributions (between upper and lower bounds) are assumed for decision variables. These uniform distributions together with specified probability distributions of uncertain variables form the base distributions for analysis. BONUS samples the solution space of the objective function at the beginning of the analysis by using the base distributions. As decision variables change, the underlying distributions for the objective function and constraints change, and the proposed algorithm estimates the objective function and constraints values based on the ratios of the probabilities for the current and the base distributions (a reweighting scheme), which are approximated using kernel density estimation (KDE) techniques. Thus, BONUS avoids sample model runs in subsequent iterations.

Fig. 4.1
figure 1

Pictorial representation of the stochastic programming framework

4.1 Reweighting Schemes

The goal of the reweighting scheme (shown by gray arrows in Fig. 4.2) is to determine changes in output distributions as input distributions change. Hesterberg (1995) presents various reweighting techniques for estimating the expected value of an output distribution cumulative distribution function (CDF), \(F[J(u)]\) without evaluating the model for the input distribution probability density function (PDF),f(u)) in Fig. 4.2. The ratio of the probability density functions f is used as a weight, which is given as:

$$\omega_i=\frac{f(u_i)}{\hat{f}(u^\star_i)},$$
(4.1)

where \(\hat{f}(u^\star_i)\) is determined for the base sample set, for which the model response is known, and the probability density \(f(u_i)\) is calculated using the sample for which the response has to be estimated. Remember that these two sample sets are not necessarily related. One attempt for estimating statistical properties P(u) for the output of the model is through the product of the weights and the same properties obtained from the base distribution (Eq. 4.2).

$$P(u) =\sum_{i}{\omega_i \cdot P(u^\star_i)}$$
(4.2)

For instance, to estimate the mean μ of a model response, Z(u), the weight would be multiplied by the individual model responses for the base set:

$$\mu[Z(u)] =\sum_{i}^{N_{samp}}{\omega_i \cdot Z(u^\star_i)},$$
(4.3)

where \(N_{samp}\) is the sample size.

Fig. 4.2
figure 2

Density estimation approach to optimization under uncertainty

This approach has limitations, as the weights may not sum to 1. This problem is reduced by using normalized weights, as shown in Eq. 4.4. This normalized reweighting (the ratio estimate for weighted average) has another advantage as it provides acceptable performance for a wider range of perturbation, especially for large samples of Monte Carlo simulations (MCS ) [19] . In the BONUS , instead of using large size of MCS , a more efficient sampling technique as presented in Chap. 2 that provides the same accuracy as MCS in order of magnitude with less number of samples is used.

$$P(u)=\sum_{j}^{N_{samp}}{\frac{\frac{f(u_j)}{\hat{f}(u^\star_j)}} {\sum_{i=1}^{N_{samp}}\frac{f(u_i)}{\hat{f}(u^\star_i)}} \cdot P(u^\star_j)}$$
(4.4)

As seen in Eq. 4.4 the mean of the function can be estimated from the ratio of the two input distributions f(u) and \(\hat{f}(u^\star)\). This requires the determination of the probability distributions from a given sample set of uncertain variables. Here, the KDE techniques discussed in Chap. 3 are used.

In order to use the kernel density approach for estimating function values (objective function and constraints ), the base sample set \(u^\star\) has to be generated for model calculations. As stated earlier, we select uniform distributions for the decision variables and specified distributions for uncertain variables for creating the base sample. Once the base sample is obtained, its density can be calculated for each point as:

$$\hat{f}(u^\star_i)=\frac{1}{N_{samp} \cdot h}\sum_{j=1}^{N_{samp}} \frac{1}{\sqrt{2\pi}}\cdot e^{-\frac{1}{2}\left(\frac{u^\star_i-u^\star_j}{h}\right)^2}$$
(4.5)

We now want to find the distribution f(u) for the decision variable selected at each optimization iteration. For this purpose a small narrow normal distribution at the decision point for the decision variables is assumed and a new sample set for these normal distributions u is generated. After determining the model output \(Z(u^\star_i)\) for each \(u^\star_i\), the value of output distribution for the decision variables Z(u) is obtained by the reweighting scheme described above using the probability of each new data point u i , as determined through the kernel density approximation (Eq. 4.6).

$$f(u_i)=\frac{1}{N_{samp} \cdot h}\sum_{j=1}^{N_{samp}} \frac{1}{\sqrt{2\pi}}\cdot e^{-\frac{1}{2}\left(\frac{u_i-u^\star_j}{h}\right)^2}$$
(4.6)

4.2 Effect of Sampling on Reweighting

The proposed reweighting scheme using KDE has been carried out for case studies up to \(d=10\) dimensions for the following five types of functions [43] . The application of alternative and more efficient sampling techniques such as Latin hypercube sampling (LHS ), median Latin hypercube sampling (MLHS), and hammersley sequence sampling (HSS) have resulted in significant reductions of computational requirements compared to MCS as shown in this section.

  • Function 1: Linear additive: \(y=\sum_{m=1}^{s}u_{m}\;\;\;\;\;s=2...10\)

  • Function 2: Multiplicative: \(y = \Pi_{m=1}^{s} u_{m}\;\;\;\;\;s=2...10\)

  • Function 3: Quadratic: \(y = \sum_{m=1}^{s} u_{m}^{2}\;\;\;\;\;s=2\dots10\)

  • Function 4: Exponential: \(y = \sum_{m=1}^{s} u_{m}\cdot exp(u_{m})\;\;\;\;\;s=2\dots10\)

  • Function 5: Logarithmic: \(y = \sum_{m=1}^{s} log(u_{m})\;\;\;\;\;s=2\dots10\)

The total analysis includes five functions, with four sampling techniques being compared for each of these functions. The number of sample points for each sample is also analyzed, by selecting sample sizes as N samp  = [50, 100, 250, 500, 750, 1000, 2500, 5000, 7500, 10, 000]. This results in a total of 200 runs. This results in a total of 200 runs for which the proposed reweighting approach has been tested. For each run, the means and variances are both calculated and estimated, as are the derivatives of each of these with respect to each u. Further, the percentage error between the actual and estimated values is determined as well, as shown in Table 4.1.

Table 4.1 Calculations for KDE efficiency analysis

As required, the base distributions are uniform distributions of decision variables with bounds given in first three columns of Table 4.2, and the estimated distributions were narrow normal , with the upper and lower bounds in the last three columns of the table indicating the region enclosing the 99.999 percentile.

Table 4.2 Bounds for base (uniform ) and estimated (normal ) distributions

For the generation of the shifted sample set \(u^\Delta\) and for derivative calculations, the step size \(\Delta u_j\) was selected as:

$$\Delta u_j = 0.05 \cdot \mu\{u_j\}$$
(4.7)

As the model functions are relatively simple, the actual values (analytical) of the mean and variance for both sample sets u and \(u^\Delta\) are calculated, and compared to the estimates. Further, the same analysis is conducted for the derivative estimates, allowing for comparison of the errors in the estimates based on the sampling technique that is applied to generate both sample sets \(u^\star\) and u. The next section provides the results of the preliminary study.

As indicated above, 200 different runs have been used to verify the applicability of the technique. For each run, means , variances , and derivatives have been calculated and estimated using the reweighting scheme, and percentage errors between each of these have been determined. Due to the extensive nature of this analysis, only one example is provided here that is both relevant to this analysis as well as representative of the overall behavior of the technique.

The results obtained for the nonlinear function, \(y =\sum_{m=1}^{3} u_{m}^{2}\) are presented here. Variance calculation is more prone to errors than calculation of mean (if sample size is small), and also the case study in the next section aims at calculating the variance of the system at hand that presents the efficiency of the reweighting technique to estimate variance for this function here.

Simultaneous plotting of the actual and estimated values will allow one to identify how accurate each technique is. Note that the x-axis is in log scale to capture the change of the sample sizes through \(N_{samp}\) = [50, 100, 250, 500, 750, 1000, 2500, 5000, 7500, 10,000]. The lines represent the actual values, while the stand-alone points represent the estimated variance values using the four different sampling techniques.

Fig. 4.3
figure 3

Variance calculation for different sampling techniques

Table 4.3 Percentage error in variance estimation for 3-dimensional analysis using 250 samples

In Fig. 4.3, the variance of Function 3 is plotted with respect to the number of samples. As seen, all four sampling techniques converge to the same value as \(N_{samp}\) approaches 10,000, with the MCS technique showing the highest variations. While most approaches over- or underestimate the mean at low sample sizes, HSS provides a rather accurate estimate in this region. Table 4.3 provides the percentage error between the estimates and the actual values of the variance for f(u) for all four sampling techniques with sample sizes of 250. As seen, HSS yields comparably small percentage errors for all functions.

Fig. 4.4
figure 4

Optimization under uncertainty: The BONUS algorithm

4.3 BONUS : The Novel SNLP Algorithm

The algorithm for BONUS , given in Fig. 4.4, can be divided into two sections. The first section, Initialization, starts with generating the base distribution that will be used as the source for all estimations throughout the optimization. After the base distribution is generated, the second section starts, which includes the estimation technique that results in the improvements associated with BONUS with respect to computational time. In this algorithm overview, we denote the D-dimensional vector of deterministic decision variables as \(\theta=[\theta_1, \theta_2,..., \theta_d, \theta_{d+1},.., x_D]\), while the S-dimensional uncertain variables are defined as \(v=[v_1, v_2,..., v_s, v_{s+1},..., v_S]\), total S+D-dimensional variable vector \(u=[u_1, u_2,..., u_{s+d}, u_{s+d+1},..., u_{S+D}]\)..

I - Initialization

  1. 1.

    Generate (i = 1 to N samp ) samples for all decision variables and specified distributions for uncertain variables \(u^{\star}_i\) as a base distribution.

  2. 2.

    Run KDE for identifying the probabilities \(\hat{f}_s(u^{\star}_i)\).

    1. a)

      Set s = 1.

      1. i.

        Set \(i = 1\).

      2. ii.

        While \(i < N_{samp}\), calculate \(\hat{f}_s(u^{\star}_i)\) using Eq. 4.5.

      3. iii.

        \(i = i+1\). Go to step ii.

    2. b)

      \(s = s + 1\). If \(s < S+D+1\) return to step I.2.a.i.

  3. 3

    Run the model for each sample point to find the corresponding model output, store value Z i .

II - SNLP Optimization

  1. 1.

    Set \(k=1\). Determine objective function value for starting point, \(J=P(\theta^k,v^k)\). Set deterministic decision variable counter \(d =1\).

    1. a)

      Generate (i = 1 to \(N_{samp}\)) samples (\(u^k_i\)) with the appropriate narrow normal distributions at \(\theta^{k}_d\) for all decision variables and specified distributions for uncertain variables \(v^k_i\).

    2. b)

      Run KDE for identifying the probabilities \(f_s(u^k_i)\) at \(\theta^{k}_d\), similar to step I.2, using Eq. 4.6 in step ii instead.

    3. c)

      Determine the weights \(\omega_i\) from the product of ratios, \(\Pi_{S}{f_s(u^k_i)/\hat{f}_s(u^{\star}_i)}\).

    4. d)

      Calculate \(\sum_i{\omega_i}\).

    5. e)

      Estimate the probabilistic objective function and constraints values:

      1. i.

        Set \(i = 1\), \(J^{k}=0\).

      2. ii.

        While \(i < N_{samp}\), calculate: \(J^{k}=J^{k}_i*\omega_i/\sum_i{\omega_i}\).

      3. iii.

        \(i = i+1\). Go to step ii.

    6. f)

      Set \(d=d + 1\), return to step II.2.

  2. 2.

    While \(d \leq D\), perturb one decision variable \(\theta^k_d\) to find \(\theta^{k,\Delta}_d\). Reset deterministic decision variable counter \(d =1\).

    1. a)

      Generate (\(i = 1\) to \(N_{samp}\)) samples with the appropriate distributions at \(\theta^{k,\Delta}_d\) for all variables \(u^k_i\).

    2. b)

      Run KDE for identifying the probabilities \(f_s(u^k_i)\) at \(\theta^{k,\Delta}_d\), similar to steps I.2, using Eq. 4.6 in step ii instead.

    3. c)

      Determine the weights \(\omega_i\) from the product of ratios, \(\Pi_{S}{f_s(u^k_i)/\hat{f}_s(u^{\star}_i)}\).

    4. d)

      Calculate \(\sum_i{\omega_i}\).

    5. e)

      Estimate probabilistic objective function and constraints value:

      1. i.

        Set \(i = 1\), \(J^{k,\Delta}=0\).

      2. ii.

        While \(i < N_{samp}\), calculate: \(J^{k,\Delta}=J^{k,\Delta}_i*\omega_i/\sum_i{\omega_i}\).

      3. iii.

        \(i = i+1\). Go to step ii.

    6. f)

      Set \(d=d + 1\), return to step II.2.

  3. 3.

    Calculate gradient information obtained from II-1 and II-3.

  4. 4.

    Check convergence criteria for nonlinear solver (KKT conditions); if satisfied, STOP-Optimum found. Otherwise, identify new vector of decision variables through gradients obtained from objective function value estimation via reweighting . Set \(k = k + 1\). Return to step II-2.

Note that traditional techniques rely on repeated model runs for steps II-3b in the algorithm. For computationally complex nonlinear models, this task can become the critical bottleneck for solving the SNLP . BONUS , on the other hand, bypasses these by estimating the objective function values via reweighting . The BONUS algorithm is implemented using the nonlinear solver based on sequential quadratic programming (SQP ) method. The following examples illustrate the steps involved in BONUS and the efficiency of BONUS for solving SNLP problems.

Example 4.1

Consider the optimization problem presented in Example 3.1 again. Illustrate the reweighting scheme and solve the problem using BONUS .

$$\begin{aligned} \textrm{min} &\; E[Z]=E[(\tilde{x}_1-7)^2 + (\tilde{x}_2-4)^2]\end{aligned}$$
(4.8)
$$\begin{aligned} \textrm{s.t.}& \tilde{x}_1 \in N[\mu=x_1^\star, \sigma=0.033 \cdot x_1^\star] \end{aligned}$$
(4.9)
$$\begin{aligned} & \tilde{x}_2 \in U[0.9 \cdot x_2^\star, 1.2 \cdot x_2^\star]\end{aligned}$$
(4.10)
$$\begin{aligned} &4 \leq x_1 \leq 10 \end{aligned}$$
(4.11)
$$\begin{aligned} & 0 \leq x_2 \leq 5\end{aligned}$$
(4.12)

Here, E represents the expected value, and the goal is to minimize the mean of the objective function calculated for two uncertain decision variables, x 1 and x 2. The optimizer determines the value \(x_1^\star\), which has an underlying normal distribution with ± 10 % of the nominal value of \(x^\star_1\) as the upper and lower 0.1 % quantiles. Similarly, \(\tilde{x}_2\) is uniformly distributed around \(x_2^\star\), with cutoff ranges at \([{-}10\,\%, +20\,\%]\).

Solution

The following steps illustrate the steps of BONUS algorithm to solve this problem.

Step 1

The first step in BONUS is determining the base distributions for the decision variables and uncertain variables, followed by generating the output values for this model. Since in this case decision variable and uncertain variables are merged, we use the entire possible range for the two variables as these base distributions have to cover the entire range, including variations. For instance, for x 2, the range extends to \((0 \times 0.9) \leq x_2 \leq (5 \times 1.2)\) to account for the uniformly distributed uncertainty. Due to space limitations, the illustrative presentation of the kernel density and reweighting approach is performed for a sample size of 10, while the remainder of the work uses \(N=100\) samples. A sample realization using MCS is given in Table 4.4.

Table 4.4 Base sample

After this sample is generated, KDE for the base sample is applied to determine the probability of each sample point with respect to the sample set. This is performed for each decision variable separately by approximating each point through a Gaussian kernel, and adding these kernels to generate the probability distribution for each point, as given in Eq. 4.13 [52] .

$$\hat{f}(x_i(k))=\frac{1}{N \cdot h}\sum_{j=1}^{N} \frac{1}{\sqrt{2\pi}}\cdot e^{-\frac{1}{2}\left(\frac{x_i(k)-x_i(j)}{h}\right)^2}.$$
(4.13)

Here, h is the width for the Gaussian kernel and depends on the variance σ and sample size N of the data set and is given as follows:

$$h = 1.06 \times \sigma \times N^{-\frac{1}{5}}.$$
(4.14)

For our example, \(h(x_1) = 1.06 \times 1.5984 \times 10^{-0.2} = 1.0690\) and \(h(x_2) = 1.06 \times 1.3271 \times 10^{-0.2} = 0.8876\). Using the first value, one can calculate \(\hat{f}(x_1(1))=\frac{1}{10 \times 1.0690}\sum_{j=1}^{10} \frac{1}{\sqrt{2\pi}}\cdot e^{-\frac{1}{2}\left(\frac{5.6091-x_1(j)}{1.0690}\right)^2}= 0.1769\). This step is repeated for every point, resulting in the KDE provided in Table 4.5.

Table 4.5 Base sample kernel density estimates

Step 2

All these steps were preparations for the optimization algorithm, where repeated calculations of the objective function will be bypassed through the reweighting scheme.

Step 2a

For the first iteration, assume that the initial value for the decision variables is \(x_1 =5\) and \(x_2 = 5\). For these values, another sample set is generated, as shown in Table 4.6, accounting for the uncertainties described in Eqs. 4.9 and 4.10.

Table 4.6 Sample-optimization iteration 1

The expected value of Z is estimated using the reweighting approach, given in Steps 2b and 2c.

Step 2b

Now, the KDE for the sample \((f(x_i))\) generated around the decision variables has to be calculated. The Gaussian kernel width \(h(\tilde{x_1}) =1.06 \times 0.0837 \times 10^{-0.2} = 5.598\times 10^{-2}\). Using this value, one can calculate \(f(x_1(1))=\frac{1}{10 \times 5.598\times 10^{-2}}\sum_{j=1}^{10} \frac{1}{\sqrt{2\pi}}\cdot e^{-\frac{1}{2}\left(\frac{5.609-\tilde{x}_1(j)}{5.598\times 10^{-2}}\right)^2}= 5.125\times 10^{-23}\). Again, this step is repeated for every point of the sample with respect to the base distribution data resulting in the KDE provided in Table 4.7.

Step 2c

Using these and the base KDE values, weights are calculated for each sample point j as

$$\omega_j = \frac{f(x_1(j))}{\hat{f}(x_1(j))} \times \frac{f(x_2(j))}{\hat{f}(x_2(j))}, j=1,...,N$$
(4.15)

In our illustrative example, the only two nonzero weights are \(\omega_5 = 1.699\times10^{-68}\)) and \(\omega_8 = 4.152\times10^{-15}\). These weights are normalized and multiplied with the output of the base distribution to estimate the objective function value:

$$E^{est}[Z]=\sum_{j}^N \overline{\omega_j} \cdot Z(j).$$
(4.16)

For our illustrative example, this reduces to

$$E^{est}[Z]= \overline{\omega_8} \cdot Z(8) = 1.0000 \times 2.5119 = 2.5119,$$
(4.17)

as the normalization eliminates all but one weight. Note that this illustrative example was developed with an unrealistically small sample size. Hence, the accuracy of the estimation technique cannot be judged from this example. Further, due to the inaccuracy of the estimate resulting from the small sample size, we will not present results for Steps 2d and 2e for just 10 samples, but use 100 samples (note that estimated value of expected value of Z is different in Table 4.8, and is different than that of 10 samples). Also note that Steps 2d and 2e basically repeat the procedures in Steps 2a through 2c for a new sample set around a perturbed point, for instance \(x_1+\Delta x_1 = 5 + 0.001 \times 5 = 5.005\).

Table 4.7 Optimization iteration 1-KDE
Table 4.8 Optimization progress at \(N=100\)

The results obtained using the BONUS algorithm for optimization converge to the same optimal solution as obtained using a brute force analysis normally used in stochastic NLPs where the objective is calculated for each iteration by calculating the objective function value for each generated sample point. In this example, BONUS used only 100 model runs, while the brute force optimization evaluated the model 600 times for the two iterations.

The following example is based on Taguchi’s approach to off-line quality control [55] applied to output of a chemical reactor system.

Example 4.2, Taguchi’s Quality Control Problem

Consider the following problem of off-line quality control of a continuous stirred tank reactor (CSTR ) derived from [23] .

The system to be investigated consists of a first-order sequential reaction, \(A~\rightarrow~B~\rightarrow~C\), taking place in a nonisothermal continuous CSTR . The process and the associated variables are illustrated in Fig. 4.5. We are interested in designing and operating this process such that the rate of production of species B (R B ) is 60 moles/min. However, as is apparent from the reaction pathway, species B degrades to species C if the conditions in the CSTR such as the temperature (T) and heat removal (Q) are conducive. The objective of parameter design is to produce species B at target levels with minimal fluctuations around the target in spite of continuous variation in the inputs. The inlet concentration of A (\(C_{A_f}\)), the inlet temperature (T f ), the volumetric flow rate (F), and the reactor temperature (T) are considered prone to continuous variations. The objective of off-line parameter design is to choose parameter settings for the design variables such that the variation in the production rate of r B around the set point is kept at a minimum.

Fig. 4.5
figure 5

Nonisothermal CSTR

The five design equations that govern the production of species B (and the steady state values of other variables) in the CSTR are given below. The average residence time (τ) of each species in the reactor is given as \(\tau\,=\) V/F, where V is the reactor volume and F is the feed flow rate.

Table 4.9 Parameters and their values in CSTR study
$$\begin{aligned} Q & = F\rho Cp(T-T_f) + V(r_AH_{RA}+r_BH_{RB})\end{aligned}$$
(4.18)
$$\begin{aligned} C_A & = \frac{C_{A_f}}{1+k_A^0e^{\frac{-E_A}{RT}}\tau}\end{aligned}$$
(4.19)
$$\begin{aligned} C_B & = \frac{C_{B_f}+k_A^0e^{\frac{-E_A}{RT}}\tau C_A}{1+k_B^0e^{\frac{-E_B}{RT}}\tau}\end{aligned}$$
(4.20)
$$\begin{aligned} -r_A & = k_A^0e^{\frac{-E_A}{RT}}C_A\end{aligned}$$
(4.21)
$$\begin{aligned} -r_B & = k_B^0e^{\frac{-E_B}{RT}}C_B~-~k_A^0e^{\frac{-E_A}{RT}}C_A\end{aligned}$$
(4.22)

where C A and C B are the bulk concentrations of A and B, T is the bulk temperature of the material in the CSTR , subscript f denotes initial feed, and the rate of consumption of A and B are given by \(-r_A\) and \(-r_B\). These five variables are the state variables of the CSTR and can be estimated for a given set of values for the input variables (C\(_{A_f}\), C\(_{B_f}\), T f , T, F, and V) and the following physical constants: \(k_A^0\), \(k_B^0\) and E A , E B the preexponential Arrhenius constants and activation energies respectively; \(H_{RA}\) and \(H_{RB}\), the molar heats of the reactions, which are assumed to be independent of temperature; ρ and Cp the density, and specific heats of the system, which are assumed to be same for all processing streams. Once input variables T and T f are specified, Eq. 4.18 can be numerically solved to estimate Q, the heat added to or removed from the CSTR . The average residence time can be calculated from the input variables F and V. Subsequently, for a given input concentration for \(C_{A_f}\) and \(C_{B_f}\), the bulk CSTR concentrations C A and C B can estimated using Eqs. 4.19 and 4.20. The production rates r A and r B can now be calculated from Eqs. 4.21 and 4.22. The system parameters are summarized in Table 4.9. Note that this analysis fixes the set-point for both the feed concentration of B, C\(_{B_f}\), and the CSTR temperature T. Both values are also given in Table 4.9.

The design objective is to produce 60 mol/min of component B, i.e., \(R_B=60\). The initial nominal set points for the decision variables are provided in Table 4.10. However, the continuous variations in the variables (\(C_{A_f}\), T f , F, and T) result in continuous variations of the production rate, R B , which needs to be minimized. Solve this problem using traditional SNLP and BONUS , and compare the results.

Table 4.10 Decision variables for optimization

Solution

The goal is to determine process parameters for a nonisothermal CSTR (Fig. 4.5) that result in minimum variance in product properties when fluctuations are encountered [23] . The mathematical representation for the problem is given as:

$$\begin{aligned} \textrm{min}\;\sigma^2_{R_B} &= \int^{1}_{0}(R_B-\overline{R_B})^2 dF \end{aligned}$$
(4.23)
$$\begin{aligned} \textrm{s.t.}\;\overline{R_B} &= \int^{1}_{0}R_B(\theta,x,u) dF \end{aligned}$$
(4.24)
$$\begin{aligned} C_A &= \frac{C_{A_f}}{1+k^0_A \cdot e^{-E_A/RT} \cdot \tau} \end{aligned}$$
(4.25)
$$\begin{aligned} C_B &= \frac{C_{B_f} + k^0_A \cdot e^{-E_A/RT} \cdot \tau \cdot C_A}{1+k^0_B \cdot e^{-E_B/RT} \cdot \tau} \end{aligned}$$
(4.26)
$$\begin{aligned} -r_A &= k^0_A \cdot e^{-E_A/RT} \end{aligned}$$
(4.27)
$$\begin{aligned} -r_B &= k^0_B \cdot e^{-E_B/RT} - k^0_A \cdot e^{-E_A/RT} \end{aligned}$$
(4.28)
$$\begin{aligned} Q &= F \rho C_p \cdot (T-T_f) + V \cdot (r_A H_{RA} + r_B H_{RB})\end{aligned}$$
(4.29)
$$\begin{aligned} \tau &= V/F \end{aligned}$$
(4.30)
$$\begin{aligned} R_B &= r_B \cdot V\end{aligned}$$
(4.31)

Uncertain variables are \([C_A, T_f, F, T]\), and the range of uncertainty for these variables is normally distributed with means at \([C_{A_f}, T_f^1, F^1, T^1]\). For the first three uncertain variables, the fluctuations \(0.001^{th}\) fractiles are at \(\pm 10\,\%\). However, for T, several factors can contribute to fluctuations and the level of fluctuation around the reactor temperature T is set at \(\pm 30\,\%\) around \(T^1\). Based on these values, the initial variance at the starting point given in Table 4.10 is determined as \(\sigma^2_{R_B, init}= 1034\).

To compare the performance of bypassing the model and using the estimation technique through kernel densities, the model was run first for the case with traditional SNLP . Using this traditional approach, the algorithm converged to the optimal solution of \([C_{A_f} = 3124.7 \mathrm {mol}/\mathrm{m^3}, T_f^1 = 350\ K, F^1 = 0.0557 \mathrm {m^3}/\mathrm{min}, V = 0.0500 \rm {m^3}]\) after three iterations, for a sample size \(N_{samp}=150\). This reactor design has a variance of \(\sigma^2_{R_B}= 608.16\). Here, the model is run for every sample point during each iteration step. Further, the derivatives used for SQP are estimated by running the model an additional four times for shifted sample sets of each variable. This requires a total of

$$ 150 \frac{\rm{ model\ calls}}{\rm{derivative\ calc.}} \cdot (4+1) \frac{\rm{derivative\ calc.}}{\rm {iterations}} \cdot 3 {\rm {iteration}} = 2250 \rm {model\ calls} \notag$$

Optimization progress is presented in Fig. 4.6 for the traditional approach and in Fig. 4.7 for BONUS . The initial point is shown as the thick line covering variations up to 120 mol/min. As optimization progresses, the probability around the desired rate of \(R_B = 60\) increases, as seen in the optimal solution presented as the bold dashed/dotted line.

Fig. 4.6
figure 6

Optimization progress for traditional SNLP approach

Fig. 4.7
figure 7

Optimization progress in reducing product variance using BONUS

The analysis for the BONUS algorithm using model bypass converges after five iterations to the same optimum values of decision variables \(C_{A_f} =3125.1~mol/\rm{m^3}\), \(T_f = 328.93 K\), \(F = 0.057 \rm{m^3}/\rm{min}\), and \(V = 0.0500~\rm{m^3}\). This solution shows almost identical behavior to the optimum found using the traditional approach and even has a slightly lower variance of \(\sigma^2_{R_B}= 607.11\). However, the real advantage of using BONUS is that this analysis called the model just 150 times, only for the determination of the initial base distribution \(F[R_B^\star]\), in contrast to a total of 2250 model evaluations for the traditional approach.

Capacity expansion for electricity utilities has been an active area of research, having been analyzed using a multitude of methods, including optimization, simulation, and decision analysis [27] . The nature of the problem is inherently uncertain, as it is impossible to determine exact values for future cost levels, the demand for electricity, the development of alternative and more efficient technologies, and many more factors. Hence, the capacity planning example has been analyzed by various researchers in the stochastic programming (SP) community [2] .

Due to the limitations of conventional algorithms for optimization under uncertainty, several assumptions have been made, converting the capacity expansion SP into a linear problem through estimations and approximations in order to solve these problems. Among these simplifications, the load curve, which identifies the probability of electricity demand levels, is generally discretized into linear sections, allowing the use of decomposition techniques that require a finite number of realizations of the uncertain variables [30]. The ability of BONUS to handle nonlinearity allows this problem to be handled without these limitations; this is presented in the following examples.

Example 4.3 Capacity Expansion for Electric Utilities

The mathematical representation of the problem is given below. The objective is to minimize the expected cost of capacity expansion subject to uncertain demands and cost factors, while ensuring that no shortages are present. Note that the objective function 4.32 is the expected value for the total cost calculated for \(n = 1,..., N_{samp}\) samples. In the formulation given below, capital nomenclature is used for decision variables, while the uncertain variables are indicated through a tilde symbol.

$$\begin{aligned} && \textrm{min} & E[cost]\end{aligned}$$
(4.32)
$$\begin{aligned} & & \textrm{s.t.} cost & = \sum_{t} cost^{op}_t + cost^{cap}_t + cost^{buy}_t \end{aligned}$$
(4.33)
$$\begin{aligned} & & cost^{op}_t & = \sum_{i} P^i_{t} \cdot \widetilde{oc^i_t} \end{aligned}$$
(4.34)
$$\begin{aligned} & & cost^{cap}_t & = \sum_{i}\alpha^i \cdot \bigl(AC^i_t\bigr)^{\beta^i} \end{aligned}$$
(4.35)
$$\begin{aligned} & & cost^{buy}_t & = \widetilde{\kappa_t} \cdot (\widetilde{d_t}-tp_{t})^{\gamma} \end{aligned}$$
(4.36)
$$\begin{aligned} & & c^i_{t} & = c^{i}_{t-1} + AC^i_t \end{aligned}$$
(4.37)
$$\begin{aligned} & & tp_t & = \sum_{i} P^i_{t} \end{aligned}$$
(4.38)
$$\begin{aligned} & & P^i_{t} & \leq c^i_{t} \end{aligned}$$
(4.39)
$$\begin{aligned} & & & \rm{i \in{Technology_1, Technology_2,..., Technology_I}} \end{aligned}$$
(4.40)
$$\begin{aligned} & & & \rm{t \in{Period_1, Period_2,..., Period_T}}\end{aligned}$$
(4.41)

Equation 4.33 sums up the respective costs for operation, capacity expansion, and the option to purchase electricity for meeting demand in case the total available capacity is below demand. The operating costs are calculated using Eq. 4.34, where \(oc^i_t\) is a cost parameter for electricity generation of technology i in time period t, and \(P^i_t\) are decision variables determining how much electricity should be produced using technology/power plant i at time t.

Equation 4.35 determines the cost of capacity expansion. Traditional models use a linear relationship between the cost of expansion \(Cost_t^{cap}\) and the added capacity \(ac^i_t\). Use the data and models from the Integrated Environmental Control Model (ICEM), a computational tool developed for the Department of Energy. This will provide the power law model for more accurate cost estimation (Eq. 4.35). In this formula, \(\alpha_i\) is a proportionality factor for capacity expansion, while \(\beta_i\) provides the exponential factor that allows capital expansion cost to follow economies of scale.

Another nonlinear expression, Eq. 4.36, will be used to determine the cost of electricity purchased, \(Cost^{buy}\), when demand d t exceeds capacity. The power factor γ must be greater than 1 to ensure that relying on external sources is not used as the sole option when increase in demand is expected. This is accomplished as \(Cost^{buy}\) increases exponentially when capacity is significantly below possible demand levels. The primary goal of this approach is to account for the common market practice of purchasing electricity in a deregulated environment when demand reaches peak levels, surpassing available capacity in a given location.

Finally, use Eq. 4.37 to calculate the available capacity at each time step following expansion, Eq. 4.38 calculates the total electricty produced, tp t , and Eq. 4.39 ensures that no power plant can produce more energy than its installed capacity.

In this problem, Technology I is selected as a Cyclone type coal power plant, while Technology II is a Tangential plant. Again, data for these technologies can be obtained using IECM [41] model.

There are five uncertain variables (Table 4.11) and eight decision variables that determine capacity expansion and electricity generation for each technology at each time step.

Table 4.11 Uncertain variables in capacity expansion case

Here, demand growth rate for Period II implies that the total demand in Period I is multiplied by a normally distributed uncertain factor varying between 0.75 and 1.50, while the unit cost of electricity generated through Technology I can vary between -5 and +12 % for the second period. Table 4.12 provides the constants and initial values used for this case study.

Table 4.12 Constants for capacity expansion case

Finally, the preexponential factor for the cost of purchasing electricity, \(\kappa_t\) can be determined as the greater value between the two per unit electricity generation costs for the different technologies, \(oc^1_t\) and \(oc^2_t\).

Solve this problem using traditional SNLP and BONUS .

Fig. 4.8
figure 8

Comparison of optimization progress

Solution

Starting from a system with initial annualized cost of the capacity expansion at $ 760.9K, the system is optimized both via BONUS , as well as exhaustive model runs for derivative estimation through objective function value calculation. The conventional approach converges after five iterations, requiring a total of

$$ 100 \frac{\rm{ model\ calls}}{\rm{derivative\ calc.}} \cdot (8+1) \frac{\rm{derivative\ calc.}}{\rm {iterations}} \cdot (5) {\rm {iteration}} = 4,500 \rm {model\ calls} \notag$$

compared to only 100 models and just three iterations run for the BONUS algorithm (Fig. 4.8). Table 4.13 presents the decision variables and their optimal values found by BONUS .

$$\epsilon_{BONUS<IndexTerm Type="Keyword"> <Primary>BONUS</Primary> </IndexTerm> }=\frac{\rm{(\%\ Mean\ reduction)}_{\rm{BONUS<IndexTerm Type="Keyword"> <Primary>BONUS</Primary> </IndexTerm> }}} {\rm{(\%\ Mean\ reduction)}_{\rm{Model runs}}}=0.867$$
Table 4.13 Decision variables in capacity expansion case

4.4 Summary

In this chapter, we have introduced the BONUS based on the reweighting approach for estimating derivative information needed during optimization of nonlinear stochastic problems. The technique relies on KDE of a base distribution and the sample space encountered during optimization. Two real world case studies; (1) an off-line quality control problem from chemical engineering, and (2) the electricity expansion problem from operations research literature, illustrates efficiency of the technique in determining derivatives, and hence the search directions during optimization loop. Further, by selection of efficient sampling techniques like HSS allows for significant computational improvement, as the repetitive nature of model evaluations is avoided by using the reweighting scheme. The BONUS algorithm is very useful for solving large-scale real-world problems of significance (e.g., for black-box models ) is illustrated in the following three chapters.