Keywords

1 Introduction

This paper is based on a MSc thesis in Mechanical Engineering (University of Trieste) [1]. The main author was a research student at the University and is now employed as an Aerodynamics Methods Engineer at ALSTOM Power UK, facilitating tailored applied research collaboration between the University and the Company. The aim of the research was to evaluate the possibility of employing surrogate modelling methods for turbo-machinery component design optimization. The main idea behind these methods is to replace expensive to compute physical models with surrogate models, in order to speed up the entire design optimization process. These surrogate models are required to be cheap to compute and easy to use, whilst providing an adequate representation of the real problem. Four different surrogate models were employed in this study: Feed-Forward Backpropagation Neural Networks (FFBP NN), Radial Basis Function (RBF) Networks, Kriging models and polynomial models.

In the first part of this paper the surrogate modelling methods will be summarized, providing guidelines and automated procedures for their setting. Surrogate models will subsequently applied to two turbo-machinery case studies.

2 Neural Networks

The FFBP NNs employed in this study are multilayer networks with a single hidden layer. FFBP NNs are characterized by a very complex setting process, due to the high number of parameters to be set and their multi-modal performance function [2]. A critical choice for the neural network is the number of hidden neurons, since it determines the “flexibility” of the model. This parameter is usually chosen directly by the user. Unfortunately, the complexity of the modelled process is usually unknown, and FFBP NNs are tested for different architectures in order to find the best fitting for a particular dataset. This “Trial and Error” procedure is very time consuming, making it desirable to automate the setting of neural networks.

There are two main approaches to design a FFBP NN in an automatic fashion [1]:

  • Constructive Methods

  • Pruning Methods

The pruning methods appear to be the most convincing, since they allow a more tailored neural networks setting than the constructive methods. Two pruning techniques are evaluated in this paper: the Optimal Brain Surgeon (OBS) and the MATLAB trainbr algorithm.

2.1 Optimal Brain Surgeon

The OBS algorithm is a pruning technique developed by Hassibi [3] and implemented by Noorgard [4]. Each pruning session returns a certain number of pruned (partially connected) networks, one for each OBS algorithm iteration. Consequently, a neural network must be chosen according to some criteria, which are provided by Noorgard [4].

In addition, a new criteria was introduced by Badjan [1]:

$$\begin{aligned} \text{ Balanced } \text{ Valid. } \text{ Error } = \text{ Valid. } \text{ Error } + \left\| \text{ Valid. } \text{ Error } - \text{ Train. } \text{ Error }\right\| _{2} \end{aligned}$$
(13.1)

called balanced validation error, which takes into account both the error on the estimation subset and on the validation subset (note that Valid. and Train. error are \(\ge 0\) by definition).

According to this criteria, FFBP NNs with a low validation error that show similar performances on both the estimation and the validation subsets are preferred to the other networks.

2.2 MATLAB trainbr

In MATLAB trainbr [13], the following cost function is implemented:

$$\begin{aligned} {\textit{MSE}}_{reg} = \alpha {\textit{MSW}} + \beta {\textit{MSE}} \end{aligned}$$
(13.2)

where MSE is the mean square error, MSW is the sum of the squares of the network weight and biases, \(\alpha \) and \(\beta \) are regularization parameters.

Minimizing Eq. (13.2) leads to lower values of the network weights and biases, making the network response smoother and less prone to overfit. In fact, assigning low values to the free parameters may be viewed as equivalent to pruning the neural network.

2.3 Dynamic Threshold Neural Networks

The Dynamic Threshold Neural Network (DTNN) was originally proposed by Chiang and Fu [5] for pattern recognition purposes, but it was also successfully applied to function approximation problems by Pediroda [6] and Poloni et al. [7]. The DTNN was designed to employ Static Threshold Quadratic Sigmoidal Neurons in the hidden layer and Dynamic Threshold Quadratic Sigmoidal Neurons in the output layer.

This network configuration produces outputs in the range \([0,1]\), which is adequate for pattern recognition purposes, but it could represent a limitation for function approximation purposes. In the view of the Authors, having an output range limited between two fixed values implies that the training-set contains both the minima and the maxima of the objective function. If the training-set targets are normalized between \([0,1]\), then other new input configurations will always produce target values included between \([0,1]\). An example may illustrate the concept: considering a training-set with the maximum objective function value \(12\) and the minimum objective function value \(-5\), after the data normalization \(12\) will correspond to \(1\) and \(-5\) to \(0\); if there is a maxima (or minima) somewhere in the input domain with a value \(15\) \((-7)\), then the corresponding output will be again \(1\) \((0)\). It can be noticed that even if the objective value is saturated, the input configuration might represent the true maxima (minima). In any case, no robust analysis could be performed using the surrogate model on that point, since it would not be possible to approximate in a proper way the shape of the objective function in the saturated zone.

For these reasons, in this study it was decided to rearrange the architecture of the DTNN, employing the Dynamic Threshold Quadratic Sigmoidal neurons directly in the hidden layer and the standard linear transformation in the output layer, thereby removing the output limits.

The setting process of the DTNN presents some differences with respect to classic FFBP NNs. In fact, fewer neurons are generally required to fit a dataset, since they have a higher approximation capability than the neurons of classic FFBP NNs [5]. It was therefore decided to use a constructive methodology to train this type of network.

The proposed setting process for DTNNs consists in:

  1. 1.

    Set the DTNN with \(n\) hidden neurons.

  2. 2.

    Train the network m times from different initial configurations.

  3. 3.

    Check the performance on the validation subset.

  4. 4.

    Set a new DTNN with \(n+1\) hidden neurons.

  5. 5.

    Train the new network a couple of times and check the performance on the validation subset.

  6. 6.

    If the validation error increases stop the procedure, otherwise go to point 4.

3 RBF Networks

In this paper, Gaussian, multiquadrics and inverse-multiquadrics functions were chosen to build RBF Networks, since they have a shape parameter \(\sigma \) used to control the domain of influence of the radial basis function. There are various strategies in the literature for selecting an appropriate value for the shape parameter \(\sigma \). The leave-one-out (LOO) error is a well known criteria for setting RBF Networks. However, the computational cost can be very high, of order \(O(N^4)\), which becomes prohibitively expensive even for problems of modest size. Fortunately, Rippa [8] proposed a technique to reduce the computational cost of the LOO metric to \(O(N^3)\), which was here implemented.

Based on the LOO error, an iterative procedure to select the optimal shape parameter for interpolating RBF Network is proposed in this paper:

  1. 1.

    Initialize \(\sigma \) to 1 and evaluate the LOO error.

  2. 2.

    Set \(\sigma _{new}\) to 0.5 and evaluate the corresponding LOO error.

  3. 3.

    If LOO \(\sigma _{new}\) \(<\) LOO \(\sigma \) then \(\sigma \) = \(\sigma _{new}\) and \(\sigma _{new}\) = \(\sigma /a\), otherwise \(\sigma \) = \(\sigma _{new}\) and \(\sigma _{new} = \sigma * b\).

  4. 4.

    Evaluate the LOO error for the RBF Network set with \(\sigma _{new}\).

  5. 5.

    Repeat the procedure from point 3 until the maximum number of iterations is reached.

  6. 6.

    Return the RBF Network which scored the minimum LOO error.

The parameters \(a,b\) can be set by the user, determining how much \(\sigma \) is increased or decreased at each iteration. In addition, there is also the possibility to vary these parameters during the iterations, to gradually reduce or increase the step size of the shape parameter. The Authors suggest to set \(a = 1.5\) and \(b = 1.8\). The maximum number of iterations should take into account the time required to solve a single LOO measure. However, 20 iterations should be an appropriate number for the majority of the problems.

RBF Networks can also perform a regression of the data, introducing the regularization parameter \(\lambda \) in a similar way as for the Kriging model [9]. Keane and Nair [10] suggest to set \(\lambda \) to the variance of the noise in the response data, but since this information in usually unknown. The remaining option is to add it to the list of parameters to be estimated. In this study, both the shape parameter \(\sigma \) and the regularization parameter \(\lambda \) were searched throughout their domain using a Genetic Algorithm (GA). Suitable upper and lower bounds for the search of \(\lambda \) are \(10^{-6}\) and \(1\) respectively [10].

4 Polynomial Models

Polynomial models can be applied to multi-dimensional problems taking into account interaction terms [9]. In this paper, optimal values for global and interaction orders are found by applying cross-validation.

5 Kriging Models

Kriging models are powerful methods based on Gaussian processes. They can perform either interpolation or regression of data. In this paper, Kriging models are set via maximizing the marginal likelihood function [9].

6 Assessment Criteria for Surrogate Models

If the observational data are abundant, a randomly selected subset (Hastie et al. [11] recommend around the \(25\,\%\) of the total \({{\varvec{x}}} \rightarrow y\) pairs) should be set aside for model testing purposes. These observations must not be touched during the previous stages, as their sole purpose is to allow us to evaluate the testing error (based on the difference between the true and approximated function values at the test sites) once the model has been built. Standard assessment criteria for surrogate models are Normalized Root Mean Square Error (NRMSE) and Coefficient of Determination (\(r^2\)). According to [9], good surrogate models should have \(\mathrm{{NRMSE}}\, {<}10\) % and \(r^{2} > 0.8\).

Furthermore, a new criteria is introduced in this paper, called RANKING [1], the aim of which is to assess the capability of surrogate models to replicate the trend expressed by the underlying function.

The RANKING is evaluated using the following procedure:

  1. 1.

    Sort the true solutions of a particular dataset in ascending order.

  2. 2.

    Check if the corresponding approximated solutions increase their values monotonically.

  3. 3.

    A score of 1 is given to the solutions that increase step by step, referring to the previous highest value (absolute RANKING).

  4. 4.

    Finally the score is divided by the number of points in the dataset and multiplied by 100.

A numerical example may illustrate the steps:

  • Assuming the following true solutions \(y\): \([12,43,2,33,30,31]\) and the surrogate model approximations \(\widehat{y}\): \([10,45,3,32,35,31]\).

  • Now sorting the true solutions \(y\) in an ascending order: \([2,12,30,31,33,43]\) with the corresponding original index: \([3,1,5,6,4,2]\).

  • Then the corresponding approximation \(\widehat{y}\) will be: \([3,10,35,31,32,45]\).

  • The scores for each point are \([1,1,1,0,0,1]\) and their sum is 4, it should be noticed that this metric is done on the absolute ascending order.

    $$\begin{aligned} \text{ RANKING } = \frac{4}{6}100 = 67\,\% \end{aligned}$$
    (13.3)

The higher the value of the RANKING, the better the surrogate model can follow the underlying response trend. However, this criteria in isolation is insufficient to determine the overall accuracy of the model.

7 Optimization Case Studies

Two optimization case studies were chosen to evaluate the application of surrogate models in turbo-machinery component design optimization:

  • Mono-objective optimization of the operating conditions of a turbine cascade.

  • Mono-objective optimization of a turbine labyrinth seal.

The optimization procedure consisted in two consecutive steps:

  1. 1.

    Global search of the optima, over all the design space.

  2. 2.

    Local search of the optima, refining the result obtained from the global search.

This optimization strategy combines both robustness and accuracy.

In particular, a Genetic Algorithm was chosen as the global optimizer, since it is a robust and reliable algorithm widely used in optimization [7, 12]. The subsequent local optimization was done using the Sequential Quadratic Programming (SQP) method [13].

7.1 Turbine Cascade Case

The performance of a steam turbine cascade [14] was analyzed for different operating conditions employing an ALSTOM in-house CFD code.

The design space was defined by three input variables:

  • Incidence Angle

  • Inlet Total Pressure (for adjusting Mach Number)

  • Fluid Viscosity (for adjusting Reynolds Number)

The objective of the optimization was to maximize the efficiency of the turbine profile. A dataset of 150 points was obtained running an Optimized Latin Hypercube DOE. Each simulation took about three minutes on a PC (Quad Core CPU running @ 2.66 Ghz, 3.25 GB RAM). Afterwards, the dataset was normalized in the range \([-1,1]\) and randomly split into a training-set of 120 points and a test-set of 30 points.

7.1.1 Performance of Surrogate Models

Different setting approaches were adopted for each type of surrogate model. Five FFBP NNs were created using the trainbr algorithm and the OBS technique. It is worth reminding that FFBP NNs have a multi-modal performance function, therefore finding the best network configuration for a particular dataset usually requires to train the network from different initial weights/biases configurations. The same concept applies to the OBS technique, since the setting of the first oversized neural network influences the results of the subsequent pruning process. The validation subset was the 20 % of the training-set. The DTNN was built finding the optimal number of hidden neurons via “trial and error” procedure. Eventually, five DTNNs were created with the optimal architecture. As described for neural networks, five polynomial models were built using cross-validation with 10 subsets, in order to investigate how the random splitting affects the setting process. The same global order and interaction order were obtained for all five models, confirming that cross-validation is a robust procedure to set polynomial models. The interpolating RBF Networks were built only once, using the iterative procedure previously described in Sect. 13.3, with 20 steps. However, the setting procedure for regressive RBF Networks was different. In fact, these models were tuned using the GA, which was set with a population of 30 individuals and 30 generations. In this case, five regressive RBF Networks were built for each basis function, resulting in broadly similar performance. Finally, five Kriging models were also built. The likelihood function employed to build the Kriging models was optimized setting the GA with a population of 50 individuals and 100 generations. The adopted setting configuration produced almost identical models, the small differences were related to the GA obtaining only the neighbourhood of the maximum as opposed to maximum of the likelihood function.

For each type of surrogate model, only the best performing model was chosen for the comparison summarized below.

Table 13.1 Surrogate models performance on the blade test-set
Table 13.2 Optimized and validated results for the blade study case

It can be noticed from Table 13.1 that all the models performed very well, with low values for the NRMSE and high values for \(r^{2}\) and RANKING.

Once the surrogate models were built, then it was possible to use them to evaluate all the other input configurations required by the optimizer algorithm. The constraints of the optimization were represented by the design space boundaries, which were defined by the highest and lowest values of each input variable in the DOE.

As can be seen in Table 13.2, all the surrogate models gave very similar validated solutions. In particular, the DTNN produced the best results, which was chosen to plot a graphic representation of the problem, fixing the viscosity to its lowest value in the dataset (see Fig. 13.1). It is also interesting to note that the DTNN had the best RANKING score on the test-set. However, almost the same validated solution was obtained with the Regressive Kriging, which was also definitely far quicker to set than DTNNs and FFBP NNs. Polynomial model and RBF Networks were also quicker and easier to set than neural networks. In the opinion of the authors, the simplicity of the setting process should be always considered in the assessment of surrogate modelling methods. In practical applications, a quick-to-set surrogate model should be preferred to other models with time consuming and non-robust setting processes, especially when the results are almost the same, as in this case.

Fig. 13.1
figure 1

Blade case—DTNN

Finally, it should be considered that the efficiency improvements obtained from the initial DOE were small from the numerical point of view, but very important in engineering design.

7.2 Turbine Seal Case

The leakage of a labyrinth seal of the high-pressure stage of a steam turbine was evaluated via CFD simulations, which were performed with the commercial code ANSYS FLUENT [15]. Seven geometric parameters were originally chosen in order to determine the key variables for prediction of leakage, such as fin height, thickness, angle, etc. These input variables were screened using full-factorial DOEs and Pareto Charts (based on polynomial regression). Finally four top parameters were selected to be included in the surrogate modelling and subsequent model based optimization (minimization) of seal leakage:

  • \(a_1\), Angle parameter

  • \(a_2\), Angle parameter

  • \(L_1\), Length parameter

  • \(L_2\), Length parameter

A full-factorial DOE of 5 levels per variable (resulting in \(5^{4} = 625\) points) was originally planned to investigate the problem, but some simulations failed due to technical issues in the CFD solver, obtaining a reduced dataset composed of 517 points. Each simulation took about eight minutes on a PC (12 Core CPU running @ 2.92 Ghz, 24 GB RAM). The dataset was randomly split into a training-set of 414 points and test-set of 103 points. The surrogate models were built adopting the same methodology employed for the turbine cascade case.

The subsequent optimizations were run setting the GA with 100 individuals and 30 generations, and allowing a maximum of 30 iterations for the local optimizer. As for the turbine cascade case, the design space boundaries were defined by the highest and lowest values of each variable in the DOE.

Table 13.3 Surrogate models performance on the seal test-set

All the values shown in the tables and pictures regarding the seal case were normalized in the range \([-1,1]\), for the purpose of protecting commercially sensitive information.

7.2.1 Performance of Surrogate Models

Table 13.3 shows that the surrogate models did not perform very well in this case, scoring high NRMSE values and low \(r^2\) values. Also the scores for the RANKING were very low. In the opinion of the authors, the RANKING criteria should be applied to the cases where surrogate models perform very similarly, as for the blade case.

The validated solutions were generally worse than the “best solution” contained in the DOE (which was equal to \(-\)1 following normalization). However, very small reductions of the leakage were obtained with RBF Networks (surrogate model based optimal solution \(-\)1.001, validated CFD code solution \(-\)1.0067), but not enough to consider the optimization a success.

It was also found that most of the optimized solutions were found for the lowest value of \(a_2\) and \(L_2\). Recalling that a generic full-factorial DOE consists in a multi-dimensional grid, Fig. 13.2 was generated fixing some variables and using MATLAB cubic spline interpolation [13].

Fig. 13.2
figure 2

Full factorial cubic spline interpolations: \(a_2\) and \(L_2\) are fixed to different values

As can be seen from Fig. 13.2, the underlying function shows very different scenarios varying the values of the same fixed variables, making it difficult to be modelled even with a full-factorial DOE of 517 points. This behavior is probably due to the fact that the input variables are highly correlated. However, it should be noted that the cubic spline interpolation does not correspond to the true function, which is obviously unknown, and the underlying function might be even more complex.

Fig. 13.3
figure 3

FFBP NN OBS pruned versus full factorial cubic spline interpolation, \(a_2 = -1\) and \(L_2 = -1\)

Fig. 13.4
figure 4

FFBP NN trainbr versus full factorial cubic spline interpolation, \(a_2 = -1\) and \(L_2 = -1\)

In addition, Figs. 13.3 and 13.4 show a comparison between some surrogate models and the corresponding full factorial cubic spline. It is clear from Fig. 13.3 that FFBP NNs pruned with OBS overfitted the data. In fact, it appeared that the high flexible structure of FFBP NNs can fit the data in a large variety of ways (i.e. with very different configurations for weights/biases), generating approximations with good values for NRMSE and \(r^2\) but also with very strange shapes. On the contrary, the FFBP NN trained with trainbr gave a good representation of the problem, as can be seen in Fig. 13.4. In fact, the trainbr algorithm increases the level of regression of neural networks, making them smooth and less prone to overfit [16]. RBF Networks and Kriging showed less flexibility than neural networks, since their structure is directly anchored to the points in the dataset.

After these observations, it was decided to adopt a different strategy for the seal case, aimed to obtain an optimized solution similar to the best solution contained in the full-factorial DOE of 517 points, but using less CFD computations.

7.2.2 Further Investigation with Alternative Datasets

An Optimized Latin Hypercube DOE of 100 points was run with the objective to gather information over all the design space using less points. Again, some points failed to produce a result, obtaining a reduced dataset composed of 94 points. Surrogate models were built with the new dataset, employing all the points as a training-set.

The new validated solutions were not better than the solutions obtained in the previous optimization. However, it can be noticed that the majority of the regressive models gave again the optimized solutions for the lowest value of \(a_2\) and \(L_2\).

It was therefore decided to run a further Optimized Latin Hypercube DOE of 50 points with \(a_2\) and \(L_2\) fixed to their lowest value, in order to reduce the dimensionality of the problem. Finally a dataset composed of 47 points was obtained. The surrogate models were built employing all the 47 points as the training-set.

Table 13.4 Optimized and validated results for the seal study case using the dataset composed of 47 points

As can be seen in Table 13.4, the Kriging model and the Regressive RBF Gaussian Network improved the best solution contained in the first dataset of 517 points. Thus, the computational budget was reduced from 600 points to 150 points. Unfortunately the FLUENT solver failed to converge at the optimized solution of the Regressive Kriging.

In addition, a further full-factorial DOE of 400 points (20 levels per variable) was run fixing \(a_2\) and \(L_2\) to their lowest value, in order to investigate in detail the morphology of the underlying function. The final dataset was composed of 384 points (with 16 points failing to converge).

Fig. 13.5
figure 5

Regressive Kriging versus FF interpolation, DOE 50 points

Fig. 13.6
figure 6

Polynomial model versus FF interpolation, DOE 50 points

As can be seen in Figs. 13.5 and 13.6, regressive models developed with 47 points were capable to detect the general trends of the real problem, which was highly irregular with many peaks (see the full factorial cubic spline interpolations). In particular, the Polynomial model was able to define the “borders” of the underlying function very well. On the contrary, FFBP NN pruned with OBS and DTNN overfitted the data.

In engineering design, the visualization of a problem is extremely useful, since it can provide an indication of promising regions that may yield a robust optimal design. Flat zones with stable performance will be preferred to peaky zones, where small variations of the input variables lead to high variation of the output. For example, the blue valley in Fig. 13.6 represents a stable zone, where small variations of \(a_1\) and \(L_1\) do not particularly affect the leakage. In fact, the geometric parameters defining a labyrinth seal are subject to manufacturing tolerances. Thus, it is clear that an important aspect in industrial design is managing the uncertainties, to find solutions which are insensitive to the stochastic fluctuations of the parameters (Robust Design) [10, 17].

Summarizing, the seal case was significantly more challenging than the blade case. FFBP NNs pruned with OBS and DTNNs performed poorly, overfitting the surface. Instead, the FFBP NNs trained with trainbr were able to detect the main trends of the underlying function. However, other surrogate models such as RBF Networks, Kriging and Polynomial, gave better results with less training. In particular, the Kriging model produced the best numerical result and the Polynomial model gave the best representation.

In addition, it appeared that a good strategy in optimization assisted by surrogate models may consist in:

  1. 1.

    Run a small global DOE, according to the available computational budget.

  2. 2.

    Build regressive surrogate models and visualize the problem where possible.

  3. 3.

    Validate the optimized solutions.

  4. 4.

    Evaluate the possibility of reducing the dimensionality of the problem, or at least to define a small promising zone in the domain.

  5. 5.

    Run a reduced/local DOE, according to the available computational budget.

  6. 6.

    Validate the new optimized solutions.

8 Conclusion

This paper has demonstrated the utility of Surrogate Models in turbo-machinery design optimization. In the first instance, different surrogate modelling methods should be used when dealing with unknown problems, in order to find the model that best fits a particular dataset. In addition, surrogate models should be assessed on the basis of their ease of configuration. From this point of view, FFBP NNs and DTNNs present too many drawbacks to be considered a valid methodology in turbo-machinery component design optimization. They did not show any clear advantage compared to other methodologies in terms of accuracy, but their setting process presented many issues. However, neural networks are widely applied in control engineering and signal processing, where their flexibility represents a benefit in modelling of dynamic systems.

Finally for the considered case studies, Kriging models were assessed as being the most promising surrogate model among those evaluated in this paper, combining high performance with a relatively easy setting process.