Combination of GPU Programming and FEM Analysis in Structural Optimisation

Nagy, Szilárd; Jármai, Károly; Baksa, Attila

doi:10.1007/978-3-031-15211-5_63

Part of the book series: Lecture Notes in Mechanical Engineering ((LNME))

Included in the following conference series:

Vehicle and Automotive Engineering

1750 Accesses

Abstract

GPUs no longer only support graphical applications and gaming. These are becoming cheap and powerful tools for scientific and general-purpose computations. They provide a massively parallel environment with the support of a single instruction multiple data (SIMD) programming model. Making finite element calculations is also a time-consuming process in some cases due to many elements or a large degree of freedom. The FEM simulation is essential to check the analytical or measured mechanical stresses, deformations, etc. In making structural optimisation, one needs several iterations and combining the optimisation with FEM, increasing the calculation time. GPU programming is a good solution for this. In the article, we show the applicability of the combination of GPU, optimisation, and FEM simulation.

Access provided by Autonomous University of Puebla. Download conference paper PDF

GPGPU-based parallel computing applied in the FEM using the conjugate gradient algorithm: a review

Article 22 June 2018

GPU Architecture

ginSODA: massive parallel integration of stiff ODE systems on GPUs

Article 24 August 2018

Keywords

1 Introduction

Nowadays, graphic cards (video cards, GPU) are cheap and efficient hardware for general-purpose parallel computation. They are used for scientific computations, for topology optimisation [1] or structural optimisation [2], and for manufacturing technologies. They provide a massively parallel environment with the support of a single instruction multiple data (SIMD) programming model. Nowadays, larger software vendors – such as MathWorks – are increasingly developing frameworks based on the CUDA API (application programming interface) that offer more convenient and user-friendly tools than the original CUDA Runtime API.

Nature-inspired, population-based, iterative, evolutionary algorithms – such as flower pollination algorithm [3], particle swarm optimisation [4], firefly algorithm [5], etc. – are powerful numerical optimisation methods. Their importance and effectiveness are underlined by the fact that they are used in several places in vehicle research to design optimal aerodynamics for UAVs (unmanned aerial vehicles) [6], for performance optimisation of formula vehicles [7], for optimising the manufacturing of vehicles [8] etc.

The finite element method (FEM) is a universal tool for analysing structures and determines mechanical stress and deformations inside the structure. In this paper, we connect an evolutionary algorithm – differential evolution – with FEM. The method is presented through the optimisation of the truss structure. Computational capacity is demanded by both the evolutionary method and FEM. Therefore, we present a possible parallelisation using MATLAB software and the obtained results.

2 Differential Evolution

Stron and Price introduced original differential evolution (DE) in [1]. DE improves the ${n}_{D}$ dimensional ${\varvec{x}}$ individuals of ${n}_{p}$ element population through a series of iteration steps

$${\varvec{x}}={\left[\begin{array}{ccccc}{x}_{1}& {x}_{2}& {x}_{3}& \cdots & {x}_{{n}_{D}}\end{array}\right]}^{T}\in S\subset {\mathbb{R}}^{{n}_{D}}$$

(1)

where $S$ is searching space. Ideally, the initial population randomly covers the entire search space. Each variable in an individual is a uniformly distributed random number in the search space.

DE generates the new entity in each iteration step by performing three operations repeatedly. These are called mutation, crossover, and selection operations [9].

During the mutation operation, a ${}^{G}{{\varvec{v}}}_{i}$ mutant is generated for each ${}^{G}{{\varvec{x}}}_{i}$ individual of $G$ generation using one of the following five strategies [10]:

DE/rand/1:
$${}^{G}{{\varvec{v}}}_{i}={}{}^{G}{{\varvec{x}}}_{{r}_{1}}+F\left({}{}^{G}{{\varvec{x}}}_{{r}_{2}}-{}{}^{G}{{\varvec{x}}}_{{r}_{3}}\right)$$
(2)
DE/best/1:
$${}^{G}{{\varvec{v}}}_{i}={}_{ }{}^{G}{{\varvec{x}}}_{b}+F\left({}_{ }{}^{G}{{\varvec{x}}}_{{r}_{1}}-{}_{ }{}^{G}{{\varvec{x}}}_{{r}_{2}}\right)$$
(3)
DE/current to best/2:
$${}^{G}{{\varvec{v}}}_{i}={}{}^{G}{{\varvec{x}}}_{i}+F\left({}{}^{G}{{\varvec{x}}}_{b}-{}{}^{G}{{\varvec{x}}}_{i}\right)+F\left({}{}^{G}{{\varvec{x}}}_{{r}_{1}}-{}{}^{G}{{\varvec{x}}}_{{r}_{2}}\right)$$
(4)
DE/best/2:
$${}^{G}{{\varvec{v}}}_{i}={}{}^{G}{{\varvec{x}}}_{b}+F\left({}{}^{G}{{\varvec{x}}}_{{r}_{1}}-{}{}^{G}{{\varvec{x}}}_{{r}_{2}}\right)+F\left({}{}^{G}{{\varvec{x}}}_{{r}_{3}}-{}{}^{G}{{\varvec{x}}}_{{r}_{4}}\right)$$
(5)
DE/rand/2:
$${}^{G}{{\varvec{v}}}_{i}={}{}^{G}{{\varvec{x}}}_{{r}_{1}}+F\left({}{}^{G}{{\varvec{x}}}_{{r}_{2}}-{}{}^{G}{{\varvec{x}}}_{{r}_{3}}\right)+F\left({}{}^{G}{{\varvec{x}}}_{{r}_{4}}-{}{}^{G}{{\varvec{x}}}_{{r}_{5}}\right)$$
(6)
where ${r}_{1}\ne {r}_{2}\ne {r}_{3}\ne {r}_{4}\ne {r}_{5}\in \left[1,{n}_{p}\right]$ are random indices, $F\in [\mathrm{0,2})$ is the scaling factor, and ${}^{G}{{\varvec{x}}}_{b}$ is individual with the best fitness value in each $G$ generation.

The mutation is followed by a “binomial” crossover operation that combines the newly created ${}^{G}{{\varvec{v}}}_{i}$ mutant with the ${}^{G}{{\varvec{x}}}_{i}$ individual

$${}^{G}{{\varvec{u}}}_{j,i}=\left\{\begin{array}{cc}{}{}^{G}{{\varvec{v}}}_{j,i}& {U}_{j}\left(\mathrm{0,1}\right)\le {C}_{R}\, \mathrm{or}\, j={j}_{R}\\ {}{}^{G}{{\varvec{x}}}_{j,i}& \mathrm{otherwise}\end{array}\right.$$

(7)

where ${U}_{j}(\mathrm{0,1})\in [\mathrm{0,1})$ is uniformly distributed random number, ${C}_{R}\in [\mathrm{0,1})$ is the crossover rate, and ${j}_{R}\in [1,{n}_{D}]$ is a random index.

During selection, if the fitness value of the newly generated ${}^{G}{{\varvec{u}}}_{i}$ is better than that of the ${}^{G}{{\varvec{x}}}_{i}$, it will be included in the new generation; if not, the algorithm drops it

$${}^{G+1}{{\varvec{x}}}_{i}=\left\{\begin{array}{cc}{}{}^{G}{{\varvec{u}}}_{i}& \mathcal{F}({}{}^{G}{{\varvec{u}}}_{i})\le \mathcal{F}({}{}^{G}{{\varvec{x}}}_{i})\\ {}{}^{G}{{\varvec{x}}}_{i}& \mathrm{otherwise}\end{array}\right.$$

(8)

The operation of differential evolution, and hence the success of the optimisation, is greatly influenced by the mutation strategy chosen, the value of the scaling factor $F$, and the ${C}_{R}$ crossing ratio.

3 Finite Element Model of Truss Structure

The connection between members of tubular trusses is frequently modelled as pin connection inelastic analysis. The preferred value of eccentricities of the intersection of member’s center lines is [1, 12].

$$e\le 0.25D\, \mathrm{or}\, e\le 0.25{H}_{0}$$

(9)

where $e$ is eccentricity, $D$ is the outside diameter of a circular hollow section, and ${H}_{0}$ is a typical size of rectangular hollow section. In this case, primary bending moments are produced by these eccentricities. Excessive moments are generated in brace members when rigid connections are considered. Usage of these is not recommended also for welded joints [1, 12]. The axial force distribution in a rigid joint is like pinned joint.

The structure could be analysed with the pushed-pulled element model (shortly in the following rod or truss model) with finite element methods (FEM) if the condition of inequality (9) is met. In most cases, it is sufficient to examine the structure in a plane relevant to the load. If this was insufficient and the spatial analysis had to be performed, the presented method could be easily adapted to a spatial case. In this paper, we will only discuss planar problems.

The model of the truss element is shown in Fig. 1. An approximation of the displacement within the rod element with a kinematically admissible function [13]

$${}^{e}u\left(\xi \right)=\left[\begin{array}{cc}\frac{{\xi }_{i}-\xi }{{}{}^{e}L}& \frac{{\xi }_{j}-\xi }{{}{}^{e}L}\end{array}\right]\left[\begin{array}{c}{}{}^{e}{u}_{i}^{^{\prime}}\\ {}{}^{e}{u}_{j}^{^{\prime}}\end{array}\right]=\left[\begin{array}{cc}{}{}^{e}{N}_{i}(\xi )& {}{}^{e}{N}_{j}(\xi )\end{array}\right]\left[\begin{array}{c}{}{}^{e}{u}_{i}^{^{\prime}}\\ {}{}^{e}{u}_{j}^{^{\prime}}\end{array}\right]={}{}^{e}{\varvec{N}} {}{}^{e}{{\varvec{u}}}^{^{\prime}}$$

(10)

where ${}^{e}L$ is the length of the rod element, ${}^{e}{\varvec{N}}$ is the matrix of shape functions, and ${}^{e}{{\varvec{u}}}^{^{\prime}}$ is the vector of nodal displacement interpreted in element connected $\xi$ coordinate system. In the global $x-y$ coordinate system, nodal displacements could be described in the following form

$${}^{e}{\varvec{u}}={\left[\begin{array}{cccc}{}_{ }{}^{e}{u}_{ix}& {}_{ }{}^{e}{u}_{iy}& {}_{ }{}^{e}{u}_{jx}& {}_{ }{}^{e}{u}_{jy}\end{array}\right]}^{T}.$$

(11)

The transformation between the two-coordinate system could be made with the transformation matrix

$${}^{e}{\varvec{T}}=\left[\begin{array}{cccc}{}{}^{e}{T}_{11}& {}{}^{e}{T}_{12}& 0& 0\\ 0& 0& {}{}^{e}{T}_{23}& {}{}^{e}{T}_{24}\end{array}\right]$$

(12)

where

$${}^{e}{T}_{11}={}_{ }{}^{e}{T}_{23}=\frac{{}_{ }{}^{e}{u}_{jx}-{}_{ }{}^{e}{u}_{ix}}{{}_{ }{}^{e}L}\, \mathrm{ and }\, {}^{e}{T}_{12}={}_{ }{}^{e}{T}_{24}=\frac{{}_{ }{}^{e}{u}_{jy}-{}_{ }{}^{e}{u}_{iy}}{{}_{ }{}^{e}L}$$

(13)

$${}^{e}{{\varvec{u}}}^{^{\prime}}={}{}^{e}{\varvec{T}} {}{}^{e}{\varvec{u}}$$

(14)

Elongation of truss element is

$${}^{e}\varepsilon =\frac{d{}{}^{e}u(\xi )}{d\xi }=\frac{1}{{}{}^{e}L}\left[\begin{array}{cc}-1& 1\end{array}\right]{}{}^{e}{{\varvec{u}}}^{^{\prime}}$$

(15)

and stress in the axial direction is

$${}^{e}\sigma =E{}{}^{e}\varepsilon =\frac{E}{{}{}^{e}L}\left[\begin{array}{cc}-1& 1\end{array}\right]{}{}^{e}{{\varvec{u}}}^{^{\prime}}$$

(16)

where $E$ is elastic modulus. The strain energy of truss element with ${}^{e}A$ cross-sectional area is

$${{}^{e}U}=\frac{1}{2}{\int }_{L}{}{}^{e}A {}{}^{e}\sigma {}{}^{e}\varepsilon d\xi =\frac{1}{2}{}{}^{e}{{\varvec{u}}}^{{^{\prime}}T}\frac{{}{}^{e}AE}{{}{}^{e}L}\left[\begin{array}{cc}1& -1\\ -1& 1\end{array}\right]{}{}^{e}{{\varvec{u}}}^{^{\prime}}=\frac{1}{2}{}{}^{e}{{\varvec{u}}}^{{^{\prime}}T} {}{}^{e}{{\varvec{K}}}^{^{\prime}} {}{}^{e}{{\varvec{u}}}^{^{\prime}}$$

(17)

where ${}^{e}{{\varvec{K}}}^{^{\prime}}$ is the stiffness matrix of the element. The work of external forces is

$${{}^{e}W}={\int }_{L}{}{}^{e}u\left(\xi \right)p d\xi ={}{}^{e}{{\varvec{u}}}^{{^{\prime}}T} {}{}^{e}{{\varvec{f}}}^{^{\prime}}$$

(18)

where ${}^{e}{{\varvec{f}}}^{^{\prime}}$ is the vector of external forces reduced to nodes. The total potential energy of one element could be written in the following form

$${}^{e}{{\Pi }_{p}}^{ }={{}_{ }{}^{e}U}^{ }-{{}_{ }{}^{e}W}^{ }=\frac{1}{2}{}_{ }{}^{e}{{\varvec{u}}}^{\mathrm{^{\prime}}T} {}_{ }{}^{e}{{\varvec{K}}}^{\mathrm{^{\prime}}} {}_{ }{}^{e}{{\varvec{u}}}^{ }-{}_{ }{}^{e}{{\varvec{u}}}^{\mathrm{^{\prime}}T} {}_{ }{}^{e}{{\varvec{f}}}^{\mathrm{^{\prime}}}$$

(19)

It could be rewritten with quantities, which are introduced in the global coordinate system

$${}^{e}{\Pi }_{p}=\frac{1}{2}{}{}^{e}{{\varvec{u}}}^{T} {}{}^{e}{\varvec{K}} {}{}^{e}{{\varvec{u}}}-{}{}^{e}{{\varvec{u}}}^{T} {}{}^{e}{{\varvec{f}}}$$

(20)

where

$${}^{e}{\varvec{K}}={}_{ }{}^{e}{{\varvec{T}}}^{T} {}_{ }{}^{e}{{\varvec{K}}}^{\mathrm{^{\prime}}} {}_{ }{}^{e}{\varvec{T}}\, \mathrm{ and }\, {}^{e}{\varvec{f}}={}_{ }{}^{e}{{\varvec{T}}}^{T} {}_{ }{}^{e}{{\varvec{f}}}^{\mathrm{^{\prime}}}$$

(21)

Introducing the ${\varvec{u}}$ all node displacement vectors and the ${\varvec{f}}$ all node load vectors as the total potential energy of the whole structure is

$${\Pi }_{p}=\frac{1}{2}{{\varvec{u}}}^{T}\left({\varvec{K}}{\varvec{u}}-{\varvec{f}}\right)$$

(22)

where ${\varvec{K}}$ stiffness matrix of the complete structure according to the rules of element alignment, which is detailed described in [13, 14].

Many truss structures are built from different rods with different cross-sectional properties. These rods could be grouped by $AE$ product. From the stiffens ${\varvec{K}}$ matrix introduced initially in (22), these $AE$ product can be extracted by cross-sectional groups

$${\varvec{K}}={{\varvec{A}}}_{1}{{\varvec{E}}}_{1}{{\varvec{K}}}_{1}+{{\varvec{A}}}_{2}{{\varvec{E}}}_{2}{{\varvec{K}}}_{2}+\cdots +{{\varvec{A}}}_{{\varvec{i}}}{{\varvec{E}}}_{{\varvec{i}}}{{\varvec{K}}}_{{\varvec{i}}}+\cdots +{{\varvec{A}}}_{{{\varvec{n}}}_{{\varvec{G}}}}{{\varvec{E}}}_{{{\varvec{n}}}_{{\varvec{G}}}}{{\varvec{K}}}_{{{\varvec{n}}}_{{\varvec{G}}}}=\sum\nolimits_{{\varvec{i}}=1}^{{{\varvec{n}}}_{{\varvec{G}}}}{{\varvec{A}}}_{{\varvec{i}}}{{\varvec{E}}}_{{\varvec{i}}}{{\varvec{K}}}_{{\varvec{i}}}$$

(23)

where ${n}_{G}$ is the number of cross-sectional groups, and ${{\varvec{K}}}_{{\varvec{i}}}$ is stiffness matrix of ${i}^{th}$ group. If the unknown quantities of the optimisation are typical cross-section dimensions (for example, D outside diameter and t wall thickness for circular hollow section), pre-processing of FEM is enough to do it once before the first iteration step of optimisation.

According to the principle of minimum total potential energy [15, 16], the $\delta\Pi$ first variation of $\Pi$ total potential energy is zero. After applying boundary conditions, we get an algebraic equation system of FEM

$$\delta {\Pi }_{p}=\delta {{\varvec{u}}}^{T}\frac{\partial {\Pi }_{p}}{\partial {\varvec{u}}}=\boldsymbol{ }\delta {{\varvec{u}}}^{T}\left({\varvec{K}}{\varvec{u}}-{\varvec{f}}\right)=0$$

(24)

$${\varvec{u}}={{\varvec{K}}}^{-1} {\varvec{f}}.$$

(25)

Post-processing of the result of Eq. (25) is necessary for further calculations. Axial stress of elements could be determined by

$${}^{e}\sigma =\frac{{}{}^{e}E}{{}{}^{e}L}\left[\begin{array}{cccc}-{}{}^{e}{T}_{11}& -{}{}^{e}{T}_{12}& {}{}^{e}{T}_{11}& {}{}^{e}{T}_{12}\end{array}\right]{}{}^{e}{\varvec{u}}$$

(26)

4 The Optimisation Problem

Optimisation of truss structures are constrained optimisation problem

$$\begin{array}{cc}\mathrm{min}. f({\varvec{x}})& {\varvec{x}}={\left[\begin{array}{cccc}{x}_{1}& {x}_{2}& \cdots & {x}_{D}\end{array}\right]}^{T}\in {\mathbb{R}}\\ {g}_{i}({\varvec{x}})\le 1& 1\le i\le q\\ {h}_{j}\left({\varvec{x}}\right)=0& 1\le j\le r\end{array}$$

(27)

where ${\varvec{x}}$ is the vector of unknowns – in this paper, vector of typical dimensions of cross-section –, $f({\varvec{x}})$ is the objective function to be optimised, ${g}_{i}({\varvec{x}})$ are inequality constraints, ${h}_{j}\left({\varvec{x}}\right)$ are equality constraints, $q$ and $r$ are the numbers of constraints.

In this paper, the target function of optimisation is the weight of the structure

$$f\left(x\right)=\rho \sum\nolimits_{e=1}^{{n}_{e}}{}_{ }{}^{e}A {}_{ }{}^{e}L$$

(28)

where ${n}_{e}$ is the number of truss elements, where $\rho$ is the density of steel.

The structure must meet strength and stability requirements. In the present case, three criteria have been analysed. In the case of pulled rods, the resistance to tensile stress, and in the case of pushed rods, the buckling and finally the local buckling. The cross-sectional utilisation factor can well characterise these characteristics.

A definition of an inequality condition can interpret the tensile and compressive strength of pushed-pulled rods if the stress from the load is interpreted as a sign. Negative tension means pressure, while positive means tension.

$${g}_{Ii}=\left\{\begin{array}{cc}\frac{{\gamma }_{M0}\left|{}{}^{e}\sigma \right|}{\chi {f}_{y}}\le 1& {}{}^{e}\sigma <0\\ \frac{{\gamma }_{M0}\left|{}{}^{e}\sigma \right|}{{f}_{y}}\le 1& {}{}^{e}\sigma \ge 0\end{array}\right.$$

(29)

where ${f}_{y}$ is yield strength, ${\gamma }_{M0}$ is a safety factor according to [17], and $\chi$ is buckling factor also according to [17]

$$\chi =\left\{\begin{array}{cc}1& \overline{\lambda }\le \mathrm{0,2}\\ \frac{1}{\phi +\sqrt{{\phi }^{2}+{\overline{\lambda }}^{2}}}& \overline{\lambda }>\mathrm{0,1}\end{array}\right.$$

(30)

where $\phi$ is a factor

$$\phi =\mathrm{0,5}\left(1+\mathrm{0,21}\left(\overline{\lambda }-\mathrm{0,2}\right)+{\overline{\lambda }}^{2}\right)$$

(31)

$\lambda$ is a slenderness factor

$$\overline{\lambda }=\pi kL\sqrt{\frac{A}{{I}_{x}}}\sqrt{\frac{{f}_{y}}{E}}$$

(32)

where ${I}_{x}$ is the second-order moment of the used cross-section, $k$ is the deflection length factor, which is $k=1$ for intermediate bars and $k=0.7$ for the gripped bars.

The limit of local buckling depends on the shape of the cross-section. A different formula should be used for a different shape [11]. Currently, we use local buckling of circular hollow section

$${g}_{IIi}=\frac{D{f}_{y}}{21150t}\le 1$$

(33)

This formula is valid only if the unit of ${f}_{y}$ yield stress is in $\mathrm{MPa},$ and the unit of $\mathrm{D}$ diameter and unit of $t$ wall thickness is $\mathrm{mm}$.

Using Eqs. (28), (29) and (33), the fitness function to be optimised

$$\mathcal{F}\left({\varvec{x}}\right)= \rho \sum\nolimits_{e=1}^{{n}_{e}}{}{}^{e}A {}{}^{e}L+\sum\nolimits_{i=1}^{{n}_{e}}p\left({g}_{Ii}\left({\varvec{x}}\right)\right)+\sum\nolimits_{i=1}^{{n}_{G}}p\left({g}_{IIi}\left({\varvec{x}}\right)\right)$$

(34)

where $p$ is the static penalty function

$$p\left({\varvec{x}}\right)=\left\{\begin{array}{cc}0& g\left({\varvec{x}}\right)\le 1\\ {10}^{6}g({\varvec{x}})& g\left({\varvec{x}}\right)>1\end{array}\right.$$

(35)

and ${\varvec{x}}$ is the vector of unknowns (vector of independent variables). For example, in the case of a circular tube, un-knowns are characteristic dimensions of the cross-section

$${\varvec{x}}={\left[\begin{array}{cccccccc}{d}_{1}& {d}_{2}& \cdots & {d}_{{n}_{G}}& {t}_{1}& {t}_{2}& \cdots & {t}_{{n}_{G}}\end{array}\right]}^{T}$$

(36)

5 Parallelisation with CUDA and MATLAB

Nvidia corporation offers CUDA Driver API [18] and CUDA Runtime API [19] to program their graphics cards for general-purpose computation. There are many types of graphics cards on the market, with different computation capabilities and performance. The codec containing our unique calculation must be scalable [20], and it should automatically detect the used hardware capabilities [21]. Implementing this feature is sometimes more challenging than implementing our custom calculation. As an intermediate layer between CUDA and our code, MATLAB offers much simpler possibilities for implementing our parallel computation [22]. However, this ease of use comes at a price, so the computation speed increase will never be as great as using only native CUDA.

MATLAB gives a reach toolset and many features to make operations with vectors and matrices. It offers many possible ways to rewrite original loop-based, scalar oriented operations to vector-matrix operations. This process is called “vectorisation”.

To illustrate the differences between the two types of operations, let the population be given as follows for circular hollow section tubes

$${\varvec{X}}=\left[\begin{array}{cccccc}{D}_{\mathrm{1,1}}& {D}_{\mathrm{1,2}}& \cdots & {D}_{1,j}& \cdots & {D}_{1,{n}_{p}}\\ {D}_{\mathrm{2,1}}& {D}_{\mathrm{2,2}}& \cdots & {D}_{2,j}& \cdots & {D}_{2,{n}_{p}}\\ \vdots & \vdots & & \vdots & & \vdots \\ {D}_{i,1}& {D}_{i,2}& \cdots & {D}_{i,j}& \cdots & {D}_{i,{n}_{p}}\\ \vdots & \vdots & & \vdots & & \vdots \\ {D}_{{n}_{G},1}& {D}_{{n}_{G},2}& \cdots & {D}_{{n}_{G},j}& \cdots & {D}_{{n}_{G},{n}_{p}}\\ {t}_{\mathrm{1,1}}& {t}_{\mathrm{1,2}}& \cdots & {t}_{1,j}& \cdots & {t}_{1,{n}_{p}}\\ {t}_{\mathrm{2,1}}& {t}_{\mathrm{2,2}}& \cdots & {t}_{2,j}& \cdots & {t}_{2,{n}_{p}}\\ \vdots & \vdots & & \vdots & & \vdots \\ {t}_{i,1}& {t}_{i,2}& \cdots & {t}_{i,j}& \cdots & {t}_{i,{n}_{p}}\\ \vdots & \vdots & & \vdots & & \vdots \\ {t}_{{n}_{G},1}& {t}_{{n}_{G},2}& \cdots & {t}_{{n}_{G },j}& \cdots & {t}_{{n}_{G},{n}_{p}}\end{array}\right]=\left[\begin{array}{c}{\varvec{D}}\\ {\varvec{t}}\end{array}\right]$$

(37)

where ${D}_{i,j}$ is diameter of ${i}^{th}$ cross-sectional group for ${j}^{th}$ the individual in the population, ${t}_{i,j}$ is the wall thickness of ${i}^{th}$ cross-sectional group for ${j}^{th}$ the individual in the population. For example, the loop-based, scalar oriented implementation of Eq. (33) could be seen in Listing 1.

In contrast, the implementation in Listing 2 of the same equation covers a vectorised form.

The striking difference between the two code snippets is that the latter is much shorter and more transparent. Sometimes, scalar-oriented operation vectorisation may not be formulated with element-wise operations (such as.*,./,.^, etc.). In such cases, arrayfun() could be a good tool. The point is that the scalar operation inside the loop core must be organised into a separate function (see in Listing 3). arrayfun() will call this function one at a time as many times as many elements in the vector or matrix passed as a parameter.

Provide tools for vectorising operations performed on multidimensional matrices using “page-wised” functions and operations. These detailed descriptions could be found in [22] for length reasons; these are not detailed in this paper.

MATLAB can always start the loop-based approach on only one thread, as illustrated in Listing 1. The situation is different with vectorised operations. It can automatically detect repetitive operations where only the data to be processed changes and automatically discover the capabilities of the runtime environment to perform them on multiple threads. In the simplest case, when using multi-core processors, it automatically – unless the opposite is set – takes advantage of the possibility of running on multiple cores in parallel. This automation also works for GPUs if the type of all variables in the expression is gpuarray. It automatically creates the required kernel functions based on the expressions and starts them on the required and possible number of threads, considering the capabilities of the GPU.

All the expressions and functions presented in previous chapters are easy to vectorise. This allows complete optimisation – evolutionary algorithm, FEM solver and fitness function calculation – to be calculated using GPU in parallel. If all steps and operations are calculated with a GPU, the host machine only manages them; it is enough to move data between the host and GPU at the beginning and end of the optimisation.

6 Comparison of Sequential and Parallel Optimisation

The dimensionless computation speed up between sequential and parallel processing is defined as follows

$$\nabla =\frac{{t}_{seq}}{{t}_{par}}$$

(38)

where ${t}_{seq}$ is the average computation time of iteration steps using only sequential processing, ${t}_{par}$ is the average computation time of iteration steps using only sequential processing. For measuring ${t}_{seq}$ computation time, we used 1 pcs CPU thread, and for measuring ${t}_{par}$ computation we used as many as possible thread on Geforce GTX 1050 Ti type graphics card.

The structure shown in Fig. 2 was optimised to determine the previously defined rate increase. This is a truss structure with deltoid-shaped stiffeners. Applied loads were ${F}_{1}=332.94\,\, kN$, ${F}_{2}=437.46\,\, kN$ and ${F}_{3}=338.08\,\, kN$. Node 1 and 7 were fixed, that means any displacement in these points is not allowed. Cross-section of all rods was a circular tube, where we optimised of outside diameter and wall thickness of tubes according to Eq. (36). Rods of the structure were divided into three cross-sectional groups. The first group contains rods 1–10. Horizontal rods (11–16) are in the second group. Finally, rods in the third group are rods of deltoid shape (17–26).

In our simulation, we have simulated optimisation with different numbers of individuals in the population which are used by SaDE. The dimensionless speed up achieved is illustrated in Fig. 3, with different ${n}_{p}$ population sizes. We did not inspect the quality of optima in this paper; we inspected only the difference in computation time.

7 Conclusion

An evolutionary algorithm is presented in this paper, the differential evolution. This algorithm relates to the finite element method for optimising truss-like structures subject to static stresses, overall buckling and local buckling. This is a powerful approach for optimising any truss structure automatically.

Evolutionary optimisation is a population-based iterative numerical method. That means the fitness function should be calculated many times; meanwhile, that could be a resource-demanding task and take a long time. One way to increase the speed of calculations is parallel computation with GPU. MATLAB offers user-friendly methods and tools for doing it. We have analysed dimensionless speed up of optimisation with tools of MATLAB.

The available speed up depends on the size of the population Speed up increases approximately exponentially in the function of population size (see in Fig. 2). If the population size is small, there is no reason for parallelisation.

In future exploration, it could be interesting to inspect speed up in the function of the number of elements and number of nodes with fixed and varied size populations.

References

Xia, Z., Wang, Y., Wang, Q., Mei, C.: GPU parallel strategy for parameterized LSM-based topology optimization using isogeometric analysis. Struct. Multidiscip. Optim. 56(2), 413–434 (2017). https://doi.org/10.1007/s00158-017-1672-x
Article MathSciNet Google Scholar
Wang, J., Zhang, D., Luo, M., Zhang, Y.: A GPU-based tool parameters optimization and tool orientation control method for four-axis milling with ball-end cutter. Int. J. Adv. Manuf. Technol. 102(5–8), 1107–1125 (2018). https://doi.org/10.1007/s00170-018-2954-1
Article Google Scholar
Yang, X.S.: Flower pollination algorithm for global optimisation. In: Durand-Lose, J., Nataša, J. (eds.) Unconventional Computation and Natural Computation, pp. 240–249. Springer, Berlin, Heidelberg (2012)
Chapter Google Scholar
Xie, X.F., Zhang, W.J., Yang, Z.L.: Adaptive particle swarm optimisation on individual level. In: Proceedings of the 6th International Conference on Signal Processing, 2002, vol. 2, pp. 1215–1218 (2002). https://doi.org/10.1109/ICOSP.2002.1180009
Yang, X.S.: Nature-Inspired Optimization Algorithms, 2nd (edn). Academic Press, London (2021). https://doi.org/10.1016/C2019-0-03762-4
Book Google Scholar
Lee, D.S., Gonzalez, L.F., Srinivas, K., Periaux, J.: Robust evolutionary algorithms for UAV/UCAV aerodynamic and RCS design optimisation. Comput. Fluids 37(5), 547–564 (2008). https://doi.org/10.1016/j.compfluid.2007.07.008
Article MATH Google Scholar
Tey, J.Y., Rahizar, R.: Handling performance optimisation for formula vehicle using multi-objectives evolutionary algorithms. Veh. Syst. Dyn. 58(12), 1823–1838 (2020). https://doi.org/10.1080/00423114.2019.1645861
Article Google Scholar
Galván-López, E., Curran, T., McDermott, J., Carroll, P.: Design of an autonomous intelligent demand-side management system using stochastic optimisation evolutionary algorithms. Neurocomputing 170, 270–285 (2015). https://doi.org/10.1016/j.neucom.2015.03.093
Article Google Scholar
Storn, R., Price, K.: Differential evolution – a simple and efficient heuristic for global optimisation over continuous spaces. J. Global Optim. 11, 341–359 (1997). https://doi.org/10.1023/A:1008202821328
Article MathSciNet MATH Google Scholar
Qin, A.K., Suganthan, P.N.: Self-adaptive differential evolution algorithm for numerical optimisation. In: Proceedings of the IEEE Congress on Evolutionary Computation, vol. 2, pp. 1785–1791 (2005). https://doi.org/10.1109/CEC.2005.1554904
Wardenier, J., Kurobane, Y., Packer, J.A., van der Vegte, G.J., Zhao, X.-L.: Design Guide for Circular Hollow Section (CHS) Joints Under Predominantly Static Loading, 2nd edn. CIDECT, Zürich (2008)
Google Scholar
Wardenier, J., Kurobane, Y., Packer, J.A., van der Vegte, G.J., Zhao, X.-L.: Design guide for rectangular hollow section (RHS) joints under predominantly static loading, 2nd edn. CIDECT, Zürich (2008)
Google Scholar
Ferreira, A.J.M., Fantuzzi, N.: MATLAB Codes for Finite Element Analysis. Springer Cham, Heidelberg (2020). https://doi.org/10.1007/978-3-030-47952-7
Book MATH Google Scholar
Smith, I.M., Lee, M.: Programming the Finite Element Method, 5th edn. John Wiley and Sons Ltd, London (2013)
Google Scholar
Páczelt, I.: Finite element method in engineering practice (in Hungarian). Miskolci Egyetemi Kiadó, Miskolc (1999)
Google Scholar
Páczelt, I., Baksa, A., Szabó, T.: Fundamentals of the finite element method (in Hungarian). HEFOP jegyzet, Miskolc (2007)
Google Scholar
EN 1993–1–1: Eurocode 3: Design of steel structures - part 1–1 General rules and rules for buildings. European Committee Standardization, Brussels (2009)
Google Scholar
CUDA Drive API documentation. https://docs.nvidia.com/cuda/cuda-driver-api/index.html. Accessed 01 Mar 2022
CUDA Runtime API documentation. https://docs.nvidia.com/cuda/cuda-runtime-api/index.html. Accessed 01 Mar 2022
Cheng, J.: Professional CUDA C Programming. John Wiley & Sons, Hoboken (2014)
Google Scholar
Sanders, J., Kandrot, E.: CUDA by Example: An Introduction to General-Purpose GPU Programming. Pearson Education, Boston (2010)
Google Scholar
Help center for MATLAB: Simulink and other MathWorks product. https://uk.mathworks.com/help/index.html?s_tid=CRUX_lftnav. Accessed 01 Mar 2022

Download references

Acknowledgements

The research was supported by the Hungarian National Research, Development and Innovation Office—NKFIH under the project number K 134358.

Author information

Authors and Affiliations

Emerson Automation FCP Kft., Eger, 3300, Hungary
Szilárd Nagy
University of Miskolc, Miskolc, 3515, Hungary
Szilárd Nagy, Károly Jármai & Attila Baksa

Authors

Szilárd Nagy
View author publications
You can also search for this author in PubMed Google Scholar
Károly Jármai
View author publications
You can also search for this author in PubMed Google Scholar
Attila Baksa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Szilárd Nagy .

Editor information

Editors and Affiliations

Faculty of Mechanical Engineering and Informatics, Institute of Energy Engineering and Chemical Machinery, University of Miskolc, Miskolc, Hungary
Károly Jármai
Faculty of Mechanical Engineering and Informatics, Institute of Logistics, University of Miskolc, Miskolc, Hungary
Ákos Cservenák

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nagy, S., Jármai, K., Baksa, A. (2023). Combination of GPU Programming and FEM Analysis in Structural Optimisation. In: Jármai, K., Cservenák, Á. (eds) Vehicle and Automotive Engineering 4. VAE 2022. Lecture Notes in Mechanical Engineering. Springer, Cham. https://doi.org/10.1007/978-3-031-15211-5_63

Download citation

DOI: https://doi.org/10.1007/978-3-031-15211-5_63
Published: 10 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15210-8
Online ISBN: 978-3-031-15211-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Combination of GPU Programming and FEM Analysis in Structural Optimisation

Abstract

Similar content being viewed by others