Keywords

1 Introduction

Nowadays, graphic cards (video cards, GPU) are cheap and efficient hardware for general-purpose parallel computation. They are used for scientific computations, for topology optimisation [1] or structural optimisation [2], and for manufacturing technologies. They provide a massively parallel environment with the support of a single instruction multiple data (SIMD) programming model. Nowadays, larger software vendors – such as MathWorks – are increasingly developing frameworks based on the CUDA API (application programming interface) that offer more convenient and user-friendly tools than the original CUDA Runtime API.

Nature-inspired, population-based, iterative, evolutionary algorithms – such as flower pollination algorithm [3], particle swarm optimisation [4], firefly algorithm [5], etc. – are powerful numerical optimisation methods. Their importance and effectiveness are underlined by the fact that they are used in several places in vehicle research to design optimal aerodynamics for UAVs (unmanned aerial vehicles) [6], for performance optimisation of formula vehicles [7], for optimising the manufacturing of vehicles [8] etc.

The finite element method (FEM) is a universal tool for analysing structures and determines mechanical stress and deformations inside the structure. In this paper, we connect an evolutionary algorithm – differential evolution – with FEM. The method is presented through the optimisation of the truss structure. Computational capacity is demanded by both the evolutionary method and FEM. Therefore, we present a possible parallelisation using MATLAB software and the obtained results.

2 Differential Evolution

Stron and Price introduced original differential evolution (DE) in [1]. DE improves the \({n}_{D}\) dimensional \({\varvec{x}}\) individuals of \({n}_{p}\) element population through a series of iteration steps

$${\varvec{x}}={\left[\begin{array}{ccccc}{x}_{1}& {x}_{2}& {x}_{3}& \cdots & {x}_{{n}_{D}}\end{array}\right]}^{T}\in S\subset {\mathbb{R}}^{{n}_{D}}$$
(1)

where \(S\) is searching space. Ideally, the initial population randomly covers the entire search space. Each variable in an individual is a uniformly distributed random number in the search space.

DE generates the new entity in each iteration step by performing three operations repeatedly. These are called mutation, crossover, and selection operations [9].

During the mutation operation, a \({}^{G}{{\varvec{v}}}_{i}\) mutant is generated for each \({}^{G}{{\varvec{x}}}_{i}\) individual of \(G\) generation using one of the following five strategies [10]:

  • DE/rand/1:

    $${}^{G}{{\varvec{v}}}_{i}={}{}^{G}{{\varvec{x}}}_{{r}_{1}}+F\left({}{}^{G}{{\varvec{x}}}_{{r}_{2}}-{}{}^{G}{{\varvec{x}}}_{{r}_{3}}\right)$$
    (2)
  • DE/best/1:

    $${}^{G}{{\varvec{v}}}_{i}={}_{ }{}^{G}{{\varvec{x}}}_{b}+F\left({}_{ }{}^{G}{{\varvec{x}}}_{{r}_{1}}-{}_{ }{}^{G}{{\varvec{x}}}_{{r}_{2}}\right)$$
    (3)
  • DE/current to best/2:

    $${}^{G}{{\varvec{v}}}_{i}={}{}^{G}{{\varvec{x}}}_{i}+F\left({}{}^{G}{{\varvec{x}}}_{b}-{}{}^{G}{{\varvec{x}}}_{i}\right)+F\left({}{}^{G}{{\varvec{x}}}_{{r}_{1}}-{}{}^{G}{{\varvec{x}}}_{{r}_{2}}\right)$$
    (4)
  • DE/best/2:

    $${}^{G}{{\varvec{v}}}_{i}={}{}^{G}{{\varvec{x}}}_{b}+F\left({}{}^{G}{{\varvec{x}}}_{{r}_{1}}-{}{}^{G}{{\varvec{x}}}_{{r}_{2}}\right)+F\left({}{}^{G}{{\varvec{x}}}_{{r}_{3}}-{}{}^{G}{{\varvec{x}}}_{{r}_{4}}\right)$$
    (5)
  • DE/rand/2:

    $${}^{G}{{\varvec{v}}}_{i}={}{}^{G}{{\varvec{x}}}_{{r}_{1}}+F\left({}{}^{G}{{\varvec{x}}}_{{r}_{2}}-{}{}^{G}{{\varvec{x}}}_{{r}_{3}}\right)+F\left({}{}^{G}{{\varvec{x}}}_{{r}_{4}}-{}{}^{G}{{\varvec{x}}}_{{r}_{5}}\right)$$
    (6)

    where \({r}_{1}\ne {r}_{2}\ne {r}_{3}\ne {r}_{4}\ne {r}_{5}\in \left[1,{n}_{p}\right]\) are random indices, \(F\in [\mathrm{0,2})\) is the scaling factor, and \({}^{G}{{\varvec{x}}}_{b}\) is individual with the best fitness value in each \(G\) generation.

The mutation is followed by a “binomial” crossover operation that combines the newly created \({}^{G}{{\varvec{v}}}_{i}\) mutant with the \({}^{G}{{\varvec{x}}}_{i}\) individual

$${}^{G}{{\varvec{u}}}_{j,i}=\left\{\begin{array}{cc}{}{}^{G}{{\varvec{v}}}_{j,i}& {U}_{j}\left(\mathrm{0,1}\right)\le {C}_{R}\, \mathrm{or}\, j={j}_{R}\\ {}{}^{G}{{\varvec{x}}}_{j,i}& \mathrm{otherwise}\end{array}\right.$$
(7)

where \({U}_{j}(\mathrm{0,1})\in [\mathrm{0,1})\) is uniformly distributed random number, \({C}_{R}\in [\mathrm{0,1})\) is the crossover rate, and \({j}_{R}\in [1,{n}_{D}]\) is a random index.

During selection, if the fitness value of the newly generated \({}^{G}{{\varvec{u}}}_{i}\) is better than that of the \({}^{G}{{\varvec{x}}}_{i}\), it will be included in the new generation; if not, the algorithm drops it

$${}^{G+1}{{\varvec{x}}}_{i}=\left\{\begin{array}{cc}{}{}^{G}{{\varvec{u}}}_{i}& \mathcal{F}({}{}^{G}{{\varvec{u}}}_{i})\le \mathcal{F}({}{}^{G}{{\varvec{x}}}_{i})\\ {}{}^{G}{{\varvec{x}}}_{i}& \mathrm{otherwise}\end{array}\right.$$
(8)

The operation of differential evolution, and hence the success of the optimisation, is greatly influenced by the mutation strategy chosen, the value of the scaling factor \(F\), and the \({C}_{R}\) crossing ratio.

3 Finite Element Model of Truss Structure

The connection between members of tubular trusses is frequently modelled as pin connection inelastic analysis. The preferred value of eccentricities of the intersection of member’s center lines is [1, 12].

$$e\le 0.25D\, \mathrm{or}\, e\le 0.25{H}_{0}$$
(9)

where \(e\) is eccentricity, \(D\) is the outside diameter of a circular hollow section, and \({H}_{0}\) is a typical size of rectangular hollow section. In this case, primary bending moments are produced by these eccentricities. Excessive moments are generated in brace members when rigid connections are considered. Usage of these is not recommended also for welded joints [1, 12]. The axial force distribution in a rigid joint is like pinned joint.

The structure could be analysed with the pushed-pulled element model (shortly in the following rod or truss model) with finite element methods (FEM) if the condition of inequality (9) is met. In most cases, it is sufficient to examine the structure in a plane relevant to the load. If this was insufficient and the spatial analysis had to be performed, the presented method could be easily adapted to a spatial case. In this paper, we will only discuss planar problems.

Fig. 1.
figure 1

Truss element model.

The model of the truss element is shown in Fig. 1. An approximation of the displacement within the rod element with a kinematically admissible function [13]

$${}^{e}u\left(\xi \right)=\left[\begin{array}{cc}\frac{{\xi }_{i}-\xi }{{}{}^{e}L}& \frac{{\xi }_{j}-\xi }{{}{}^{e}L}\end{array}\right]\left[\begin{array}{c}{}{}^{e}{u}_{i}^{^{\prime}}\\ {}{}^{e}{u}_{j}^{^{\prime}}\end{array}\right]=\left[\begin{array}{cc}{}{}^{e}{N}_{i}(\xi )& {}{}^{e}{N}_{j}(\xi )\end{array}\right]\left[\begin{array}{c}{}{}^{e}{u}_{i}^{^{\prime}}\\ {}{}^{e}{u}_{j}^{^{\prime}}\end{array}\right]={}{}^{e}{\varvec{N}} {}{}^{e}{{\varvec{u}}}^{^{\prime}}$$
(10)

where \({}^{e}L\) is the length of the rod element, \({}^{e}{\varvec{N}}\) is the matrix of shape functions, and \({}^{e}{{\varvec{u}}}^{^{\prime}}\) is the vector of nodal displacement interpreted in element connected \(\xi\) coordinate system. In the global \(x-y\) coordinate system, nodal displacements could be described in the following form

$${}^{e}{\varvec{u}}={\left[\begin{array}{cccc}{}_{ }{}^{e}{u}_{ix}& {}_{ }{}^{e}{u}_{iy}& {}_{ }{}^{e}{u}_{jx}& {}_{ }{}^{e}{u}_{jy}\end{array}\right]}^{T}.$$
(11)

The transformation between the two-coordinate system could be made with the transformation matrix

$${}^{e}{\varvec{T}}=\left[\begin{array}{cccc}{}{}^{e}{T}_{11}& {}{}^{e}{T}_{12}& 0& 0\\ 0& 0& {}{}^{e}{T}_{23}& {}{}^{e}{T}_{24}\end{array}\right]$$
(12)

where

$${}^{e}{T}_{11}={}_{ }{}^{e}{T}_{23}=\frac{{}_{ }{}^{e}{u}_{jx}-{}_{ }{}^{e}{u}_{ix}}{{}_{ }{}^{e}L}\, \mathrm{ and }\, {}^{e}{T}_{12}={}_{ }{}^{e}{T}_{24}=\frac{{}_{ }{}^{e}{u}_{jy}-{}_{ }{}^{e}{u}_{iy}}{{}_{ }{}^{e}L}$$
(13)
$${}^{e}{{\varvec{u}}}^{^{\prime}}={}{}^{e}{\varvec{T}} {}{}^{e}{\varvec{u}}$$
(14)

Elongation of truss element is

$${}^{e}\varepsilon =\frac{d{}{}^{e}u(\xi )}{d\xi }=\frac{1}{{}{}^{e}L}\left[\begin{array}{cc}-1& 1\end{array}\right]{}{}^{e}{{\varvec{u}}}^{^{\prime}}$$
(15)

and stress in the axial direction is

$${}^{e}\sigma =E{}{}^{e}\varepsilon =\frac{E}{{}{}^{e}L}\left[\begin{array}{cc}-1& 1\end{array}\right]{}{}^{e}{{\varvec{u}}}^{^{\prime}}$$
(16)

where \(E\) is elastic modulus. The strain energy of truss element with \({}^{e}A\) cross-sectional area is

$${{}^{e}U}=\frac{1}{2}{\int }_{L}{}{}^{e}A {}{}^{e}\sigma {}{}^{e}\varepsilon d\xi =\frac{1}{2}{}{}^{e}{{\varvec{u}}}^{{^{\prime}}T}\frac{{}{}^{e}AE}{{}{}^{e}L}\left[\begin{array}{cc}1& -1\\ -1& 1\end{array}\right]{}{}^{e}{{\varvec{u}}}^{^{\prime}}=\frac{1}{2}{}{}^{e}{{\varvec{u}}}^{{^{\prime}}T} {}{}^{e}{{\varvec{K}}}^{^{\prime}} {}{}^{e}{{\varvec{u}}}^{^{\prime}}$$
(17)

where \({}^{e}{{\varvec{K}}}^{^{\prime}}\) is the stiffness matrix of the element. The work of external forces is

$${{}^{e}W}={\int }_{L}{}{}^{e}u\left(\xi \right)p d\xi ={}{}^{e}{{\varvec{u}}}^{{^{\prime}}T} {}{}^{e}{{\varvec{f}}}^{^{\prime}}$$
(18)

where \({}^{e}{{\varvec{f}}}^{^{\prime}}\) is the vector of external forces reduced to nodes. The total potential energy of one element could be written in the following form

$${}^{e}{{\Pi }_{p}}^{ }={{}_{ }{}^{e}U}^{ }-{{}_{ }{}^{e}W}^{ }=\frac{1}{2}{}_{ }{}^{e}{{\varvec{u}}}^{\mathrm{^{\prime}}T} {}_{ }{}^{e}{{\varvec{K}}}^{\mathrm{^{\prime}}} {}_{ }{}^{e}{{\varvec{u}}}^{ }-{}_{ }{}^{e}{{\varvec{u}}}^{\mathrm{^{\prime}}T} {}_{ }{}^{e}{{\varvec{f}}}^{\mathrm{^{\prime}}}$$
(19)

It could be rewritten with quantities, which are introduced in the global coordinate system

$${}^{e}{\Pi }_{p}=\frac{1}{2}{}{}^{e}{{\varvec{u}}}^{T} {}{}^{e}{\varvec{K}} {}{}^{e}{{\varvec{u}}}-{}{}^{e}{{\varvec{u}}}^{T} {}{}^{e}{{\varvec{f}}}$$
(20)

where

$${}^{e}{\varvec{K}}={}_{ }{}^{e}{{\varvec{T}}}^{T} {}_{ }{}^{e}{{\varvec{K}}}^{\mathrm{^{\prime}}} {}_{ }{}^{e}{\varvec{T}}\, \mathrm{ and }\, {}^{e}{\varvec{f}}={}_{ }{}^{e}{{\varvec{T}}}^{T} {}_{ }{}^{e}{{\varvec{f}}}^{\mathrm{^{\prime}}}$$
(21)

Introducing the \({\varvec{u}}\) all node displacement vectors and the \({\varvec{f}}\) all node load vectors as the total potential energy of the whole structure is

$${\Pi }_{p}=\frac{1}{2}{{\varvec{u}}}^{T}\left({\varvec{K}}{\varvec{u}}-{\varvec{f}}\right)$$
(22)

where \({\varvec{K}}\) stiffness matrix of the complete structure according to the rules of element alignment, which is detailed described in [13, 14].

Many truss structures are built from different rods with different cross-sectional properties. These rods could be grouped by \(AE\) product. From the stiffens \({\varvec{K}}\) matrix introduced initially in (22), these \(AE\) product can be extracted by cross-sectional groups

$${\varvec{K}}={{\varvec{A}}}_{1}{{\varvec{E}}}_{1}{{\varvec{K}}}_{1}+{{\varvec{A}}}_{2}{{\varvec{E}}}_{2}{{\varvec{K}}}_{2}+\cdots +{{\varvec{A}}}_{{\varvec{i}}}{{\varvec{E}}}_{{\varvec{i}}}{{\varvec{K}}}_{{\varvec{i}}}+\cdots +{{\varvec{A}}}_{{{\varvec{n}}}_{{\varvec{G}}}}{{\varvec{E}}}_{{{\varvec{n}}}_{{\varvec{G}}}}{{\varvec{K}}}_{{{\varvec{n}}}_{{\varvec{G}}}}=\sum\nolimits_{{\varvec{i}}=1}^{{{\varvec{n}}}_{{\varvec{G}}}}{{\varvec{A}}}_{{\varvec{i}}}{{\varvec{E}}}_{{\varvec{i}}}{{\varvec{K}}}_{{\varvec{i}}}$$
(23)

where \({n}_{G}\) is the number of cross-sectional groups, and \({{\varvec{K}}}_{{\varvec{i}}}\) is stiffness matrix of \({i}^{th}\) group. If the unknown quantities of the optimisation are typical cross-section dimensions (for example, D outside diameter and t wall thickness for circular hollow section), pre-processing of FEM is enough to do it once before the first iteration step of optimisation.

According to the principle of minimum total potential energy [15, 16], the \(\delta\Pi\) first variation of \(\Pi\) total potential energy is zero. After applying boundary conditions, we get an algebraic equation system of FEM

$$\delta {\Pi }_{p}=\delta {{\varvec{u}}}^{T}\frac{\partial {\Pi }_{p}}{\partial {\varvec{u}}}=\boldsymbol{ }\delta {{\varvec{u}}}^{T}\left({\varvec{K}}{\varvec{u}}-{\varvec{f}}\right)=0$$
(24)
$${\varvec{u}}={{\varvec{K}}}^{-1} {\varvec{f}}.$$
(25)

Post-processing of the result of Eq. (25) is necessary for further calculations. Axial stress of elements could be determined by

$${}^{e}\sigma =\frac{{}{}^{e}E}{{}{}^{e}L}\left[\begin{array}{cccc}-{}{}^{e}{T}_{11}& -{}{}^{e}{T}_{12}& {}{}^{e}{T}_{11}& {}{}^{e}{T}_{12}\end{array}\right]{}{}^{e}{\varvec{u}}$$
(26)

4 The Optimisation Problem

Optimisation of truss structures are constrained optimisation problem

$$\begin{array}{cc}\mathrm{min}. f({\varvec{x}})& {\varvec{x}}={\left[\begin{array}{cccc}{x}_{1}& {x}_{2}& \cdots & {x}_{D}\end{array}\right]}^{T}\in {\mathbb{R}}\\ {g}_{i}({\varvec{x}})\le 1& 1\le i\le q\\ {h}_{j}\left({\varvec{x}}\right)=0& 1\le j\le r\end{array}$$
(27)

where \({\varvec{x}}\) is the vector of unknowns – in this paper, vector of typical dimensions of cross-section –, \(f({\varvec{x}})\) is the objective function to be optimised, \({g}_{i}({\varvec{x}})\) are inequality constraints, \({h}_{j}\left({\varvec{x}}\right)\) are equality constraints, \(q\) and \(r\) are the numbers of constraints.

In this paper, the target function of optimisation is the weight of the structure

$$f\left(x\right)=\rho \sum\nolimits_{e=1}^{{n}_{e}}{}_{ }{}^{e}A {}_{ }{}^{e}L$$
(28)

where \({n}_{e}\) is the number of truss elements, where \(\rho\) is the density of steel.

The structure must meet strength and stability requirements. In the present case, three criteria have been analysed. In the case of pulled rods, the resistance to tensile stress, and in the case of pushed rods, the buckling and finally the local buckling. The cross-sectional utilisation factor can well characterise these characteristics.

A definition of an inequality condition can interpret the tensile and compressive strength of pushed-pulled rods if the stress from the load is interpreted as a sign. Negative tension means pressure, while positive means tension.

$${g}_{Ii}=\left\{\begin{array}{cc}\frac{{\gamma }_{M0}\left|{}{}^{e}\sigma \right|}{\chi {f}_{y}}\le 1& {}{}^{e}\sigma <0\\ \frac{{\gamma }_{M0}\left|{}{}^{e}\sigma \right|}{{f}_{y}}\le 1& {}{}^{e}\sigma \ge 0\end{array}\right.$$
(29)

where \({f}_{y}\) is yield strength, \({\gamma }_{M0}\) is a safety factor according to [17], and \(\chi\) is buckling factor also according to [17]

$$\chi =\left\{\begin{array}{cc}1& \overline{\lambda }\le \mathrm{0,2}\\ \frac{1}{\phi +\sqrt{{\phi }^{2}+{\overline{\lambda }}^{2}}}& \overline{\lambda }>\mathrm{0,1}\end{array}\right.$$
(30)

where \(\phi\) is a factor

$$\phi =\mathrm{0,5}\left(1+\mathrm{0,21}\left(\overline{\lambda }-\mathrm{0,2}\right)+{\overline{\lambda }}^{2}\right)$$
(31)

\(\lambda\) is a slenderness factor

$$\overline{\lambda }=\pi kL\sqrt{\frac{A}{{I}_{x}}}\sqrt{\frac{{f}_{y}}{E}}$$
(32)

where \({I}_{x}\) is the second-order moment of the used cross-section, \(k\) is the deflection length factor, which is \(k=1\) for intermediate bars and \(k=0.7\) for the gripped bars.

The limit of local buckling depends on the shape of the cross-section. A different formula should be used for a different shape [11]. Currently, we use local buckling of circular hollow section

$${g}_{IIi}=\frac{D{f}_{y}}{21150t}\le 1$$
(33)

This formula is valid only if the unit of \({f}_{y}\) yield stress is in \(\mathrm{MPa},\) and the unit of \(\mathrm{D}\) diameter and unit of \(t\) wall thickness is \(\mathrm{mm}\).

Using Eqs. (28), (29) and (33), the fitness function to be optimised

$$\mathcal{F}\left({\varvec{x}}\right)= \rho \sum\nolimits_{e=1}^{{n}_{e}}{}{}^{e}A {}{}^{e}L+\sum\nolimits_{i=1}^{{n}_{e}}p\left({g}_{Ii}\left({\varvec{x}}\right)\right)+\sum\nolimits_{i=1}^{{n}_{G}}p\left({g}_{IIi}\left({\varvec{x}}\right)\right)$$
(34)

where \(p\) is the static penalty function

$$p\left({\varvec{x}}\right)=\left\{\begin{array}{cc}0& g\left({\varvec{x}}\right)\le 1\\ {10}^{6}g({\varvec{x}})& g\left({\varvec{x}}\right)>1\end{array}\right.$$
(35)

and \({\varvec{x}}\) is the vector of unknowns (vector of independent variables). For example, in the case of a circular tube, un-knowns are characteristic dimensions of the cross-section

$${\varvec{x}}={\left[\begin{array}{cccccccc}{d}_{1}& {d}_{2}& \cdots & {d}_{{n}_{G}}& {t}_{1}& {t}_{2}& \cdots & {t}_{{n}_{G}}\end{array}\right]}^{T}$$
(36)

5 Parallelisation with CUDA and MATLAB

Nvidia corporation offers CUDA Driver API [18] and CUDA Runtime API [19] to program their graphics cards for general-purpose computation. There are many types of graphics cards on the market, with different computation capabilities and performance. The codec containing our unique calculation must be scalable [20], and it should automatically detect the used hardware capabilities [21]. Implementing this feature is sometimes more challenging than implementing our custom calculation. As an intermediate layer between CUDA and our code, MATLAB offers much simpler possibilities for implementing our parallel computation [22]. However, this ease of use comes at a price, so the computation speed increase will never be as great as using only native CUDA.

MATLAB gives a reach toolset and many features to make operations with vectors and matrices. It offers many possible ways to rewrite original loop-based, scalar oriented operations to vector-matrix operations. This process is called “vectorisation”.

To illustrate the differences between the two types of operations, let the population be given as follows for circular hollow section tubes

$${\varvec{X}}=\left[\begin{array}{cccccc}{D}_{\mathrm{1,1}}& {D}_{\mathrm{1,2}}& \cdots & {D}_{1,j}& \cdots & {D}_{1,{n}_{p}}\\ {D}_{\mathrm{2,1}}& {D}_{\mathrm{2,2}}& \cdots & {D}_{2,j}& \cdots & {D}_{2,{n}_{p}}\\ \vdots & \vdots & & \vdots & & \vdots \\ {D}_{i,1}& {D}_{i,2}& \cdots & {D}_{i,j}& \cdots & {D}_{i,{n}_{p}}\\ \vdots & \vdots & & \vdots & & \vdots \\ {D}_{{n}_{G},1}& {D}_{{n}_{G},2}& \cdots & {D}_{{n}_{G},j}& \cdots & {D}_{{n}_{G},{n}_{p}}\\ {t}_{\mathrm{1,1}}& {t}_{\mathrm{1,2}}& \cdots & {t}_{1,j}& \cdots & {t}_{1,{n}_{p}}\\ {t}_{\mathrm{2,1}}& {t}_{\mathrm{2,2}}& \cdots & {t}_{2,j}& \cdots & {t}_{2,{n}_{p}}\\ \vdots & \vdots & & \vdots & & \vdots \\ {t}_{i,1}& {t}_{i,2}& \cdots & {t}_{i,j}& \cdots & {t}_{i,{n}_{p}}\\ \vdots & \vdots & & \vdots & & \vdots \\ {t}_{{n}_{G},1}& {t}_{{n}_{G},2}& \cdots & {t}_{{n}_{G },j}& \cdots & {t}_{{n}_{G},{n}_{p}}\end{array}\right]=\left[\begin{array}{c}{\varvec{D}}\\ {\varvec{t}}\end{array}\right]$$
(37)

where \({D}_{i,j}\) is diameter of \({i}^{th}\) cross-sectional group for \({j}^{th}\) the individual in the population, \({t}_{i,j}\) is the wall thickness of \({i}^{th}\) cross-sectional group for \({j}^{th}\) the individual in the population. For example, the loop-based, scalar oriented implementation of Eq. (33) could be seen in Listing 1.

figure a

In contrast, the implementation in Listing 2 of the same equation covers a vectorised form.

figure b

The striking difference between the two code snippets is that the latter is much shorter and more transparent. Sometimes, scalar-oriented operation vectorisation may not be formulated with element-wise operations (such as.*,./,.^, etc.). In such cases, arrayfun() could be a good tool. The point is that the scalar operation inside the loop core must be organised into a separate function (see in Listing 3). arrayfun() will call this function one at a time as many times as many elements in the vector or matrix passed as a parameter.

figure c

Provide tools for vectorising operations performed on multidimensional matrices using “page-wised” functions and operations. These detailed descriptions could be found in [22] for length reasons; these are not detailed in this paper.

MATLAB can always start the loop-based approach on only one thread, as illustrated in Listing 1. The situation is different with vectorised operations. It can automatically detect repetitive operations where only the data to be processed changes and automatically discover the capabilities of the runtime environment to perform them on multiple threads. In the simplest case, when using multi-core processors, it automatically – unless the opposite is set – takes advantage of the possibility of running on multiple cores in parallel. This automation also works for GPUs if the type of all variables in the expression is gpuarray. It automatically creates the required kernel functions based on the expressions and starts them on the required and possible number of threads, considering the capabilities of the GPU.

All the expressions and functions presented in previous chapters are easy to vectorise. This allows complete optimisation – evolutionary algorithm, FEM solver and fitness function calculation – to be calculated using GPU in parallel. If all steps and operations are calculated with a GPU, the host machine only manages them; it is enough to move data between the host and GPU at the beginning and end of the optimisation.

6 Comparison of Sequential and Parallel Optimisation

The dimensionless computation speed up between sequential and parallel processing is defined as follows

$$\nabla =\frac{{t}_{seq}}{{t}_{par}}$$
(38)

where \({t}_{seq}\) is the average computation time of iteration steps using only sequential processing, \({t}_{par}\) is the average computation time of iteration steps using only sequential processing. For measuring \({t}_{seq}\) computation time, we used 1 pcs CPU thread, and for measuring \({t}_{par}\) computation we used as many as possible thread on Geforce GTX 1050 Ti type graphics card.

The structure shown in Fig. 2 was optimised to determine the previously defined rate increase. This is a truss structure with deltoid-shaped stiffeners. Applied loads were \({F}_{1}=332.94\,\, kN\), \({F}_{2}=437.46\,\, kN\) and \({F}_{3}=338.08\,\, kN\). Node 1 and 7 were fixed, that means any displacement in these points is not allowed. Cross-section of all rods was a circular tube, where we optimised of outside diameter and wall thickness of tubes according to Eq. (36). Rods of the structure were divided into three cross-sectional groups. The first group contains rods 1–10. Horizontal rods (1116) are in the second group. Finally, rods in the third group are rods of deltoid shape (1726).

In our simulation, we have simulated optimisation with different numbers of individuals in the population which are used by SaDE. The dimensionless speed up achieved is illustrated in Fig. 3, with different \({n}_{p}\) population sizes. We did not inspect the quality of optima in this paper; we inspected only the difference in computation time.

Fig. 2.
figure 2

Sketch of optimised structure for comparison of sequential optimisation and parallel optimisation

Fig. 3.
figure 3

Dimensionless speed up with different population sizes and numbers of nodes.

7 Conclusion

An evolutionary algorithm is presented in this paper, the differential evolution. This algorithm relates to the finite element method for optimising truss-like structures subject to static stresses, overall buckling and local buckling. This is a powerful approach for optimising any truss structure automatically.

Evolutionary optimisation is a population-based iterative numerical method. That means the fitness function should be calculated many times; meanwhile, that could be a resource-demanding task and take a long time. One way to increase the speed of calculations is parallel computation with GPU. MATLAB offers user-friendly methods and tools for doing it. We have analysed dimensionless speed up of optimisation with tools of MATLAB.

The available speed up depends on the size of the population Speed up increases approximately exponentially in the function of population size (see in Fig. 2). If the population size is small, there is no reason for parallelisation.

In future exploration, it could be interesting to inspect speed up in the function of the number of elements and number of nodes with fixed and varied size populations.