Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The use of graphical processors in the case of simulation of beam dynamics is widely spread through particle accelerator scientists [1,2,3,4,5,6] today as, without a doubt, a very popular approach, though it is not quite a new one in other areas of science. But in accelerator physics there are some problems with porting the earlier realised program components to the hybrid architecture. We will speak about it below.

Last year we present the idea of using graphical processors for simulation and visualization in the terms of constructing Virtual Accelerator Laboratory (VAL) [7]. In the work we described how to develop a model of real machine and test the behaviour of the beam inside the virtual system of control elements, such as dipoles, quadrupoles, sextupoles and drifts. As far as VAL is a problem-solving environment, it is divided in to special blocks of components.

In this paper we will involve only this set of components:

  • The block of control elements: with the help of it the user can construct the main view of the future accelerator (for example, Fig. 1).

  • The block of particle distribution: which forms the initial particle distribution coming from the source.

  • The block of data: the data base with all calculated results.

  • The block of GPU vizualization: the result of calculations presented in a 3D graphical form.

Fig. 1.
figure 1

Schematic view of accelerator ring for modelling the long-turn evolution

2 The Idea of the Matrix Form of the ODE

The main idea of the concept is to describe the behaviour of some dynamic system with the set of ordinary differential equations presented in the matrix form. Let us have a system:

$$\begin{aligned} \frac{dX}{dt} = F(X,t), \end{aligned}$$
(1)

where F(Xt) - arbitrary, analytic in the neighborhood \(X=0, X \in R^n\) and measurable \(t \in [t_0,t_s] \in R^n\) function.

According to [8], Eq. (1) can be written:

$$\begin{aligned} \frac{dX}{dt} = \sum ^{k}_{i=0}{\mathbb {P}}^{1i}(t)X^{[i]}, \end{aligned}$$
(2)

where \({\mathbb {P}}^{1i}\) matrix with symbolic coefficients obtained from the original ODE. This matrix has the form:

$$\begin{aligned} {\mathbb {P}}^{11} = \left( \begin{array}{lllll} {\mathbb {P}}^{11} &{} {\mathbb {P}}^{12} &{} \dots &{} {\mathbb {P}}^{1k} &{} \dots \\ \mathbb {O}&{} {\mathbb {P}}^{22} &{} \dots &{} {\mathbb {P}}^{2k} &{} \dots \\ \vdots &{} \vdots &{} \ddots &{} \vdots &{} \ddots \\ \mathbb {O}&{}\mathbb {O}&{} \dots &{} {\mathbb {P}}^{jk} &{} \dots \\ \vdots &{} \vdots &{} \ddots &{} \vdots &{} \ddots \\ \end{array} \right) , \end{aligned}$$

Dimensions of control matrices depend on the length of the vector X and the order of nonlinearity necessary for the experiment.

In the course of symbolic computation elements of final matrix \(\mathbb {P}\) can be calculated by series expansion:

$$ \left\{ {\mathbb {P}}^{1k}(t) \right\} _{ij} = \frac{1}{k_1! \dots k_n!}\left. \frac{{\partial }^k {\mathbf{F} { }}_i (X_j, t)}{\partial x^{k_1}_1 \dots \partial x^{k_n}_n} \right| _{x_1=\dots =x_n=1}$$

Depending on the problem Eq. 1 can be represented in numerical form:

$$\begin{aligned} X = \sum ^{k}_{i=0}{\mathbb {R}}^{1i}(t)X^{[i]}_0, \end{aligned}$$
(3)

where \({\mathbb {R}}^{1i}\) – matrix with numerical coefficients.

Initial vector \(X = \{x,y,x',y',s,\cdots \}^T\) can also be written as a matrix consisting of vectors:

$$\begin{aligned} {\mathbf{X} { }}_0 = \left( \begin{array}{cccc} x_1 &{} x_2 &{} \cdots &{} x_n\\ y_1 &{} y_2 &{} \cdots &{} y_n\\ {x'}_1 &{} {x'}_2 &{} \cdots &{} {x'}_n\\ {y'}_1 &{} {y'}_2 &{} \cdots &{} {y'}_n\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ \end{array} \right) \end{aligned}$$

Thus, during calculations is the multiplication of the matrix by the matrix:

$$\begin{aligned} \mathbf{X} { }_s = \sum ^{k}_{i=0}{\mathbb {R}}^{1i}(t)\mathbf{X} { }^{[i]}_j, \end{aligned}$$
(4)

3 The Space Charge Example

The these part describes how it works on the example of the influence of space charge on the dynamics of the beam.

The equations of the cross-section motion in the beam without bunches can be written in the following form:

$$\begin{aligned} x'' = \frac{q}{p} \sqrt{1 + {x'}^2 + {y'}^2}(y' B_s - (1 + {x'}^2)B_y + x'y'B_x + \sqrt{1 + {x'}^2 + {y'}^2} \frac{E_x}{c \beta \gamma }),\,\nonumber \\ y'' = - \frac{q}{p} \sqrt{1 + {x'}^2 + {y'}^2}(x' B_s - (1 + {y'}^2)B_x + x'y'B_y + \sqrt{1 + {x'}^2 + {y'}^2}\frac{E_y}{c \beta \gamma }) \end{aligned}$$
(5)

In the linear case for example for quadrupole they will look as shown below

$$\begin{aligned} x'' + \frac{q B_{xy}}{m_0 c \beta \gamma } x - \frac{q}{{\epsilon }_0 m_0 c^2 {\beta }^2 {\gamma }^3} E_x^L = 0, \nonumber \\ y'' - \frac{q B_{yx}}{m_0 c \beta \gamma } y - \frac{q}{{\epsilon }_0 m_0 c^2 {\beta }^2 {\gamma }^3} E_y^L = 0 \end{aligned}$$
(6)

According to the matrix form the influence of the space charge will modify the Eq. (2) this way:

$$\begin{aligned} \frac{dX}{dt} = \sum ^{k}_{i=0}({{\mathbb {P}}_ext}^{1i}(s) + {{\mathbb {P}}_self}^{1i}(s))X^{[i]}, \end{aligned}$$
(7)

Now we have the part, which respond to the external field and the part of the self space charge forces. The external field describes by the components of the vector of magnetic induction. And for the space charge responds the intensity vector.

Matrices \({{\mathbb {P}}_{ext}}^{11}(s) + {{\mathbb {P}}_{self}}^{11}(s)\) in linear case will be:

figure a

For non-linear case the Eq. (5) will be

$$\begin{aligned} x'' + (K_x - L_x)x = a_x x^3 + b_x x {x'}^2 + c x y^2 + (K_x)' x y y' + d_x x {y'}^2 + K_x x' y y', \nonumber \\ y'' + (K_y - L_y)y = a_y y^3 + b_y y {y'}^2 + c y x^2 + (K_y)' y x x' + d_y y {x'}^2 + K_y y' x x' \end{aligned}$$
(10)

where

$$K_x = \frac{q B_xy}{m_0 c \beta \gamma },$$
$$K_y = \frac{q B_yx}{m_0 c \beta \gamma },$$
$$L_x = \alpha \frac{qI}{\pi {\epsilon }_0 m_0 c^3 {\beta }^3 {\gamma }^3 r_x(r_x + r_y)},$$
$$L_y = \alpha \frac{qI}{\pi {\epsilon }_0 m_0 c^3 {\beta }^3 {\gamma }^3 r_y(r_x + r_y)}.$$

Therefore the external and self matrices will take the following form

figure b

The idea of the matrix form of the ordinary differential equations is not new [9]. But with the help of evolution of the graphical processors these form of the equations get a new life. And next the usage of these method will be presented.

4 Linear Cases on CPU

The workflow of the environment is shown on the scheme on Fig. 2. The initial distribution of the particles and the structure of the machine is set to the database to begin working with. The user can choose the style of the machine: linear or cyclic. Then the mode of the simulation can be defined: element-to-element or long-turn evolution. By element-to-element we mean the test of the system, when the visualization of particle distribution is made after each element.

Fig. 2.
figure 2

General scheme of the workflow of the PSE

The results of computations in linear case for different distributions are shown on the pictures below. The system in this example consists of several drifts and quadrupoles under the FODO concept

figure c

where DQ - defocusing quadrupole, FQ - focus quadrupole.

The studies show that the algorithm works with commonly used distributions - Gauss (Fig. 3) and uniform (Fig. 4). On Fig. 5 new Gauss distribution of 10000 particles on CPU is shown. And on Fig. 6 new uniform distribution of 10000 particles on CPU is shown.

Fig. 3.
figure 3

Initial Gauss distribution, N = 10000 particles

Fig. 4.
figure 4

Initial Uniform distribution, N = 10000 particles

Fig. 5.
figure 5

New Gauss distribution of 10000 particles on CPU in (X, Y), (X, \(P_x\)), (Y, \(P_y\)), (\(P_x\), \(P_y\))

5 Long-Turn Evolution of the Beam on GPU

When we speak about the linear machine with some meters lengthwise, the personal computer will solve our problems. But in the case of the storage ring with the long-turn evolution of the beam, we need the machine like the accelerator itself to get the result in suitable time. As the result of transformation of the ODE we get matrices in the sparse form. The problem is how to compute them on the GPU in the most effective way. The advantage of using sparse matrices on GPU is that we have a lot of zero elements, which can be empty components to be loading from the CPU memory to the memory of the GPU. But the fact that we get the partial result from every element the memory exchanges will outweigh the advantages of the sparse matrices. For that reason we compute the part of accelerator (rather the quarter of it) in the symbolic way to get the total matrix of this part to load it on the device. As the whole machine consists of symmetric sections, with some corrections these matrix can be used for modelling the long-turn evolution of the beam with minimum (in some way) computational resources. The schematic view of such system is shown on the Fig. 1.

Fig. 6.
figure 6

New uniform distribution of 10000 particles on CPU in (X, Y), (X, \(P_x\)), (Y, \(P_y\)), (\(P_x\), \(P_y\))

Computations were made on GeForce GTX 1060 6 GB with compute capability 6.1. The whole environment is developed on Python 3.6 libraries. Obviously, Python is not a quick tool for computations, but it is used as background of all important components. Python plays a role of something like glue, it links everything in one working system. The main computational blocks are constructed with the following modules:

  • NumPy – scientific computations on CPU [10];

  • pyCUDA – general purpose computations on GPU [11];

  • SymPy – symbolic calculations of control elements for sending on GPU [12];

  • TkInter – graphical user interface [13];

  • Seaborn – visualization of obtained results [14];

Furthermore, there are some libraries, which are going to used for future work with machine learning and neural networks, such as Scikit-learn [15] and TensorFlow [16]. The results of computations of new Gauss distribution of 1000000 particles on GPU are shown on Fig. 7.

Fig. 7.
figure 7

New Gauss distribution of 1000000 particles on GPU in (X, Y), (X, \(P_x\)), (Y, \(P_y\)), (\(P_x\), \(P_y\))

6 Discussion

In works [17,18,19] the concurrency is made for a simple part of the large particle physics simulation toolkits, such as Geant4 and Elegant. The difficulty is that these toolkits were developed for CPU only systems. And they cannot be easily ported for general purpose GPU computations. On the contrary, the method of matrix form of ODE has parallel principle inside. Besides, we constructing the problem-solving environment as a toolkit based on the idea of matrix representation of ODE, but not in reverse. In this case, there is no such a problem of adapting, because we develop a parallel system originally.

The main difficulty in GPGPU development for the matrix algorithm is the amount of data sent on device. It is a bottleneck of practically every GPU program. As every non-zero \(P^{ij}\) matrices can be calculated separately and all zero elements are not needed for computations and not sent on device, save resources for the set of particles.

7 Conclusion

The idea of the problem-solving environment is to provide the scientist with a clear and easy environment to simulate and predict the behaviour of the beam. The result of the simulation should be shown in numerical and graphical ways.

In this work the long-turn simulation of the beam with the help of the graphical processors is made. The future work will be dedicated to visualization of the received results. Besides, the KV-distribution is under testing, and the results of simulation using this distribution can be compared with other works.

Another idea is to use the machine learning algorithms to analyse the numerical data and the neural networks, especially the deep learning approach to process visual data of the experiment.