Keywords

1 Introduction

The study of physical processes in the Universe, their influence on the self-organization and evolution of astronomical objects, as well as on their further dynamics and interaction constitute the subject of modern astrophysics. The importance of considering gravitational and magnetic fields and the difficulty of reproducing cosmic conditions in the laboratory impose significant restrictions on the experimental study of astronomical objects. Thus, mathematical modeling is the main, and often the only, approach to the theoretical study of astrophysical processes and astronomical objects.

The evolution of hydrodynamic turbulence and the formation of compact objects as a result of gravitational collapse are among the important processes occurring in astrophysical objects at various spatial scales [1, 2]. Magnetohydrodynamic (MHD) turbulence was simulated at the scales of clusters of galaxies in [3]. Problems of gravitational and magneto-gravitational instability [4], dynamics of clouds falling into a black hole [5], and cloud collapse and its fragmentation [6] have been considered in the context of modeling the dynamics of molecular clouds.

An important role is given to the influence of magnetic fields on the evolution of interstellar turbulent flows, in which the magnetic fields are quite strong [7,8,9]. The energy spectrum [10], the subalfvenian flows [11], and the star formation rate [12] have been studied in the context of the evolution of MHD turbulence. A comparison of various codes for simulation of supersonic turbulence was made in [13]. Turbulence in the solar wind was investigated in [14]. It has been noted that turbulence is the main mechanism for the transition of the deflagration process into detonation in supernova explosion problems [15]. It is important to realize that significant computational high-performance resources are required if one wants to simulate the evolution of hydrodynamic turbulence with self-gravity taken into account.

A trend for using hybrid supercomputers equipped with graphics accelerators and Intel Xeon Phi or Sunway accelerators has become obvious. There are a variety of codes adapted for hybrid supercomputers to simulate hydrodynamic flows in astrophysics [16,17,18,19,20,21,22,23]. However, the main potential for improving the performance in hydrodynamic computing on Intel Xeon Phi accelerators using low-level vectorization of computations has not been sufficiently explored.

In this paper, we shall consider the model problem of turbulence evolution using a new vectorized code developed for supercomputers equipped with Intel Xeon Phi KNL accelerators. The peak performance of Intel Xeon Phi dual accelerators is about three teraFLOPS. Of course, such a value is unreachable in real-world applications but a value of the order of one teraFLOPS can be achieved on synthetic tests. We will be guided by this value when designing our computational model. At present, some program codes (based on publications in the Computer Physics Communications journal) using Intel Xeon Phi accelerators have been implemented in the fields of plasma physics [24], molecular dynamics [25, 26], statistical mechanics [27], and hydrodynamics [28].

In 2015, we developed the AstroPhi code [18], based on the implementation of an original numerical method by using the offload programming model of the Intel Xeon Phi. The used accelerator architecture did not allow us to implement vector instructions, although switching to the native mode made it possible to achieve a code performance of 28 gigaFLOPS [29]. The use of low-level vectorization of cycles in the AstroPhi code allowed us to increase the performance to a value of the order of 100 gigaFLOPS [30]. There became evident the necessity to use low-level vectorizing tools to achieve a maximum performance. The new version of the code was based on the HLL method and used a single accelerator [31, 32]. With this implementation, we achieved performances of 245 gigaFLOPS on Intel Xeon Phi 7250 and 302 gigaFLOPS on Intel Xeon Phi 7290.

The computational model and the numerical method will be briefly described in Sect. 2. Section 3 is devoted to the development and investigation of the parallel implementation. In Sect. 4, we formulate the main problems of vectorization. Section 5 is devoted to the simulation of hydrodynamic turbulence taking self-gravity into consideration. Finally, we summarize the conclusions of our research in Sect. 6.

2 The Computational Model

The mathematical model is based on the equations of multicomponent gravitational hydrodynamics. An important condition for the subsequent construction of a vectorized numerical method is to write the equations in vector form. We will use an overdetermined system of hydrodynamic equations with an entropy equation. This will enable us to write the system of hydrodynamic equations in a divergent form, making it possible to formulate a vector numerical method:

$$\begin{aligned} \frac{\partial }{\partial t} \left( \begin{array}{c} \rho \\ \rho _{i} \\ \rho \mathbf {u} \\ \rho S \\ \rho E \end{array} \right) + \bigtriangledown \cdot \left( \begin{array}{c} \rho \mathbf {u} \\ \rho _{i} \mathbf {u} \\ \rho \mathbf {u} \otimes \mathbf {u} + p \\ \rho S \mathbf {u} \\ \left( \rho E + p \right) \mathbf {u} \end{array} \right) = \left( \begin{array}{c} 0 \\ s_{i} \\ \rho \bigtriangledown \varPhi \\ \left( \gamma - 1 \right) \rho ^{1-\gamma } \left( \varLambda - \varGamma \right) \\ \varLambda - \varGamma \end{array} \right) , \end{aligned}$$
(1)

where \(\rho _{i}\) is the density of the species, \(\rho = \sum _{i} \rho _{i}\) denotes the density of the gas mixture, \(\mathbf {u} = \left( u_x, u_y, u_z \right) \) is the velocity vector, S stands for the entropy, \(p = p \left( \rho , S, T \right) \) denotes the pressure, \(\gamma \) is the adiabatic index, \(\rho E = \rho \varepsilon + \frac{1}{2} \rho \mathbf {u}^2\) is the total mechanical energy, T is the temperature, \(s_{I}\) represents the rate of formation of the corresponding species and, finally, \(\varPhi \) is the gravitational potential satisfying the Poisson equation

$$\begin{aligned} \bigtriangleup \varPhi = 4 \pi G \rho , \end{aligned}$$
(2)

in which G is the gravitational constant, \(\varLambda \) is the cooling function and \(\varGamma \) is the heating function. In this article, we restrict ourselves to considering the equation of state based on a combination of the isothermal and adiabatic regimes:

$$\begin{aligned} p = c_s^2 \rho + c_s^2 \rho _{\text {crit}} \left( \rho / \rho _{\text {crit}} \right) ^{\gamma }, \end{aligned}$$
(3)

where \(c_s^2\) is the isothermal velocity of sound and \(\rho _{\text {crit}}\) is the critical density of the gas during the transition from isothermal to adiabatic mode, which can be expressed as

$$\begin{aligned} \rho _{\text {crit}} = \mu m_H n_{\text {crit}}, \end{aligned}$$
(4)

with \(\mu \) the average molecular weight of gas, \(m_H\) the mass of a hydrogen atom, and \(n_{\text {crit}}\) the critical gas concentration. In this work, we assume \(n_{\text {crit}} = 10^{10}\) cm\(^{-3}\). We will consider neither cooling/heating processes nor chemical kinetics processes. Consequently, to simulate hydrodynamic turbulence, we will use the following simplified form of the equations:

$$\begin{aligned} \frac{\partial }{\partial t} \left( \begin{array}{c} \rho \\ \rho \mathbf {u} \end{array} \right) + \bigtriangledown \cdot \left( \begin{array}{c} \rho \mathbf {u} \\ \rho \mathbf {u} \otimes \mathbf {u} + p \end{array} \right) = \left( \begin{array}{c} 0 \\ \rho \bigtriangledown \varPhi \end{array} \right) . \end{aligned}$$
(5)

However, we will describe all the calculations and the structure of the code for the entire system given in (1).

The equations of hydrodynamics can be written in vector form:

$$\begin{aligned} \frac{\partial U}{\partial t} + \frac{\partial F(U)}{\partial x} = 0. \end{aligned}$$
(6)

To solve the equations, one can use a numerical method based on a combination of the operator splitting approach, the Godunov method, the HLL scheme, and the piecewise-parabolic method on a local stencil. The flow through the boundary between the left (L) and the right (R) cells is calculated with the help of the equation

$$\begin{aligned} F = \frac{F \left( -\lambda _{\text {L}} \tau \right) + F \left( \lambda _{\text {R}} \tau \right) }{2} + \frac{c + \Vert \mathbf {u} \Vert }{2} \left( U \left( -\lambda _{\text {L}} \tau \right) - U \left( \lambda _{\text {R}} \tau \right) \right) , \end{aligned}$$
(7)

where

$$\begin{aligned} \lambda _{\text {L}} = c - \Vert \mathbf {u} \Vert , \qquad \lambda _{\text {R}} = c + \Vert \mathbf {u} \Vert , \end{aligned}$$
(8)

with \(c = \sqrt{\frac{\gamma p}{\rho }}\) the speed of sound. The modification of the parabolas construct given in [33] is based on the reduction of the order of the first element in the parabola.

The application of the procedure suggested in [33] for the construction of a local parabola to increase the order of accuracy would have made more difficult the transition to an adaptive nested mesh, due to the difference in size of the cells. Therefore, we set two features: to take the original PPML approach using a compact template and the ability to integrate parabolas along the characteristics in each cell. To this end, we save the solver notation and, therefore, the parallel computing algorithms. To solve the problems posed, we will rewrite the parabola construction algorithm from [33] and integrate the parabolas within each cell.

The blocks are the parabolas constructed for the numerical scheme. We construct a piecewise-parabolic function q(x) on a regular mesh with step size h on the interval \([x_{i-1/2},x_{i+1/2}]\). The general equation of the parabola can be written as

$$\begin{aligned} q(x) = q_{i}^{\text {L}} + \xi \left( \bigtriangleup q_{i} + q_{i}^{(6)} (1 - \xi ) \right) , \end{aligned}$$

where \(q_{i}\) is the value at the center of the cell, \(\xi = (x - x_{i-1/2})h^{-1}\), \(\bigtriangleup q_{i} = q_{i}^{\text {L}} - q_{i}^{\text {R}}\), and \(q_{i}^{(6)} = 6 \bigl (q_{i} - 1/2 (q_{i}^{\text {L}} + q_{i}^{\text {R}})\bigr )\), according to conservation laws:

$$\begin{aligned} q_{i} = h^{-1} \int _{x_{i-1/2}}^{x_{i+1/2}} q(x)\,dx. \end{aligned}$$

To construct \(q_{i}^{\text {R}} = q_{i+1}^{\text {L}} = q_{i+1/2}\), we use an interpolation function of second order of accuracy:

$$\begin{aligned} q_{i+1/2} = 1/2(q_{i} + q_{i+1}), \end{aligned}$$

where \(\delta q_{i} = 1/2 (q_{i+1} - q_{i-1})\). The input value for the construction of the parabola is \(q_{i}\). The output procedure involves all parameters of the parabola on each interval \([x_{i-1/2},x_{i+1/2}]\).

  1. 1.

    Construct \(\delta q_{i} = 1/2 (q_{i+1} - q_{i-1})\) without extreme regularization:

  2. 2.

    Compute the boundary values for the parabola:

    $$\begin{aligned} q_{i}^{\text {R}} = q_{i+1}^{\text {L}} = q_{i+1/2} = 1/2(q_{i} + q_{i+1}). \end{aligned}$$
  3. 3.

    Reconstruct the parabola according to the following equations:

    $$\begin{aligned} \begin{aligned} \bigtriangleup q_{i}&= q_{i}^{\text {L}} - q_{i}^{\text {R}}, q_{i}^{(6)}&= 6 (q_{i} - 1/2 (q_{i}^{\text {L}} + q_{i}^{\text {R}})). \end{aligned} \end{aligned}$$

    To obtain a monotone parabola, we use the following equations for the boundary values \(q_{i}^{\text {L}}, q_{i}^{\text {R}}\):

    $$\begin{aligned} \begin{aligned} q_{i}^{\text {L}}&= q_{i},\ q_{i}^{\text {R}} = q_{i},\ (q_{i}^{\text {L}} - q_{i})(q_{i} - q_{i}^{\text {R}}) \le 0,\\ q_{i}^{\text {L}}&= 3q_{i} - 2q_{i}^{\text {R}},\ \bigtriangleup q_{i} q_{i}^{(6)} > (\bigtriangleup q_{i})^{2},\\ q_{i}^{\text {R}}&= 3q_{i} - 2q_{i}^{\text {L}},\ \bigtriangleup q_{i} q_{i}^{(6)} < - (\bigtriangleup q_{i})^{2}. \end{aligned} \end{aligned}$$
  4. 4.

    Make a final upgrade of the parabola parameters:

    $$\begin{aligned} \begin{aligned} \bigtriangleup q_{i}&= q_{i}^{\text {L}} - q_{i}^{\text {R}},\\ q_{i}^{(6)}&= 6 (q_{i} - 1/2 (q_{i}^{\text {L}} + q_{i}^{\text {R}})). \end{aligned} \end{aligned}$$

At the final stage of the solution of the hydrodynamic equations, we execute an adjustment procedure. In the case of a gas vacuum border, we have

$$\begin{aligned} \Vert \mathbf {u} \Vert = \sqrt{2(E-\epsilon )},\ (E - \mathbf {u}^{2}/2)/E < 10^{-3}. \end{aligned}$$
(9)

In other regions, we apply an adjustment to ensure a nondecreasing entropy:

$$\begin{aligned} \rho \epsilon = \left( \rho E - \frac{\rho \mathbf {u}^{2}}{2} \right) ,\ (E - \mathbf {u}^{2}/2)/E \ge 10^{-3}. \end{aligned}$$
(10)

This modification provides a detailed balance of energy and ensures a nondecreasing entropy.

After solving the hydrodynamic equations, it is necessary to restore the gravitational potential with respect to the gas density. To this end, we will use a 27-point template to approximate the Poisson equation. The algorithm for solving the Poisson equation consists of three stages:

  1. 1.

    Setting the boundary conditions for the gravitational potential at the boundary of the region.

  2. 2.

    Transforming the density function to the harmonics space. A fast Fourier transform is used for this.

  3. 3.

    Solving the Poisson equation in the harmonics space. Next, it is necessary to perform the inverse fast Fourier transformation of the potential of the harmonics into the functional space of the harmonics.

The details of the method are given in [33].

3 Parallel Implementation

The parallel implementation is based on a multi-level decomposition of the computations:

  1. 1.

    One-dimensional decomposition of the computational domain by means of MPI, which, for consistency with the solution of the Poisson equation, is specified by the FFTW library.

  2. 2.

    One-dimensional decomposition of the computations by means of OpenMP as part of a single process running on a single Intel Xeon Phi accelerator.

  3. 3.

    Vectorization of computations within a single cell.

The geometric decomposition of the computational domain is carried out by means of MPI processes and by means of OpenMP threads. In the case of a decomposition of the computations by means of MPI, it is necessary to take into account overlapping subregions. The compact calculation template allows for the use of only one overlapping layer.

Next, we describe the basic instructions used to implement the method. We will dwell only on the declarative description:

  • _mm512_set1_pd – Formation of a vector with each element being a scalar.

  • _mm512_load_pd – Loading the addresses of the eight double elements of the vector.

  • _mm512_mul_pd – Multiplication of vectors.

  • _mm512_add_pd – Addition of vectors.

  • _mm512_sub_pd – Subtraction of vectors.

  • _mm512_stream_pd – Writing the vector to memory.

  • _mm512_abs_pd – Getting the absolute value of the vector elements.

The instructions given here are sufficient to implement a numerical method for the solution of the hydrodynamic equations. We used the following line to compile the code:

It is worth noting only the acceleration of the division through the option -no-prec-div, which is recommended when using SSE extensions.

We studied the acceleration of the gooPhi code on a \(512^3\) grid. We measured the time of the numerical method (Total) in seconds on different numbers of logical cores (Cores). The acceleration P (Speedup) was calculated with the formula

$$\begin{aligned} P = \frac{\text {Total}_1}{\text {Total}_K}, \end{aligned}$$
(11)

where \(\text {Total}_1\) is the computation time on one logical core and \(\text {Total}_K\) is the computation time on K logical cores. We also assessed the actual performance. Table 1 contains the results on acceleration and performance on a mesh of size \(512^3\). We achieved a performance of 173 gigaFLOPS and a speedup factor of 48 using a single Intel Xeon Phi KNL.

Table 1. Speedup and real performance of the code on a single Intel Xeon Phi

In addition, we studied the scalability of the gooPhi code on a mesh of size \(512 \times 512 \times 512\) points using all logical cores of each accelerator. Thus, each accelerator has a subdomain size of \(512^3\). For scalability assessment purposes, we measured the time of the numerical method (Total) in seconds while varying the number of Intel Xeon Phi (KNL) accelerators. The scalability T was computed using the formula

$$\begin{aligned} T = \frac{\text {Total}_1}{\text {Total}_p}, \end{aligned}$$
(12)

where \(\text {Total}_1\) is the computation time for one accelerator when using a single accelerator and \(\text {Total}_p\) is the computing time for one accelerator when using p accelerators. The results on acceleration are given in Table 2. Using 16 accelerators, we achieved a 97% scalability. Note that this is a fairly high result.

Table 2. Scalability of the code for various numbers of Intel Xeon Phi accelerators

4 Discussion

In this section, we will discuss several important issues related to the organization of computations, constraints, and new features.

  1. 1.

    In the study, we used the eight elements of the vector (four density functions, three components of the velocity, and the entropy). This is connected with the use of all elements of a 512-bit double-precision vector. We hope that the size of the vector in future versions of the processors will be increased. This would allow us to take into account a greater number of species. At the same time, the multiplicity of eight requires in some cases the use of dummy elements for the organization of computations.

  2. 2.

    When writing the first version of the AstroPhi code and performing subsequent studies, an interesting fact emerged: a greater performance is achieved when using separate arrays to describe hydrodynamic quantities (density, angular momentum, pressure, etc.) than when using an array of C/C++ language structures in which each object contains all the information about the cell. Apparently, this is due to the use of a larger cache. This means that, when accessing multiple arrays, the corresponding cache lines are filled. Thus, we efficiently used as many cache lines as arrays. In the case of structures (or 4D arrays as in the present paper), only one or two cache lines were used.

  3. 3.

    In our implementation, we did not use combined instructions of FMA type. Performance tests, especially in linear algebra applications, where the main operation is a daxpy instruction, show that using FMA instructions improves performance. However, this trend was not observed. Moreover, there was a slowdown of the code, after which we decided to reject such instructions.

5 Modeling of Hydrodynamic Turbulence with Self-Gravity

For the simulation, we considered the test problem in the cubic region \([-1;1]^3\) with \(c_s = 0.1\). The initial density was assumed to be 1. The initial velocity perturbations followed a Gaussian distribution [34].

The main analysis of turbulent flows with gravity consists in estimating the Jeans criterion and the free-fall time, during which a local collapse occurs. To estimate of Jeans criterion, let us write the equations of gravitational hydrodynamics in 1D form using the isothermal equation of state:

$$\begin{aligned}&\frac{\partial \rho }{\partial t} + \frac{\partial }{\partial x} \left( \rho u \right) = 0,\nonumber \\&\frac{\partial \rho u}{\partial t} + \frac{\partial }{\partial x} \left( \rho u u \right) = -\frac{\partial p}{\partial x} - \rho \frac{\partial \varPhi }{\partial x},\nonumber \\&\frac{\partial ^2 \varPhi }{\partial x^2} = 4 \pi G \rho ,\nonumber \\&p = c_s^2 \rho . \end{aligned}$$
(13)

The adiabatic term of the equation of state (3) starts working when the critical density is reached. This density is attained during the development of instability. For the analysis, we need the Jeans criterion, which is achieved at the initial stage by using the isothermal equation of state.

We will consider a linear perturbation of the physical variables:

$$\begin{aligned} \rho = \rho _0 + \rho _1, \quad p = p_0 + p_1, \quad u = u_1, \quad \varPhi = \varPhi _0 + \varPhi _1. \end{aligned}$$
(14)

Let us rewrite the equations of gravitational hydrodynamics for the considered perturbation of the physical variables:

$$\begin{aligned}&\frac{\partial \rho _1}{\partial t} + \rho _0 \frac{\partial u_1}{\partial x} = 0,\nonumber \\&\frac{\partial u_1}{\partial t} = -\frac{c_s^2}{\rho _0} \frac{\partial \rho _1}{\partial x} - \frac{\partial \varPhi _1}{\partial x},\nonumber \\&\frac{\partial ^2 \varPhi _1}{\partial x^2} = 4 \pi G \rho _1. \end{aligned}$$
(15)

We seek a nontrivial solution proportional to \(\exp \left[ i \left( kx + \omega t \right) \right] \). Consequently,

$$\begin{aligned} \frac{\partial }{\partial t} = i \omega , \quad \frac{\partial }{\partial x} = i k. \end{aligned}$$

Let us write the equations for \(\left( \rho _1, u_1, \varPhi _1 \right) \) in the following form:

$$\begin{aligned}&\omega \rho _1 + k \rho _0 u_1 = 0,\nonumber \\&\frac{k c_s^2}{\rho _0} \rho _1 + \omega u_1 + k \varPhi _1 = 0,\nonumber \\&4 \pi G \rho _1 + k^2 \varPhi _1 = 0. \end{aligned}$$
(16)

By equating to zero the determinant of the system,

$$\begin{aligned} \left| \begin{array}{ccc} \omega &{} k \rho _0 &{} 0 \\ \frac{k c_s^2}{\rho _0} &{} \omega &{} k \\ 4 \pi G &{} 0 &{} k^2 \end{array} \right| , \end{aligned}$$

we obtain the condition

$$\begin{aligned} \omega ^2 = k^2 c_s^2 - 4 \pi G \rho _0. \end{aligned}$$
(17)

We should write the critical wavenumber of the Jeans criterion in the form

$$\begin{aligned} k_J = \left( \frac{4 \pi G \rho _0 }{ c_s^2 } \right) ^{1/2}, \end{aligned}$$
(18)

and the critical wavelength of the Jeans criterion in the form

$$\begin{aligned} \lambda _J = \frac{2 \pi }{k_J} = \left( \frac{\pi }{G \rho _0} \right) ^{1/2} c_s. \end{aligned}$$
(19)

By applying a perturbation of the wavelength \(\lambda > \lambda _J\), we trigger the gravitational instability.

To estimate the free-fall time, we consider the collapse of a homogeneous sphere of mass M and radius R. We need to estimate the time it takes the sphere radius to decrease from R to zero. Let us write the equation for the moment of impulse in the following form:

$$\begin{aligned} \frac{d^2 r}{d t^2} = -\frac{G m}{r^2}, \end{aligned}$$
(20)

where \(m = 4 \pi \int _{0}^{r} r^2 \rho _0\, dr\) and \(M = \frac{4 \pi R^3 \rho _0}{3}\). Here we omit the cumbersome but rather trivial computations. It follows from Eq. (20) that

$$\begin{aligned} dt = - \left( \frac{8 \pi G \rho _0}{3} \right) ^{-1/2} \left( \frac{r}{R-r} \right) ^{1/2} \frac{dr}{R}. \end{aligned}$$
(21)

By integrating the last equation from the initial state of the sphere \(r = R\) to the final stage \(r = 0\), when it collapses, we obtain the equation for the free-fall time \(t_{{\mathrm{f}\mathrm{f}}}\):

$$\begin{aligned} t_{{\mathrm{f}\mathrm{f}}} = \left( \frac{3 \pi }{32 G \rho _0} \right) ^{1/2}. \end{aligned}$$
(22)

We will use the last equation to find the characteristic time for the local collapse. Obviously, a collapse is not achievable in a hydrodynamic model in that time. However, since the computational cells have finite size we can consider the process of local collapse in various subdomains of the computational domain. That is especially important in the context of the process of star formation and supernovae explosions.

The results of the computational experiments on the evolution of hydrodynamic turbulence are portrayed in Fig. 1. As we can see, density fragmentation occurs throughout the evolution of turbulence. It would be interesting to consider each individual density wave since in the context of star formation these waves can potentially correspond to young stars. It would also be interesting from the point of view of nuclear reactions to consider the high density regions in the case of turbulent combustion of carbon in white dwarfs.

Fig. 1.
figure 1

The density distribution during the evolution of the turbulence process for a model time equal to one quarter (a), two quarters (b), three quarters (c), and four quarters (d) of the free-fall time for cold matter

The problem of hydrodynamic turbulence is one of interest in various astrophysical applications. Our main interest is related to the organization of parallel and distributed computations of supernova explosions. Despite the variety of mechanisms involved in supernova explosions, the distributed computations in these problems are used to correctly reproduce the nuclear combustion of chemical elements and, therefore, correctly compute the injected energy in each computational cell of the domain.

The distributed run of such problems is a very expensive and complicated procedure, and a detailed elaboration is not always required. This is a consequence of the fact that perturbations in the computational cell do not always lead to instabilities. The main criterion for running a hydrodynamic problem should be the analysis of the Jeans criterion \(\lambda _J\). If it is attained, then it is enough to carry out the simulation for a time less than free-fall time \(t_{{\mathrm{f}\mathrm{f}}}\), rather than for the characteristic time step of the main task. All density waves are formed in that time, and this allows one to fully take into account all nuclear reactions in supernovae of all types.

6 Conclusions

In this paper, we presented the results of simulations of hydrodynamic turbulence with self-gravity, employing the latest Intel Xeon Phi accelerators with KNL architecture. A new vector numerical code was described in detail. We achieved a performance of 173 gigaFLOPS and an acceleration factor of 48 by using a single Intel Xeon Phi KNL. Using 16 accelerators, we reached a scalability of 97%.