Performance Evaluation of GPU-Accelerated Spatial Interpolation Using Radial Basis Functions for Building Explicit Surfaces

Ding, Zengyu; Mei, Gang; Cuomo, Salvatore; Xu, Nengxiong; Tian, Hong

doi:10.1007/s10766-017-0538-6

Performance Evaluation of GPU-Accelerated Spatial Interpolation Using Radial Basis Functions for Building Explicit Surfaces

Published: 20 November 2017

Volume 46, pages 963–991, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Parallel Programming Aims and scope Submit manuscript

Performance Evaluation of GPU-Accelerated Spatial Interpolation Using Radial Basis Functions for Building Explicit Surfaces

Download PDF

Zengyu Ding¹,
Gang Mei ORCID: orcid.org/0000-0003-0026-5423¹,
Salvatore Cuomo²,
Nengxiong Xu¹ &
…
Hong Tian³

675 Accesses
13 Citations
Explore all metrics

Abstract

This paper focuses on evaluating the computational performance of parallel spatial interpolation with Radial Basis Functions (RBFs) that is developed by utilizing modern GPUs. The RBFs can be used in spatial interpolation to build explicit surfaces such as Discrete Elevation Models. When interpolating with large-size of data points and interpolated points for building explicit surfaces, the computational cost would be quite expensive. To improve the computational efficiency, we specifically develop a parallel RBF spatial interpolation algorithm on many-core GPUs, and compare it with the parallel version implemented on multi-core CPUs. Five groups of experimental tests are conducted on two machines to evaluate the computational efficiency of the presented GPU-accelerated RBF spatial interpolation algorithm. Experimental results indicate that: in most cases, the parallel RBF interpolation algorithm on many-core GPUs does not have any significant advantages over the parallel version on multi-core CPUs in terms of computational efficiency. This unsatisfied performance of the GPU-accelerated RBF interpolation algorithm is due to: (1) the limited size of global memory residing on the GPU, and (2) the need to solve a system of linear equations in each GPU thread to calculate the weights and prediction value of each interpolated point.

OpenCL Based Parallel Algorithm for RBF-PUM Interpolation

Article 21 April 2017

Towards RBF Interpolation on Heterogeneous HPC Systems

A Generic Paradigm for Accelerating Laplacian-Based Mesh Smoothing on the GPU

Article 12 October 2014

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Spatial interpolation is the procedure for predicting the unknown value of a group of interpolated points according to the known value of a set of data points. The spatial interpolation is widely used in science and engineering applications, such as image processing [21], numerical analysis [32, 35], geometrical computation [3], Geographic Information System (GIS) [16, 17], Artificial Intelligence (AI) [22, 37], and even Internet of Things (IoT) [4, 7, 19]. Several of the most frequently used spatial interpolation methods include: Inverse Distance Weighted Method (IDW) [33], Kriging method [26], Discrete Smoothing Interpolation method (DSI) [24, 25], Moving Least Squares method (MLS) [30], and Radial Basis Functions (RBFs) Interpolation [8]. The performance of these interpolation methods has been excellently compared and analyzed by R. Franke [12].

The RBFs are commonly used to (1) approximate implicit surfaces in image processing and (2) build explicit surfaces such as Digital Elevation Model (DEM). The objective of approximating implicit surfaces with RBFs is different from that of building explicit surfaces with RBFs. In approximating implicit surfaces, the input data is a set of scattered points, and an implicit surface will be approximated, which attempts to fit the input scattered points. When building explicit surfaces using the RBF interpolation, a set of points with a specific type of known value is needed to calculate the unknown value of another set of interpolated points.

Much research work has been conducted to approximate implicit surfaces using the RBF approximation algorithms. For example, Cuomo et al. [8] analyzed theoretical and practical issues in using RBFs for reconstructing implicit curves and surfaces from point clouds. Hillier et al. [14] presented a generalized interpolation framework using RBFs to implicitly model three-dimensional continuous geological surfaces from scattered multivariate structural data. Macedo et al. [23] introduced the Hermite Radial Basis Function (HRBF) implicit method to calculate a global implicit function that can interpolate scattered multivariate Hermite data. Lin et al. [20] proposed a novel implicit surface reconstruction approach, named Dual-RBF.

Also, several efforts have been dedicated to building various types of explicit surfaces using the RBF interpolation algorithms. For example, Izquierdo et al. [18] proposed a new interpolation scheme for Compactly-Supported Radial Basis Functions (CS-RBFs) to address the problem of the interpolation of explicit surfaces with vertical faults from scattered data.

In many science or engineering applications such as those in the field of GIS and IoT, the involved data size could be quite large. When adopting the RBF interpolation method to deal with the large size of data sets, the computational cost would be quite expensive; especially the computational efficiency would be unsatisfied.

The techniques in HPC (High Performance Computing) are widely used to improve computational efficiency in various science and engineering applications, such as image processing [10, 11, 31], 3D data denoising [5], spatial interpolation [28, 29], and numerical computation [6, 27].

One of the effective strategies to solve the problem is to perform the RBF-based approximations or interpolations in parallel on various parallel computing platforms such as shared-memory computers, distributed-memory computers, or even clusters. For example, Cuomo et al. [9] described a parallel implicit method based on RBFs for surface reconstruction by exploiting the Graphic Processor Units (GPUs) acceleration. Wang et al. [36] presented a parallel algorithm for RBF-based surface reconstruction from contours on multi-core CPUs. Yokota et al. [38] developed a parallel algorithm for RBF interpolation that exhibits O(n) complexity, requires O(n) storage, and scales excellently up to a thousand processes on powerful clusters.

To be best of the authors’ knowledge, there is no previously reported work specifically focused on developing or evaluating GPU-accelerated spatial interpolation with RBFs for building explicit surfaces. The currently reported relevant work mainly aims at accelerating the approximation of implicit surfaces on the GPU [9].

In this paper, we specifically focus on evaluating the computational performance of GPU-accelerated spatial interpolation with RBFs. We first parallelize the RBF interpolation on many-core GPU and then compare it with the parallel version implemented on multi-core CPU. We also carry out five groups of experimental tests on two different machines to evaluate the computational efficiency.

The paper is organized as follows. Section 2 briefly introduces the spatial interpolation using RBFs. Section 3 concentrates mainly on our parallel implementations of the RBF interpolation on multi-core CPU and many-core GPU. Section 4 presents several experimental tests, and Section 5 discusses the results. Finally, Sect. 6 draws some conclusions.

2 Background: Spatial Interpolation Using RBFs

Given a set of N distinct points $\left( {x_j ,y_j } \right) ,\,\,j=1,\ldots ,N$, where $x_j \in R^s$ and $y_j \in R$, the scattered data interpolation problem consists in finding an interpolant function on F such that:

$$\begin{aligned} F\left( {x_j } \right) =y_j ,\,\,j=1,\ldots ,N. \end{aligned}$$

(1)

In the univariate setting $\left( {s=1} \right) $, the interpolant F is usually chosen in a suitable function space. A common approach assumes the function F as a linear combination of certain basis functions $\varPhi _j $.

$$\begin{aligned} F\left( x \right) =\sum \limits _{j\,=\,1}^N {w_j } \varPhi _j \left( x \right) \end{aligned}$$

(2)

In a multivariate setting $x_j \in R^s,s>1$, the problem is more complex. In order to have a well-posed multivariate scattered data interpolation problem, it is not possible to fix in advance the basis $\left\{ {\varPhi _1 ,\varPhi _2 ,\ldots ,\varPhi _N } \right\} $, since the basis functions must depend on the data sites $x_j $.

The data dependent space for RBF interpolation can be easily generated by means of the radial functions:

$$\begin{aligned} \varPhi _j =\phi \left( {{\parallel } {x-x_j } {\parallel }} \right) \end{aligned}$$

(3)

The points $x_j $ to which the basic function $\phi $ is shifted are usually referred to as centers. While there may be circumstances that suggest choosing these centers different from the data sites one generally picks the centers to coincide with the data sites.

In fact, a practical interpolation problem consists of two sub problems: finding the interpolant F and evaluating it on an assigned set of points. The coefficients $w_j $ in Eq. (2) are obtained by imposing the interpolation conditions (Eq. 1).

$$\begin{aligned} F\left( {x_i } \right) =\sum \limits _{j=1}^N {w_j } \phi \left( {{\parallel } {x_i -x_j } {\parallel }} \right) =y_i ,\quad i=1,\ldots ,N \end{aligned}$$

(4)

This leads to solving the linear system of equations $Ax=b$ in Eq. (5).

$$\begin{aligned} Aw= & {} B\nonumber \\ A= & {} \left[ \begin{array}{*{20}c} {\phi \left( {{\parallel } {x_1 -x_1 } {\parallel }} \right) } &{} {\phi \left( {{\parallel } {x_1 -x_2 } {\parallel }} \right) } &{} \cdots &{} {\phi \left( {{\parallel } {x_1 -x_N } {\parallel }} \right) } \\ {\phi \left( {{\parallel } {x_2 -x_1 } {\parallel }} \right) } &{} {\phi \left( {{\parallel } {x_2 -x_2 } {\parallel }} \right) } &{} \cdots &{} {\phi \left( {{\parallel } {x_2 -x_N } {\parallel }} \right) } \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ {\phi \left( {{\parallel } {x_N -x_1 } {\parallel }} \right) } &{} {\phi \left( {{\parallel } {x_N -x_1 } {\parallel }} \right) } &{} \cdots &{} {\phi \left( {{\parallel } {x_N -x_N } {\parallel }} \right) } \\ \end{array} \right] \nonumber \\ w= & {} \left[ {w_1 ,w_2 ,\ldots ,w_N } \right] ^T, \quad B=\left[ {y_1 ,y_2 ,\ldots ,y_N } \right] ^T \end{aligned}$$

(5)

Given a set of M points $\xi =\left\{ {\xi _1 ,\xi _2 ,\ldots ,\xi _M } \right\} $, the evaluation of the interpolant F on $\xi $ can be computed as a matrix-vector product (Eq. 6).

$$\begin{aligned} F\left( {\xi _i } \right) =\sum \limits _{j\,=\,1}^N {w_j } \phi \left( {\xi _i -x_j } \right) ,\quad i=1,2,\ldots ,M \end{aligned}$$

(6)

It is well known that in order to have a well-posed problem (Eq. 5), the matrix A must be nonsingular. Unfortunately, a complete characterization of the class of all basic functions $\phi $ that generate a nonsingular matrix for an arbitrary set $\chi =\left\{ {x_1 ,x_2 ,\ldots ,x_N } \right\} $ of distinct data points is still lacking. The situation gets better in the case of positive definite matrices, which are always non-singular. Popular RBFs $\phi _i $, that give rise to positive definite interpolation matrices, are summarized as follows.

Multi-quadrics (MQ):

$$\begin{aligned} \phi _j \left( x \right) =\sqrt{\left( {c^2+{\parallel } {x-x_j } {\parallel }^2} \right) } , \end{aligned}$$

(7)

Inverse MQ (IMQ):

$$\begin{aligned} \phi _j \left( x \right) =1/\sqrt{\left( {c^2+{\parallel } {x-x_j } {\parallel }^2} \right) } , \end{aligned}$$

(8)

Thin-plate splines (TPS):

$$\begin{aligned} \phi _j \left( x \right) ={\parallel }{x-x_j}{\parallel }^{2}\ln \left( {\parallel }{x-x_j}{\parallel }/c \right) , \end{aligned}$$

(9)

Gaussians:

$$\begin{aligned} \phi _j \left( x \right) =\exp \left( {-c{\parallel } {x-x_j } {\parallel }^2} \right) , \end{aligned}$$

(10)

where c is the shape parameter, which can be selected according to the suggestions given in the literature [13, 34, 35].

3 Methods: GPU-Accelerated Spatial Interpolation Using RBFs for Building Explicit Surfaces

In this section, we will first introduce our basic ideas behind the presented GPU-accelerated spatial interpolation using RBFs for building explicit surfaces, and then describe the details of three implementations, i.e., (1) the serial implementation of the spatial interpolation using RBFs, (2) the parallel implementation developed on multi-core CPU, and (3) the parallel implementation by utilizing a single many-core GPU.

3.1 Basic Ideas Behind the GPU-Accelerated Spatial Interpolation Using RBFs

3.1.1 Overview

The spatial interpolation algorithm using RBFs for building explicit surfaces is inherently suitable to be parallelized on GPU architecture. This is because that: in the RBF-based interpolation algorithm, the desired prediction value for each interpolated point can be calculated independently, which means that it is naturally to calculate the prediction values for many interpolated points concurrently without any data dependencies between the interpolating procedures for any pair of the interpolated points.

Due to the inherent feature of the RBF-based spatial interpolation algorithm, it is allowed a single thread to calculate the interpolation value for an interpolated point. For example, assuming there are n interpolation points that are needed to be predicted their values such as elevations, and then it is needed to allocated n threads to concurrently calculate the desired prediction values for all n interpolated points. Therefore, the RBF-based spatial interpolation method is quite suitable to be parallelized on GPU architecture.

In the RBF-based spatial interpolation, there are two choices for determining the region of data points for each interpolated point. The first is to use all the data points to calculate the prediction value of each interpolated points, while the second is to employ a local set of data points to evaluate the prediction value of an interpolated point. The interpolation methods adopting the first choice of selecting a global set of data points are referred to as Global interpolation, while those interpolation methods employing the second opinion are called Local interpolation.

In the presented RBF-based spatial interpolation for building explicit surfaces, we adopt the second choice to determine the region of data points for each interpolated point. That is, we only use the local set of data points around an interpolated point to calculate the prediction value. The local set of data points for each interpolated is found using the k Nearest Neighbors algorithm (kNN) [29]. Moreover, we adopt the Globally-Supported RBFs (GS-RBFs) rather than the Compactly-Supported RBFs (CS-RBFs) for the local set of data points to compute the prediction value of each interpolated point.

In summary, there are two key ideas behind the presented RBF-based spatial interpolation algorithm for constructing explicit surfaces:

(1)
We use a local set of data points around each interpolated point to calculate the prediction value of the interpolated point. The local set of data points is found using a kNN algorithm.
(2)
We employ GS-RBFs for the local set of data points to compute the prediction value of the interpolated point.

The process of the presented GPU-accelerated spatial interpolation algorithm using RBFs is illustrated in Fig. 1. First, the input data is stored on the host side and then transferred to the device side. Second, on the device side, an even grid is created to help conduct the kNN search procedure. Third, the local set of data points for each interpolated point is found using the kNN algorithm; and then the distances between the data points are calculated, after that the coefficient matrix can be formed according to the selected GS-RBFs. Finally, weights can be obtained by solving the linear equations, and the desired prediction value for each interpolated points can be achieved by weighting average using those weights.

3.1.2 Stage 1: The kNN Search

In the RBF-based spatial interpolation algorithm, it is needed to find the k nearest neighboring data points for each interpolated point. The kNN search algorithm is directly derived from our previous work [29]. The procedure of the kNN search is illustrated in Fig. 2, and more details are described as follows.

Step 1: Creating an even grid

The creating of an even planar grid is straightforward. We first determine the planar rectangular region for partitioning by finding the minimum and maximum x and y coordinates of all points. Then, the numbers of rows and columns of the grid can be easily determined by dividing the rectangle with the width of the square cell; see a simple illustration in Fig. 3.

Step 2: Distributing data points into cells

The objective of distributing all data points into the grid cells is to find out in which grid cell each data point is located. The distributing of each data point is in fact to determine the row and column indices of the cell in which it locates. Since that the grid cells are indexed sequentially first by rows and then by columns, the procedure of distributing can be easily carried out. First, the differences between the coordinates of a data point and the minimum coordinates of all cells are calculated; then the indices of column and row can be determined by dividing the above differences with the cell width.

Step 3: Determining data points in each cell

The objective of this step is to determine the number and the indices of those data points located in the same cell. The number of data points located in the same cell can be determined with the use of a segmented parallel reduction. After sorting all data points according to cell indices, the data points are sequentially stored in a group of segments; each segment is flagged with the cell index, and contains the indices of data points locating in the same cell. The number of those data points located in the same cell can be obtained by performing a reduction for each segment; see Fig. 4a. Moreover, the head index of the first point of each segment can be determined using segmented parallel scan; see Fig. 4b.

Step 4: Searching nearest neighbors

The process of kNN search for each interpolated point can be summarized as the following substeps: (1) locating the interpolate point into the even grid, (2) determining the level of cell expanding (see Fig. 3), and (3) finding the k nearest neighbors within the local region. More details on searching the nearest neighboring data points for each interpolated points are presented in our previous work [29].

3.1.3 Stage 2: The Interpolating Using RBFs

After finding the nearest neighbors of each interpolated point using the above-described kNN search algorithm, the distances (1) between any pair of the nearest neighboring data points and (2) between each nearest neighbor and the interpolated point can be calculated; after that, the coefficient matrix can be formed according to the selected GS-RBFs, i.e., the Multi-quadrics RBF (see Eq. 7). Finally, the weights can be obtained by solving the equations (see Eq. 5), and the desired prediction value for each interpolated points can be achieved by weighting average using those weights (see Eq. 6).

3.2 Sequential and Parallel Implementations

In this subsection, we will introduce details of the following three implementations, i.e., (1) the sequential implementation of the spatial interpolation using RBFs, (2) the parallel implementation developed on multi-core CPU, and (3) the parallel implementation by utilizing a many-core GPU. Our focus in this work is to evaluate the computational performance of the GPU-accelerated spatial interpolation algorithm using RBFs. Thus, we also implement the sequential version and the parallel version on multi-core CPU, and then compare the GPU version with that of the sequential version and the parallel version on multi-core CPU.

It should be also noted that the source code of the above three implementations is publicly available at: https://figshare.com/s/e9a2fc20daa963097d1d.

3.2.1 Sequential Implementation

There are two major sub-procedures in the sequential implementation. The first is to create the even grid and then record the number and indices of those data points located in each grid cell. The second is to loop over all interpolated points to first find the kNN of each interpolated point and then calculate the prediction value based on RBFs.

In the sequential implementation, we first sequentially loop over all points to determine the planar rectangular region for partitioning by finding the minimum and maximum x and y coordinates of all points. Then, the numbers of rows and columns of the grid can be quite easily determined by dividing the rectangle with the width of the square cell. After that, we sequentially distribute all the data points into the grid cells and record the index of the located cell for each data point. Then, we sort all the data points according to the index of the cell it locates ascendingly using the std::sort() function, and sequentially loop over all the sorted data points again to directly record the number and indices of data point located in the same grid cell. Finally, we loop over all interpolated points to first find the kNN of each interpolated point and then calculate the prediction value based on RBFs.

3.2.2 Parallel Implementation on Multi-core CPU

There are two major sub-procedures in the presented parallel spatial interpolation algorithm using RBFs. The first is to create the even grid and then record the number and indices of those data points located in each grid cell. The second is to loop over all interpolated points to first find the kNN of each interpolated point and then calculate the prediction value based on RBFs.

The first sub-procedure is implemented by strongly utilizing the Thrust library [1, 2]. Thrust is a C++ template library for parallel platforms based on the Standard Template Library (STL). Thrust provides a rich collection of data parallel primitives such as scan, sort, and reduce. Thrust allows users to implement high-performance parallel applications with minimal programming effort through a high-level interface that is fully interoperable with technologies such as C++, CUDA, OpenMP, and TBB [15].

In the first sub-procedure, we first employ the function thrust::minmax_ element() to find the boundary range of all input data points and interpolated points, i.e., to determine the minimum and maximum x and y coordinates of all data points and interpolated points. Then, we use the function thrust::sort_by_key to sort all data points ascendingly according to their coordinates. Finally, we employ the function thrust::unique_by_key to find the head indices of the data points located in each grid cell, and adopt the function thrust::reduce_by_key to determine the number of data points residing in each grid cell.

The second sub-procedure is parallelized by exploiting the interface OpenMP with the simple use of the OpenMP directive “#pragma omp parallel for”. In the second sub-procedure, a loop needs to be performed over all the interpolated points to first find the k nearest neighboring data points for each interpolated point and then calculate the nodal distances, coefficients, and weights. Due to the fact that there are no data dependencies between the calculating of prediction values of any pair of interpolated points, the spatial interpolating for all interpolated points can be parallelized by simply adding the OpenMP directive “#pragma omp parallel for” before the for_loop.

3.2.3 Parallel Implementation on Many-Core GPU

As introduced in the above subsection, there are two major sub-procedures in the presented parallel spatial interpolation algorithm using RBFs. The first is to create the even grid and then record the number and indices of those data points located in each grid cell. The second is to loop over all interpolated points to first find the kNN of each interpolated point and then calculate the prediction value based on RBFs.

The first sub-procedure is implemented by strongly utilizing the Thrust library [1, 2]. For that the Thrust library is fully interoperable with technologies such as C++, CUDA, OpenMP, and TBB [15], these employed functions provided by Thrust library, such as thrust::minmax_element(), thrust::sort_by_key, thrust::unique_by_key, and thrust::reduce_by_key, can be executed in parallel on both the multi-core CPU and many-core GPU. Therefore, there is no need to modify those functions to port them from the multi-core CPU to the many-core GPU. It only needs to replace the container thrust::host_vector with the container thrust::device_vector. After the modifying, the above-mentioned functions can be automatically executed in parallel on the GPU.

A specific CUDA kernel is designed for the second sub-procedure. In the kernel, each thread is invoked to calculate the prediction value for an interpolated point. More specifically, each thread is responsible to (1) find the k nearest neighboring data points for an interpolated point and (2) calculate the distances between the found k nearest data points, the corresponding coefficient matrix, the weights, and finally the desired prediction value for an interpolated point.

4 Experimental Results

4.1 Experimental Environment and Testing Data

4.1.1 Experimental Environment

To evaluate the computational performance of the presented RBF-based spatial interpolation algorithm, we carry out five groups of experimental tests on two different workstations. The specifications of the employed workstations are listed in Table 1.

Table 1 Specifications of the employed two workstations for performing the experimental tests

Performance Evaluation of GPU-Accelerated Spatial Interpolation Using Radial Basis Functions for Building Explicit Surfaces

Abstract

Similar content being viewed by others

OpenCL Based Parallel Algorithm for RBF-PUM Interpolation

Towards RBF Interpolation on Heterogeneous HPC Systems

A Generic Paradigm for Accelerating Laplacian-Based Mesh Smoothing on the GPU

1 Introduction

2 Background: Spatial Interpolation Using RBFs

3 Methods: GPU-Accelerated Spatial Interpolation Using RBFs for Building Explicit Surfaces

3.1 Basic Ideas Behind the GPU-Accelerated Spatial Interpolation Using RBFs

3.1.1 Overview

3.1.2 Stage 1: The kNN Search

3.1.3 Stage 2: The Interpolating Using RBFs

3.2 Sequential and Parallel Implementations

3.2.1 Sequential Implementation

3.2.2 Parallel Implementation on Multi-core CPU

3.2.3 Parallel Implementation on Many-Core GPU

4 Experimental Results

4.1 Experimental Environment and Testing Data

4.1.1 Experimental Environment

4.1.2 Testing Data

4.2 Running Time and Speedup

4.2.1 Experimental Tests on the PC NO.1

4.2.2 Experimental Tests on the PC NO.2

4.3 Interpolation Accuracy

5 Discussion

5.1 Impact of the Size of Data Set and the Value of k on the Computational Efficiency

5.1.1 Impact of the Size of Data Set on the Computational Efficiency

5.1.2 Impact of the Value of k on the Computational Efficiency

5.2 Impact of the Size of Data Set and the Value of k on the Interpolation Accuracy

5.2.1 Impact of the Size of Data Set on the Interpolation Accuracy

5.2.2 Impact of the Value of k on the Interpolation Accuracy

5.3 Key Factors that Limit the Efficiency of the GPU-Accelerated Spatial Interpolation Algorithm

5.3.1 The Memory Bottleneck in Developing GPU-Accelerated RBF Interpolation

5.3.2 Cost for Solving the System of Linear Equations

5.4 Outlook and Future Work

6 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation