1 Introduction

Geodetic networking activities in Turkey began in the early 1900s. In the 1950s, Turkey National Geodetic Network (TNGN) was established in order to create a basis for the study of mapping and cadastral works. In 1954, via eight points of the Greek and Bulgarian geodetic networks, TNGN was connected to the European datum ED50. Because Turkey is linked to a major world fault line and in terms of crustal movements is an extremely active region, Turkey National Horizontal Control Network has witnessed great damage in the region over time. Highly sensitive measurement and calculation methods were needed to identify deformations. On the other hand, due to rapid population growth and urbanisation, the need for infrastructure services had increased. This made it necessary to complete Turkey’s digital cadastral situation accurately and as soon as possible. Therefore, there was a need for a fundamental geodetic network based on the Global Positioning System (GPS) and consisting of points of specified accuracy that sufficiently covered the country's surface. With the work carried out between 1997 and 1999 by the General Command of Mapping to realise the said objectives, the Turkish National Fundamental GPS Network‒1999 (TNFGN‒99) was established, consisting of nearly 600 points. In the newly formed geodetic network, three coordinate values (X, Y, Z) and their velocities (\(\it {\text{V}}_{\text{X}}\), \(\it {\text{V}}_{\text{Y}}\), \(\it {\text{V}}_{\text{Z}}\)) were calculated in the three‒dimensional geocentric coordinate system at a given time. The TNFGN is within the ITRF (International Terrestrial Reference Frame) coordinate system, with a relative sensitivity of 0.1‒0.01 ppm, and point position sensitivity levels of 1‒3 cm. Large-scale earthquakes in 1999‒2003 (e.g.: Mw = 7.5/İzmit, Mw = 7.2/Düzce, Mw = 6.1/Çankırı-Çerkeş, Mw = 6.5/Sultandağı, and Mw = 6.4/Bingöl) caused important changes in TNFGN point locations in the earthquake zones. For this reason, TNFGN was updated by carrying out GPS measurements.

Turkey is located in the Alpine‒Himalayan earthquake zone and in the collision zone with the Eurasian tectonic plate of the African and Arabian plates (McClusky et al. 2000). Considering the magnitude of the deformation induced and the active tectonic structure of Turkey and its periphery, it is of great importance for the survival and improvement of the four-dimensional (X, Y, Z, and time) designed TNFGN to determine the time-dependent coordinate changes of the on-site points with high accuracy. Time-dependent changes in point coordinates generally caused by tectonic plate movements include pre-earthquake (inter-seismic), earthquake moment (co-seismic), and post-seismic effects (Demir and Açıkgöz 2000). Point velocities must be determined with high accuracy in order to establish these time-dependent effects. For the survival and improvement of TNFGN, GPS measurements are carried out periodically and these measurements are evaluated and combined into a specified reference epoch. The TNFGN coordinates and velocities are calculated in the ITRF system (Aktuğ et al. 2011). The velocities of TNFGN points are calculated by estimating two or more repetitive GPS measurements or, in cases without repeated GPS measurements, the speeds of other TNFGN points (Kurt and Deniz 2010).

In order to ensure the standard unity of large-scale mapping, the regulation for the construction of large-scale maps was published in 1988. However, in parallel with the increase of GPS usage in geodetic applications and large-scale map activities, it was updated in 2005 and the system of calculating geodetic point velocities was adopted. In order to compress the TNFGN, satellite positioning techniques were used to shift the coordinates of the points in the measurement epoch to a specified reference epoch. The point velocities to be used for this process are calculated by interpolation from TNFGN or high-grade compression point velocities, but the method used is not specified. The Kriging (KRIG) interpolation method is the best-known and most used technique. This method has been used in geodetic velocity modelling by several researchers (Majdański 2012; Kierulf et al. 2013; Bogusz et al. 2014; Ching and Chen 2015; Li et al. 2019). Many studies have been also conducted to investigate the geodetic velocity fields for different regions using GPS observations (Aktuğ et al. 2013; Gülal et al. 2013; Müller et al. 2013; Farolfi and Del Ventisette 2016; Ito et al. 2019; Poyraz et al. 2019).

Artificial neural networks (ANNs) belong to the family of artificial intelligence (AI) techniques widely used in many fields of geoscience. The ANN is utilized for modelling, optimization and estimation in order to solve complicated problems and generally presents satisfactory results compared to other conventional techniques. In recent years, the use of ANNs in the field of geodesy has been widely adopted and implemented. Its suitability as an alternative technique to traditional methods in solving most geodetic problems has been investigated. Many researchers have used the ANN in the past for solving different geodetic problems of coordinate transformation (Zaletnyik 2004; Lin and Wang 2006; Tierra et al. 2008; Gullu 2010; Tierra and Romero 2014; Konakoğlu and Gökalp 2016; Konakoglu et al. 2016; Ziggah et al. 2016a, b, 2019; Elshambaky et al. 2018; Cakir and Konakoglu 2019), modelling ionospheric TEC (Hernandez-Pajares et al. 1997; Cander 1998; Habarulema et al. 2007; Maruyama 2008; Akhoondzadeh 2014; Huang and Yuan 2014; Tebabal et al. 2018; Inyurt and Sekertekin 2019), geoid determination (Hu et al. 2004; Kavzoglu and Saka 2005; Stopar et al. 2006; Lin 2007; Veronez et al. 2011; Erol and Erol 2013; Cakir and Yilmaz 2014; Kaloop et al. 2018a), earth orientation parameter determination (Schuh et al. 2002; Wang et al. 2008; Liao et al. 2012; Lei et al. 2015), gravity anomaly prediction (Hajian et al. 2011; Tierra and De Freitas 2005; Pereira et al. 2012), and noise reduction in GNSS signals (Mosavi 2006; Kaloop and Hu 2015; Kaloop et al. 2018b).

In addition, one of the geodetic problems is the prediction of the geodetic point velocities. A few studies utilizing the ANN in this context have been conducted in the past with successful results. For example, Güllü et al. (2011) developed the multi-layer perceptron neural network (MLPNN) model to solve this problem using the TNFGN dataset. The geodetic point velocities have also been estimated using different interpolation methods. On the basis of statistical analysis, the MLPNN model is highly preferred compared to traditional interpolation models. Yilmaz and Gullu (2014) applied two different types of ANN models for geodetic point velocity estimation. They constructed ANN models using the MLPNN and the radial basis function neural network (RBFNN). To evaluate the performance of the ANNs, the Kriging (KRIG) method was used to interpolate the velocities. The optimal ANN was developed with two neurons in the input layer and one neuron in the output layer. The input parameters were the geographic coordinates (latitude and longitude), and the velocity component of the geodetic point (\(\it {\text{V}}_{\text{X}}\) or \(\it {\text{V}}_{Y}\) or \(\it {\text{V}}_{\text{Z}}\)) was selected separately as the output parameter. A single hidden layer was used in the ANN models. The MLPNN and RBFNN models of the ANN successfully estimated the geodetic point velocities for regional geodetic networks. The performance results predicted using the MLPNN model were better than those obtained using the KRIG and RBFNN models. A literature review indicates that an ANN model can be used as an alternative to conventional interpolation methods for predicting geodetic point velocities. However, only the MLPNN and RBFNN models were used for this purpose. Application of the generalized regression neural network (GRNN) model was not conducted in the past. In this present work, in addition to the MLPNN and RBFNN models, the GRNN model was used to predict geodetic point velocities and to find the appropriate model on the basis of a comparative study. This is the novelty of this research. To achieve this objective, 238 points and corresponding velocities belonging to the TNFGN were utilized. In order to predict the geodetic point velocities, latitude (\(\varphi\)) and longitude (λ) were selected as input parameters, and all geodetic point velocity components (\({V}_{X}\),\({V}_{Y}\), and \({V}_{Z}\)) were selected simultaneously as output parameters.

2 Study area and applied data

This study was conducted in central and eastern Anatolian regions of Turkey in an area located at 34° 48′ E‒44° 7′ E longitudes and 36° 58′ N‒41° 25′ N latitudes (Fig. 2). The study area included a total of 238 points belonging to TNFGN. Out of the 238 points, 166 (~ 70%) were randomly selected as the references and the remaining 72 points (~ 30%) were used as the test points to evaluate the performance of the ANN models. The distribution of the reference and test points within the study area is shown in Fig. 1. Here, the blue squares and red asterisks denote the reference points and test points, respectively.

Fig. 1
figure 1

Location of the study area and distribution of the reference and test points

The statistical characteristics of the dataset related to the studied points including the average, minimum, maximum, and standard deviation values are listed in Table 1.

Table 1 Statistical parameters of the dataset

To improve the computational speed and obtain more accurate results, the input and output sample data must be normalized to a range of [‒ 1,1], [0,1], or another scaling criterion before developing the ANN models. In this study, the data was normalized between ‒ 1 and 1 using Eq. (1).

$${Y}_{{\text{normali}}{\text{z}}{\text{ed}}}={Low}_{value}+\left({High}_{value}-{Low}_{value}\right)\frac{{Y}_{i}-{Y}_{min}}{{Y}_{max}-{Y}_{min}}$$
(1)

3 Artificial neural network (ANN)

McCulloch and Pitts (1943) proposed the artificial neural network (ANN) for the first time. The ANN is a computational paradigm which behaves like the human brain, and has been successfully used to find solutions to complex problems that are difficult to solve by other known methods. This is the main advantage of the ANN method. Many types of ANN models have been introduced and successfully applied to date. In this study, the three types of ANN models employed to predict geodetic point velocities were the MLPNN, GRNN, and RBFNN. The following sections present a brief description of these ANN models.

3.1 Multi-layer perceptron neural network (MLPNN)

The MLPNN is known as the most common type of ANN (Haykin 1999) and its structure can be represented as in Fig. 2. The MLPNN has three layers: the input layer, the hidden layer, and the output layer. Each neuron receives the information from other neurons and transmits it to the following layers. Interconnected processing neurons combine together and form an ANN. The output of each neuron is the product of weighted inputs. The sum of the weighted inputs formed by neurons is as given in Eq. (2).

Fig. 2
figure 2

Structure of the multi-layer perceptron neural network (MLPNN)

$$X=\left(\sum_{i=1}^{n}{w}_{ij}{x}_{i}\right)+{b}_{j}$$
(2)

where \({w}_{ij}\) are the interconnecting weights of input data \({x}_{i}\), \({b}_{j}\) is the bias for the neuron, and n is the number of input data. This sum \(X\) then passes through transfer function \(F\), which generates the output, as shown in Eq. (3).

$$Y=F\left(X\right)=F\left[\left(\sum_{i=1}^{n}{w}_{ij}{x}_{i}\right)+{b}_{j}\right]$$
(3)

Hidden and output layers generally have either a linear or non‒linear transfer function. The commonly used nonlinear transfer function, as expressed in Eq. (4), is the sigmoid function, whose output lies between 0 and 1.

$$F\left(X\right)=\frac{1}{1+{e}^{-x}}$$
(4)

If the input and output layers have negative values, then the tansig transfer function is used, which is expressed as Eq. (5).

$$F\left(X\right)=\frac{1-{e}^{-2X}}{1+{e}^{-2X}}$$
(5)

3.2 Generalized regression neural network (GRNN)

The GRNN was proposed by Specht (1991) as a type of radial basis function neural network (RBFNN) and a universal approximator for smooth functions that can solve any smooth function approximation problem (Park and Sandberg, 1991). A schematic of the GRNN is depicted in Fig. 3.

Fig. 3
figure 3

Basic structure of GRNN

As seen from Fig. 3, the GRNN consists of four layers (Patterson 1996): the input layer, pattern layer, summation layer and output layer. The first layer, referred to as input, picks up information and conveys it to the pattern layer. The pattern layer is the second layer, which is connected to the summation layer. The output of the pattern layer passes through the summation layer. This layer consists of two summations, namely, S-summation and D-summation neurons, respectively. The sum of the weighted pattern outputs is calculated using Eq. (6).

$$S=\sum_{i=1}^{n}{Y}_{i} exp\left(-\frac{{\left(X-{X}_{i}\right)}^{T}\left(X-{X}_{i}\right)}{2{\sigma }^{2}}\right)$$
(6)

where \({Y}_{i}\) is the weight connecting the \(i\) th neuron in the pattern layer to the summation layer. The unweighted pattern output is calculated using Eq. (7).

$$D=\sum_{i=1}^{n}exp\left(-\frac{{\left(X-{X}_{i}\right)}^{T}\left(X-{X}_{i}\right)}{2{\sigma }^{2}}\right)$$
(7)

The results of the sum calculated in the summation layer are then transmitted to the output layer. The output \(Y\), can be derived from the function in Eq. (8).

$$Y\left(X\right)=\frac{S}{D}=\frac{\sum_{i=1}^{n}{Y}_{i}exp\left(-\frac{{\left(X-{X}_{i}\right)}^{T}\left(X-{X}_{i}\right)}{2{\sigma }^{2}}\right)}{\sum_{i=1}^{n}exp\left(-\frac{{\left(X-{X}_{i}\right)}^{T}\left(X-{X}_{i}\right)}{2{\sigma }^{2}}\right)}$$
(8)

where \(\sigma\) is the spread parameter (also called a smoothing parameter). This is the only network parameter to be regulated and it determines the generalization performance of the GRNN. The trial and error procedure was generally used to determine the optimal spread parameter. The main advantage of this ANN compared to the other existing ANNs is that it is does not require an iterative training procedure. It needs only one‒pass learning to achieve optimal prediction performance.

3.3 Radial basis function neural network (RBFNN)

The RBFNN is a type of feed-forward neural network that was addressed by Broomhead and Lowe (1988). The architecture of the RBFNN consists of three layers, namely, the input layer, hidden layer, and output layer, as presented schematically in Fig. 4. The RBFNN has the advantage of not suffering from local minima and having a simple structure and fast training procedure compared to the MLPNN.

Fig. 4
figure 4

Schematic diagram of RBFNN

The hidden layer of the RBFNN model is a radial basis transfer function and the linear function is used at the output layer. Among many radial basis functions, the Gaussian function is preferred as the transfer function in the hidden layer, and is defined as in Eq. (9) (Haykin 1999).

$${a}_{j}\left(x\right)=exp\left(\frac{\Vert {x}_{i}-{c}_{j}\Vert }{2{\sigma }_{j}^{2}}\right)$$
(9)

where \({a}_{j}\left(x\right)\) is the centre of the basis function, \({\sigma }_{j}\) is the spread of the \({j}_{th}\) neuron in the hidden layer, \(\Vert {x}_{i}-{c}_{j}\Vert\) is the radial distance between \({x}_{i}\) and the centre of the RBF unit. The operation of the output layer is linear, as given in Eq. (10).

$${y}_{k}\left(x\right)=\sum_{j=1}^{n}{w}_{jk}{a}_{j}\left(x\right)+{b}_{k}$$
(10)

where \({y}_{k}\) is the \({k}_{th}\) output unit for the input vector \(x\), \({w}_{jk}\) is the weight connection between the \({k}_{th}\) output unit and the \({j}_{th}\) hidden layer unit, and \({b}_{k}\) is the bias.

3.4 Performance metrics

The outcomes of velocity prediction using the MLPNN, GRNN, and RBFNN models were evaluated based on the statistical metrics of root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (\({\mathrm{R}}^{2}\)).

$${\mathrm{RMSE }=\left(\frac{1}{n}\sum_{i=1}^{n}{\left({X}_{O,i}-{X}_{P, i}\right)}^{2}\right)}^{1/ 2}$$
(11)
$$\mathrm{MAE }=\frac{1}{n}\sum_{i=1}^{n}\left|{X}_{O,i}-{X}_{P, i}\right|$$
(12)
$${\mathrm{R}}^{2}={\left(\sum_{i=1}^{n}\left({X}_{O,i}-{\stackrel{-}{X}}_{O}\right)\left({X}_{P,i}-{\stackrel{-}{X}}_{P}\right)/\sqrt{\sum_{i=1}^{n}{\left({X}_{O,i}-{\stackrel{-}{X}}_{O}\right)}^{2}\sum_{i=1}^{n}{\left({X}_{P,i}-{\stackrel{-}{X}}_{P}\right)}^{2}}\right)}^{2}$$
(13)

where \({\text{n}}\) is the number of data, \({X}_{O,i}\) and \({X}_{P, i}\) are the observed and predicted values, respectively, and \({\stackrel{-}{X}}_{O}\) is the mean observed value.

4 Results and discussion

In the current study, the MLPNN, GRNN, and RBFNN models were developed in a MATLAB software environment. In this section, the obtained results are presented separately.

4.1 MLPNN model development and results

The choice of the number of hidden layers is an important step in the operation of the ANN and for that there is no theoretical guidance. It has been proven that a single hidden layer is sufficient to approximate any continuous function (Hornik et al. 1989). Moreover, using more than one hidden layer can lead to a large number of local minima and make training difficult. However, the MLPNN that used a single (one) layer did not give accurate training results. Thus, in this study, double (two) hidden layers were utilized to construct the MLPNN model. Due to the low accuracy of the model, the results derived with a single hidden layer are not given in this article.

At the start of the training phase, both weights and biases were initialized with random values. Hence, each MLPNN model was retrained 100 times with the same training dataset to avoid the effects of different initial weights and biases. The mean of the output values from these models is then assumed as the final value of the MLPNN model. The hyperbolic tangent sigmoid function (tansig) and logarithmic sigmoid function (logsig) were used in the hidden layers, whereas the pure linear function (purelin) was used in the output layer. Before the training phase, the most appropriate training function must be chosen for the problem. Different back-propagation training algorithms including the Levenberg‒Marquardt (LM), Scaled Conjugate Gradient (SCG), Bayesian Regularization (BR), and Gradient Descent with Momentum and Adaptive Learning Rate (GDX) were used for training the MLPNN. Among the four training algorithms for geodetic point velocity prediction, the MLPNN trained with the BR training algorithm provided the best performance in the training phase, with RMSE about 4 mm. The MLPNN trained with LM and SCG training algorithms yielded lower performance in training phase, with RMSE of about 6 mm and 7 mm, respectively, for all velocity components. The weakest model is the MPLNN trained with the GDX training algorithm with an RMSE of about 8 mm for all velocity components. To have a complete conclusion, the models’ performances were assessed on the testing dataset. Similar results to the training phases were obtained. The MLPNN trained with BR training algorithm was the best model in a comparison of the other models. The corresponding performance values of the MLPNN trained with BR training algorithm also found with an RMSE of about 2 mm for all velocity components. Whereas, MLPNN trained with LM training algorithm, MLPNN trained with SCG training algorithm, and MLPNN trained with GDX training algorithm proved lower performances, with RMSE 4 mm, 5 mm, and 6 mm respectively, for all velocity components. From the analysis of the results, it is obvious that for MLPNN, BR training algorithm is suitable for predicting the geodetic point velocities. The detail results about the MLPNN trained with BR training algorithm are given in Table 1. The BR is a training algorithm that updates the weight and bias values according to Levenberg‒Marquardt optimization. It minimizes a combination of squared weights and biases, and then determines the correct combination so as to produce a network that generalizes well. To improve generalization ability of the network, the regularized training objective function \(S\left(w\right)\) is denoted as:

$$S\left(w\right)=\alpha {E}_{W}+\beta {E}_{D};\quad {E}_{W}=\sum_{i=1}^{m}{w}_{i}^{2};\quad {E}_{D}={\sum }_{i=1}^{n}{\left({Y}_{i}-{T}_{i}\right)}^{2}$$
(14)

where \(\alpha\) and \(\beta\) are objective function parameters (regularization parameters), \({E}_{W}\) is the sum of squared network weights, \({E}_{D}\) is the sum of squared network errors,\(m\) is the number of weights,

n is the number of input and output examples of the training dataset D, and \(T\) is the target value.

In Bayesian framework, the weights of the network are considered to be random variables. According to Bayes’ rule the probability distribution of the weights can be written as:

$$P(w|D,\alpha ,\beta ,M)=\frac{P\left(D|w\text{,} \beta \text{,} M \right) P(w|\alpha \text{,} M)}{P\left(D|\alpha \text{,} \beta \text{,} M\right)}$$
(15)

where \(M\) is the network model and architecture, and \(w\) is the vector of network weights. \(P\left(D|w\text{,} \beta \text{,} M\right)\) is the likelihood function, which shows the probability of the data occurring.\(P(w|\alpha \text{,} M)\) is the prior density, which represents our knowledge of the weights.\(P\left(D|\alpha \text{,} \beta \text{,} M\right)\) is a normalization factor, which guarantees that the total probability is 1. If the noise in the training set data is Gaussian and that the prior distribution for the weights is Gaussian, the probability densities can be obtained by

$$P(w|\alpha ,M)=\frac{1}{{Z}_{W}(\alpha )}\mathrm{exp}(-\alpha {E}_{W})$$
(16)
$$P(D|w,\beta ,M)=\frac{1}{{Z}_{D}(\beta )}\mathrm{exp}(-\beta {E}_{D})$$
(17)

where \({Z}_{W}\left(\alpha \right)={\left(\pi /\alpha \right)}^{N/2}\) and \({Z}_{D}\left(\beta \right)={\left(\pi /\beta \right)}^{n/2}\)

Substituting the expressions for the prior probability and the likelihood function into (15) gives.

$$P(w|D,\alpha ,\beta ,M)=\frac{1}{{Z}_{S\left(w\right)}(\alpha \text{,} \beta )}\mathrm{exp}\left(-S\left(w\right)\right)$$
(18)

The objective function parameters \(\alpha\) and \(\beta\) determine the complexity of the model \(M\). Now we again apply Bayes’ rule to.

$$P(D|\alpha ,\beta ,M)=\frac{P\left(D|\alpha \text{,} \beta \text{,} M \right) P(\alpha \text{,} \beta | M)}{P\left(D| M\right)}$$
(19)

From Eqs. (16) and (17), it follows.

$$P(\alpha ,\beta |D,M)=\frac{{Z}_{S\left(w\right)}(\alpha \text{,} \beta )}{{Z}_{W}\left(\alpha \right) {Z}_{D}\left(\beta \right)}$$
(20)

In Eq. (20), we already know \({Z}_{W}\left(\alpha \right)\) and \({Z}_{D}\left(\beta \right)\). Since the objective function has the shape of a quadratic in the small area surrounding the minimum point of the posterior density \({w}^{MP}\), where the gradient is zero. Thus, we can estimate \({Z}_{S\left(w\right)}(\alpha \text{,} \beta )\) by Taylor series expansion. For solving the normalizing constant, we obtain

$${Z}_{S(w)}(\alpha,\beta)^{\rm v}\approx {(2\pi )}^{N/2}\mathrm{det}{({({H}^{MP})}^{-1})}^\frac{1}{2}\mathrm{exp}(-S({w}^{MP}))$$
(21)

where \(H\) is the Hessian matrix of the objective function.

$$H=\alpha {\nabla}^{2}{E}_{W}+\beta {\nabla}^{2}{E}_{D}$$
(22)

Substituting \({Z}_{S\left(w\right)}\) in Eq. (21) by Eq. (20), taking the derivative with respect to each of the log of Eq. (20) and set them equal to zero, we can find the optimal values for \(\alpha\) and \(\beta\) at \({w}^{MP}\).

$$\alpha =\gamma /2{E}_{W}({w}^{MP})$$
(23)
$$\beta =(n-\gamma )/2{E}_{D}({w}^{MP})$$
(24)

where \(\gamma\) is the number of effective parameters

$$\gamma =N-2{\alpha }^{MP}{trace}^{-1}({H}^{MP})$$
(25)

where N is the number of parameters in the network.

According to MacKay (1992) and Foresee and Hagan (1997), the iterative procedure is as follows: (1) Initialize \(\alpha\), \(\beta\) and the weights. (2) Take one step of the LM training algorithm to find the weights that minimize the objective function \(S(w)\). (3) Compute \(\gamma\) using the Gauss-newton approximation to Hessian matrix in the Levenberg–Marquardt training algorithm and new values for \(\alpha\) and \(\beta\). (4) Iterate steps 2 to 3 until convergence.

In the past, many approaches were suggested for calculating the number of neurons in the hidden layer. However, determining the number of hidden neurons was achieved by the trial‒and‒error approach. The RMSE was used as a criterion to select the optimum number of neurons. Starting from two hidden neurons, the number of hidden neurons was increased by one in each trial until the required accuracy was achieved. The value of 7 for both hidden neurons was selected as the optimum case. Thus, the optimal MLPNN structure was determined as [2:7:7:3]. The performance of the MLPNN model for the training and testing phases in terms of the RMSE, MAE, and R2 is given in Table 2.

Table 2 Performance statistics of the MLPNN model in the training and testing phases

Table 2 shows that all velocity components gave similar RMSE and MAE values of about 4 mm and 3 mm, respectively, for the training phase. The same table indicates that the RMSE and MAE values belonging to all velocity components are about 2.5 mm and 2 mm, respectively, for the testing phase. The MLPNN model gave a high R2 value for the testing phase, whereas the R2 value for training phase was found to be low. The R2 value for the testing phase was close to 0.9 only for the \({V}_{Y}\) component, while the R2 values of \({V}_{X}\) and \({V}_{Z}\) remained below 0.9. The scatter diagrams of the velocity values predicted by the MLPNN model versus the observed velocities and the residuals produced by the MLPNN model in the testing phase is also shown in Fig. 5.

Fig. 5
figure 5

Scatter diagrams of the predicted and observed velocity values (left) and residuals obtained from the developed MLPNN model for testing points (right)

In the \({V}_{X}\) and \({V}_{Z}\) components, the intersection rate was far from to 0, although the slope rate was close to 1. Nevertheless, the slope and intercept rates were very close to 1 and 0 in the \({V}_{Y}\) component. According to the obtained results, the MLPNN model developed for predicting the geodetic point velocities gave acceptable results despite the reasonable fitting of the curve.

4.2 GRNN model development and results

The GRNN was employed to construct the second ANN model. One of the most important steps in training a GRNN model is to select the best possible spread parameter because this value influences the efficacy of the developed GRNN. The GRNN was trained with different spread values ranging from 0.01 to 1. The optimum value of the spread parameter was determined according to the RMSE. The point where the error decreased and began to increase again was chosen as the most appropriate spread parameter. Because of the lowest RMSE at a spread parameter of 0.18, the predicted results were satisfactory. In order to reveal the performance of the GRNN model, the training and testing results in terms of the RMSE, MAE, and R2 are tabulated in Table 3.

Table 3 Performance statistics of the GRNN model in the training and testing phases

According to the derived results of the GRNN model, based on the training dataset, the RMSE for velocity components ranged from 3.47 to 4.22 mm, and the MAE value obtained was about 3 mm per component. Considering the testing results, the GRNN model provided higher accuracies for all velocity components, with RMSE values of 1.88 mm, 1.81 mm, and 1.77 mm, respectively. The MAE ranged from 1.44 to 1.51 mm with regard to testing results. The GRNN model was similar to the MLPNN model, based on the testing dataset, and yielded a high R2 value, whereas the R2 value was low for the training phase. The R2 value for the testing phase was about 0.9 for the \({V}_{x}\) and \({V}_{Y}\) velocity components. For the \({V}_{Z}\) component, the R2 value was found as 0.8. Figure 6 shows the scatter diagrams of the velocity values predicted by the GRNN model versus the observed velocities and the residuals produced by the GRNN model in the training phase.

Fig. 6
figure 6

Scatter diagrams of the predicted and observed velocity values and residuals obtained from the developed GRNN model for testing points

For the \({V}_{Z}\) component, the slope rate was close to 1, whereas the intersection rate was about 2. The rates of slope and intercept were very close to 1 and 0 for components \({V}_{X}\) and \({V}_{Y}\). That is to say, the predicted velocity values were near to the real velocity values of the cleaning width. Similar to the MLPNN model results, the fitting curves of the \({V}_{Y}\) and \({V}_{Z}\) components gave less sensitive results compared to the \({V}_{X}\) component. In general, based on the obtained results, the GRNN model developed for predicting the geodetic point velocities is reasonably good.

4.3 RBFNN model development and results

The RBFNN was employed to construct the third ANN model and the same parameters were used. The spread parameter was varied to achieve a lower RMSE value. After several trials, it was found that the RBFNN model showed good results at a spread of 1.5. To achieve good prediction, 60 neurons were taken in the hidden layer of the RBFNN model. The prediction results of RMSE, MAE, and R2 for the training and testing phases are given in Table 4.

Table 4 Performance statistics of the RBFNN model in the training and testing phases

According to the derived results of the RBFNN model, based on the training dataset, the RMSE for all velocity components ranged from 3.57 to 4.78 mm, while the RMSE for all velocity components was about 2.34 mm based on the testing dataset. Based on the training dataset, the MAE value was about 3 mm per component, while the MAE value was about 2 mm per component for the testing dataset. Similar to the MLPNN and GRNN models, the RBFNN model gave a low R2 value for the testing dataset, whereas a high R2 value was found for the training dataset. The R2 value for the testing phase was about 0.8 for both the \({V}_{X}\) and \({V}_{Y}\) components. The R2 value was about 0.6 for component \({V}_{Z}\). Figure 7 shows the scatter diagrams of the predicted values of velocities via the RBFNN model versus the observed velocities and the residuals produced by the RFBNN model in the training phase.

Fig. 7
figure 7

Scatter diagrams of the predicted and observed velocity values and residuals obtained from the developed RBFNN model for testing points

The RBFNN gave slope and intersection rates very close to the MLPNN results (Fig. 7). In components \({V}_{X}\) and \({V}_{Z}\), the slope rate was close to 1, whereas the intersection rate was about 3. As with both the MLPNN and GRNN methods, the slope and intercept rates for the \({V}_{Y}\) component were very close to 1 and 0. As a result, it was understood that the RBFNN was likely capable of reasonably predicting the velocity values within a broad range of data despite the poor fitting of the curve.

4.4 Comparisons of models

Performances of the MLPNN, GRNN, and RBFNN models were compared with each other and the appropriate ANN model was found for predicting the geodetic point velocities. The comparison of results from the three different ANN models, based on the RMSE, MAE, and R2, is given in Table 5. This table shows that the RMSE values of the MLPNN, GRNN, and RBFNN models were approximately 2.41 mm, 1.82 mm, and 2.34 mm, respectively. According to the RMSE results of the GRNN model considering all components of geodetic point velocities, the RMSE values were below 2 mm. Similarly, the MAE values were below 2 mm, at approximately 1.87 mm, 1.47 mm, and 1.85 mm, respectively. The R2 values of the MLPNN, GRNN, and RBFNN models were approximately 0.76360, 0.86131, and 0.77651, respectively. In light of these comparisons, it is clear that the GRNN model produced the best accuracy in all the geodetic point velocity components with respect to RMSE, MAE, and R2. As can be observed from Table 5, after the GRNN model, the RBFNN model performed better than the MLPNN model.The results shown in bold in the Table 5 indicate the best values.

Table 5 Comparisons of MLPNN, GRNN, and RBFNN models for each geodetic point velocity component

The comparisons of the observed results with the predicted results of the MLPNN, GRNN, and RBFNN models, and the velocity residuals obtained from all the ANN models in the testing phase are shown in Fig. 8. For the MLPNN, GRNN, and RBFNN models, the velocity residual ranges were ‒6.99 mm‒5.37 mm, ‒3.94 mm‒3.87 mm, and ‒6.22 mm‒4.89 mm for \({V}_{X}\); ‒6.08 mm‒4.88 mm, ‒3.59 mm‒3.93 mm, and ‒5.70 mm‒5.90 mm for \({V}_{Y}\); and ‒7.29 mm‒3.64 mm, ‒3.59 mm‒3.36 mm, and ‒6.38 mm‒6.61 mm for \({V}_{Z}\). As a result of the prediction made using the GRNN model, the velocity residuals did not exceed 4 mm for any of the geodetic point velocity components. The extreme velocity residuals were found for the \({V}_{Z}\) geodetic point velocity component.

Fig. 8
figure 8

Comparison between observed velocities and predicted velocities for each velocity component using MLPNN, GRNN, and RBFNN models

It is clear that the velocity residual values of the GRNN model were lower than either the MLPNN or RBFNN model.

5 Conclusions

The applicability of the two ANN models, the MLPNN and RBFNN, in the prediction of geodetic point velocities has been investigated in previous studies. The MLPNN model was proposed as an alternative to classical interpolation methods in the prediction of geodetic point velocities (Güllü et al. 2011). In addition to the MLPNN, the RBFNN was applied and tested for prediction of geodetic point velocity (Yilmaz and Gullu 2014). However, the GRNN model has never been used for this purpose. This study investigated the potential of the GRNN in predicting the geodetic point velocities as compared to the MLPNN and RBFNN. All ANN model performances were evaluated and compared through the statistical parameters RMSE, MAE and R2 for each model. The following conclusions can be made from the results of this study:

  • All geodetic point velocity components (\({V}_{X}\), \({V}_{Y}\), \({V}_{Z}\)) can be predicted used as output parameters simultaneously.

  • The MLPNN, GRNN, and RBFNN models offered satisfactory prediction of geodetic velocity (at mm accuracy).

  • The GRNN model was found to be superior to the other ANN models (MLPNN and RBFNN) since it achieved the lowest RMSE and MAE, and the highest R2 values. In this regard, the GRNN can be said to be able to predict geodetic point velocity.

  • After the GRNN, the RBFNN model was found to perform better than the MLPNN model.

Although the R2 performance criteria results obtained from the ANN models were generally not very good, especially in the training phase, the RMSE and MAE results indicated that sufficient accuracy was achieved in both the training and testing phases. It should be noted that the R2 should not be applied alone as a performance criterion (Legates and McCabe 1999) and that being equal to 1 does not guarantee that a model captures the behaviour of the investigated parameter (Kisi 2008). The reason for the low R2 is thought to have been the distribution of the training and testing datasets. In this current study, to predict the velocities of the geodetic points, the training and testing datasets were selected randomly. Incorrect data partition could have led to reduced accuracy of the results on the predictive performance of the ANN model. To overcome this problem, the use of cross-validation (i.e. k-fold) technique has been recommended (Reitermanová 2010). Cross‒validation is an assessment method used to improve the flexibility of a model, thus the performance of the proposed model; and then further statistical analysis will generalize onto an individual dataset. In further studies, the potential of the ANN with different cross-validation methods can be investigated for prediction of geodetic point velocity. Otherwise, the GRNN model should be applied and tested at different point densities. Thus, different datasets can be utilized to assess the impact of point density on the GRNN velocity prediction results. The GRNN-predicted outcomes can significantly deviate from the geodetically derived velocities. In this study, the GRNN can provide reasonable predictions, so it can be a promising tool for this purpose. The velocity predicting framework of the GRNN model needs further discussion and research. To assess the effectiveness of the GRNN, more studies should be conducted using different data sets in future works.