1 Introduction

Artificial neural networks (ANN) are widely used in discrimination, sample recognition and estimation problems as an alternative of statistical models [1, 7, 17, 33]. Economics, finance and business have great importance among the problems examined using ANN techniques [1, 16, 27, 29, 30]. There is increasing number of papers on economics that applied ANN models in recent years. Leung et al. [18] compared the performance of different forecasting techniques and general regression in forecasting exchange rates. Kim et al. [15] mentioned usefulness of neural networks for early warning system of economic crisis. They studied Korean economy as a sample of world economic crisis. More examples can be given on the subject.

One of the other macroeconomic problems that are examined using ANN is international debt problem of the countries. There are also numerous researches using statistical methods on this subject. Frank and Cline [10] used multiple discriminant analysis to predict debt-servicing difficulties. Dhonte [8] used principal components analysis to obtain a description of a country’s debt position. Feder and Just [9] used logit analysis in determining debt-servicing capacity. Yoon-Dae [31] made multiple regression analysis in determining of a country’s creditworthiness. Kharas [13] made probit analysis to assess the probability of a country’s becoming uncreditworthy. Cooper [5] used canonical correlation analysis to examine the relationship between country creditworthiness and a country’s recent economic performance.

Interest on macroeconomic problems using ANN has increased since 1990s. Roy and Cosset [26] developed a neural network to estimate sovereign credit ratings. Burell and Folarin [4] used neural networks in order to maintain competitiveness in a global economy and collected the studies on global economy using neural networks until then. Cooper [6] compared the performance of artificial neural networks with logistic, probit regression and discriminant analysis to classify the countries to two classes; reschedule or non-reschedule the international debts. Although there are many studies including classification methods in the literature, the study in this paper differ from them with rich comparisons consist of ANN for whole data and ANN for the partitioned data (training–validation–test). This paper especially pointed out the importance of partitioning the data.

In this paper, rescheduling and non-rescheduling of the international debts of countries is examined using data for 2001. To analyze the data taken into account, logistic and probit regression among the statistical methods and different backpropagation algorithms of multilayer networks as ANN are used. As a result of classification, more successful architectures and algorithms are defined.

2 Statistical methods for classification problems

Researchers usually face with classification problems in application studies. If the problem is discrimination into two groups, logistic regression, probit regression and discriminant analysis are commonly used statistical methods.

Logistic regression is a mathematical modeling approach that can be used to describe the relationship of several independent variables \( x_{1} ,x_{2} , \ldots ,x_{k} \) and a binary dependent variable y, where y is coded as 1 or 0 for its two possible categories. The logistic model describes the expected value of y, in other words the probability of occurrence of one of two possible outcomes of y. The general form of the logistic regression model is

$$ y_{i} = E(y_{i} ) + \varepsilon_{i} , $$
(2.1)

where the observations y i are independent Bernoulli random variables with expected values

$$ E(y_{i} ) = p_{i} = {\frac{1}{{1 + \exp ( - x_{i}^{T} \beta )}}} $$
(2.2)

and \( x_{i}^{T} = (1,x_{i1} ,x_{i2} , \ldots ,x_{ik} ),\;\beta^{T} \, = \,(\beta_{0} ,\,\beta_{1} ,\,\beta_{2} ,\, \ldots ,\beta_{k} ),\;\varepsilon_{i} \; \) is ith error.

Let \( \eta_{i} = x_{i}^{T} \beta \) be the linear predictor where \( \eta_{i} \) is defined by the logit transformation of the probability p i

$$ \eta_{i} = \ln {\frac{{p_{i} }}{{1 - p_{i} }}}. $$

The parameters \( \beta^{T} = (\beta_{0} ,\beta_{1} ,\beta_{2} , \ldots ,\beta_{k} ) \) of the linear predictor \( \eta_{i} \) is estimated by the of maximum likelihood method. Let \( \hat{\beta } \) be the final estimate of the model parameters. If the model assumptions are correct, then the estimated value of the linear predictor is constructed as

$$ \hat{\eta }_{i} = \hat{\beta }_{0} + \hat{\beta }_{1} x_{i1} + \cdots + \hat{\beta }_{k} x_{ik} , $$

where k is the number of the independent variables.

Another mathematical modeling approach is probit models. Again, a nonlinear model in p is transformed so that a monotonic function of p is linear with respect to explanatory variables. The probability p i in the ith cell or the ith observation is given by the standard cumulative normal distribution function [20, 23]:

$$ F(\eta_{i} ) = p_{i} = \int\limits_{ - \infty }^{{\eta_{i} }} {{\frac{1}{{\sqrt {2\pi } }}}\exp \left( { - \frac{1}{2}u^{2} } \right)} {\text{d}}u. $$
(2.3)

The probit transformation is given by the inverse of the standard cumulative normal distribution function F. Solving (2.3) for \( \eta_{i} \) yields

$$ \eta_{i} = F^{ - 1} (p_{i} ) = \,{\text{probit(}}p_{i} ) $$

Thus, the probit model can be written as

$$ F^{ - 1} (p_{i} ) = \eta_{i} = \sum\limits_{k = 0}^{K} {\beta_{k} x_{ik} } $$

or

$$ p_{i} = F\left( {\sum\limits_{k = 0}^{K} {\beta_{k} x_{ik} } } \right), $$

where parameters of the probit model can be estimated by the method of ordinary least square.

There are several computer programs that help to find these parameters of the model such as STATISTICA, SAS, etc.

Since (2.2) and (2.3) give the probability of belonging one of the two groups, they are commonly used in classification problems. Discriminant analysis is another method used for binary classification problems. Since it does not give CCR for such macro-economic problems as good as logistic regression, probit regression and ANN models, it is not studied in this paper [6].

3 Backpropagation neural network algorithms for classification problems

In this study multi-layer perceptrons (MLP) with one or two hidden layers are taken into account. In fact, usually, one layer in network is sufficient in order to realize a good approximation [3, 11, 12, 33].

Backpropagation is the widespread approximation approach for training of the multi-layer feed forward neural networks based on Widrow-Hoff training rule. The main idea here is, adjusting the weights and the biases that minimize the sum of square error by propagation the error back at each epoch. In order to minimize the sum of square error, different backpropagation algorithms are constructed by applying different numeric optimization algorithms among gradient and Newton methods class.

Sum of squares for qth training sample in supervised training situations with n-inputs, m-outputs and p-hidden units, in two layers (with single hidden layer) neural network is calculated as follows:

$$ E^{q} (w) = \sum\limits_{k = 1}^{m} {[y_{k} - t_{k} ]}^{2} = \sum\limits_{k = 1}^{m} {\left[ {f_{o} \left( {\sum\limits_{j = 1}^{p} {w_{jk}^{o} } \cdot f_{h} \left( {\sum\limits_{i = 1}^{n} {w_{ij}^{h} \cdot x_{i} + b_{j}^{h} } } \right) + b_{k}^{o} } \right) - t_{k} } \right]}^{2} $$
(3.1)

where t k is the kth target, y k is the kth output value; \( w_{ij}^{h} \) is the weight that connects the ith input and jth output units, \( w_{jk}^{o} \) is the weight that connects the jth hidden and kth output units, \( b_{j}^{h} \) is the bias for jth hidden unit, \( b_{k}^{o} \) is the bias for kth output unit, f h (·) is activation function applied to the hidden units, and f o (·) is the activation function that applied to the output units. Here, w is the vector of all weight and bias components. For simplicity of writing q sample indices is not shown up.

A nonlinear (almost an S-shaped sigmoid) f h (·) function is taken. f o (·) can also be taken as a linear function. For all N-training sample, the mean square error is defined as:

$$ E(w)\, = \,\frac{1}{n}\sum\limits_{q = 1}^{N} {E^{q} (w)}. $$
(3.2)

3.1 Standard backpropagation (BP) training algorithm

Gradient descent numeric optimization method is used to decrease error in this algorithm [3, 12]. The iteration of this method is as follows:

$$ w_{k + 1} = w_{k} - \alpha \cdot g_{k} $$

or

$$ \Updelta w_{k + 1} = - \alpha \cdot g_{k} $$

where w k is the current vector of weights and biases, g k is the current gradient of error function (2.2) at the point w k , and α is the learning (training) rate. The learning rate is crucial for BP since it determines the magnitude of weight changes.

Standard backpropagation training algorithm that includes momentum is given by the improved gradient descent with momentum formula:

$$ \Updelta w_{k + 1} = - \alpha \cdot g_{k} + \mu \Updelta w_{k} , $$

where Δw k  = w k  − w k−1 is the weight change in the previous iteration, μ is the momentum coefficient. The weight change in backpropagation training algorithm with momentum is made by the combination of current gradient vector and the previous gradient vector. This change is better for the behavior of the algorithm since it provides the chance of escaping surface local minimums. Of course the better choice of α and μ constants speed up convergence of the algorithm [2, 24, 32].

3.2 Resilient backpropagation (RP) algorithm

MLPs typically use S-shaped sigmoid activation functions in the hidden layers. For this reason, their slope must approach zero as the input gets large. The purpose of the resilient backpropagation (RP) training algorithm is to eliminate these injury effects of the magnitudes of the partial derivatives. In RP algorithm the magnitude of the derivative has no effect on the weight update. Only the sign of the derivative is used to determine the direction of the weight update. If the derivative is zero, then the update value remains the same. Whenever the weights are oscillating the weight change will be reduced. If the weight continues to change in the same direction for several iterations, then the magnitude of the weight change will be increased [25].

Various versions of backpropagation training algorithms to be based on conjugate gradient and Newton methods—second order basic optimization techniques are more efficient in classification and forecasting problems [3, 12].

3.3 Conjugate gradient (CG) algorithms

Search is made conjugate directions in CG algorithms. A set of nonzero n-dimensional vectors \( \{ p_{0} ,p_{1} , \ldots ,p_{n - 1} \} \) is said to be conjugate with respect to the symmetric positive definite n × n matrix A if

$$ p_{i}^{T} Ap_{j} = 0\quad {\text{for}}\;{\text{all}}\;i \ne j $$

First, let us take into account the minimum problem of square function:

$$ F(w) = \frac{1}{2}w^{T} Hw\, - \,b^{T} w $$
(3.3)

where \( w \in R^{n} \) and H is the n × n symmetric positive definite matrix. For a starting point \( w_{0} \in R^{n} , \) the method defined by the formulas below called conjugate direction method [21]:

$$ w_{k + 1} = w_{k} + \alpha_{k} p_{k} $$
(3.4)
$$ \alpha_{k} = \, - {\frac{{p_{k}^{T} g_{k} }}{{p_{k}^{T} Hp_{k} }}} $$
(3.5)

Here, (3.5) is defined by the minimum problem of one variable function of \( \varphi (\alpha ) = F(w_{k} + \alpha p_{k} ). \)

For any starting point \( w_{0} \in R^{n} , \) the sequence {w k } generated by the conjugate direction algorithm (3.4), (3.5) converges to the minimum point w of the problem (3.3) in at most n steps [21].

The conjugate gradient method is a conjugate direction method; start out by searching in the gradient direction on the first iteration

$$ p_{0} = - g_{0} $$
(3.6)

conjugate current p k vector is calculated by using the previous p k−1 vector and current gradient

$$ p_{k} = - g_{k} + \beta_{k} p_{k - 1} $$
(3.7)

Coefficient β k in (3.7) is selected by the condition of p k−1 and p k vectors’ being conjugate with respect to the symmetric positive definite n × n matrix H (p k−1 Hp k  = 0):

$$ \beta_{k} = {\frac{{g_{k}^{T} Hp_{k - 1} }}{{p_{k - 1}^{T} Hp_{k - 1} }}} $$
(3.8)

There are various CG algorithms depending on different calculation formulas of β k coefficients. Some important ones of those are as follows [3, 12, 21].

3.3.1 Fletcher-Reeves conjugate gradient algorithm (CGF)

For this algorithm β k is defined by

$$ \beta_{k} = {\frac{{g_{k}^{T} g_{k} }}{{g_{k - 1}^{T} g_{k} }}} $$

This is the ratio of the norm squared of the current gradient to the norm squared of the previous gradient.

3.3.2 Polak–Ribere conjugate gradient algorithm (CGP)

Another version of the CG algorithm was proposed by Polak and Ribere (CGP), for which β k is defined by

$$ \beta_{k} = {\frac{{g_{k}^{T} (g_{k} - g_{k - 1} )}}{{g_{k - 1}^{T} g_{k - 1} }}} $$

Now, (3.2) error function is taken into account. Since activation functions are nonlinear, sum of mean squared error function E(w) is nonlinear function of w. Using first two terms of the Taylor series expansion of E(w) around w k , we can approximate function E(w) to a squared function

$$ E(w_{k + 1} ) - E(w_{k} ) = \nabla E(w_{k + 1} )\, \approx \,g_{k}^{T} \Updelta w_{k} \, + \,\frac{1}{2}\Updelta w_{k}^{T} H_{k} \Updelta w_{k} $$
(3.9)

where H k is the Hessian matrix in current point w k :

$$ H_{k} = {\frac{{\partial^{2} E(w_{k} )}}{{\partial w^{2} }}} $$

So, in each current iteration taking function (3.9) instead of function (3.3) suitable minimization problem is taken into account and using (3.4)–(3.8), local conjugate gradient is realized. In this case, Hessian matrix H k is assumed positive definite.

3.3.3 Scaled conjugate gradients (SCG) algorithm

CG algorithms reviewed above makes a linear search in each iteration and that makes calculations very expensive.

Moller [19] put forward scaled conjugate gradient algorithm that does not use line search procedure in traditional CG algorithms. The basic idea of SCG is to combine the model trust region approach with the conjugate gradient approach [3, 21].

Instead of using line minimization which involves few error function evaluations, Moller proposed Hp k evaluation with O(W) order operations (W is the total number of weights and biases). This approach can be unsuccessful when Hessian matrix H in local approximation is not positive definite. In this case, the denominator of (3.5) \( p_{k}^{T} Hp_{k} \) will be negative and finally weight update can increase the value of error function. The problem can be overcome by modifying the Hessian matrix to ensure that it is positive definite, for example

$$ H_{\bmod } = H + \lambda I $$

where I is the unit matrix, λ ≥ 0 is a scaling coefficient. We can make Hessian matrix positive definite by choosing the big positive value of λ. So, the step length formula (3.5) is changed as follows:

$$ \alpha_{k} = - {\frac{{p_{k}^{T} g_{k} }}{{p_{k}^{T} Hp_{k} + \lambda_{k} \left\| {p_{k} } \right\|^{2} }}} $$
(3.10)

while λ k increases α k decreases. The size of the trust region is adjusted by λ k parameter. While λ k increases trust region gets smaller. The choice of appropriate λ k is very important for each step of SCG algorithm.

To mark from \( \delta_{k} \), the denominator of (3.10) Moller [19] chooses \( \lambda_{k}^{*} \) as follows,

$$ \lambda_{k}^{*} = 2\left( {\lambda_{k} - {\frac{{\delta_{k} }}{{\left\| {p_{k} } \right\|^{2} }}}} \right). $$

For this reason, \( \delta_{k}^{*} > 0 \) is realized. That value \( \delta_{k}^{*} \, = \, - p_{k}^{T} H_{k} p_{k} \, \) is used instead of denominator in calculation of α k step length in (3.10).

3.4 Quasi-Newton (QN) algorithms

The basic step of Newton’s method is

$$ w_{k + 1} \, = \,w_{k} - H_{k}^{ - 1} g_{k} $$
(3.11)

where \( H_{k} \) is the Hessian matrix in current point w k . Newton’s method often converges faster than conjugate gradient methods. Unfortunately, it is complex and expensive to compute the Hessian matrix. Quasi-Newton (or secant) methods are based on Newton’s method, but which does not require calculation of second derivatives. They update (the update is computed as a function of the gradient) an approximate Hessian matrix at each iteration of the algorithm.

3.4.1 The Broyden, Fletcher, Goldfarb and Shanno (BFGS) procedure

This is one of the successful QN algorithms. In BFGS algorithm, the approximation G of the inverse Hessian matrix in (3.11) is constructed as

$$ G_{k + 1} = G_{k} + {\frac{{pp^{T} }}{{p^{T} v}}} - {\frac{{G_{k} vv^{T} G_{k} }}{{v^{T} G_{k} v}}} + (v^{T} G_{k} v)uu^{T} , $$

where \( p = w_{k + 1} - w_{k} ,\;v = g_{k + 1} - g_{k} ,\;u = {\frac{p}{{p^{T} v}}} - {\frac{{G_{k} v}}{{v^{T} G_{k} v}}}, \)

G is a positive definite matrix, −Gg is the descent way at each step of the algorithm. Weights update is made as follows:

$$ w_{k + 1} = w_{k} + \alpha_{k} G_{k} g_{k} , $$

where α k is found by line minimization.

3.4.2 The one-step secant (OSS) algorithm

This is an attempt to bridge the gap between the CG algorithms and Quasi-Newton algorithms. That algorithm accepts that the previous Hessian is an identity matrix and does not calculate inverse matrixes in choosing the new search direction. This algorithm involves very light storage and fewer calculations in each step than the CG algorithms.

4 Definition and importance of rescheduling the international debts

International debt crisis occurred after financial crisis, riveted attention on political issues at the international level. If global economic justice is to be achieved, debt crises must be assessed within the broader context of the international financial system. This system has led to instability and recurrent financial crises that have severely harmed the interests of poor countries and their people [22].

The financial crises lived in the world badly affect the economies of the countries and debt crises follow economic crises, eventually the debt problem dominates all policy decisions.

Since late 1982, debt crisis has been almost universal as a result of the developing countries’ fail to service their debt. Governments and multilateral agencies then stepped in, but this official activity after August 1982 should not beguile one into thinking that the major governments were absent from the game earlier [28]. The definition of the crisis in the Northern industrial countries was uniform: the onset of widespread difficulties in servicing the mountain of developing country debt threatened the stability of the international financial system [14].

At the end of 1990s, another financial turmoil in Asia that erupted in mid-1997 has abated since January, and markets have partially recovered from their troughs. Nevertheless, currency and asset values in the countries involved have been left far below precrisis levels and considerable uncertainty remains about the resolution of the crisis and its global repercussions. The crisis and its effects were apparent in December 1997. So the last economic crisis lived in the world is Asian economic crisis of 1998.

Since international debt crisis effects all economical and political balances of countries, in this study, countries’ rescheduling and non-rescheduling of their international debts for 2001 data for 96 countries after Asian economic crisis of 1998 are taken into account. The effects of that crisis have occurred a few years later. So, the data related with 96 countries’ rescheduling and non-rescheduling of the international debts for 2001 are used.

The data are taken from The World Bank Group web site and The World Bank e-library. The decision of which country is in rescheduled class is given by observing the tables of Stand-By Arrangements published in the Journals of IMF Survey over 1997–2001. Hence, if a country made Stand-By Arrangement by IMF, then it is chosen as a country reschedules its debt and the value of related dependent variable is 1, else that country is non-reschedule and the value of related dependent variable is 0.

The factors that affect the rescheduling and non-rescheduling of the international debts defined as follows and those factors are taken as independent variables for logistic and probit regression, and inputs for ANN:

  • X 1 : Average increase in GDP growth (annual %) over 1960–2001

  • X 2 : The ratio of short term debt (%GDP) to export of goods and services (%GDP)

  • X 3 : Debt service ratio, the interest on total external debt plus amortization on long-term debt as a percentage of the exports of goods and services in 2001

  • X 4 : The import of cover and equals international reserves divided by total imports in 2001 (US $ million)

  • X 5 : The ratio of export of goods and services (%GDP) to import of goods and services (%GDP).

The number of input variables is five for MLP model since all those variables are taken as input variables. The output variable can either be 1 for rescheduled debts if probability is equal or greater than 0.5 or 0 for non-rescheduled debts if probability is less than 0.5. So in output layer of the network, there is only one neuron.

5 Estimation of the results and comparison of different methods

In this study, first the architectures with two hidden layers are used and since they do not have much contribution, they are not involved. It is seen that MLP with one hidden layer gives high CCR. So MLP for 5:p:1 is taken into account. The undefined parameter here is the number of units in the hidden layer. Since the initial weights and biases are random for any network and algorithm taking different values of p between 1 and 15, 33 experiments are made and as a result of that, the networks for p ranging between 6 and 10 are preferred. The CCR for different training algorithms for these networks are given in Table 1.

Table 1 Correct classification ratio for neural networks and for statistical methods

According to Table 1, the best CCR are obtained by BFGS and OSS algorithms (97.92). The CCR for logit (85.42) and probit (86.46) are quite lower than the NN results.

Even if the results of training of the neural network for the whole data set are good, the results of extrapolation may be bad for the whole data. In this case the network becomes memorizer and even the small changes in data can cause big mistakes. To avoid this drawback, the data are divided into three parts for training, validation and test. At the stage of training, the neural network (adjusting the weights) only training data are used but with the help of validation data, the performance of the network is checked at each epoch. Training automatically stops when the total error square starts to increase for validation data and in this case the parameters of neural network are obtained. At this stage, test data are taken as extraneous data. The network with minimum total square error for each of training, validation and test data is accepted the one that gives the best performance.

For the problem in question, the data set, obtained from World bank’s data bank, with 96 countries is divided to 3 parts; 48 units for training, 24 units for validation and 24 units for test. Experiments are made by applying different algorithms for MLPs with one and two hidden layers. The models that give better performance and are suitable for each algorithm are chosen by changing the number of units in hidden layer. Thirty-three experiments are made for each of architecture. CCR is taken as choice criteria. As a result, the chosen models with one hidden layer are shown in Table 2 and with two hidden layers are shown in Table 3. The values in parenthesis indicate the CCR of the test data.

Table 2 Correct classification ratio of neural network algorithms with one hidden layer
Table 3 Correct classification ratio of neural network algorithms with two hidden layer

As results of experiments for training data set CCR is calculated as 88.54 and for test data 79.17. That means RP algorithm gives the best performance. The architectures which give that result for RP algorithm are (the number of neurons in first and second hidden layers are shown, respectively) as follows: 3:4, 4:3, 4:5, 4:6, 4:8, 4:9, 5:1–5:3, 5:5–5:8, 5:11, 6:1, 6:3–6:5, 6:7, 6:9, 6:11, 7:1–7:9, 8:1–8:6, 8:8–9:5, 9:8, 9:9, 9:11–10:6, 10:8–11:4, 11:7, 11:9. Examining the results, the second best performance is obtained by SCG algorithm.

The applications of ANN are made using MATLAB 6.0 software.

According to CCR results, it can be concluded that, when the whole data is used for training in neural network, ANN gives much better performance than logit and probit models. Especially CG algorithms make much better classifications. However, QN algorithms have the best performance for 5:9:1 (BFGS) and 5:10:1 (OSS) architectured MLP with 97.92 CCR.

As mentioned before, if the network is trained for whole data, memorizing property comes out and causes bad estimates for extraneous data. It is seen from Table 2 that, when the data divided to 3 parts, for one hidden layer MLP, generally the CCR for training result is 88.54 and test result is 79.17. Each RP algorithm for moderate or small amount of data, gives the best and SCG gives the second best performance.

Examining Table 3, it can be seen that from two different results the CCR for whole data is 89.58 and for test data is 83.33. As a result, those ratios are obtained for BFGS algorithm with 5:5:11:1 architectured MLP and RP algorithm for 5:11:5:1 architectured MLP. At this stage again, RP algorithm makes better estimations.

6 Conclusions

In this paper, rescheduling and non-rescheduling of the international debts of countries is examined using data for 2001. To analyze the data taken into account, logistic and probit regression among the statistical methods and different backpropagation algorithms of multilayer networks as ANN are used. As a result of classification, more successful architectures and algorithms are defined.

Different ANN models are compared with the logit and probit models for classification problem and results are evaluated using CCR criterion.

CCR results show that, when the whole data is used for training in neural network, ANN gives much better performance than logit and probit models. Especially, CG algorithms make much better classifications.

If the network is trained for whole data, memorizing property comes out and causes bad estimates for extraneous data. Each RP algorithm for moderate or small amount of data, gives the best and SCG gives the second best performance. At this stage again, RP algorithm makes better estimations. This result especially points out the importance of partitioning the data.

General results for the problem in question are quite high, and those can be obtained by RP algorithm. In this method, misclassifications are occurred only for Cape Verde, Nigeria and Sri Lanka among 96 countries. The second choice that gives good results for the problem is SCG algorithm. Misclassified countries here are Bosnia and Herzegovina, Cape Verde and Sri Lanka.

Consequently, ANN, as an alternative to statistical methods, usually shows better performance in estimation and classification for many empirical studies. Also ANN has a great advantage of not requiring the assumptions which are necessary for statistical methods. Moreover, ANN has another advantage for extraneous data and prediction. However, the choice of suitable architecture and algorithm has a great importance. Depending on the idea of getting better results using RP and SCG algorithms for classification problems, those algorithms are suggested for that kind of problems. It is also necessary to be cautious about the QN algorithms that can give good results at the stage of choosing the suitable network algorithms.