Analysis of international debt problem using artificial neural networks and statistical methods

Yazici, Berna; Memmedli, Memmedaga; Aslanargun, Atilla; Asma, Senay

doi:10.1007/s00521-010-0422-4

Analysis of international debt problem using artificial neural networks and statistical methods

Original Article
Published: 11 July 2010

Volume 19, pages 1207–1216, (2010)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Neural Computing and Applications Aims and scope Submit manuscript

Analysis of international debt problem using artificial neural networks and statistical methods

Download PDF

Berna Yazici¹,
Memmedaga Memmedli¹,
Atilla Aslanargun¹ &
…
Senay Asma¹

213 Accesses
3 Citations
Explore all metrics

Abstract

It is known from the scientific researches that artificial neural networks are alternatives of statistical methods such as regression analysis and classification in recent years. Since multi-layer backpropagation neural network models are nonlinear, it is expected that the neural network models should make better classifications and predictions. The studies on this subject support that idea. In this study, a macro-economic problem on rescheduling or non-rescheduling of the countries’ international debts is taken into account. Among the statistical methods, logistic and probit regression, and the different neural network backpropagation algorithms are applied and comparisons are made. Evaluations and suggestions are made depending on the results and different neural network architecture.

USPD Doubling or Declining in Next Decade Estimated by WASD Neuronet Using Data as of October 2013

Ten-Quarter Projection for Spanish Central Government Debt via WASD Neuronet

Financial Fragility in Emerging Markets: Examining the Innovative Applications of Machine Learning Design Methods

Article 20 January 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Artificial neural networks (ANN) are widely used in discrimination, sample recognition and estimation problems as an alternative of statistical models [1, 7, 17, 33]. Economics, finance and business have great importance among the problems examined using ANN techniques [1, 16, 27, 29, 30]. There is increasing number of papers on economics that applied ANN models in recent years. Leung et al. [18] compared the performance of different forecasting techniques and general regression in forecasting exchange rates. Kim et al. [15] mentioned usefulness of neural networks for early warning system of economic crisis. They studied Korean economy as a sample of world economic crisis. More examples can be given on the subject.

One of the other macroeconomic problems that are examined using ANN is international debt problem of the countries. There are also numerous researches using statistical methods on this subject. Frank and Cline [10] used multiple discriminant analysis to predict debt-servicing difficulties. Dhonte [8] used principal components analysis to obtain a description of a country’s debt position. Feder and Just [9] used logit analysis in determining debt-servicing capacity. Yoon-Dae [31] made multiple regression analysis in determining of a country’s creditworthiness. Kharas [13] made probit analysis to assess the probability of a country’s becoming uncreditworthy. Cooper [5] used canonical correlation analysis to examine the relationship between country creditworthiness and a country’s recent economic performance.

Interest on macroeconomic problems using ANN has increased since 1990s. Roy and Cosset [26] developed a neural network to estimate sovereign credit ratings. Burell and Folarin [4] used neural networks in order to maintain competitiveness in a global economy and collected the studies on global economy using neural networks until then. Cooper [6] compared the performance of artificial neural networks with logistic, probit regression and discriminant analysis to classify the countries to two classes; reschedule or non-reschedule the international debts. Although there are many studies including classification methods in the literature, the study in this paper differ from them with rich comparisons consist of ANN for whole data and ANN for the partitioned data (training–validation–test). This paper especially pointed out the importance of partitioning the data.

In this paper, rescheduling and non-rescheduling of the international debts of countries is examined using data for 2001. To analyze the data taken into account, logistic and probit regression among the statistical methods and different backpropagation algorithms of multilayer networks as ANN are used. As a result of classification, more successful architectures and algorithms are defined.

2 Statistical methods for classification problems

Researchers usually face with classification problems in application studies. If the problem is discrimination into two groups, logistic regression, probit regression and discriminant analysis are commonly used statistical methods.

Logistic regression is a mathematical modeling approach that can be used to describe the relationship of several independent variables $ x_{1} ,x_{2} , \ldots ,x_{k} $ and a binary dependent variable y, where y is coded as 1 or 0 for its two possible categories. The logistic model describes the expected value of y, in other words the probability of occurrence of one of two possible outcomes of y. The general form of the logistic regression model is

$$ y_{i} = E(y_{i} ) + \varepsilon_{i} , $$

(2.1)

where the observations y _i are independent Bernoulli random variables with expected values

$$ E(y_{i} ) = p_{i} = {\frac{1}{{1 + \exp ( - x_{i}^{T} \beta )}}} $$

(2.2)

and $ x_{i}^{T} = (1,x_{i1} ,x_{i2} , \ldots ,x_{ik} ),\;\beta^{T} \, = \,(\beta_{0} ,\,\beta_{1} ,\,\beta_{2} ,\, \ldots ,\beta_{k} ),\;\varepsilon_{i} \; $ is ith error.

Let $ \eta_{i} = x_{i}^{T} \beta $ be the linear predictor where $ \eta_{i} $ is defined by the logit transformation of the probability p _i

$$ \eta_{i} = \ln {\frac{{p_{i} }}{{1 - p_{i} }}}. $$

The parameters $ \beta^{T} = (\beta_{0} ,\beta_{1} ,\beta_{2} , \ldots ,\beta_{k} ) $ of the linear predictor $ \eta_{i} $ is estimated by the of maximum likelihood method. Let $ \hat{\beta } $ be the final estimate of the model parameters. If the model assumptions are correct, then the estimated value of the linear predictor is constructed as

$$ \hat{\eta }_{i} = \hat{\beta }_{0} + \hat{\beta }_{1} x_{i1} + \cdots + \hat{\beta }_{k} x_{ik} , $$

where k is the number of the independent variables.

Another mathematical modeling approach is probit models. Again, a nonlinear model in p is transformed so that a monotonic function of p is linear with respect to explanatory variables. The probability p _i in the ith cell or the ith observation is given by the standard cumulative normal distribution function [20, 23]:

$$ F(\eta_{i} ) = p_{i} = \int\limits_{ - \infty }^{{\eta_{i} }} {{\frac{1}{{\sqrt {2\pi } }}}\exp \left( { - \frac{1}{2}u^{2} } \right)} {\text{d}}u. $$

(2.3)

The probit transformation is given by the inverse of the standard cumulative normal distribution function F. Solving (2.3) for $ \eta_{i} $ yields

$$ \eta_{i} = F^{ - 1} (p_{i} ) = \,{\text{probit(}}p_{i} ) $$

Thus, the probit model can be written as

$$ F^{ - 1} (p_{i} ) = \eta_{i} = \sum\limits_{k = 0}^{K} {\beta_{k} x_{ik} } $$

or

$$ p_{i} = F\left( {\sum\limits_{k = 0}^{K} {\beta_{k} x_{ik} } } \right), $$

where parameters of the probit model can be estimated by the method of ordinary least square.

There are several computer programs that help to find these parameters of the model such as STATISTICA, SAS, etc.

Since (2.2) and (2.3) give the probability of belonging one of the two groups, they are commonly used in classification problems. Discriminant analysis is another method used for binary classification problems. Since it does not give CCR for such macro-economic problems as good as logistic regression, probit regression and ANN models, it is not studied in this paper [6].

3 Backpropagation neural network algorithms for classification problems

In this study multi-layer perceptrons (MLP) with one or two hidden layers are taken into account. In fact, usually, one layer in network is sufficient in order to realize a good approximation [3, 11, 12, 33].

Backpropagation is the widespread approximation approach for training of the multi-layer feed forward neural networks based on Widrow-Hoff training rule. The main idea here is, adjusting the weights and the biases that minimize the sum of square error by propagation the error back at each epoch. In order to minimize the sum of square error, different backpropagation algorithms are constructed by applying different numeric optimization algorithms among gradient and Newton methods class.

Sum of squares for qth training sample in supervised training situations with n-inputs, m-outputs and p-hidden units, in two layers (with single hidden layer) neural network is calculated as follows:

$$ E^{q} (w) = \sum\limits_{k = 1}^{m} {[y_{k} - t_{k} ]}^{2} = \sum\limits_{k = 1}^{m} {\left[ {f_{o} \left( {\sum\limits_{j = 1}^{p} {w_{jk}^{o} } \cdot f_{h} \left( {\sum\limits_{i = 1}^{n} {w_{ij}^{h} \cdot x_{i} + b_{j}^{h} } } \right) + b_{k}^{o} } \right) - t_{k} } \right]}^{2} $$

(3.1)

where t _k is the kth target, y _k is the kth output value; $ w_{ij}^{h} $ is the weight that connects the ith input and jth output units, $ w_{jk}^{o} $ is the weight that connects the jth hidden and kth output units, $ b_{j}^{h} $ is the bias for jth hidden unit, $ b_{k}^{o} $ is the bias for kth output unit, f _h(·) is activation function applied to the hidden units, and f _o(·) is the activation function that applied to the output units. Here, w is the vector of all weight and bias components. For simplicity of writing q sample indices is not shown up.

A nonlinear (almost an S-shaped sigmoid) f _h(·) function is taken. f _o(·) can also be taken as a linear function. For all N-training sample, the mean square error is defined as:

$$ E(w)\, = \,\frac{1}{n}\sum\limits_{q = 1}^{N} {E^{q} (w)}. $$

(3.2)

3.1 Standard backpropagation (BP) training algorithm

Gradient descent numeric optimization method is used to decrease error in this algorithm [3, 12]. The iteration of this method is as follows:

$$ w_{k + 1} = w_{k} - \alpha \cdot g_{k} $$

or

$$ \Updelta w_{k + 1} = - \alpha \cdot g_{k} $$

where w _k is the current vector of weights and biases, g _k is the current gradient of error function (2.2) at the point w _k, and α is the learning (training) rate. The learning rate is crucial for BP since it determines the magnitude of weight changes.

Standard backpropagation training algorithm that includes momentum is given by the improved gradient descent with momentum formula:

$$ \Updelta w_{k + 1} = - \alpha \cdot g_{k} + \mu \Updelta w_{k} , $$

where Δw _k = w _k − w _k−1 is the weight change in the previous iteration, μ is the momentum coefficient. The weight change in backpropagation training algorithm with momentum is made by the combination of current gradient vector and the previous gradient vector. This change is better for the behavior of the algorithm since it provides the chance of escaping surface local minimums. Of course the better choice of α and μ constants speed up convergence of the algorithm [2, 24, 32].

3.2 Resilient backpropagation (RP) algorithm

MLPs typically use S-shaped sigmoid activation functions in the hidden layers. For this reason, their slope must approach zero as the input gets large. The purpose of the resilient backpropagation (RP) training algorithm is to eliminate these injury effects of the magnitudes of the partial derivatives. In RP algorithm the magnitude of the derivative has no effect on the weight update. Only the sign of the derivative is used to determine the direction of the weight update. If the derivative is zero, then the update value remains the same. Whenever the weights are oscillating the weight change will be reduced. If the weight continues to change in the same direction for several iterations, then the magnitude of the weight change will be increased [25].

Various versions of backpropagation training algorithms to be based on conjugate gradient and Newton methods—second order basic optimization techniques are more efficient in classification and forecasting problems [3, 12].

3.3 Conjugate gradient (CG) algorithms

Search is made conjugate directions in CG algorithms. A set of nonzero n-dimensional vectors $ \{ p_{0} ,p_{1} , \ldots ,p_{n - 1} \} $ is said to be conjugate with respect to the symmetric positive definite n × n matrix A if

$$ p_{i}^{T} Ap_{j} = 0\quad {\text{for}}\;{\text{all}}\;i \ne j $$

First, let us take into account the minimum problem of square function:

$$ F(w) = \frac{1}{2}w^{T} Hw\, - \,b^{T} w $$

(3.3)

where $ w \in R^{n} $ and H is the n × n symmetric positive definite matrix. For a starting point $ w_{0} \in R^{n} , $ the method defined by the formulas below called conjugate direction method [21]:

$$ w_{k + 1} = w_{k} + \alpha_{k} p_{k} $$

(3.4)

$$ \alpha_{k} = \, - {\frac{{p_{k}^{T} g_{k} }}{{p_{k}^{T} Hp_{k} }}} $$

(3.5)

Here, (3.5) is defined by the minimum problem of one variable function of $ \varphi (\alpha ) = F(w_{k} + \alpha p_{k} ). $

For any starting point $ w_{0} \in R^{n} , $ the sequence {w _k} generated by the conjugate direction algorithm (3.4), (3.5) converges to the minimum point w of the problem (3.3) in at most n steps [21].

The conjugate gradient method is a conjugate direction method; start out by searching in the gradient direction on the first iteration

$$ p_{0} = - g_{0} $$

(3.6)

conjugate current p _k vector is calculated by using the previous p _k−1 vector and current gradient

$$ p_{k} = - g_{k} + \beta_{k} p_{k - 1} $$

(3.7)

Coefficient β _k in (3.7) is selected by the condition of p _k−1 and p _k vectors’ being conjugate with respect to the symmetric positive definite n × n matrix H (p _k−1 Hp _k = 0):

$$ \beta_{k} = {\frac{{g_{k}^{T} Hp_{k - 1} }}{{p_{k - 1}^{T} Hp_{k - 1} }}} $$

(3.8)

There are various CG algorithms depending on different calculation formulas of β _k coefficients. Some important ones of those are as follows [3, 12, 21].

3.3.1 Fletcher-Reeves conjugate gradient algorithm (CGF)

For this algorithm β _k is defined by

$$ \beta_{k} = {\frac{{g_{k}^{T} g_{k} }}{{g_{k - 1}^{T} g_{k} }}} $$

This is the ratio of the norm squared of the current gradient to the norm squared of the previous gradient.

3.3.2 Polak–Ribere conjugate gradient algorithm (CGP)

Another version of the CG algorithm was proposed by Polak and Ribere (CGP), for which β _k is defined by

$$ \beta_{k} = {\frac{{g_{k}^{T} (g_{k} - g_{k - 1} )}}{{g_{k - 1}^{T} g_{k - 1} }}} $$

Now, (3.2) error function is taken into account. Since activation functions are nonlinear, sum of mean squared error function E(w) is nonlinear function of w. Using first two terms of the Taylor series expansion of E(w) around w _k, we can approximate function E(w) to a squared function

$$ E(w_{k + 1} ) - E(w_{k} ) = \nabla E(w_{k + 1} )\, \approx \,g_{k}^{T} \Updelta w_{k} \, + \,\frac{1}{2}\Updelta w_{k}^{T} H_{k} \Updelta w_{k} $$

(3.9)

where H _k is the Hessian matrix in current point w _k:

$$ H_{k} = {\frac{{\partial^{2} E(w_{k} )}}{{\partial w^{2} }}} $$

So, in each current iteration taking function (3.9) instead of function (3.3) suitable minimization problem is taken into account and using (3.4)–(3.8), local conjugate gradient is realized. In this case, Hessian matrix H _k is assumed positive definite.

3.3.3 Scaled conjugate gradients (SCG) algorithm

CG algorithms reviewed above makes a linear search in each iteration and that makes calculations very expensive.

Moller [19] put forward scaled conjugate gradient algorithm that does not use line search procedure in traditional CG algorithms. The basic idea of SCG is to combine the model trust region approach with the conjugate gradient approach [3, 21].

Instead of using line minimization which involves few error function evaluations, Moller proposed Hp _k evaluation with O(W) order operations (W is the total number of weights and biases). This approach can be unsuccessful when Hessian matrix H in local approximation is not positive definite. In this case, the denominator of (3.5) $ p_{k}^{T} Hp_{k} $ will be negative and finally weight update can increase the value of error function. The problem can be overcome by modifying the Hessian matrix to ensure that it is positive definite, for example

$$ H_{\bmod } = H + \lambda I $$

where I is the unit matrix, λ ≥ 0 is a scaling coefficient. We can make Hessian matrix positive definite by choosing the big positive value of λ. So, the step length formula (3.5) is changed as follows:

$$ \alpha_{k} = - {\frac{{p_{k}^{T} g_{k} }}{{p_{k}^{T} Hp_{k} + \lambda_{k} \left\| {p_{k} } \right\|^{2} }}} $$

(3.10)

while λ _k increases α _k decreases. The size of the trust region is adjusted by λ _k parameter. While λ _k increases trust region gets smaller. The choice of appropriate λ _k is very important for each step of SCG algorithm.

To mark from $ \delta_{k} $, the denominator of (3.10) Moller [19] chooses $ \lambda_{k}^{*} $ as follows,

$$ \lambda_{k}^{*} = 2\left( {\lambda_{k} - {\frac{{\delta_{k} }}{{\left\| {p_{k} } \right\|^{2} }}}} \right). $$

For this reason, $ \delta_{k}^{*} > 0 $ is realized. That value $ \delta_{k}^{*} \, = \, - p_{k}^{T} H_{k} p_{k} \, $ is used instead of denominator in calculation of α _k step length in (3.10).

3.4 Quasi-Newton (QN) algorithms

The basic step of Newton’s method is

$$ w_{k + 1} \, = \,w_{k} - H_{k}^{ - 1} g_{k} $$

(3.11)

where $ H_{k} $ is the Hessian matrix in current point w _k. Newton’s method often converges faster than conjugate gradient methods. Unfortunately, it is complex and expensive to compute the Hessian matrix. Quasi-Newton (or secant) methods are based on Newton’s method, but which does not require calculation of second derivatives. They update (the update is computed as a function of the gradient) an approximate Hessian matrix at each iteration of the algorithm.

3.4.1 The Broyden, Fletcher, Goldfarb and Shanno (BFGS) procedure

This is one of the successful QN algorithms. In BFGS algorithm, the approximation G of the inverse Hessian matrix in (3.11) is constructed as

$$ G_{k + 1} = G_{k} + {\frac{{pp^{T} }}{{p^{T} v}}} - {\frac{{G_{k} vv^{T} G_{k} }}{{v^{T} G_{k} v}}} + (v^{T} G_{k} v)uu^{T} , $$

where $ p = w_{k + 1} - w_{k} ,\;v = g_{k + 1} - g_{k} ,\;u = {\frac{p}{{p^{T} v}}} - {\frac{{G_{k} v}}{{v^{T} G_{k} v}}}, $

G is a positive definite matrix, −Gg is the descent way at each step of the algorithm. Weights update is made as follows:

$$ w_{k + 1} = w_{k} + \alpha_{k} G_{k} g_{k} , $$

where α_k is found by line minimization.

3.4.2 The one-step secant (OSS) algorithm

This is an attempt to bridge the gap between the CG algorithms and Quasi-Newton algorithms. That algorithm accepts that the previous Hessian is an identity matrix and does not calculate inverse matrixes in choosing the new search direction. This algorithm involves very light storage and fewer calculations in each step than the CG algorithms.

4 Definition and importance of rescheduling the international debts

International debt crisis occurred after financial crisis, riveted attention on political issues at the international level. If global economic justice is to be achieved, debt crises must be assessed within the broader context of the international financial system. This system has led to instability and recurrent financial crises that have severely harmed the interests of poor countries and their people [22].

The financial crises lived in the world badly affect the economies of the countries and debt crises follow economic crises, eventually the debt problem dominates all policy decisions.

Since late 1982, debt crisis has been almost universal as a result of the developing countries’ fail to service their debt. Governments and multilateral agencies then stepped in, but this official activity after August 1982 should not beguile one into thinking that the major governments were absent from the game earlier [28]. The definition of the crisis in the Northern industrial countries was uniform: the onset of widespread difficulties in servicing the mountain of developing country debt threatened the stability of the international financial system [14].

At the end of 1990s, another financial turmoil in Asia that erupted in mid-1997 has abated since January, and markets have partially recovered from their troughs. Nevertheless, currency and asset values in the countries involved have been left far below precrisis levels and considerable uncertainty remains about the resolution of the crisis and its global repercussions. The crisis and its effects were apparent in December 1997. So the last economic crisis lived in the world is Asian economic crisis of 1998.

Since international debt crisis effects all economical and political balances of countries, in this study, countries’ rescheduling and non-rescheduling of their international debts for 2001 data for 96 countries after Asian economic crisis of 1998 are taken into account. The effects of that crisis have occurred a few years later. So, the data related with 96 countries’ rescheduling and non-rescheduling of the international debts for 2001 are used.

The data are taken from The World Bank Group web site and The World Bank e-library. The decision of which country is in rescheduled class is given by observing the tables of Stand-By Arrangements published in the Journals of IMF Survey over 1997–2001. Hence, if a country made Stand-By Arrangement by IMF, then it is chosen as a country reschedules its debt and the value of related dependent variable is 1, else that country is non-reschedule and the value of related dependent variable is 0.

The factors that affect the rescheduling and non-rescheduling of the international debts defined as follows and those factors are taken as independent variables for logistic and probit regression, and inputs for ANN:

X₁: Average increase in GDP growth (annual %) over 1960–2001
X₂: The ratio of short term debt (%GDP) to export of goods and services (%GDP)
X₃: Debt service ratio, the interest on total external debt plus amortization on long-term debt as a percentage of the exports of goods and services in 2001
X₄: The import of cover and equals international reserves divided by total imports in 2001 (US $ million)
X₅: The ratio of export of goods and services (%GDP) to import of goods and services (%GDP).

The number of input variables is five for MLP model since all those variables are taken as input variables. The output variable can either be 1 for rescheduled debts if probability is equal or greater than 0.5 or 0 for non-rescheduled debts if probability is less than 0.5. So in output layer of the network, there is only one neuron.

5 Estimation of the results and comparison of different methods

In this study, first the architectures with two hidden layers are used and since they do not have much contribution, they are not involved. It is seen that MLP with one hidden layer gives high CCR. So MLP for 5:p:1 is taken into account. The undefined parameter here is the number of units in the hidden layer. Since the initial weights and biases are random for any network and algorithm taking different values of p between 1 and 15, 33 experiments are made and as a result of that, the networks for p ranging between 6 and 10 are preferred. The CCR for different training algorithms for these networks are given in Table 1.

Table 1 Correct classification ratio for neural networks and for statistical methods

Full size table

According to Table 1, the best CCR are obtained by BFGS and OSS algorithms (97.92). The CCR for logit (85.42) and probit (86.46) are quite lower than the NN results.

Even if the results of training of the neural network for the whole data set are good, the results of extrapolation may be bad for the whole data. In this case the network becomes memorizer and even the small changes in data can cause big mistakes. To avoid this drawback, the data are divided into three parts for training, validation and test. At the stage of training, the neural network (adjusting the weights) only training data are used but with the help of validation data, the performance of the network is checked at each epoch. Training automatically stops when the total error square starts to increase for validation data and in this case the parameters of neural network are obtained. At this stage, test data are taken as extraneous data. The network with minimum total square error for each of training, validation and test data is accepted the one that gives the best performance.

For the problem in question, the data set, obtained from World bank’s data bank, with 96 countries is divided to 3 parts; 48 units for training, 24 units for validation and 24 units for test. Experiments are made by applying different algorithms for MLPs with one and two hidden layers. The models that give better performance and are suitable for each algorithm are chosen by changing the number of units in hidden layer. Thirty-three experiments are made for each of architecture. CCR is taken as choice criteria. As a result, the chosen models with one hidden layer are shown in Table 2 and with two hidden layers are shown in Table 3. The values in parenthesis indicate the CCR of the test data.

Table 2 Correct classification ratio of neural network algorithms with one hidden layer

Full size table

Table 3 Correct classification ratio of neural network algorithms with two hidden layer

Full size table

As results of experiments for training data set CCR is calculated as 88.54 and for test data 79.17. That means RP algorithm gives the best performance. The architectures which give that result for RP algorithm are (the number of neurons in first and second hidden layers are shown, respectively) as follows: 3:4, 4:3, 4:5, 4:6, 4:8, 4:9, 5:1–5:3, 5:5–5:8, 5:11, 6:1, 6:3–6:5, 6:7, 6:9, 6:11, 7:1–7:9, 8:1–8:6, 8:8–9:5, 9:8, 9:9, 9:11–10:6, 10:8–11:4, 11:7, 11:9. Examining the results, the second best performance is obtained by SCG algorithm.

The applications of ANN are made using MATLAB 6.0 software.

According to CCR results, it can be concluded that, when the whole data is used for training in neural network, ANN gives much better performance than logit and probit models. Especially CG algorithms make much better classifications. However, QN algorithms have the best performance for 5:9:1 (BFGS) and 5:10:1 (OSS) architectured MLP with 97.92 CCR.

As mentioned before, if the network is trained for whole data, memorizing property comes out and causes bad estimates for extraneous data. It is seen from Table 2 that, when the data divided to 3 parts, for one hidden layer MLP, generally the CCR for training result is 88.54 and test result is 79.17. Each RP algorithm for moderate or small amount of data, gives the best and SCG gives the second best performance.

Examining Table 3, it can be seen that from two different results the CCR for whole data is 89.58 and for test data is 83.33. As a result, those ratios are obtained for BFGS algorithm with 5:5:11:1 architectured MLP and RP algorithm for 5:11:5:1 architectured MLP. At this stage again, RP algorithm makes better estimations.

6 Conclusions

In this paper, rescheduling and non-rescheduling of the international debts of countries is examined using data for 2001. To analyze the data taken into account, logistic and probit regression among the statistical methods and different backpropagation algorithms of multilayer networks as ANN are used. As a result of classification, more successful architectures and algorithms are defined.

Different ANN models are compared with the logit and probit models for classification problem and results are evaluated using CCR criterion.

CCR results show that, when the whole data is used for training in neural network, ANN gives much better performance than logit and probit models. Especially, CG algorithms make much better classifications.

If the network is trained for whole data, memorizing property comes out and causes bad estimates for extraneous data. Each RP algorithm for moderate or small amount of data, gives the best and SCG gives the second best performance. At this stage again, RP algorithm makes better estimations. This result especially points out the importance of partitioning the data.

General results for the problem in question are quite high, and those can be obtained by RP algorithm. In this method, misclassifications are occurred only for Cape Verde, Nigeria and Sri Lanka among 96 countries. The second choice that gives good results for the problem is SCG algorithm. Misclassified countries here are Bosnia and Herzegovina, Cape Verde and Sri Lanka.

Consequently, ANN, as an alternative to statistical methods, usually shows better performance in estimation and classification for many empirical studies. Also ANN has a great advantage of not requiring the assumptions which are necessary for statistical methods. Moreover, ANN has another advantage for extraneous data and prediction. However, the choice of suitable architecture and algorithm has a great importance. Depending on the idea of getting better results using RP and SCG algorithms for classification problems, those algorithms are suggested for that kind of problems. It is also necessary to be cautious about the QN algorithms that can give good results at the stage of choosing the suitable network algorithms.

References

Aslanargun A, Mammadagha M, Yazici B, Yolacan S (2007) Comparison of ARIMA, neural networks and hybrid models in time series: tourist arrival forecasting. J Stat Comput Simul 77(1):29–53
Article MathSciNet Google Scholar
Bhaya A, Kaszkurewich E (2004) Steepest descent with momentum for quadratic functions is a version of the conjugate gradient method. Neural Netw 17:65–71
Article Google Scholar
Bishop CM (1995) Neural networks for pattern recognition. Clarenden Press, Oxford
Google Scholar
Burell PR, Folarin BO (1997) The impact of neural networks in finance. Neural Comput Appl 6(4):193–200
Article Google Scholar
Cooper JCB (1985) Country creditworthiness: a canonical correlation analysis. Glasgow Caledonian University economics discussion paper, no. 3
Cooper JCB (1999) Artificial neural networks versus multivariate statistics: an application from economics. J Appl Stat 26:909–921
Article Google Scholar
Cheng B, Titterington D (1994) Neural networks: a review from a statistical perspective. Stat Sci 9(1):2–54
Article MathSciNet Google Scholar
Dhonte P (1975) Describing external debt situations: a roll over approach. IMF Staff Papers 24:159–186
Google Scholar
Feder G, Just R (1977) A study of debt servicing capacity applying logit analysis. J Dev Econ 4:25–38
Article Google Scholar
Frank CR, Cline WR (1971) Measurement of debt service capacity: an application of discriminant analysis. J Int Econ 1:327–344
Article Google Scholar
Hagan MT, Demuth HB, Beale MH (1996) Neural network design. PWS, Boston
Google Scholar
Haykin S (1999) Neural networks: a comprehensive foundation. Prentice Hall, New Jersey
Google Scholar
Kharas H (1984) The long run creditworthiness of developing countries: theory and practice. Quart J Econ 99:415–439
Article Google Scholar
Kahler M (1985) Politics and international debt: explaining the crisis. Int Organ 39(3):357–382
Article Google Scholar
Kim TY, Oh KJ, Sohn I, Hwang C (2004) Usefulness of artificial neural networks for early warning system of economic crisis. Expert Syst Appl 26(4):583–590
Article Google Scholar
Kuan CM, White H (1994) Artificial neural networks: an economic perspective. Econ Rev 13(1):1–91
Article MathSciNet Google Scholar
Kumar UA (2005) Comparison of neural networks and regression analysis: a new insight. Expert Syst Appl 29:424–430
Article Google Scholar
Leung MT, Chen A, Daouko H (2000) Forecasting exchange rates using general regression neural networks. Comput Oper Res 27:1093–1110
Article Google Scholar
Moller M (1993) Scaled conjugate gradient algorithm for fast supervised learning. Neural Netw 6(4):525–533
Article Google Scholar
Montgomery DC, Peck EA, Vining GG (2001) Introduction to linear regression analysis. Wiley, New York
Google Scholar
Nocedal J, Wright SJ (1999) Numerical optimization. Springer, New York
Book Google Scholar
Pettifor A (2003) Resolving international debt crises fairly. Ethics Int Aff 17(2):2–9
Article Google Scholar
Powers DA, Yu X (2000) Statistical methods for categorical data analysis. Academic Press, London
Google Scholar
Qian N (1999) Efficient backpropagation learning using optimal learning rate and on the momentum term in gradient descent learning algorithms. Neural Netw 102(1):145–151
Article Google Scholar
Riedmiller M, Braun H (1993) A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In: Proceedings of the IEEE international conference on neural networks
Roy J, Cosset J (1990) Forecasting country risk ratings using a neural network. In: Proceedings of the 23rd Hawaii international conference on systems sciences. 4:327–334
Sharda R (1994) Neural networks for the MS/OR analyst: an application bibliography. Interfaces 24(2):116–130
Article Google Scholar
Wellons PA (1985) International debt: the behavior of banks in a politicized environment. Int Organ 39(3):441–471
Article Google Scholar
Wilson R, Sharda R (1992) Neural networks. OR/MS Today, August, 36–42
Wong B, Bodnovich T, Selvi Y (1995) A bibliography of neural networks business application research: 1988-September 1994. Expert Syst 12(3):253–262. http://www.worldbank.org/data/countrydata
Yoon-Dae E (1979) Commercial banks and the creditworthiness of less developed countries. UMI Research Press, Ann Arbor, MI
Google Scholar
Yu XH, Chen GA (1997) Efficient backpropagation learning using optimal learning rate and momentum. Neural Netw 10(3):517–527
Article MathSciNet Google Scholar
Zhang GP, Patuwo EB, Hu MY (1998) Forecasting with artificial neural networks: the state of the art. Int J Forecast 14:35–62
Article Google Scholar

Download references

Acknowledgments

This study was supported by the Research Fund of Anadolu University Eskisehir, Turkey. We appreciate to Prof. Dr. Ilyas SIKLAR from Department of Economics for his very valuable comments.

Author information

Authors and Affiliations

Department of Statistics, Anadolu University, 26470, Eskisehir, Turkey
Berna Yazici, Memmedaga Memmedli, Atilla Aslanargun & Senay Asma

Authors

Berna Yazici
View author publications
You can also search for this author in PubMed Google Scholar
Memmedaga Memmedli
View author publications
You can also search for this author in PubMed Google Scholar
Atilla Aslanargun
View author publications
You can also search for this author in PubMed Google Scholar
Senay Asma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Senay Asma.

Appendix

See Table 4.

Table 4 Data for dependent and independent variables

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yazici, B., Memmedli, M., Aslanargun, A. et al. Analysis of international debt problem using artificial neural networks and statistical methods. Neural Comput & Applic 19, 1207–1216 (2010). https://doi.org/10.1007/s00521-010-0422-4

Download citation

Received: 08 May 2009
Accepted: 28 June 2010
Published: 11 July 2010
Issue Date: November 2010
DOI: https://doi.org/10.1007/s00521-010-0422-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Analysis of international debt problem using artificial neural networks and statistical methods

Abstract

Similar content being viewed by others

USPD Doubling or Declining in Next Decade Estimated by WASD Neuronet Using Data as of October 2013

Ten-Quarter Projection for Spanish Central Government Debt via WASD Neuronet

Financial Fragility in Emerging Markets: Examining the Innovative Applications of Machine Learning Design Methods

1 Introduction

2 Statistical methods for classification problems

3 Backpropagation neural network algorithms for classification problems

3.1 Standard backpropagation (BP) training algorithm

3.2 Resilient backpropagation (RP) algorithm

3.3 Conjugate gradient (CG) algorithms

3.3.1 Fletcher-Reeves conjugate gradient algorithm (CGF)

3.3.2 Polak–Ribere conjugate gradient algorithm (CGP)

3.3.3 Scaled conjugate gradients (SCG) algorithm

3.4 Quasi-Newton (QN) algorithms

3.4.1 The Broyden, Fletcher, Goldfarb and Shanno (BFGS) procedure

3.4.2 The one-step secant (OSS) algorithm

4 Definition and importance of rescheduling the international debts

5 Estimation of the results and comparison of different methods

6 Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Analysis of international debt problem using artificial neural networks and statistical methods

Abstract

Similar content being viewed by others

USPD Doubling or Declining in Next Decade Estimated by WASD Neuronet Using Data as of October 2013

Ten-Quarter Projection for Spanish Central Government Debt via WASD Neuronet

Financial Fragility in Emerging Markets: Examining the Innovative Applications of Machine Learning Design Methods

Explore related subjects

1 Introduction

2 Statistical methods for classification problems

3 Backpropagation neural network algorithms for classification problems

3.1 Standard backpropagation (BP) training algorithm

3.2 Resilient backpropagation (RP) algorithm

3.3 Conjugate gradient (CG) algorithms

3.3.1 Fletcher-Reeves conjugate gradient algorithm (CGF)

3.3.2 Polak–Ribere conjugate gradient algorithm (CGP)

3.3.3 Scaled conjugate gradients (SCG) algorithm

3.4 Quasi-Newton (QN) algorithms

3.4.1 The Broyden, Fletcher, Goldfarb and Shanno (BFGS) procedure

3.4.2 The one-step secant (OSS) algorithm

4 Definition and importance of rescheduling the international debts

5 Estimation of the results and comparison of different methods

6 Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation