On the distribution of links in financial networks: structural heterogeneity and functional form

Lux, Thomas

doi:10.1007/s00181-018-1569-6

On the distribution of links in financial networks: structural heterogeneity and functional form

Published: 19 October 2018

Volume 58, pages 1019–1053, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Empirical Economics Aims and scope Submit manuscript

On the distribution of links in financial networks: structural heterogeneity and functional form

Download PDF

Thomas Lux¹

317 Accesses
3 Citations
Explore all metrics

A Correction to this article was published on 08 October 2019

This article has been updated

Abstract

We investigate the distribution of links in three large data sets: one of these covering interbank loans in the electronic trading platform e-MID, and the other two covering a large part of the loans of banks to non-financial companies in the Spanish and Japanese economies, respectively. In contrast to all the previous literature, we do not assume homogeneity of the link distribution over time and across different categories of agents (banks, firms) but apply our hypothesized distributions as regression models. As it turns out, many of the tested sources of heterogeneity turn out to be significant regressors. For instance, we find pervasive time heterogeneity of link formation in all three data sets, and also heterogeneity for different categories of banks/firms that can be identified in the data as well as some explanatory power of balance sheet statistics in the case of the Japanese data set. Across all networks, the Negative Binomial model almost always outperforms all alternative models confirming its good performance as a model of economic count data in many previous applications.

On the distribution of links in the interbank network: evidence from the e-MID overnight money market

Article 13 February 2015

Network Effects and Systemic Risk in the Banking Sector

Network models of financial systemic risk: a review

Article Open access 21 November 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Different major classes of networks are routinely associated with their implied distributions of degrees, i.e., the number of links of nodes they generate. The major prototypes are Erdös–Renyi and scale-free networks. The former are random networks that are characterized by a constant probability of existence of a link which obviously leads to a Binomial distribution of links that converges toward the Poisson distribution for large networks. Scale-free networks somehow mark the opposite end of the spectrum in that they generate a very broad distribution of links via some kind of amplification mechanism (like preferential attachment of new nodes to those that already possess a large number of connections). As a result, the degree distribution emerging from such a generating mechanism is of a very heterogeneous nature and its scale-free behavior corresponds to a power-law decay of the distribution of links over its entire range or at least in the upper tail region. Almost all of the related literature focuses on these two possibilities. However, the Poisson and power-law distributions do certainly not constitute an exhaustive list of candidate distributions for the number of links in a network setting. Indeed classes of networks exist which focus on properties other than the degree distribution and for which no general results for the distribution of links are available. Examples are ‘small-world’ networks which are defined by a small average distance between nodes (Watts and Strogatz 1998) or ‘core-periphery’ networks that are defined by a dichotomic classification of nodes into a core group and its periphery (Borgatti and Everett 1999). Both of these classes might contain members that also share the property of an (asymptotically) power-law-like distribution of links or not. In how much these different categorizations overlap or exclude each other seems to be completely unknown and has not received any attention so far. However, the existence of such alternative categorizations of classes of networks and their pertinent generating mechanisms makes it likely that for some empirical networks, other distributions than the Poisson and scale-free could better describe the data.

This should also apply to financial networks, for which the asserted scale-free behavior had already been disputed in certain cases (cf. Fricke and Lux 2015). Due to the dominance of the Erdös–Renyi and scale-free paradigms, theoretical modeling has typically made use only of these two classes of models (Nier et al. 2008; Haldane and May 2011; Anand et al. 2013; Krause and Giansante 2013). When generating the link structure of a theoretical model in this way, any inference on the stability of the network and its susceptibility to contagion effects after shocks would be determined to a large extend by the (known) properties of the pertinent class of models. Hence, the extent of contagious cascade effects might be underestimated or overestimated because of deviations of these theoretical benchmarks from the empirical structure. It, thus, appears worthwhile to expand the range of candidate distributions and generating mechanisms beyond these classical ones as it appears likely that often the distribution of links is located somewhere between these extremes. A better and hopefully robust characterization of the degree distribution should, therefore, be valuable input to inform the mushrooming literature on network contagion studies of the banking sector.

Continuing the line of research initiated by Fricke and Lux (2015), this paper will look at some intermediate distributions from the large class of compound Poisson distributions (Karlis and Xekalaki 2005) that have been found appropriate for modeling discrete events in various fields but have seemingly not been applied to the discrete variables defined by the counts of the number of links within a network so far. We will focus here on the Poisson–Gamma and Poisson–Pareto distributions along with the original Poisson and discrete Pareto (aka power law or scale-free) distribution and will compare the performance of these four alternatives for three important data sets: one covering interbank credit connections and the other two capturing the network structure of bank-firm loans. As another novel feature within the financial network literature, we will also apply most of the mentioned distributions within a regression framework. In this way, we can identify the influence of certain characteristics of the nodes on their propensity to form links.

We will estimate these models for three large data sets of financial linkages due to loan contracts: interbank loans contracted via the electronic trading platform e-MID, and loans of financial institutions to non-financial firms in the Spanish and Japanese economies. All data sets are available over at least one decade. The e-MID data contain daily recordings of all interbank loans, while the other two data sets have yearly granularity. As it will turn out, heterogeneity is pervasive in all three data sets along various dimensions: There is both a change over time of the shape of the estimated distributions as well as a highly significant influence of whether banks/firms belong to some basic classes of agents that can be distinguished in the data. For the Japanese data set, we can also identify an influence of certain balance sheet statistics on the degrees of banks (for the other data sets, such covariates are not available). These exogenous effects are mostly very robust as they appear in a qualitatively similar way in all distributions under consideration. Irrespective of inclusion of exogenous effects or not, in almost all cases, the Negative Binomial exhibits the best fit and dominates all alternatives at any standard level of significance.

The rest of the paper is structured as follows: Sect. 2 introduces the various distributions under investigation and their use as regression models. Section 3 describes our data, and Sect. 4 provides the empirical results. Section 5 concludes.

2 Statistical models

Since degree distributions are by definition distributions of discrete variables, the present paper confines itself to comparing the performance of discrete distributions. The simplest benchmark is the Poisson distribution given by:

$$\begin{aligned} P(x)=\frac{e^{-\lambda }\lambda ^x}{x!}, \end{aligned}$$

(1)

where x is the number of links (the degree of a node) and $\lambda $ the unique parameter of this distribution function. We note that empirical degree distributions are typically truncated at zero, simply because pertinent data are only collected for entities that are at least minimally connected to the network under investigation. Hence, in such applications we would have to use the truncated Poisson distribution which is given by:

$$\begin{aligned} P_{T}(x)=\frac{e^{-\lambda }\lambda ^x}{x!(1-e^{-\lambda })}, \end{aligned}$$

(2)

where the additional term in the denominator adjusts for the ‘missing’ zero of the empirical data (note that $P(0)=e^{-\lambda }$ in the original Poisson distribution).

The Poisson distribution approximates the exact Binomial distribution of degrees in Erdös–Renyi networks with a high degree of accuracy if the networks are not too small. Since all our applications would be based on at least three-digit numbers of nodes, the Poisson estimates should be virtually identical to estimates for a Binomial distribution.

The power law characterizing scale-free networks is usually described and estimated in its continuous version, i.e., $p(x)\sim x^{-\alpha }$. However, this of course neglects the discrete nature of the data. The discrete counterpart of the continuous Pareto distribution is also known as the Zipf or Zeta distribution, and it is given by the probability mass function:

$$\begin{aligned} P_{\alpha }(x)=\frac{x^{-\alpha }}{\zeta (\alpha )}, \end{aligned}$$

(3)

where $\zeta (\alpha )$ is the zeta function $\zeta (s)=\sum _{n=1}^\infty \frac{1}{n^s}$. No adjustment for the lack of zeros is needed in this case as the support of the discrete Pareto covers only positive integers. Besides the elementary Poisson and the discrete Pareto, the most frequently encountered classes of discrete distribution functions are compound Poisson distributions. Two of these are used in this paper: The first is the Negative Binomial (NBD) which results if the parameter $\lambda $ of the original Poisson distribution (1) is drawn from a Gamma distribution. Note that this amounts to drawing the realizations from a family of Poisson distributions with heterogeneous mean values and hence can be seen as a reflection of heterogeneity of the statistical features of the nodes in a network. We adopt here the following functional form of the Negative Binomial:

$$\begin{aligned} N(x)=\frac{\varGamma (\theta + x) \tau ^\theta (1-\tau )^x}{\varGamma (1+x) \varGamma (\theta )} \quad {\hbox {with}} \quad \tau = \frac{\theta }{\theta + \lambda } \end{aligned}$$

(4)

with $\varGamma (.)$ the gamma function, $\varGamma (n)=(n-1)!$, and $\theta $ and $\lambda $ the two parameters for the shape of the distribution. Alternative functional forms can be found in Greene (2008). The one of Eq. (4) is preferred in the present context as it can be easily related to the Poisson distribution, since the mean value is in both cases identical to the pertinent parameter $\lambda $ and the Negative Binomial converges to the Poisson for $\theta \rightarrow \infty $. The Negative Binomial has become hugely popular in many applications featuring discrete data as it is able to capture the widespread phenomenon of overdispersion, i.e., the variance exceeding the mean. Namely, while it is well known that the variance of the Poisson distribution is $Var_{P}(x)=\lambda $, for the Negative Binomial we obtain $Var_{N}(x)=\lambda (1+\frac{\lambda }{\theta })>\lambda $. For applications without zero counts, we also need to adjust the Negative Binomial in an appropriate way to obtain its truncated version:

$$\begin{aligned} N_{T}(x)=\frac{\varGamma (\theta + \lambda ) \tau ^\theta (1-\tau )^x}{\varGamma (1+x) \varGamma (\theta ) (1-\tau ^\theta )}. \end{aligned}$$

(5)

The Negative Binomial enjoys an almost legendary reputation in marketing as the most versatile tool for fitting purchase frequencies of consumer goods. This literature has been initiated by Ehrenberg (1959) and surveyed by Schmittlein et al. (1985).

The last candidate to be considered in this paper is the Pareto–Poisson mixture. This compound model had been studied before in the actuarial literature (cf. Albrecht 1984) and has been proposed by Lux (2016) as a model for the degree distribution of credit networks. The justification for this functional form was the plausible observation that the number of credit links of both banks and non-financial firms is increasing with their balance sheet size (de Masi and Gallegati 2012; de Masi et al. 2011). Taking the size of the underlying entity as a latent variable in a compound Poisson model and taking into account that firm size distributions are close to a power or Pareto law^{Footnote 1} leads to a formalization in which the shape parameter of the Poisson distribution is drawn from a Pareto law:

$$\begin{aligned} PP(x)=\int _{{\underline{\lambda }}}^\infty \frac{e^{-\lambda } \lambda ^x}{x!} \alpha \frac{{\underline{\lambda }}^\alpha }{\lambda ^{\alpha + 1}} \mathrm{d}\lambda \end{aligned}$$

(6)

which defines a family of distributions with two parameters, $\alpha $ and ${\underline{\lambda }}$. A closed-form solution for the integral in Eq. (6) is not available, so that the probability mass function can only be solved via numerical integration. In Eq. (6), $\alpha $ is the usual shape parameter of the Pareto distribution (note that since the latent variable ‘firm size’ is a continuous variable, we can adopt here the standard Pareto law), and ${\underline{\lambda }} > 0$ is a lower boundary for the latent variable which is necessary to guarantee convergence of the integral. Again, we need the zero-truncated counterpart of Eq. (6) which formally we obtain by setting:

$$\begin{aligned} PP_T (x)=\frac{PP(x)}{1-PP(0)} \end{aligned}$$

(7)

which again is obtained by numerical integration. It is worthwhile to add that most applications (e.g., in marketing) use the Poisson and Negative Binomial as regression models (cf. Hilbe 2007), i.e., apply it for modeling the dependency of variables obeying such distributions on exogenous variables. While network data have to the best of my knowledge only be described via unconditional distributions so far, such a perspective would be most informative if additional information on the characteristics of the nodes were available. The Poisson and Negative Binomial model could be embedded into a regression framework by setting:

$$\begin{aligned} \lambda _i = exp(\mu + \mathbf{y }_{i}' \beta ) \end{aligned}$$

(8)

where $\mathbf{y }_i$ is a vector of covariates and $i=1,\ldots ,N$ is the sample of nodes of the network. This adds node-specific heterogeneity even in the Poisson model and, in the case of the Negative Binomial, could be interpreted as a combination of both observable and unobservable heterogeneity, the later being represented by the Gamma mixing distribution.

I am not aware of any previous use of the Poisson–Pareto model within a regression framework. Nevertheless, this family can also easily be cast into such a format. It can be shown that the mean of Eq. (6) is $E[x]=\frac{\alpha }{\alpha - 1} {\underline{\lambda }}$ and so it seems most natural to allow exogenous effects to enter via ${\underline{\lambda }}$:

$$\begin{aligned} {\underline{\lambda }}=\hbox {exp}(\mu + \mathbf{y }_{i}' \beta ) \end{aligned}$$

(9)

While Eq. (9) is motivated by the Poisson regression framework, it also allows inference on the influence of exogenous factors if the mean actually does not exist, i.e., if $\alpha \le 1$ holds.

In contrast, no straightforward way suggests itself to add a regression framework to the discrete Pareto distribution, and so we just apply this alternative in its unconditional format. Since not too much knowledge is available in our data set on the characteristics of individual nodes, the regression framework model is used to allow for fixed effects of different years, as well as different categories of actors the nodes belong to and so we can investigate whether this categorization is of relevance for the number of their links. In the case of the Japanese data set, we are able to add non-categorical covariates as these data come with balance sheet information besides the identities of borrowers and lenders.

It appears worthwhile to note that the statistical fitting of degree distributions is just one level of analysis in network research. Another, equally important approach would consist in modeling not the degrees of the actors, but the particular structure of links within the network. In such an analysis, the effects of actor-specific and dyadic characteristics as well as various network effects themselves (such as reciprocity, transitivity or closure of subsets) would be investigated. The method of choice in recent literature for such analyses is the so-called exponential random graph model (ERG) that basically captures all candidate factors of influence in an exponential function determining the linking probabilities between each pair of actors (cf. Lusher et al. 2013). While our fitting of degree distributions cannot shed any light on endogenous factors such as reciprocity in the formation of a specific network, it can provide information on important covariates that should be included when estimating an ERG model for the same data. It is plausible that any significant explanatory variables for the degrees of the actors should exert their influence via a higher or lower probability of link formation of these actors under certain circumstances, and so it would be surprising to not find these variables also entering significantly in an ERG model. The same applies for behavioral analyses of network formation on the base of longitudinal data and actor-based models of link formation over time (cf. Finger and Lux 2017 for such an analysis for the interbank market that yields results from a different perspective that are broadly in harmony with those reported below).

3 The data

We consider three large data sets of credit links: The first covers all transactions in the interbank money market conducted within the electronic trading platform e-MID over the years 1999–2014.

The second data set is a comprehensive database of credit extended from banks to non-financial firms in Spain which has been extracted from the SABI (Sistema de Análisis de Balances Ibéricos) archive based on the public commercial registry in Spain. This complete list of bank connections of all publicly registered companies is available for each year from 1997 to 2008 comprising more than 500,000 links between individual banks and their borrowers. Our third data set is a similarly large record of credit links between banks and non-financial firms in Japan collected by Nikkei Media Marketing, Inc., from financial statements of the firms included. These data are available for us over the period from 1979 through 2011. The Japanese data set also includes a variety of balance sheet items of which we construct some key financial statistics to be included among the covariates of Eqs. (8) and (9) .

All three sources have been used in other studies before: The SABI database has been used by Illueca et al. (2014) who study the effects of the regional expansion of Spanish saving banks during the real-estate boom of the years after the introduction of the Euro. The Japanese data have been investigated from a network perspective by Marotta et al. (2015). The e-MID data feature prominently in quite a number of contributions to financial network theory (e.g., de Masi et al. 2006; Fricke and Lux 2015) as it is the only commercially available data set in this area.

Table 1 Basic statistics of network data

On the distribution of links in financial networks: structural heterogeneity and functional form

Abstract

Similar content being viewed by others

On the distribution of links in the interbank network: evidence from the e-MID overnight money market

Network Effects and Systemic Risk in the Banking Sector

Network models of financial systemic risk: a review

1 Introduction

2 Statistical models

3 The data

4 Empirical results

4.1 Interbank Loans from e-MID Platform

4.2 Spanish bank-firm credit network

4.3 Japanese bank-firm credit network

5 Conclusion

Change history

08 October 2019

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Histograms and log–log plots of degree distributions

Appendix: Histograms and log–log plots of degree distributions

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation