Penalized Independent Factor

Chen, Y.; Chen, R. B.; He, Q.

doi:10.1007/978-3-662-54486-0_10

Y. Chen⁵,
R. B. Chen⁶ &
Q. He⁵

Part of the book series: Statistics and Computing ((SCO))

115k Accesses

Abstract

We propose a penalized independent factor (PIF) method to extract independent factors via a sparse estimation. Compared to the conventional independent component analysis, each PIF only depends on a subset of the measured variables and is assumed to follow a realistic distribution. Our main theoretical result claims that the sparse loading matrix is consistent. We detail the algorithm of PIF, investigate its finite sample performance and illustrate its possible application in risk management. We implement the PIF to the daily probability of default data from 1999 to 2013. The proposed method provides good interpretation of the dynamic structure of 14 economies’ global default probability from pre-Dot Com bubble to post-Sub Prime crisis.

Access provided by CONRICYT-eBooks. Download chapter PDF

Robust Factor Analysis Parameter Estimation

Bayesian sparse covariance decomposition with a graphical structure

Article 09 December 2014

Principal Component and Static Factor Analysis

1 Introduction

Sovereign default probability reflects financial vulnerability and sovereign financing or refinancing difficulties or default of advanced and emerging market economies. It is considered as a fundamental early warning indicator of financial crises and contagions of global financial markets. Thus, sovereign credit ratings and the associated sovereign default rates continue to be a major concern of international financial markets and economic policy makers. According to the current version of Basel Capital Accord 3, financial institutions will be allowed to use credit ratings and the corresponding default rates to determine the amount of regulatory capital they have to reserve against their credit risks. It prompts the booming research interests on the determinants and co-movements of sovereign defaults.

While the large amount of information containing in the sovereign default data makes it possible to understand the dependence among economies, the massive sample size, high dimensionality and complex dependence structure of the data create computational and statistical challenges. It turns out that data analysis in a reduced space often accompanies with improved interpretability and estimation accuracy. This possibly explains the wide adoption of factor models in literature.

Factor models try to decipher complex phenomena of large dimensional data through a small number of basic causes or factors. Though the factors are often supposed to be macroeconomic and financial determinants, our study intends to launch a new investigation into the identification of factors of sovereign default probabilities in a data-driven way. From a statistical viewpoint, understanding the dependence among these sovereign default probabilities relies on the estimation of the joint probability distribution of the multiple variables. The conventional methods such as Principal Component Analysis (PCA) and Factor Analysis (FA) extract a set of uncorrelated factors from the multivariate and dependent data within a linear framework. Under Gaussianity, non-correlation is identical to independence. With the aid of Jacobian transformation, the complex joint distribution can be obtained by using the marginal distributions of each factor in a closed form. Thus, the high dimensional statistical problem is converted to univariate cases. Independence however does not hold, if the measured variables e.g. the sovereign default probabilities are not Gaussian distributed, which is most likely in practice. In this case, the joint distribution estimation cannot be easily solved with the help of the conventional methods.

The recently developed Independent Component Analysis (ICA) method sheds lights on possible solutions. Similar to the PCA and FA methods, the ICA identifies essential factors via a linear transformation. Instead of projecting onto the eigenvectors of the covariance matrix as PCA does, the ICA directly extracts statistical independent factors from the original complex data via solving an optimization problem on statistical cross-independence. Depending on the definition of independence, various estimation methods have been proposed, including the maximization of nongaussianity (Jones and Sibson 1987; Cardoso and Souloumiac 1993; Hyvärinen and Oja 1997), the minimization of mutual information (Comon 1994; Hyvärinen 1998, 1999a), the maximum likelihood estimation (Pham and Garat 1997; Bell 1995; Hyvärinen 1999b), and the local parametric estimation with time varying loading (Chen et al. 2014).

In high dimensional space, however, ICA leads to redundant dependence by assuming each factor is associated with all the measured variables. The overparametrization is solvable by either reducing the number of factors or simplifying the structure of the loading matrix. Wu et al. (2006) proposed an ordering approach based on the mean-square-error criterion to identify the number of ICs. This dimension reduction eventually accompanies with loss of information. On the other hand, the dependence between the measured sovereign default probabilities and the factors can be sparse. A possibly more realistic situation is that each measured variable is only driven by a few factors, while others depend on a possibly different set of factors. It suggests necessity to reduce dimensionality in parameter space, with a sparse loading matrix.

Sparse estimation has been widely used especially in the regularized regression analysis. Under the sparsity assumption, unnecessary dependence is penalized and insignificant coefficients are pushed to zeros, see e.g. Lasso (Tibshirani 1996), Ridge (Frank and Friedman 1993) and the smoothly clipped absolute deviation (SCAD) penalty (Fan and Li 2001) and so on. The adoption of sparsity in independent component analysis is still new. Hyvärinen and Raju (2002) proposed sparse Bayesian ICA, where the loading matrix is assumed to be random and a conjugate sparse prior is imposed to the loading matrix. Zhang et al. (2009) incorporated adaptive Lasso in the maximum likelihood estimation method to obtain sparse loading matrix, where the statistical independent factors are assumed to follow a simple distribution family with one parameter. Theoretical properties of the estimators are unknown in the above works.

We are motivated to propose a penalized independent component analysis method, named PIF, to extract statistical independent factors via a sparse linear transformation. The sparse loading matrix is estimated under normal inverse Gaussian distributional assumption with SCAD penalty. Our main theoretical result claims that the sparse loading matrix estimator is consistent. The proposed PIF method displays appealing performance in simulation study. We implement the PIF to the daily probability of default data of Corporate Vulnerability Index from 1999 to 2013. The proposed method shows superior interpretation of the dynamic structure of 14 economies’ global default probability from the pre-Dot Com bubble period to the post-Sub Prime crisis period.

The remainder of the paper is structured as follows. Section 10.2 details the sovereign default probability data. Section 10.3 presents the penalized independent factor method, the estimation procedure and statistical prosperity of the estimator. Its finite sample performance is investigated along with simulation study in Sect. 10.4. Section 10.5 implements the PIF method to the sovereign default probabilities. Section 10.6 concludes.

2 Data

We consider the sovereign default probabilities of 14 economies from $1^{st}$ April 1999 to $31^{st}$ December 2013. The data are the equally-weighted Corporate Vulnerability Index (CVI), proxies of sovereign default probability, maintained in the Credit Research Initiative, Risk Management Institute at National University of Singapore. The CVI of each economy is constructed by averaging of all the listed firms’ probability of default (PD) in the corresponding exchange. It is worth mentioning that the number of firms considered over the time horizon is not fixed, given the happening of default events and IPOs. For example, on $1^{st}$ Apr 1999, there were 717 firms listed in the stock exchange of China, and on $31^{st}$ Dec 2013, the number of listed firms went up to 3017. The PDs were computed using the forward intensity approach in Duan et al. (2012) with input variables of common economic factors including e.g. stock index returns and 3-month interest rates, and firm specific factors of e.g. distance to default, ratio of cash (equivalent) to total assets, return on assets, market to book ratio and 1-year idiosyncratic volatility. The 14 economies include 9 advanced economies of Hong Kong, Japan, US, Germany, Greece, Ireland, Italy, Spain, and UK, and 5 emerging ones of China, India, Indonesia, Russian and Brazil.

Figure 10.1 displays the movements of the 14 CVIs from 1999 to 2013. To understand the dynamic structure of CVIs over time, we divide the time horizon of the 15 years into five sub-periods according to the business cycles announced by the National Bureau of Economic Research, including two recessions occurred from $1^{st}$ March 2001 to $30^{th}$ November 2001 (Dot Com bubble) and from $1^{st}$ December 2007 to $30^{th}$ June 2009 (US Sub Prime crisis). During the two recessions, the level of CVI increases on average 26 and 53% respectively. The relatively high level of the sovereign default probabilities continues after the recessions for a while and then drops to low value. China, however, behaves distinctively from the rest. The CVI of China is much larger than the others during 2002–2007, i.e. the post-Dot Com bubble period. For example, China’s CVI is 3 times of the second highest value of Indonesia. Table 10.1 reports the CVI summary statistics of each economy over the 15 years. China and US have the highest level (mean) of CVI. The level of the US’ CVI is high mainly during the two recessions, the Doc Com bubble and Sub Prime crisis. China, on the other hand, though immune to the Dot Com bubble recession, due to its constantly achieved 2-digits growth during 2003 to 2007, accompanies with high level for the “higher return higher risk” philosophy. In terms of variation, China reaches to the highest CVI variation, with a standard deviation of at least 12% larger than the rest. Moreover, all CVIs are positively skewed with extreme values and the JB statistics are all significant, indicating the deviation from Gaussianity. The conventional PCA is not able to deliver independent factors. Table 10.2 reports the correlation matrix of the CVI data during the 15 years, which are mostly positive except China. While China has either negative or weak correlations with the other economies, the US remains high positive correlations to most of the advanced economies such as Japan and UK, consistent to its influential role in the global financial markets (Tables 10.3, 10.4, 10.5 and 10.6).

Table 10.1 Summary statistics of the CVI data over the time horizon, Apr 1999–Dec 2013

Full size table

Table 10.2 Correlation matrix of the 14 economies from $1^{st}$ April 1999 to $31^{st}$ December 2013

Full size table

Table 10.3 Summary statistics of simulation results. The results of the PIF is marked in bold and reported in the first row of each scenario. The results of the NIG-ICA and ICA are reported in the second and third row of each scenario. Detection of zeros is the percentage of zeros correctly estimated, and miss-detection is the percentage of non-zero entries estimated as zero. $\lambda $ is the optimal penalty parameter obtained via minimizing BIC. Each measurement is given in the form of sample average (sample standard deviation)

Full size table

Table 10.4 True Loading matrix. Zero entries are left blank

Full size table

Table 10.5 Simulation results for large dimension loading matrix. Each measurement is given in the form of mean(std). The penalty parameter is $\lambda =0.08$ by minimizing BIC. #0s is the percentage of zero elements estimated correctly by the method. Mis-detection is the number of elements that are wrongly pushed to zero

Full size table

Table 10.6 Number of factors participated by each economy. Sparsity is reflected by of the percentage of zeros in the loading matrix

Full size table

More detailed summary statistics on CVIs over the 5 time periods can be found in Tables 10.7, 10.8, 10.9, 10.10, 10.11, 10.12, 10.13, 10.14, 10.15 and 10.16 in the Appendix.

Table 10.7 Summary statistics of the CVI data, Apr 1999–Feb 2001

Full size table

Table 10.8 Summary statistics of the CVI data during DOT COM bubble, Mar 2001–Nov 2001

Full size table

Table 10.9 Summary statistics of the CVI data, Dec 2001–Nov 2007

Full size table

Table 10.10 Summary statistics of the CVI data during Sub Prime crisis, Dec 2007–Jun 2009

Full size table

Table 10.11 Summary statistics of the CVI data, Jul 2009–Dec 2013

Full size table

Table 10.12 Correlation matrix of the 14 economies from $1^{st}$ April 1999 to $28^{th}$ Feb 2001

Full size table

Table 10.13 Correlation matrix of the 14 economies from $1^{st}$ Mar 2001 to $30^{th}$ Nov 2001

Full size table

Table 10.14 Correlation matrix of the 14 economies from $1^{st}$ Dec 2001 to $30^{th}$ Nov 2007

Full size table

Table 10.15 Correlation matrix of the 14 economies from $1^{st}$ Dec 2007 to $30^{th}$ Jun 2009

Full size table

Table 10.16 Correlation matrix of the 14 economies from $1^{st}$ Jul 2009 to $31^{st}$ December 2013

Full size table

3 Penalized Independent Factor

Consider p-dimension random vector $\mathbf {X}=\left( X_{1},\cdots , X_{p}\right) \in \mathbb {R}^p$. The penalized independent factor analysis is to factorize the variables into a linear combination of latent independent random factors $\mathbf {Z}=\left( Z_{1},\cdots , Z_{p}\right) \in \mathbb {R}^p$:

$$\begin{aligned} \mathbf {Z}= & {} B\mathbf {X} \end{aligned}$$

(10.1)

where B refers to a sparse and invertible loading matrix. Given the observed realizations $\mathbf {X}_i=\left( X_{i1},\cdots , X_{ip}\right) $ with $ i = 1,\cdots , n$, the task here is to estimate the sparse loading B as well as to obtain the independent factor $\mathbf {Z}_i$ with $ i = 1,\cdots , n$, without any prior knowledge of the sparsity structure of B.

The loading matrix and independent factors are only identifiable up to scale. For any constant $c\ne 0$, one obtains another set of loading matrix cB and independent factors denoted $c\mathbf {Z}$ satisfying (10.1). To avoid the identification problem, we assume that the independent factors have unit variance. Moreover, we set the number of independent factors to p, as the primary goal of our study is to convert the multivariate problem into a number of univariate ones with sparsity such that it eases the understanding of the dependence with reduced parameter space and simultaneously an improved estimation accuracy.

Denote the probability density function of each independent factor to be $f_j(z)$ for $j=1,\dots ,p$. The log-likelihood is defined as:

$$\begin{aligned} l(B)=\sum _{i=1}^{n}\sum _{j=1}^{p}\log f_j\left( b^{\top }_j\mathbf {X}_i\right) +n\log |det(B)| \end{aligned}$$

(10.2)

where $b_j^\top $ denotes the j-th row of B. To achieve the sparsity of the loading matrix B, a penalty function, denoted as $\rho _{\lambda }$ is added to the log-likelihood, where $\lambda $ is a tuning parameter. The penalized log-likelihood is defined as:

$$\begin{aligned} \mathbf{{P}} (B)= & {} \sum _{i=1}^{n}\sum _{j=1}^{p}\log f_j(b^\top _j\mathbf {X}_i)+n\log |det(B)|-n\sum _{j=1}^p\sum _{k=1}^p\rho _\lambda (|b_{jk}|) \qquad \quad \end{aligned}$$

(10.3)

where $b_{jk}$ denotes the (j, k)-th element of the loading matrix B. Take the gradient of the penalized likelihood function with respect to the loading matrix, we obtain:

$$\begin{aligned} \frac{\partial \mathbf {P}}{\partial B}=\sum \nolimits _{i=1}^n\begin{bmatrix} \frac{f_1^{'}(b_1^\top \mathbf {X}_i)}{f_1(b_1^\top \mathbf {X}_i)} \\ \frac{f_2^{'}(b_2^\top \mathbf {X}_i)}{f_2(b_2^\top \mathbf {X}_i)} \\ ...\\ \frac{f_p^{'}(b_p^\top \mathbf {X}_i)}{f_p(b_p^\top \mathbf {X}_i)} \end{bmatrix}\mathbf {X}_i^\top +n[B^\top ]^{-1}-n\Omega \end{aligned}$$

where $[B^\top ]^{-1}$ is the inverse of transpose matrix of B, $\Omega _{jk}=sgn(b_{jk})\rho _\lambda ^{'}(|b_{jk}|)$ is the first derivative of the penalty function with respect to each element of the loading matrix, and ${f_i^{'}(s)}/{f_i(s)}$ is the first derivative of log-density function of each independent factor. The sparse loading matrix is estimated using the gradient method. Given the loading matrix estimator, the independent factors are recovered in (10.1).

3.1 Independent Component’s Density: NIG

The density of IC is unknown. Hyvärinen (1999b) developed the maximum likelihood estimation approach of independent factor extraction under a simple but unrealistic distribution with one distributional parameter, and proved consistency of the estimator, see also Pham and Garat (1997), Bell (1995). The log-likelihood function is defined under a simple but unrealistic distribution with one distributional parameter. Financial risk factors are however neither Gaussian distributed nor the special cases of the exponential power family. Instead, the factors are often asymmetric and with extreme values. This motivates the adoption of the normal inverse Gaussian (NIG) distribution for its desirable probabilistic features. With 4 distributional parameters, the NIG distribution is able to mark data characteristics from the central locations to the tails behaviours.

In our study, each factor is assumed to be normal inverse Gaussian (NIG) distributed with individual distributional parameters. The density is of the form:

$$\begin{aligned} f_{\text{ NIG }}(z_{j})= & {} \frac{\phi _j\delta _j}{\pi } \frac{K_{1} \Bigl \{ \phi _j\sqrt{\delta _j^{2}+(z_{j}-\mu _j)^{2}} \Bigr \}}{\sqrt{ \delta _j^{2} + (z_{j} - \mu _j)^{2}}} \exp \{ \delta _j \sqrt{\phi _j^{2} - \beta _j^{2}} + \beta _j(z_{j} - \mu _j) \}, \end{aligned}$$

where $ \mu _j $, $ \delta _j $, $ \beta _j $ and $ \phi _j $ are NIG parameters for $j = 1,\cdots ,p$. $K_{1}(\cdot ) $ is the modified Bessel function of the third type. The distributional parameters fulfill the conditions $ \mu _j\in \mathbb {R} $, $ \delta _j > 0 $, and $ |\beta _j| \le \phi _j $. The limiting distributions of NIG have been well developed in bn (1997); Blæsild (1999) including the Normal distribution, the Cauchy distribution and the Student-t distribution.

For $\beta =0$, $\phi \rightarrow \infty $ and $\delta /\phi =\sigma ^2$, $NIG(\phi ,\beta ,\mu ,\delta ) \rightarrow N(\mu ,\sigma ^2)$
For $\phi ,\beta \rightarrow \infty $, $\mu =0$ and $\delta =1$, $NIG(\phi ,\beta ,\mu ,\delta ) \rightarrow Cauchy$
For $\phi ,\beta \rightarrow 0$, $\mu =0$ and $\delta =1$, $NIG(\phi ,\beta ,\mu ,\delta )\rightarrow Student-t_{1}$

See bn (1997) for more details. Moreover, all independent factors are assumed to have unit variance to avoid identification ambiguity.

3.2 Penalty Function: SCAD

Question remains on the selection of penalty function in the estimation. Various penalty function has been proposed in literature, including the first order norm penalty of Lasso (Tibshirani 1996), the second order norm penalty of Ridge (Frank and Friedman 1993) and the smoothly clipped absolute deviation (SCAD) penalty (Fan and Li 2001) and so on. Among them, the SCAD penalty is theoretically desirable with oracle property and has been widely used in quantile regression, logistic regression, high dimensional data analysis, large scale genomic data analysis and many others, see Gou et al. (2014), Xie and Huang (2009). In our study, we use the SCAD penalty, which is defined in the form of its first derivative:

$$\begin{aligned} \rho _{\lambda }^{'}(\theta )=\lambda \{I(\theta \le \lambda )+\frac{(a\lambda -\theta )_+}{(a-1)\lambda }I(\theta >\lambda )\} \end{aligned}$$

(10.4)

where $\theta >0$ and $a=3.7$ suggested in Fan and Li (2001).

3.3 Estimation

Substitute the NIG density and the SCAD penalty function into (10.3):

$$\begin{aligned} \mathbf{{P}}(B)= & {} \sum _{i=1}^{n}\sum _{j=1}^{p}\log f_j(b^\top _j \mathbf {X}_i)+n\log |det(B)|-n\sum _{j=1}^p\sum _{k=1}^p\rho _\lambda (|b_{jk}|) \end{aligned}$$

(10.5)

$$\begin{aligned}= & {} \sum _{i=1}^{n}\sum _{j=1}^{p}\left\{ \log \frac{\phi _j\delta _j}{\pi }\frac{K_1\left( \phi _j\sqrt{\delta _j^2+(b_j^\top \mathbf {X}_i-\mu _j)^2}\right) }{\sqrt{\delta _j^2+(b_j^\top \mathbf {X}_i-\mu _j)^2}}+\delta _j\sqrt{\phi _j^2-\beta _j^2}+\beta _j(b_j^\top \mathbf {X}_i-\mu _j)\right\} \nonumber \\&+\,n\log |det(B)|-n\sum _{j=1}^p\sum _{k=1}^p\rho _\lambda (|b_{jk}|) \end{aligned}$$

(10.6)

and the gradient of the log-likelihood function is:

$$\begin{aligned} \frac{\partial l}{\partial B}=\sum \nolimits _{i=1}^n\begin{bmatrix} \frac{f_1^{'}(b_1^\top \mathbf {X}_i)}{f_1(b_1^{\top }\mathbf {X}_i)} \\ \frac{f_2^{'}(b_2^{\top }\mathbf {X}_i)}{f_2(b_2^{\top }\mathbf {X}_i)} \\ ...\\ \frac{f_p^{'}(b_p^{\top }\mathbf {X}_i)}{f_p(b_p^{\top }\mathbf {X}_i)} \end{bmatrix}\mathbf {X}_i^{\top }+n[B^{\top }]^{-1}-\Omega \end{aligned}$$

where $\Omega _{jk}=sgn(b_{jk})\rho _\lambda ^{'}(|b_{jk}|)$ and $\frac{f_j^{'}(s)}{f_j(s)}=\beta _j+\phi _j\frac{K_1^{'}(\phi _j\sqrt{\delta _j^2+(s-\mu _j)^2})}{K_1(\phi _j\sqrt{\delta _j^2+(s-\mu _j)^2})}\frac{s-\mu _j}{\sqrt{\delta _j^2+(s-\mu _j)^2}}-\frac{s-\mu _j}{\delta _j^2+(s-\mu _j)^2}$.

The optimization problem is solved in two steps, where maximum is achieved by changing the loading matrix B and the NIG parameters iteratively until the algorithm converges. The algorithm starts with an initial estimator of $B_0$, e.g. the estimation obtained by the conventional ICA:

1.
Given the previous estimator of B, optimize the penalized log-likelihood function to obtain the NIG distributional parameters estimator. The EM algorithm is adopted for the estimation of NIG parameters, see Karlis (2002).
2.
Based on the estimated NIG estimator, update the estimator of B by maximizing the penalized log-likelihood function.
3.
Scale the estimator of B and the NIG parameters to have unit variance of each independent factor.
4.
Repeat, until converge.

The penalized maximum likelihood estimation involves the choice of the tuning parameter $\lambda $. While too large tuning parameter leads to over sparse loading matrix, too small tuning parameter has over fitting effect to identify the true model. Cross validation (Kohavi 1995) and generalized cross validation (Li 1987) can be used. However the approaches are computational intensive. Even worse, there is a positive probability of model over-fitting by generalized cross validation (Wang et al. 2007). Alternatively, several information criteria have been proposed and widely used in time series analysis. In our study, we consider using the Schwarz–Bayesian information criterion (BIC) (Schwarz 1978) for its computation tractability and its consistency in model selection. The BIC is defined as:

$$\begin{aligned} BIC=-l(\hat{B})+\log n\times \#\{\hat{B}_{ij}\ne 0\} \end{aligned}$$

where $\hat{B}$ is the estimator of B. The penalty parameter with the lowest BIC is chosen to be optimal.

3.4 Property of Estimator

We prove the consistency of the PIF estimator under two conditions:

C1.
The observations $(X_{i1},\dots ,X_{ip})$ are IID with density $\left( g_1(X,B),\dots ,g_p(X,B)\right) $ with respect to some measure $\mu $. The density has a common support and is identifiable. Furthermore, the first logarithmic derivatives of $g_i$ satisfying the equation
$$\begin{aligned} E\frac{\partial \log g_a(X,B)}{\partial B_{jk}}=0 \end{aligned}$$
(10.7)
for all a, j and k.
C2.
$E[-\Omega _a]$ is positive definite at point B with $\Omega _a$ defined as:
$$\begin{aligned} \Omega _a= \begin{vmatrix} \frac{\partial ^2g_a(B)}{\partial b_{11}\partial b_{11}}&\frac{\partial ^2g_a(B)}{\partial b_{11}\partial b_{12}}&\dots&\frac{\partial ^2g_a(B)}{\partial b_{11}\partial b_{1p}}&\frac{\partial ^2g_a(B)}{\partial b_{11}\partial b_{21}}&\dots&\frac{\partial ^2g_a(B)}{\partial b_{11}\partial b_{pp}}\\ \frac{\partial ^2g_a(B)}{\partial b_{12}\partial b_{11}}&\frac{\partial ^2g_a(B)}{\partial b_{12}\partial b_{12}}&\dots&\frac{\partial ^2g_a(B)}{\partial b_{12}\partial b_{1p}}&\frac{\partial ^2g_a(B)}{\partial b_{12}\partial b_{21}}&\dots&\frac{\partial ^2g_a(B)}{\partial b_{12}\partial b_{pp}}\\ \vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots \\ \frac{\partial ^2g_a(B)}{\partial b_{pp}\partial b_{11}}&\frac{\partial ^2g_a(B)}{\partial b_{pp}\partial b_{12}}&\dots&\frac{\partial ^2g_a(B)}{\partial b_{pp}\partial b_{1p}}&\frac{\partial ^2g_a(B)}{\partial b_{pp}\partial b_{21}}&\dots&\frac{\partial ^2g_a(B)}{\partial b_{pp}\partial b_{pp}}\\ \end{vmatrix} \end{aligned}$$

Theorem 10.1

Let $(X_{11},X_{12},\dots ,X_{1p}),\dots ,(X_{n1},X_{n2},\dots ,X_{np})$ be IID measured vector, each with a density $(g_1,g_2,\dots ,g_p)$ that satisfies conditions (C1) and (C2). If $\max \{p_{\lambda _n}^{''}(|B_{jk}|):B_{jk}\ne 0\}\rightarrow 0$, then there exists a local maximizer $\hat{B}$ of $\mathbf{{P}}(B)$ such that $\Vert \hat{B}-B\Vert =\mathcal {O}_p(n^{-1/2}+a_n)$, where $a_n=\max \{p_{\lambda _n}^{'}(|B_{jk}|): B_{jk}\ne 0\}$

Note that, though the density of the observed variables $g_a$ is unknown, Theorem 10.1 holds as long as the two conditions hold. Detailed proof can be found in Appendix.

4 Simulation

Before the implementation with real sovereign default probability data, we investigate the finite sample performance of the PIF method first by performing a number of simulation studies under the known data generating processes. Our interest is on the estimation accuracy of the proposed method and its robustness under various scenarios compared to the conventional ICA approach.

We design our simulation studies so that they properly reflect the real study at hand. All the parameters are obtained from analyzing the Corporate Vulnerability Index (CVI) data from April 1999 to February 2001, before the Dot Com bubble. In the first experiment, small dimensional data are generated based on the CVIs of India, Indonesia and Japan, 3 Asia countries of both emerging and advanced economies. We consider 3 scenarios with non-sparsity, medium sparsity and high sparsity in the loading matrix. In the second experiment, large dimensional data are produced, where the parameters are learned from the CVI data of the 14 economies from April 1999 to February 2001.

In the data generation process, we follow the model setting in (10.1) and generate dependent data with the loading matrix:

$$\mathbf {X}_i = B^{-1}\mathbf {Z}_i, \quad i = 1,\cdots , n.$$

The generated data are considered as the measured variables. Each experiment is repeated 100 times with $n=200$ observations. Both the PIF and the conventional ICA methods are implemented. In addition to the two approaches, we also implement ICA with the NIG distributed source assumption, named as NIG-ICA in the following.

We evaluate the estimation accuracy of the PIF method, with focus on the factor loadings B and the identified factors $\mathbf {Z}_i$. We compare the estimation accuracy of the PIF method based on 3 measurements. For the loading matrix, our interests are the overall estimation accuracy and the elementary accuracy. While the Euclidean distance (ED) is used to measure the estimation error of the loading matrix estimator, the maximum norm (MN) reports the largest elementary bias of the matrix estimator. For the identified independent factors, we compute the root mean squared error (RMSE) to show the identification accuracy. The criteria are defined as follows:

$$\begin{aligned} \textit{ED}= & {} \sum _{jk} \left( b_{jk}-\hat{b}_{jk}\right) ^2 \end{aligned}$$

(10.8)

$$\begin{aligned} \textit{MN}= & {} \max \left( |b_{jk}-\hat{b}_{jk}|\right) \end{aligned}$$

(10.9)

$$\begin{aligned} \textit{RMSE}= & {} \sqrt{\frac{1}{np} \sum _{ij} \left( Z_{ij}-\hat{Z}_{ij}\right) ^2} \end{aligned}$$

(10.10)

where $b_{jk}$ refers to the (j, k)-th element of the matrix B, and $\hat{b}_{jk}$ represents the corresponding element estimators.

4.1 Experiment 1: 3 Dimensional Data

In the low dimensioned experiment, 3 scenarios are analyzed with 3 different loading matrices that are either non-sparse, sparse, or highly sparse:

Non-sparse loading matrix::: $$\begin{bmatrix} 52.7&-10.7&14.4 \\ -32.3&-17.3&-5.2 \\ 18.1&-6.3&12.8 \end{bmatrix};$$
Sparse loading matrix::: $$\begin{bmatrix} -3.2&31.2&{\mathbf {0}} \\ 40.1&-96.4&-20.9 \\ -29.4&18.7&{\mathbf {0}} \end{bmatrix};$$
Highly-sparse loading matrix::: $$\begin{bmatrix} -3.3&31.2&{\mathbf {0}} \\ {\mathbf {0}}&10.1&{\mathbf {0}} \\ {\mathbf {0}}&44.2&-25.0 \end{bmatrix}.$$

Table 10.3 reports the simulation results based on the 100 replications. For all the 3 scenarios, the PIF is better than ICA in terms of estimation accuracy for both the loading matrix and the independent factors. In the sparsity scenario, the estimation accuracy of PIF is much better with lower ED of 6.67(SD : 3.98), MN of 5.54(SD : 3.65) and RMSE of 0.09(SD : 0.03) than that of ICA with ED of 27.19(SD : 17.47), MN of 20.40(SD : 13.61) and RMSE of 0.20(SD : 0.14). The improved accuracy is mostly contributed by the adoption of the NIG distributional assumption. In the highly-sparse scenario, the PIF is remarkably better than the conventional ICA. The improvement w.r.t to the NIG-ICA becomes larger.

Moreover, the tunning parameter $\lambda $ is reasonably selected by using BIC. In the non-sparsity scenario, the optimal $\lambda $ is 0, indicating non-necessity of penalty as the true loading matrix is not sparse. In the sparsity and high-sparsity scenarios, the optimal $\lambda $ becomes 0.04 and 0.07 respectively, leading to a high detection rate of zero elements at 100 and $99\%$ respectively. On the contrary, ICA and NIG-ICA are not able to detect any zero elements in the loading matrix. Furthermore, there is no mis-detection by PIF, meaning that no entries in the loading matrix are over pushed to zero.

Figure 10.2 illustrates one representation of the estimation error of the recovered independent factors by the PIF, NIG-ICA and ICA methods respectively in the high-sparsity scenario. While the ICA produces more variations with wider spread, the PIF and NIG-ICA recover the independent factor with smaller errors.

4.2 Experiment 2: Large Dimensional Data

In the second experiment with large dimensional data, we generate 14-dimensional dependent data with a sparse loading matrix learning from the CVI data, over a time span of April 1999 to February 2001. The loading matrix is shown in Table 10.4, where $35\%$ of elements are zero.

The generation is repeated 100 times with $n=200$ sample size. Table 10.5 reports the estimation result. The penalty parameter of PIF is chosen to be $\lambda =0.08$ by minimizing BIC. The estimation accuracy of PIF is much better with ED of 88.60(SD : 26.11), MN of 60.00(SD : 24.63) and RMSE of 0.20(SD : 0.10) than that of ICA with ED of 419.24(SD : 56.11), MN of 204.00(SD : 36.54) and RMSE of 1.29(SD : 0.05) and slightly better than NIG-ICA with ED of 90.23(SD : 27.74), MN of 61.50(SD : 25.68) and RMSE of 0.22(SD : 0.10). In addition, PIF is able to detect $99.85\%$ of zero entries in the loading matrix and without any miss-detection record of non-zeros.

The simulation study shows that the proposed PIF method has good performance compared to the alternative ICA and NIG-ICA methods with improved estimation accuracy. The good performance mostly attributes to the adoption of the NIG distribution and further by the sparsity of loading matrix. By adding the SCAD penalty function, the proposed PIF is able to identify zero entries in the sparse loading matrix and involves no miss-detection of non-zeros. Moreover, the penalty parameter can be reasonably chosen by using BIC. For example, in the non-sparse scenario, the penalty parameter is selected to be zero. The relative good performance of the PIF is stable with respect to the increase of sparsity and dimensionality.

5 Real Data Analysis

In this section, we analyze the sovereign default probabilities of 14 economies from April 1999 to December 2013. The sovereign default probabilities are quantified as daily equally-weighted CVI (Corporate Vulnerability Index) of each economy. The 14 economies are mixture of advanced and emerging economies including China, Hong Kong, India, Indonesia, Japan, US, Germany, Greece, Ireland, Italy, Russian, Spain, UK and Brazil. Data are obtained from the Risk Management Institute at National University of Singapore. We divide the time span into five sub-periods based on the business cycles announced by the National Bureau of Economic Research among which two recessions happened: Dot Com bubble from March 2001 to November 2001 and the US sub prime crisis from December 2007 to June 2009. Our interest is to identify the statistical independent dominant factors and investigate the cross-dependence of the sovereign defaults among the economies.

We implement the proposed PIF method. Table 10.6 summarizes the sparse structure of the loading matrices over the 5 time periods. Each economy column reports the number of non-zero elements in the column of loading matrix, representing the number of factors participated in the economies. The total number of non-zero elements in the loading matrix is summarized in the column Total. Sparsity is reflected by the percentage of zero elements in the loading matrix. It shows that there is a V-shape sparsity in terms of US default probability over time, possibly driven by the cyclical pattern of the global economy. Five advanced economies Japan, Germany, Italy, Spain and UK display relatively stable low-sparse structure across the whole time. China and Hong Kong exhibit co-movement, indicating the connection between the two economies, though Hong Kong given its higher level of globalization appears in more factors than China across all periods. The emerging economies of China, India and Indonesia show constant increasing in the number of participated factors along with their increased connection to the global economy especially in the fast growing export business.

Figures 10.3, 10.4, 10.5, 10.6 and 10.7 provides details of the estimated loading matrices over the five time periods. In each plot, we display the loadings of an independent factor with respect to the economies. Zero elements are colored in white. The loading matrix is interpretable. In the pre-Dot Com bubble period, the advanced economies including Japan, Germany, Ireland, Spain and UK participate the most number of factors, while the emerging economies such as China, Indonesia, Russia and Brazil are only related to a few factors. China, for example, only participates in one factor and moreover it is the only element of the factor, implying the closed market of China in the early time. During 1999 to 2001, most defaults in China happened due to the reforming of the state-owned enterprises, which were less likely affected or influenced by the global economy. On the contrary, Japan participates more than 10 factors implying its close connection to the global financial market. In the recent, the sparse inequality between the advanced and emerging economies decreases from period to period, see Fig. 10.8.

6 Conclusion

We propose the PIF method to transform the observed multivariate correlated variables into independent factors with a sparse loading matrix. We derive the consistency and convergence rate of the sparse loading matrix estimator. Based on the NIG distributional assumption, the estimation is done with a two step ML estimation algorithm by iterating NIG parameter updating and sparse loading matrix estimation. The optimal penalty parameter is chosen via minimizing BIC. We compare the performance of PIF with two alternatives, ICA and NIG-ICA in simulation. The results show the proposed PIF has good performance compared with the conventional ICA and NIG-ICA in both the loading matrix estimation and factor recovery. The estimation accuracy is much improved due to the imposing of NIG distribution. Furthermore, by adopting the SCAD penalty function in PIF, the estimation accuracy is further improved with sparse structure. Moreover, the optimal penalty parameter is reasonably selected by minimizing BIC. The performance of PIF is consistently better with respect to different level of sparse structure and dimensionality of the loading matrix. We implement the PIF to sovereign default probability using CVI data maintained at Credit Research Initiative, Risk Management Institute, National University of Singapore. The estimated loading matrix displays significant sparse structure. For example, China in the pre-Dot Com Bubble period only participates in one factor and is the only element, implying the independence of China’s closed market and the global economy. The proposed model can be easily applied to other high-dimensional data.

References

Barndorff-Nielsen, O. (1997). Normal inverse gaussian distributions and stochastic volatility modelling. Scandinavian Journal of Statistics, 24, 1–13.
Article MathSciNet MATH Google Scholar
Bell, A. J., & Sejnowski, T. J. (1995). An information-maximization approach to blind separation and blind deconvolution. Neural Computation, 7, 1129–1159.
Article Google Scholar
Blæsild, P. (1999). Generalized hyperbolic and generalized inverse Gaussian distributions, Working Paper, University of Århus.
Google Scholar
Cardoso, J. F., & Souloumiac, A. (1993). Blind beamforming for non-Gaussian signals. IEE Proceedings F Radar and Signal Processing, 140, 362–370.
Article Google Scholar
Chen, R. B., Chen, Y., & Härdle, W. K. (2014). TVICA Time varying independent component analysis and its application to financial data. Computational Statistics & Data Analysis, 74, 95–109.
Article MathSciNet Google Scholar
Comon, P. (1994). Independent component analysis, A new concept? Signal Processing, 36, 287–287. Higher Order Statistics.
Google Scholar
Duan, J. C., Sun, J., & Wang, T. (2012). Multiperiod corporate default prediction—A forward intensity approach. Journal of Econometrics, 170, 191–209.
Article MathSciNet MATH Google Scholar
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.
Article MathSciNet MATH Google Scholar
Frank, L. E., & Friedman, J. H. (1993). A statistical view of some chemometrics regression tools. Technometrics, 35, 109–135.
Article MATH Google Scholar
Gou, J., Zhao, Y., Wei, Y., Wu, C., Zhang, R., Qiu, Y., et al. (2014). Stability SCAD: A powerful approach to detect interactions in large-scale genomic study. BMC Bioinformatics, 133, 140–159.
Google Scholar
Hyvärinen, A. (1998). Analysis and projection pursuit. Advances in Neural Information Processing Systems, 10, 273.
Google Scholar
Hyvärinen, A. (1999a). Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks, 10, 626–634.
Article Google Scholar
Hyvärinen, A. (1999b). The fixed-point algorithm and maximum likelihood estimation for independent component analysis. Neural Processing Letterss, 10, 1–5.
Article Google Scholar
Hyvärinen, A., & Oja, E. (1997). A fast fixed-point algorithm for independent component analysis. Neural Networks, 9, 1483–1492.
Google Scholar
Hyvärinen, A., & Raju, K. (2002). Imposing sparsity on the mixing matrix in independent component analysis. Neurocomputing, 49, 151–162.
Article MATH Google Scholar
Jones, M. C., & Sibson, R. (1987). What is projection pursuit? Journal of the Royal Statistical Society. Series A (General), 24, 1–10.
Article MathSciNet MATH Google Scholar
Karlis, D. (2002). An EM type algorithm for maximum likelihood estimation of the normal-inverse Gaussian distribution. Statistics & Probability Letters, 57, 43–52.
Article MathSciNet MATH Google Scholar
Kohavi, R., et al. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Ijcai, 14, 1137–1145.
Google Scholar
Li, K. C. (1987). Asymptotic optimality for Cp, CL, cross-validation and generalized cross-validation: discrete index set. The Annals of Statistics, 15, 958–975.
Article MathSciNet MATH Google Scholar
Pham, D. T., & Garat, P. (1997). Blind separation of mixture of independent sources through a maximum likelihood approach. In Proceedings of EUSIPCO.
Google Scholar
Schwarz, G., et al. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.
Article MathSciNet MATH Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58, 267–288.
MathSciNet MATH Google Scholar
Wang, H., Li, R., & Tsai, C. L. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika, 94, 553–568.
Article MathSciNet MATH Google Scholar
Wu, E. H., Philip, L. H., & Li, W. K. (2006). An independent component ordering and selection procedure based on the MSE criterion. Independent Component Analysis and Blind Signal Separation (pp. 286–294).
Google Scholar
Xie, H., & Huang, J. (2009). SCAD-penalized regression in high-dimensional partially linear models. The Annals of Statistics, 37, 673–696.
Article MathSciNet MATH Google Scholar
Zhang, K., Peng, H., Chan, L., & Hyvärinen, A. (2009). ICA with sparse connections: revisited. Independent component analysis and signal separation (pp. 195–202).
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore
Y. Chen & Q. He
Department of Statistics, National Cheng Kung University, Tainan, Taiwan
R. B. Chen

Authors

Y. Chen
View author publications
You can also search for this author in PubMed Google Scholar
R. B. Chen
View author publications
You can also search for this author in PubMed Google Scholar
Q. He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Y. Chen .

Editor information

Editors and Affiliations

C.A.S.E.—Center for Applied Statistics and Economics, Humboldt-Universität zu Berlin, Berlin, Germany
Wolfgang Karl Härdle
School of Business and Economics, Humboldt-Universität zu Berlin, Berlin, Germany
Cathy Yi-Hsuan Chen
Department of Mathematics, University of Giessen, Giessen, Germany
Ludger Overbeck

Appendix

1.1 Proof of Theorem 1

Proof

The explicit form of the density function $g_j$ is not required, as long as the two conditions are fulfilled. Under condition C1 and C2, Equation $\Vert \hat{B}-B\Vert =O_P(n^{-1/2}+a_n)$ is equivalent to proof that for any given $\epsilon >0$, there exist a large C s.t.

$$\begin{aligned} P\{\sup _{\Vert u\Vert =C}Q(B+\alpha _nu)<Q(B)\}\ge 1-\epsilon \end{aligned}$$

(10.11)

where Q(B) is the penalized likelihood and u is a p-by-p matrix.

Let $D_n(u)=Q(B+\alpha _nu)-Q(B)$

$I_u(B)=-E(tr(\nabla _Btr(\nabla \frac{1}{n}l(B)^{\top }u)^{\top }u))=-E(tr(\nabla _Bd_u\frac{1}{n}l(B)^{\top }u))>0$ for any $y\in R^{p*p}$ based on condition (B)

If $D_n(u)<0$ by choosing a sufficiently large C, then the proof is done.

$$\begin{aligned} D(u)&=l(B+\alpha _nu)-l(B)-n\sum \{\rho _{\lambda _n}(|B_{jk}+\alpha _nu_{jk}|)-\rho _{\lambda _n}(|B_{jk}|)\}\nonumber \\&\le l(B+\alpha _nu)-l(B)-n\sum _{B_{jk}\ne 0}\{\rho _{\lambda _n}(|B_{jk}+\alpha _nu_{jk}|)-\rho _{\lambda _n}(|B_{jk}|)\} \nonumber \\&\le \alpha _ntr(\nabla l(B)^{\top }u)+\frac{1}{2}\alpha _n^2tr(\nabla _Bd_ul(B)^{\top }u)\{1+o_P(1)\} \nonumber \\&-\sum _{B_{jk}\ne 0}[n\alpha _n\rho _{\lambda _n}^{'}(|B_{jk}|)sgn(B_{jk})u_{jk}+n\alpha _n^2\rho _{\lambda _n}^{''}(|B_{jk}|)u_{jk}^2\{1+o(1)\} \nonumber \\&\le \alpha _ntr(\nabla l(B)^{\top }u)-\frac{1}{2}n\alpha _n^2I_u(B)\{1+o_P(1)\} \nonumber \\&-\sum _{B_{jk}\ne 0}[n\alpha _n\rho _{\lambda _n}^{'}(|B_{jk}|)sgn(B_{jk})u_{jk}+n\alpha _n^2\rho _{\lambda _n}^{''}(|B_{jk}|)u_{jk}^2\{1+o(1)\} \end{aligned}$$

(10.12)

The first inequality is because $\rho _{\lambda _n}(0)=0$ and $\rho _{\lambda _n}(\beta )\ge 0$. The next inequality is Taylor expansion. Then substitute $I_u(B)$ into the equation.

Base on condition (A), $n^{-1/2}tr(\nabla l(B)^{\top }u)=O_P(1)$, thus the first term of (8) is of order $O_P(n^{1/2}\alpha _n)=O_P(n\alpha _n^2)$. By choosing a sufficiently large C, the second term dominates the first term in $\Vert u\Vert =C$.

The last term in (8) is bounded by

$$\begin{aligned} \sqrt{s}n\alpha _na_n\Vert u\Vert +n\alpha _n^2max\{\rho _{\lambda _n}^{''}(|B_{jk}|):B_{jk}\ne 0\}\Vert u\Vert ^2 \end{aligned}$$

(10.13)

The first part of (9) is dominated by the second term in (8) when choosing a sufficiently large C. The second term in (9) is also dominated by the second term in (8) as $\max \{\rho _{\lambda _n}^{''}(|B_{jk}|):B_{jk}\ne 0\}\rightarrow 0$

Proof is completed.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chen, Y., Chen, R.B., He, Q. (2017). Penalized Independent Factor. In: Härdle, W., Chen, CH., Overbeck, L. (eds) Applied Quantitative Finance. Statistics and Computing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-54486-0_10

Download citation

DOI: https://doi.org/10.1007/978-3-662-54486-0_10
Published: 04 August 2017
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-54485-3
Online ISBN: 978-3-662-54486-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Penalized Independent Factor

Abstract

Similar content being viewed by others

Robust Factor Analysis Parameter Estimation

Bayesian sparse covariance decomposition with a graphical structure

Principal Component and Static Factor Analysis

1 Introduction

2 Data