Tensor extreme learning design via generalized Moore–Penrose inverse and triangular type-2 fuzzy sets

Huang, Sharina; Zhao, Guoliang; Chen, Minghao

doi:10.1007/s00521-018-3385-5

Tensor extreme learning design via generalized Moore–Penrose inverse and triangular type-2 fuzzy sets

Original Article
Published: 01 March 2018

Volume 31, pages 5641–5651, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Neural Computing and Applications Aims and scope Submit manuscript

Tensor extreme learning design via generalized Moore–Penrose inverse and triangular type-2 fuzzy sets

Download PDF

Sharina Huang¹,
Guoliang Zhao² &
Minghao Chen¹

611 Accesses
15 Citations
Explore all metrics

Abstract

A tensor-based extreme learning machine is proposed, which is referred to as tensor-based type-2 extreme learning machine (TT2-ELM). In contrast to the work on ELM, regularized ELM (RELM), weighted regularized ELM (WRELM) and least squares support vector machine (LS-SVM), which are the most often used learning algorithm in regression problems, TT2-ELM adopts the tensor structure to construct the ELM for type-2 fuzzy sets, Moore–Penrose inverse of tensor is used to obtain the tensor regression result. No further type-reduction method is needed to obtain the coincide type-1 fuzzy sets, and type-2 fuzzy structure can be seamlessly incorporated into the ELM scheme. Experimental results are carried out on two Sinc functions, a nonlinear system identification problem and four real-world regression problems, results show that TT2-ELM performs at competitive level of generalized performance as the ELM, RELM, WRELM and LS-SVM on the small- and moderate-scale data sets.

RETRACTED ARTICLE: Tensor based stacked fuzzy neural network for efficient data regression

Article 17 August 2022

Deterministic multikernel extreme learning machine with fuzzy feature extraction for pattern classification

Article 30 July 2021

A New Fuzzy Extreme Learning Machine for Regression Problems with Outliers or Noises

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Extreme learning machine (ELM) has been attracting much attention because of its excellent performance in training speed and predicting accuracy [1–4]. To improve the generalization ability of ELM, varieties of ELM variants have been proposed, such as RELM [2], WRELM [5], summation wavelet ELM [6], and optimally pruned extreme learning machine (OP-ELM) [7], et al. OP-ELM utilizes least angle regression and leave-one-out validation method to afford enhanced robustness. Tikhonov-regularized OP-ELM (TROP-ELM) is a newly proposed method, where $\ell _1$ penalty, $\ell _2$ penalty, and a variable ranking method named multiresponse sparse regression [8] are applied to the hidden layer and the regression weights, respectively [9]. A supervised algorithm with kernel method, which randomly selects a subset of the available data samples as support vectors, referred to as the reduced kernel ELM is studied by Deng [10]. ELM is also a useful tool for large-scale data analysis, especially in dimensionality reduction in big dimensional data [11, 12]. As far as we know, all the regression problems of ELM are solved by Moore–Penrose (M–P) inverse of matrices. There exists no tensor-based ELM, since the usage of the type-2 fuzzy sets in the hidden layer mapping leads to a tensor regression problem, and little work has been down yet on ELM with type-2 fuzzy hidden layer mapping.

Modern networks (such as internet traffic, telecommunication records, mobile location and large-scale social networks) generate massive amounts of data, the data present multiple aspects and high dimensionality. Such data are complex for matrix form and easy for multi-way arrays or tensor, ELM and its variants do not suit for the tensor data with high dimensionality. Consequently, tensor decomposition such as Tucker decomposition becomes important tool for tensor summarization and tensor analysis. One major challenge of tensor decomposition is how to deal with high-dimensional or sparse data with most of the entries of the data are zero. To address the memory overflow problem, Kolda and Sun [13] propose a memory-efficient Tucker method, which adaptively selects the right execution strategy during the decomposition. Kolda also presents an overview of higher-order tensor decomposition, higher-order tensor applications and well-known software for higher-order tensor [14]. Incomplete tensor completion and data factorization method called CANDECOMP/PARAFAC (CP) weighted optimization [15] is equivalent to solve the weighted least squares problem. The CP expresses a tensor as the sum of component rank-one tensors. However, doing CP decomposition can be difficult due to alternating least squares optimization. To avoid this problem, a kind of gradient-based optimization method is used to do the CP decomposition [16]. A new and highly parallelizable method (ParCube) for speeding up tensor decomposition that is well-suited to producing sparse approximations in data mining applications is proposed, the ParCube can scale to truly large data sets [17]. In [18], a sparse tensor decomposition algorithm that incorporates sparsity into the estimation of decomposition components is proposed, and the method provides a general way to solve the high-dimensional latent variable models. An alternating direction method of multiplier-based method [19] is developed to solve the tensor completion problem and its low-rankness approximation problem. A new algorithm [20] is proposed to find the generalized singular value decomposition (SVD) of two matrices with the same number of columns, and the sensitivity of the algorithm to the matrix entries’ errors can be suppressed. In [21], a highly efficient and scalable estimation algorithm for a class of spectral regularized matrix regression is developed, and model selection algorithm along the regularization path is finished by a degree of freedom formula. In [22], an algorithm called alternating SVD is proposed for the computation of a best rank-one approximation of tensors, the convergence of the alternating SVD is guaranteed.

Tensor regression problem is relatively new comparing with matrix regression problem. There exist a few papers on this topic, to name a few, a new tensor decomposition method (based on the fact that a tensor group endowed with the Einstein product is isomorphic to the general linear group of degree n) which has been proven to be related to the well-known canonical polyadic decomposition and tensor SVD [23] is introduced, and multilinear systems are solvable when the tensor has an odd number of modes or when the tensor has distinct dimensions in each modes. The M–P inverse of tensors with the Einstein product is first proposed in [24], and some multilinear systems’ general solution or the minimum-norm least squares solution are given based on the M–P inverse of tensors. In [25], a method to compute the M–P inverse of tensors is proposed. Reverse order laws for several generalized inverses of tensors are also presented. In addition, general solutions of multilinear systems of tensors using the generalized M–P inverse of tensors with the Einstein product are discussed. In [26], the fundamental theorem of linear algebra for matrix space is extended to tensor space, and the relationship between the minimum-norm least squares solution of a multilinear system and the weighted M–P inverse of its coefficient tensor is studied.

Inspired by the abovementioned topics, in this paper, we extended the ELM in three folds: (1) Triangular type-2 fuzzy set is used to formulate the uncertainty; (2) Tensor structure is adopted to construct the ELM; (3) Only type-2 fuzzy membership functions are needed in tensor-based type-2 ELM (TT2-ELM), extreme learning results are solved by tensor-based regression problem in which classical M–P inverse of matrix is replaced by M–P inverse of tensor, and the type-reduction in type-2 fuzzy set is avoided. As far as we know, there exists no work on the extreme learning machine scheme which uses tensor to train the model, let alone on the type-2 fuzzy set-based ELM.

This paper is organized as follows. The basic concepts of type-2 fuzzy sets are described in Sect. 2. In Sect. 3, a tensor-based type-2 ELM is proposed, and proof of the tensor regression for TT2-ELM is presented at the end of Sect. 3. Section 4 provides three types of examples to illustrate the utilization of the proposed algorithm. Finally, conclusions and future work are given in Sect. 5.

2 Triangular type-2 fuzzy sets

Type-2 fuzzy set is introduced as an extension of the ordinary fuzzy set, the membership grades of type-2 fuzzy sets themselves are type-1 fuzzy sets. For a type-2 fuzzy set ${\tilde{A}}$, the type-2 membership function $\mu _{\tilde{A}}(x,u)$ with $x\in X$ and $u\in J_{x}\subseteq [0,1]$ is described as $\tilde{A}=\{((x,u),\mu _{\tilde{A}}(x,u))|\forall x\in X, \forall u\in J_{x}\subseteq [0,1]\}$ [27], where $0\le \mu _{\tilde{A}}(x,u)\le 1$. Another representation form of the type-2 fuzzy set $\tilde{A}$ can be expressed as

$$\begin{aligned} \tilde{A}=\int _{x\in X}\int _{u\in J_{x}}\mu _{\tilde{A}}(x,u)/(x,u),\quad J_{x}\subseteq [0,1], \end{aligned}$$

(1)

where $\int \int$ denotes union over all admissible x and u, $J_{x}$ is the primary membership of x (usually a closed interval of real numbers that is contained within [0, 1]).

To the triangular type-2 fuzzy sets, the partition of the primary domain X in the discrete form is denoted as $\{x_1,x_2,\ldots , x_n\}$, let $f_k$ be the secondary membership function of $x_k$, and it can be denoted by

$$\begin{aligned} f_{k}(u)=\max \left\{ 0,\min \left\{ \frac{u-\underline{\mu }_{k}}{\hat{\mu }_{k}-\underline{\mu }_{k}},\frac{u-\overline{\mu }_{k}}{\hat{\mu }_{k}-\overline{\mu }_{k}}\right\} \right\} , \end{aligned}$$

(2)

where $\bar{\mu }_k> \hat{\mu } _k> \underline{\mu }_k$ are upper, principal and lower membership grades for $k=1,2,\ldots ,n$, respectively.

For a given testing data set denoted as $\{D_t\}_{t=1}^{N}$, where $D_t=({\varvec{x}}_t,y_t), {\varvec{x}}_t=(x_{t1},x_{t2},\ldots ,x_{tK})\in {\mathbb {R}}^{K}$ and $y_t\in {\mathbb {R}}$. A lower membership function matrix $\underline{{\varPhi }}\in {\mathbb {R}}^{N\times 2\times L \times 1}$ can be constructed to approximate the relationship between input ${\varvec{x}}_t$ and a desired output $y_t$ via lower membership functions,

$$\begin{aligned} \begin{array}{cc} \underline{{\varPhi }}_{:,:,1,1} =\begin{bmatrix} \underline{\mu }({\varvec{w}}_{11} {\varvec{x}}_1+b_{11})&{}\underline{\mu }({\varvec{w}}_{12} {\varvec{x}}_1+b_{12})\\ \vdots &{}\vdots \\ \underline{\mu }({\varvec{w}}_{11} {\varvec{x}}_N+b_{11})&{}\underline{\mu }({\varvec{w}}_{12} {\varvec{x}}_N+b_{12}) \end{bmatrix},\\ \vdots \\ \underline{{\varPhi }}_{:,:,L,1} =\begin{bmatrix} \underline{\mu }({\varvec{w}}_{L1} {\varvec{x}}_1+b_{L1})&{}\underline{\mu }({\varvec{w}}_{L2} {\varvec{x}}_1+b_{L2})\\ \vdots &{}\vdots \\ \underline{\mu }({\varvec{w}}_{L1} {\varvec{x}}_N+b_{L1})&{}\underline{\mu }({\varvec{w}}_{L2} {\varvec{x}}_N+b_{L2}) \end{bmatrix}, \end{array} \end{aligned}$$

where $b_{il}$ and ${\varvec{w}}_{il}=[w_{i1},w_{i2},\ldots ,w_{iK}]$ ($i=1, 2, \ldots , L, l=1,2$) are random generated bias and input weights, respectively. Similarly, the principal and upper membership function matrices $\hat{{\varPhi }}, \bar{{\varPhi }}\in {\mathbb {R}}^{N\times 2\times L \times 1}$ are

$$\begin{aligned} \begin{array}{cc} \hat{{\varPhi }}_{:,:,1,2} =\begin{bmatrix} \hat{\mu }({\varvec{w}}_{11} {\varvec{x}}_1+b_{11})&{}\hat{\mu }({\varvec{w}}_{12} {\varvec{x}}_1+b_{12})\\ \vdots &{}\vdots \\ \hat{\mu }({\varvec{w}}_{11} {\varvec{x}}_N+b_{11})&{}\hat{\mu }({\varvec{w}}_{12} {\varvec{x}}_N+b_{12}) \end{bmatrix},\\ \vdots \\ \hat{{\varPhi }}_{:,:,L,2} =\begin{bmatrix} \hat{\mu }({\varvec{w}}_{L1} {\varvec{x}}_1+b_{L1})&{}\hat{\mu }({\varvec{w}}_{L2} {\varvec{x}}_1+b_{L2})\\ \vdots &{}\vdots \\ \hat{\mu }({\varvec{w}}_{L1} {\varvec{x}}_N+b_{L1})&{}\hat{\mu }({\varvec{w}}_{L2} {\varvec{x}}_N+b_{L2}) \end{bmatrix}, \end{array} \end{aligned}$$

$$\begin{aligned} \begin{array}{cc} \bar{{\varPhi }}_{:,:,1,3} =\begin{bmatrix} \bar{\mu }({\varvec{w}}_{11} {\varvec{x}}_1+b_{11})&{}\bar{\mu }({\varvec{w}}_{12} {\varvec{x}}_1+b_{12})\\ \vdots &{}\vdots \\ \bar{\mu }({\varvec{w}}_{11} {\varvec{x}}_N+b_{11})&{}\bar{\mu }({\varvec{w}}_{12} {\varvec{x}}_N+b_{12}) \end{bmatrix},\\ \vdots \\ \bar{{\varPhi }}_{:,:,L,3} =\begin{bmatrix} \bar{\mu }({\varvec{w}}_{L1} {\varvec{x}}_1+b_{L1})&{}\bar{\mu }({\varvec{w}}_{L2} {\varvec{x}}_1+b_{L2})\\ \vdots &{}\vdots \\ \bar{\mu }({\varvec{w}}_{L1} {\varvec{x}}_N+b_{L1})&{}\bar{\mu }({\varvec{w}}_{L2} {\varvec{x}}_N+b_{L2}) \end{bmatrix}. \end{array} \end{aligned}$$

A 4-tensor $\Upphi \in {\mathbb {R}}^{N\times 2\times L \times 3}$ can be constructed by $\underline{{\varPhi }}_{:,:,:,1}, \hat{{\varPhi }}_{:,:,:,2}$ and $\bar{{\varPhi }}_{:,:,:,3}$. In the following, we denote tensor $\Upphi$ by ${\mathcal {A}}$ for the following tensor regression section.

3 Tensor-based type-2 ELM (TT2-ELM)

In this section, we establish the basic structure of TT2-ELM. First, tensor operations and inverse of tensor are introduced. In [24], a generalized M–P inverse of even-order tensor is introduced, which is formulated as follows.

Definition 1

Let ${\mathcal {A}} \in {\mathbb {R}} ^{I_{1 \ldots N}\times K_{1 \ldots N}}$, the tensor ${\mathcal {X}} \in {\mathbb {R}} ^{K_{1\ldots N} \times J_{1\ldots N}}$ is called the M–P inverse of ${\mathcal {A}}$, denoted by ${\mathcal {A}}^+$, and it satisfied the following four tensor equations:

(1)
${\mathcal {A}} *_N {\mathcal {X}} *_N {\mathcal {A}} = {\mathcal {A}}$;
(2)
${\mathcal {X}} *_N {\mathcal {A}} *_N {\mathcal {X}} = {\mathcal {X}}$;
(3)
$({\mathcal {A}} *_N {\mathcal {X}})^* = {\mathcal {A}} *_N {\mathcal {X}}$;
(4)
$({\mathcal {X}} *_N {\mathcal {A}})^* = {\mathcal {X}} *_N {\mathcal {A}}$,

where $I_{1 \ldots N}=I_1\times \cdots \times I_N, J_{1 \ldots N}=J_1\times \cdots \times J_N, K_{1 \ldots N}=K_1\times \cdots \times K_N, *_N$ is the Einstein product^{Footnote 1} of tensor, and ${\mathcal {A}}\, *_N\, {\mathcal {X}}$ is defined as

$$\begin{aligned} ({\mathcal {A}}\, *_N\, {\mathcal {X}} )_{i_{1\ldots N}j_{1\ldots N}}=\sum _{k_{1\ldots N}}a_{i_{1\ldots N}k_{1\ldots N}}x_{k_{1\ldots N}j_{1\ldots N}}. \end{aligned}$$

(3)

For an even-order tensor ${\mathcal {A}} \in {\mathbb {R}} ^{I_{1 \ldots N}\times I_{1 \ldots N}}$, Brazell et al. [28] shown that the inverse of tensor ${\mathcal {A}}$ exists if there exists tensor ${\mathcal {X}}\in {\mathbb {R}} ^{I_{1 \ldots N}\times I_{1 \ldots N}}$ such that ${\mathcal {A}}\, *_N\, {\mathcal {X}} = {\mathcal {X}} *_N {\mathcal {A}} = {\mathcal {I}}$, denoted by ${\mathcal {A}}^{-1}$. Clearly, if ${\mathcal {A}}$ is invertible, then ${\mathcal {A}}^{(i)} = {\mathcal {A}}^+ ={\mathcal {A}}^{-1}$, where ${\mathcal {A}}^{(i)}$ satisfies the ith condition of Definition 1. For a tensor ${\mathcal {S}} \in {\mathbb {R}} ^{I_{1 \ldots N}\times J_{1 \ldots N}}$, it follows from Definition 1 that ${\mathcal {S}}^+ \in {\mathbb {R}} ^{J_{1\ldots N}\times I_{1\ldots N} }$ and

$$\begin{aligned} ({\mathcal {S}}^+)_{J_{1\ldots N}\times I_{1\ldots N} } = ({\mathcal {S}})^+_{I_{1\ldots N}\times J_{1\ldots N}}, \end{aligned}$$

(4)

where

$$\begin{aligned}&s^+=\left\{ \begin{array}{ll} s^{-1},& \quad {\rm if} \,(I_1,\ldots , I_N) = ( J_1,\ldots , J_N),s\ne 0,\\ 0,& \quad {\rm if} \,(I_1,\ldots , I_N) \ne ( J_1,\ldots , J_N),\\ &{}\quad {\rm or} \,(I_1,\ldots , I_N) = ( J_1,\ldots , J_N),s=0. \end{array} \right. \end{aligned}$$

and $s=({\mathcal {S}})_{i_{1\ldots N}\times j_{1\ldots N}}$. It is easy to see that ${\mathcal {S}} *_N {\mathcal {S}}^+$ and ${\mathcal {S}}^+ *_N {\mathcal {S}}$ are diagonal tensors whose diagonal entries are 1 or 0.

For a tensor ${\mathcal {A}} \in {\mathbb {R}} ^{J_{1 \ldots N}\times I_{1 \ldots N}}$, if ${\mathcal {A}} *_N {\mathcal {A}}^* = {\mathcal {A}}^*\, *_N \,{\mathcal {A}} ={\mathcal {I}}$, then ${\mathcal {A}}$ is called the orthogonal tensor. The SVD of a tensor ${\mathcal {A}}$ has the form as ${\mathcal {A}} = {\mathcal {U}}\, *_N \,{\mathcal {B}} *_N {\mathcal {V}}^*$, where ${\mathcal {U}} \in {\mathbb {R}} ^{I_{1 \ldots N}\times I_{1 \ldots N}}$ and ${\mathcal {V}} \in {\mathbb {R}} ^{J_{1 \ldots N}\times J_{1 \ldots N}}$ are orthogonal tensors, and ${\mathcal {B}} \in {\mathbb {R}} ^{J_{1 \ldots N}\times I_{1 \ldots N}}$ satisfies Eq. (4). For some ‘square’ case, that is $I_k=J_k, k=1,2,\ldots ,N$, the existence for the inverse of ${\mathcal {A}}$ is given by an isomorphism map $\mathscr {L}:{\mathbb {T}}_{I_{1 \ldots N} \times I_{1 \ldots N}}({\mathbb {R}}) \rightarrow {\mathbb {M}}_{\left( \prod _{i =1}^N I_i\right) \times \left( \prod _{i =1}^N I_i\right) }({\mathbb {R}})$, where the Einstein product is used in the tensor group ${\mathbb {T}}_{I_{1 \ldots N} \times I_{1 \ldots N}}({\mathbb {R}})$, and the matrix group ${\mathbb {M}}_{\left( \prod _{i =1}^N I_i\right) \times \left( \prod _{i =1}^N I_i\right) }({\mathbb {R}})$ used the usual matrix product [28]. The M–P inverse of ${\mathcal {A}} \in {\mathbb {R}} ^{I_{1\ldots N}\times J_{1\ldots N}}$ exists and is unique [24], and ${\mathcal {A}}^+ = {\mathcal {V}} *_N{\mathcal {S}}^{+} *_N {\mathcal {U}}^*$ (Pseudo code of M–P inverse of even-order tensor${\mathcal {A}}$ is listed in Algorithm 1).

TT2-ELM^{Footnote 2} is a neural network with a single-layer feed forward, structure of TT2-ELM is shown in Fig. 1. For a given testing data set denoted as $\{D_t\}_{t=1}^{N}$, where $D_t=({\varvec{x}}_t,y_t), {\varvec{x}}_t=(x_{t1},x_{t2},\ldots ,x_{tK})\in {\mathbb {R}}^{K}$ and $y_t\in {\mathbb {R}}$. TT2-ELM can be constructed to approximate the relationship between input ${\varvec{x}}_t$ and desired output $y_t$. For N training patterns, the TT2-ELM’s mathematical model can be formulated in a compact form

$$\begin{aligned} {\mathcal {A}} *_N {\mathcal {X}}={\mathcal {Y}}, \end{aligned}$$

(5)

where ${\mathcal {X}}\in {\mathbb {R}}^{J_{1 \ldots 2}}, {\mathcal {Y}}\in {\mathbb {R}}^{I_{1\ldots 2} }$, and ${\mathcal {A}}\in {\mathbb {R}}^{I_{1\ldots 2}\times J_{1\ldots 2}}$ can be reshaped via $\Upphi$ with dimension $N\times 2\times L\times 3$. We can find the value of output weight ${\mathcal {X}}$ by solving the linear tensor Eq. (5).

Remark 1

The regression tensor’s dimension is $N\times 2\times L\times 3$, we can deduce that ${\mathcal {Y}}\in R^{N\times 2}$ based on Einstein product. A vector should be copied once to meet the dimension requirement.

Table 1 Mathematical models and its design parameters

Full size table

Table 2 Comparison results for the one-input Sinc function (12)

Full size table

In the following, the multilinear system Eq. (5) is generalized as follows

$$\begin{aligned} {\mathcal {A}} *_N {\mathcal {X}} ={\mathcal {Y}}, \end{aligned}$$

(6)

where ${\mathcal {A}} \in {\mathbb {R}}^{I_{1 \ldots N} \times J_{1 \ldots N}},{\mathcal {X}} \in {\mathbb {R}}^{J_{1 \ldots N}}$ and ${\mathcal {Y}} \in {\mathbb {R}}^{I_{1\ldots N}}$. It is obviously that (5) is the special case of (6) when $N=2$. The following theorem holds for the multilinear system Eq. (6).

Theorem 1

The multilinear system (6) is solvable if and only if${\mathcal {X}}={\mathcal {A}}^{(1)}*_N {\mathcal {Y}}$ is a solution of the TT2-ELM with the system Eq. (6), where${\mathcal {A}}^{(1)}$ is a solution of${\mathcal {A}} *_N {\mathcal {X}}*_N {\mathcal {A}}={\mathcal {A}}$. If there doesn’t exist any${\mathcal {X}}\in {\mathbb {R}}^{J_{1 \ldots N}}$ for Eq. (6), then there exists a minimum-norm solution${\mathcal {X}} = {\mathcal {A}}^+ *_N {\mathcal {Y}}$ to unsolvable multilinear system (6), and the norm is defined as the Frobenius norm

$$\begin{aligned} \Vert \cdot \Vert _F=\sqrt{\sum _{i_{1\ldots N}j_{1\ldots N}} |a_{i_{1\ldots N}j_{1\ldots N}}|^2}. \end{aligned}$$

Proof

The proof is based on kernel theory and norm theory of tensor, which is slightly different from [24] (Theorem 4.1 (1)), it is more easily for us to understand the idea that how the tensor linear equation is incorporated into the extreme learning design framework. Since ${\mathcal {A}} *_N ({\mathcal {A}}^{(1)} *_N {\mathcal {A}}) *_N {\mathcal {Z}}={\mathcal {A}} *_N {\mathcal {Z}}$, where ${\mathcal {Z}}$ is an arbitrary tensor with suitable order, we knew that $({\mathcal {I}} -{\mathcal {A}}^{(1)} *_N {\mathcal {A}}) *_N {\mathcal {Z}}$ satisfies that ${\mathcal {A}}*_N ({\mathcal {I}} -{\mathcal {A}}^{(1)} *_N {\mathcal {A}}) *_N {\mathcal {Z}}=0$. It means that $({\mathcal {I}} - {\mathcal {A}}^{(1)} *_N {\mathcal {A}}) *_N {\mathcal {Z}}$ belongs to the null space of ${\mathcal {A}}*_N {\mathcal {X}}=0$. Obviously, the general solution is

$$\begin{aligned} {\mathcal {X}} ={\mathcal {A}}^{(1)} *_N {\mathcal {Y}} + ({\mathcal {I}} - {\mathcal {A}}^{(1)} *_N {\mathcal {A}}) *_N {\mathcal {Z}}, \end{aligned}$$

(7)

where ${\mathcal {A}}^{(1)}$ is the 1-inverse of ${\mathcal {A}}$. The general solution is constituted by two parts, one is from Eq. (6), the other is from null space of equation ${\mathcal {A}}*_N {\mathcal {X}}=0$.

For multilinear system (6), if the multilinear system (6) is unsolvable, there exists a minimum-norm solution to Eq. (6). Let ${\mathcal {E}}({\mathcal {X}})=\Vert {\mathcal {A}} *_N {\mathcal {X}} -{\mathcal {Y}}\Vert _F^2, {\mathcal {E}}_1({\mathcal {X}})=\mathrm{trace}(({\mathcal {A}} *_N {\mathcal {X}})^T *_N ({\mathcal {A}} *_N {\mathcal {X}})), {\mathcal {E}}_2({\mathcal {X}})=-2\mathrm{trace}({\mathcal {Y}}^T *_N ({\mathcal {A}} *_N {\mathcal {X}}))$, and ${\mathcal {E}}_3({\mathcal {X}})=\mathrm{trace}({\mathcal {Y}}^T *_N {\mathcal {Y}})$, it holds that ${\mathcal {E}}({\mathcal {X}})={\mathcal {E}}_1({\mathcal {X}})+{\mathcal {E}}_2({\mathcal {X}})+{\mathcal {E}}_3({\mathcal {X}})$. After some tensor differential operations, we have that

$$\begin{aligned} \frac{\partial {\mathcal {E}}_1}{\partial {\mathcal {X}}}=2 ({\mathcal {A}}^T *_N {\mathcal {A}})*_N {\mathcal {X}},\, \frac{\partial {\mathcal {E}}_2}{\partial {\mathcal {X}}}= -2 {\mathcal {A}}^T *_N {\mathcal {Y}},\, \frac{\partial {\mathcal {E}}_3}{\partial {\mathcal {X}}}=0. \end{aligned}$$

(8)

Noticed that a minimum-norm solution to Eq. (6) is equivalent to the following tensor optimization problem

$$\begin{aligned} \min _{{\mathcal {X}}}\,{\mathcal {E}}({\mathcal {X}}). \end{aligned}$$

(9)

The necessary condition for (9) is $\frac{\partial {\mathcal {E}}_1}{\partial {\mathcal {X}}}+\frac{\partial {\mathcal {E}}_2}{\partial {\mathcal {X}}}=2(({\mathcal {A}}^T *_N {\mathcal {A}})*_N {\mathcal {X}}-{\mathcal {A}}^T *_N {\mathcal {Y}})=0$, that is, the tensor equation

$$\begin{aligned} ({\mathcal {A}}^T *_N {\mathcal {A}})*_N {\mathcal {X}}={\mathcal {A}}^T *_N {\mathcal {Y}}. \end{aligned}$$

(10)

Left hand of Eq. (10)’s gain tensor ${\mathcal {A}}^T *_N {\mathcal {A}}$ is a ‘square tensor’, tensor Eq. (10) is solvable. Using Eq. (7), we have

$$\begin{aligned} {\mathcal {X}} &=({\mathcal {A}}^T *_N {\mathcal {A}})^{(1)} *_N {\mathcal {A}}^T *_N {\mathcal {Y}} \\&\quad+({\mathcal {I}} -({\mathcal {A}}^T *_N {\mathcal {A}})^{(1)} *_N ({\mathcal {A}}^T *_N {\mathcal {A}})) *_N {\mathcal {Z}}. \end{aligned}$$

(11)

It holds that ${\mathcal {X}}$ is the minimizer when ${\mathcal {Z}}=0$. We have ${\mathcal {A}}={\mathcal {A}}*_N ({\mathcal {A}}^T *_N {\mathcal {A}})^{(1)} *_N ({\mathcal {A}}^T *_N {\mathcal {A}})$ by Corollary 2.14(1) [25], there also exists $({\mathcal {A}}^T *_N {\mathcal {A}})^{(1)}$ such that ${\mathcal {A}}^+=({\mathcal {A}}^T *_N {\mathcal {A}})^{(1)} *_N {\mathcal {A}}^T$. Then the minimum-norm least squares solution can be obtained as ${\mathcal {X}} = {\mathcal {A}}^+ *_N {\mathcal {Y}}$. $\square$

Remark 2

The tensor regression is necessary for the type-2 fuzzy set-based extreme learning scheme, since the tensor structure can incorporate the information of the secondary membership function directly, the type-reduction step which is the most important step in type-2 fuzzy inference is avoided, and type-2 fuzzy structure can be seamlessly incorporated into the ELM scheme.

Remark 3

The regression tensor ${\mathcal {A}}\in {\mathbb {R}}^{N\times 2\times L\times 3}$, the second dimension equals to 2. Thus, TT2-ELM needs two randomly generated weight vectors, any one of the vectors is randomly generated weight vectors which can be used by the RELM and WRELM.

Table 3 Comparison results for the two-input Sinc function (13)

Full size table

4 Performance comparison

In this section, three types of examples are involved in algorithms’ performance comparison. Two function approximation problems are presented in Sect. 4.1. A nonlinear system identification problem which has three input variables is provided in Sect. 4.2. In Sect. 4.3, four real-world regression problems in which the input variables range from 2 to 61 are given. Parameters sensitivity and stability analysis are carried out for different iteration number at the end of this section.

ELM, RELM and WRELM are applied to compare with TT2-ELM in the section. The RELM and WRELM use constraint on minimum norm of weight factor, while TT2-ELM’s mathematical model only has constraint on minimum norm of error. Mathematical models of the four algorithms, that is, ELM, RELM, WRELM and TT2-ELM are listed in Table 1. The algorithms RELM and WRELM have two design parameters L and C, while ELM and TT2-ELM only presents one design parameter, the hidden layer neuron number L. All of the compared results are carried out by averaging with 1000 times. The triangular type-2 fuzzy sets are used in the TT2-ELM design. The principal membership grade is randomly generated, the lower and upper membership grades use two offset variables, one is 0.01, and the other is 0.05.

4.1 Modelling of Sinc functions

Consider one-input Sinc function

$$\begin{aligned} z =\frac{\sin x}{x},\quad x\in [-\,10, 10]. \end{aligned}$$

(12)

Table 2 shows the comparison results among ELM, RELM, WRELM and TT2-ELM. All the training data are randomly generated from interval $[-\,10,10]$, while testing data are uniformly generated from interval $[-\,10,10]$ with step equals to 0.01. TT2-ELM solves a regression problem ${\mathcal {A}} {\mathcal {X}}={\mathcal {Y}}$ with ${\mathcal {A}}\in {\mathbb {R}}^{600\times 2\times L \times 3}, {\mathcal {X}} \in {\mathbb {R}}^{L \times 3}$, and ${\mathcal {Y}}\in {\mathbb {R}}^{600\times 2}$ ($L \in \{25, 30, 35\}$). It is needed to be pointed out that the data should be the repeat copies of the first column of the array if we only have a single column data, the column data should repeat once more to meet the dimension requirement of ${\mathcal {Y}}$ for TT2-ELM. When $L=35$, the ELM outperforms the TT2-ELM on all the design parameters $\{2^{-5}, 2^{-10}, 2^{-15}, 2^{-20}\}$.

Consider two-input Sinc function

$$\begin{aligned} z =\frac{\sin x\cdot \sin y}{x y}, (x,y)\in [-\pi ,\pi ]^2. \end{aligned}$$

(13)

Equally spaced $41\times 41$ data pairs are used for training, and equally spaced $30\times 30$ testing data are used to check generalization.

Table 3 lists the comparison results for the two-input Sinc function (13). Results show that TT2-ELM outperforms ELM, RELM and WRELM for all the cases. Combining the results which are listed in Tables 2 and 3, we can infer that the hidden neuron number L is important to each algorithm’s training and testing precision, the hidden neuron number L must be lied in the certain range. The performance will be poor when the hidden neuron number L out of the range, and it will be tested again by the next subsection’s nonlinear system identification problem.

Table 4 User-defined parameters of the data sets

Full size table

Table 5 Training and testing error on auto-Mpg, bank, diabetes and triazines

Full size table

4.2 Nonlinear system identification

Consider a nonlinear system given by [29]

$$\begin{aligned} y_p(k) =\frac{ y_p(k - 1)y_p(k - 2)(y_p(k - 1) + 2.5)}{1 + (y_p(k - 1))^2 + (y_p(k - 2))^2} + u(k - 1). \end{aligned}$$

(14)

The equilibrium state of the system is (0, 0) and input is chosen as $u(k) \in \{-\,2, 2\}$, stable operation is guaranteed in the range of $\{-\,2, 2\}$. Uniformly distributed random variable in the range $\{-\,2, 2\}$ is chosen as training input and testing input is given by $u(k) = \sin (2\pi k/25)$. By selecting $[y_p(k - 1), y_p(k - 2), u(k - 1)]$ as input variables and $y_p(k)$ as output variable, the system can write in form

$$\begin{aligned} \hat{y} _p(k) = \hat{f}(y_p(k - 1), y_p(k - 2), u(k - 1)). \end{aligned}$$

(15)

Total 800 data are selected, 600 data are used for training and 200 data for testing. Figure 2a shows the nonlinear system output which is randomly generated by $u(k) \in \{-\,2, 2\}$. Figure 2b shows the nonlinear system output with specified control effort $u(k) = \sin (2\pi k/25)$.

To the relation between the hidden neuron number L and testing error, we did the comparisons on nonlinear system identification for TT2-ELM. Figure 3 shows the actual output $y_p(k)$ and testing output $\hat{y}_p(k)$ using specified control effort $u(k) = \sin (2\pi k/25)$ with $L=35$, and the testing error is 0.0580. Figure 4a shows the averaged 100 results of training error, and testing error for different hidden layer number L which varies from 25 to 75 is shown in Fig. 4b. Figure results show that the testing error will not always decrease when training error decreases and the hidden layer number L increases. Thus, to achieve a satisfiable precision, trail-and-error method should be used for the experiments based on different hidden neuron number when we used a priori type-2 fuzzy sets, although the weighting and bias parameters are randomly generated.

4.3 Real-world regression problems

In this section, four real-world regression problems are considered. Auto-Mpg is a data set which collects miles per gallon data from different automobile brand. The bank data set simulates the customers’ levels of patience who choose their preferred bank depending on eight factors, such as residential areas, distances, fictitious temperature controlling bank choice, et al. Diabetes is to investigate the dependence of the level of serum C-peptide (to measure the patterns of residual insulin secretion) on the various factors. Triazines is a data set which is usually used to learn a regression equation or rules to predict the activity from the descriptive structural attributes. User-defined parameters of the four data sets are tabulated in Table 4, these data sets include three small-scale data sets and one moderate-scale data set. Table 5 shows the mean and standard deviation on auto-Mpg, bank, diabetes and triazines with 1000 experiment results. Five algorithms, ELM, RELM, WRELM, LS-SVM [30] and PLS [31], are adopted in the algorithm comparisons. TT2-ELM outperforms the other five algorithms on auto-Mpg and diabetes with respect to testing error, and outperforms the ELM, RELM, WRELM and LS-SVM on bank data set with respect to the training error. PLS achieves the best training error on three data sets, that is auto-Mpg, bank and diabetes, and achieves best testing error on bank. To the data set triazines, PLS fails to obtain the training error or testing error due to the occurrence of matrix singularity problem, reciprocal condition number is near 0. LS-SVM has the best training error on triazines, and WRELM performs relatively better than the other three variants of ELM, TT2-ELM has a comparable performance with ELM and RELM.

4.4 Parameters sensitivity and stability analysis

In this section, parameters sensitivity and stability analysis are carried out for different iteration number. The parameter C is chosen from the parameter set $\{2^{-k}| k=5,10,15,20,25,30\}$ for RELM and WRELM. Hidden layer number L is varied from 10 to 80 and the best result is recorded. All the data sets use 100 iteration steps, except auto-Mpg uses 50. Figure 5a shows TT2-ELM achieves the smallest value among the five algorithms, the mean is 0.4529 and standard deviation is 0.0076. While RELM and WRELM stabilized around 1.7593. Figure 5b shows that the algorithms ELM, RELM and WRELM all vary along 0.0664, while TT2-ELM’s curve presents the mean 0.0436 and standard deviation 9.01e−4. To the data set bank, PLS is the best algorithm, as is shown in Fig. 5b. Figure 5c shows the testing error on diabetes. Figure 5d shows the testing error on triazines. Figure 5c shows that numerical results of PLS flow flatly than other algorithms, as is shown in Table 5, while PLS meets the singularity problem on triazines. The test results show that PLS is not suitable for data sets when linear relation between input and output is not strong. TT2-ELM obtains the best performance on the diabetes, and worst on the triazines, this may partially due to RELM and WRELM utilized the regularization and weighted optimization method, TT2-ELM uses no further technique to the object function (see Table 1). Considering that TT2-ELM explores the input–output relation by tensor structure, the overall performance shows that TT2-ELM can perform better if more design parameters are considered in the mathematical model.

5 Conclusions and future work

Tensor regression is a new topic for fuzzy system modelling, it can be inferred as a high-dimensional case of the classical matrix regression problem. In this paper, we introduce the tensor regression to the type-2 fuzzy extreme learning machine scheme, the type-reduction step is avoided since the tensor structure can incorporate the information of the secondary membership function directly. Experimental results show that TT2-ELM outperforms ELM, RELM and WRELM on most of the data set. However, it should be noted that TT2-ELM has an inherit drawback which came from the tensor itself, the algorithm is not efficient for large-scale problem since it is time-consuming to solve a large-scale tensor regression problem. The batch training method which is proposed in this paper is space demanding for large-scale problem, an alternative way to avoid this problem is to design an incremental tensor decomposition algorithm to iteratively obtain the inverse of the tensor, it means that new theory on tensor operations is needed. Although we think that would be a great research, it is a different approach and might be more suitable for a different paper.

In this paper, a special tensor structure is constructed for the purpose of type-2 fuzzy set which is used in extreme learning machine scheme. There still exists much more work to be done. One direction is to design an extreme learning machine for general fuzzy sets, such as type-2 trapezoidal fuzzy sets, spiked type-2 fuzzy sets, et al. To the tensor regression, a sequential regression algorithm is more appreciated for online training or learning purpose. Generalized M–P inverse is essential for the tensor regression, the tensor operation should be redefined to endow the generalized M–P inverse more convenient to use for tensor regression.

Notes

TPROD efficiently allows any type of tensor product between two multi-dimensional arrays, which can be downloaded from Mathworks’s file exchange.
TT2-ELM’s source MATLAB code can be download directly from https://github.com/zggl/TriT2ELM.

References

Huang GB, Ding XJ, Zhou HM (2010) Optimization method based extreme learning machine for classification. Neurocomputing 74(1):155–163
Article Google Scholar
Huang GB, Zhou HM, Ding XJ, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern B Cybern 42(2):513–529
Article Google Scholar
Zong WW, Huang GB, Chen YQ (2013) Weighted extreme learning machine for imbalance learning. Neurocomputing 101:229–242
Article Google Scholar
Huang GB (2014) An insight into extreme learning machines: random neurons, random features and kernels. Cogn Comput 6(3):376–390
Article Google Scholar
Deng W, Zheng Q, Chen L (2009) Regularized extreme learning machine. In: IEEE symposium on computational intelligence and data mining, pp 389–395
Javed K, Gouriveau R, Zerhouni N (2014) SW-ELM: a summation wavelet extreme learning machine algorithm with a priori parameter initialization. Neurocomputing 123:299–307
Article Google Scholar
Miche Y, Sorjamaa A, Bas P, Simula O, Jutten C, Lendasse A (2010) OP-ELM: optimally pruned extreme learning machine. IEEE Trans Neural Netw 21(1):158–162
Article Google Scholar
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–451
Article MathSciNet Google Scholar
Miche Y, van Heeswijk M, Bas P, Simula O, Lendasse A (2011) TROP-ELM: a double-regularized ELM using LARS and Tikhonov regularization. Neurocomputing 74(16):2413–2421
Article Google Scholar
Deng WY, Ong YS, Zheng QH (2016) A fast reduced kernel extreme learning machine. Neural Netw 76:29–38
Article Google Scholar
Deng WY, Bai Z, Huang GB, Zheng QH (2016) A fast SVD-hidden-nodes based extreme learning machine for large-scale data analytics. Neural Netw 77:14–28
Article Google Scholar
Ding SF, Guo LL, Hou YL (2017) Extreme learning machine with kernel model based on deep learning. Neural Comput Appl 28(8):1975–1984
Article Google Scholar
Kolda TG, Sun J (2008) Scalable tensor decompositions for multi-aspect data mining. In: 2008 eighth IEEE international conference on data mining. IEEE, Pisa, Italy, pp 363–372
Kolda T, Bader B (2009) Tensor secompositions and applications: a survey. SIAM Rev 51(3):455–500
Article MathSciNet Google Scholar
Acar E, Dunlavy DM, Kolda TG, Rup MMO (2011) Scalable tensor factorizations for incomplete sata. Chemometr Intell Lab Syst 106(1):41–56
Article Google Scholar
Acar E, Dunlavy DM, Kolda TG (2011) A scalable optimization approach for fitting canonical tensor decompositions. J Chemom 25(2):67–86
Article Google Scholar
Papalexakis EE, Faloutsos C, Sidiropoulos ND (2012) ParCube: sparse parallelizable tensor decompositions. In: Cristianini N (eds) Proceedings of 2012 European conference on machine learning and knowledge discovery in databases. Springer, Berlin, Heidelberg, pp 521–536
Chapter Google Scholar
Sun WW, Lu JW, Liu H, Cheng G (2017) Provable sparse tensor decomposition. J R Stat Soc Ser B (Stat Methodol) 79(3):899–916
Article MathSciNet Google Scholar
Ji TY, Huang TZ, Zhao XL, Ma TH, Deng LJ (2017) A non-convex tensor rank approximation for tensor completion. Appl Math Model 48:410–422
Article MathSciNet Google Scholar
Friedland S (2005) A new approach to generalized singular value decomposition. SIAM J Matrix Anal Appl 27(2):434–444
Article MathSciNet Google Scholar
Zhou H, Li LX (2014) Regularized matrix regression. J R Stat Soc Ser B (Stat Methodol) 76(2):463–483
Article MathSciNet Google Scholar
Friedland S, Mehrmann V, Pajarola R, Suter SK (2013) On best rank one approximation of tensors. Numer Linear Algebra Appl 20(6):942–955
Article MathSciNet Google Scholar
De Lathauwer L, De Moor B, Vandewalle J (2000) A multilinear singular value decomposition. SIAM J Matrix Anal Appl 21(4):1253–1278
Article MathSciNet Google Scholar
Sun LZ, Zheng BD, Bu CJ, Wei YM (2016) Moore–Penrose inverse of tensors via Einstein product. Linear Multilinear Algebra 64(4):686–698
Article MathSciNet Google Scholar
Behera R, Mishra D (2017) Further results on generalized inverses of tensors via the Einstein product. Linear Multilinear Algebra 65(8):1662–1682
Article MathSciNet Google Scholar
Ji J, Wei YM (2017) Weighted Moore–Penrose inverses and fundamental theorem of even-order tensors with Einstein product. Front Math China. https://doi.org/10.1007/s11464-017-0628-1
Article MathSciNet MATH Google Scholar
Mendel JM (2001) Uncertain rule-based fuzzy logic system: introduction and new directions. Prentice-Hall, Prentice
MATH Google Scholar
Brazell M, Li N, Navasca C, Tamon C (2013) Solving multilinear systems via tensor inversion. SIAM J Matrix Anal Appl 34(2):542–570
Article MathSciNet Google Scholar
Rong HJ, Huang GB, Sundararajan N, Saratchandran P (2009) Online sequential fuzzy extreme learning machine for function approximation and classification problems. IEEE Trans Syst Man Cybern Part B (Cybern) 39(4):1067–1072
Article Google Scholar
Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300
Article Google Scholar
Li HD, Xu QS, Liang YZ (2014) libPLS: an integrated library for partial least squares regression and discriminant analysis. PeerJ PrePrints 2: e190v1, source codes available at www.libpls.net

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Nos. 11771111 and 61603126), and the Natural Science Foundation of Heilongjiang Province with Grant No. QC2016094.

Author information

Authors and Affiliations

School of Science, Harbin Institute of Technology, Harbin, 150001, People’s Republic of China
Sharina Huang & Minghao Chen
College of Electronic Information Engineering, Inner Mongolia University, Hohhot, 010021, People’s Republic of China
Guoliang Zhao

Authors

Sharina Huang
View author publications
You can also search for this author in PubMed Google Scholar
Guoliang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Minghao Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sharina Huang.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, S., Zhao, G. & Chen, M. Tensor extreme learning design via generalized Moore–Penrose inverse and triangular type-2 fuzzy sets. Neural Comput & Applic 31, 5641–5651 (2019). https://doi.org/10.1007/s00521-018-3385-5

Download citation

Received: 09 August 2017
Accepted: 21 February 2018
Published: 01 March 2018
Issue Date: September 2019
DOI: https://doi.org/10.1007/s00521-018-3385-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Tensor extreme learning design via generalized Moore–Penrose inverse and triangular type-2 fuzzy sets

Abstract

Similar content being viewed by others

RETRACTED ARTICLE: Tensor based stacked fuzzy neural network for efficient data regression

Deterministic multikernel extreme learning machine with fuzzy feature extraction for pattern classification

A New Fuzzy Extreme Learning Machine for Regression Problems with Outliers or Noises

1 Introduction

2 Triangular type-2 fuzzy sets