Semi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion

Gangeh, Mehrdad J.; Bedawi, Safaa M. A.; Ghodsi, Ali; Karray, Fakhri

doi:10.1007/978-3-319-41501-7_2

Mehrdad J. Gangeh^15,16,
Safaa M. A. Bedawi¹⁷,
Ali Ghodsi¹⁸ &
…
Fakhri Karray¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9730))

Included in the following conference series:

International Conference on Image Analysis and Recognition

2780 Accesses
1 Citations

Abstract

In this paper, a novel semi-supervised dictionary learning and sparse representation (SS-DLSR) is proposed. The proposed method benefits from the supervisory information by learning the dictionary in a space where the dependency between the data and class labels is maximized. This maximization is performed using Hilbert-Schmidt independence criterion (HSIC). On the other hand, the global distribution of the underlying manifolds were learned from the unlabeled data by minimizing the distances between the unlabeled data and the corresponding nearest labeled data in the space of the dictionary learned. The proposed SS-DLSR algorithm has closed-form solutions for both the dictionary and sparse coefficients, and therefore does not have to learn the two iteratively and alternately as is common in the literature of the DLSR. This makes the solution for the proposed algorithm very fast. The experiments confirm the improvement in classification performance on benchmark datasets by including the information from both labeled and unlabeled data, particularly when there are many unlabeled data.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Laplacian Welsch Regularization for Robust Semi-supervised Dictionary Learning

Discriminative Sparse Coding by Nuclear Norm-Driven Semi-Supervised Dictionary Learning

PSSDL: Probabilistic Semi-supervised Dictionary Learning

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Dictionary learning and sparse representation (DLSR) is one of the most successful mathematical models, which has led to state-of-the-art results in various applications such as face recognition [1–3], image denoising [4], texture classification [5], and emotion recognition [6]. DLSR, however, was originally proposed in an unsupervised setting [7]. The main objective function in the optimization problem related to DLSR is to minimize the reconstruction error between the original signal and the reconstructed one in the space of learned dictionary without including the information on class labels into the learning process. To formally describe the original DLSR formulation, we suppose that there is a finite set of data samples denoted as $\mathbf {X}=[\mathbf {x}_1,...,\mathbf {x}_n]\in \mathbb {R}^{d\times n}$, where d is the dimensionality of the data and n is the number of data samples. In original DLSR, the data is decomposed using a few dictionary atoms by optimizing the empirical cost function

$$\begin{aligned} L(\mathbf {X},\mathbf {D},\varvec{\alpha })=\sum _{i=1}^{n}l(\mathbf {x}_i,\mathbf {D},\varvec{\alpha }), \end{aligned}$$

(1)

where $\mathbf {D}\in \mathbb {R}^{d\times k}$ is a dictionary of k atoms, $\varvec{\alpha }\in \mathbb {R}^{k\times n}$ are the sparse coefficients and L, l are loss functions. In the literature of the DLSR, the reconstruction error, in mean-squared sense, between the original signal and the reconstructed signal is the most common loss function, which is usually regularized by the $\ell _1$ norm to induce sparsity into the coefficients. Thus, the formulation in (1) can be written as

$$\begin{aligned} L(\mathbf {X},\mathbf {D},\varvec{\alpha })=\min _{\mathbf {D},\varvec{\alpha }}\sum _{i=1}^{n}\begin{pmatrix}\frac{1}{2}\Vert \mathbf {x}_i-\mathbf {D}\varvec{\alpha }_i\Vert _{2 }^{2}+\lambda \Vert \varvec{\alpha }_i\Vert _{1}\end{pmatrix}\!, \end{aligned}$$

(2)

where $\varvec{\alpha _i}$ is the ith column of $\alpha $. In order to avoid arbitrarily large values for $\mathbf {D}$ and consequently, arbitrarily small values for $\alpha $, we need an additional constraint on the dictionary atoms to limit their $\ell _2$ norm to be smaller than or equal to one. The complete optimization problem in (2) after adding this constraint is as follows:

$$\begin{aligned} \begin{aligned} L(\mathbf {X},\mathbf {D},\varvec{\alpha })=&\min _{\mathbf {D},\varvec{\alpha }}\ \sum _{i=1}^{n}\begin{pmatrix}\frac{1}{2}\Vert \mathbf {x}_i-\mathbf {D}\varvec{\alpha }_i\Vert _{2 }^{2}+\lambda \Vert \varvec{\alpha }_i\Vert _{1}\end{pmatrix}\!, \\&\text {s.t.} \ \ \ \Vert \mathbf {d}_j\Vert _2^2\le 1 \;\;\;\; \forall j=1,...,k. \end{aligned} \end{aligned}$$

(3)

The original DLSR formulation given in (3) is unsupervised as the category information has not been taken into consideration in the optimization problem. However, in a supervised learning paradigm, where the ultimate goal is the classification of the data, this setting may not lead to an optimal discriminative dictionary nor coefficients. A more recent attempt in the literature was to incorporate the class labels into the learning of the dictionary and/or coefficients (refer to [8] for a review). This modification resulted in a new category of DLSR, namely called supervised dictionary learning and sparse representation (S-DLSR). Improvements (some significant) over unsupervised DLSR have been reported in the literature for the classification tasks [3, 9–11].

Although S-DLSR benefits from the side information available from category information to learn a more discriminative dictionary, unfortunately, gathering labeled data is often very expensive and time consuming. Most data available is unlabeled and the sample size of the labeled data is often very small, which has a hindering effect on the discriminative quality of the learned dictionary. Semi-supervised learning (SSL) methods can potentially boost the performance of a machine learning system by utilizing both supervisory information and global data distribution. Using a large amount of unlabeled data, which is usually easily accessible, can improve revealing the manifold global distribution [12], and compensate for the small sample size of labeled data [13].

In this paper, a semi-supervised dictionary learning and sparse representation (SS-DLSR) based on Hilbert-Schmidt independence criterion (HSIC) is proposed. The proposed SS-DLSR approach finds a dictionary based on two criteria: first, the maximization of the dependency between the labeled data and the corresponding category information, and second, minimization of the distances between the unlabeled data and their nearest labeled data. The first criterion guarantees finding the space of maximum discrimination based on the information in the category information and labeled data, whereas the second criterion, guarantees that the unlabeled data remain as close as possible to their nearest-neighbor labeled data. Therefore, the learned dictionary (the projection directions computed by using the aforementioned criteria) benefits from the discriminative power of the category information in the labeled data and proximity information of the unlabeled data as an indication of global manifold distribution. the sparse coefficients are subsequently computed in the space of learned dictionary using the formulation given in (3).

2 Semi-supervised Dictionary Learning and Sparse Representation

2.1 Problem Statement

Let $\mathbf {X}=[\mathbf {x}_1,...,\mathbf {x}_n]\in \mathbb {R}^{d\times n}$ be n data samples with the dimensionality of d. There are $n_l$ labeled and $n_u$ unlabeled data samples, where $n=n_l+n_u$. Let $\{(\mathbf {x}_1,\mathbf {y}_1),...,(\mathbf {x}_{n_l},\mathbf {y}_{n_l})\}$ be the pair of labeled data ($\mathbf {X}_l\in \mathbb {R}^{d\times n_l}$) and the corresponding labels ($\mathbf {Y}\in \{0,1\}^{c\times n_l}$, where c is the number of classes), and $\mathbf {X}_u=[\mathbf {x}_{n_l+1},...,\mathbf {x}_n]\in \mathbb {R}^{d\times n_u}$ be the unlabeled data samples. We would like to find a dictionary, which can be considered as a transformation, based on two criteria (1) maximizing the dependency between the labeled data $\mathbf {X}_l$ and the labels $\mathbf {Y}$, and (2) minimizing the distance between each unlabeled data with the nearest label data. The first criterion is to guarantee finding a discriminative dictionary using the labeled data, and the second criterion is to ensure the unlabeled data samples are mapped close to their neighboring labeled data and therefore, the global connectivity of data is maintained in the space of the learned dictionary.

The first criterion is implemented using the Hilbert-Schmidt independence criterion (HSIC), which will be explained in the next subsection followed by the design of the dictionary and sparse coefficients for the proposed semi-supervised method.

2.2 Hilbert-Schmidt Independence Criterion

HSIC is a kernel-based measure of independence between two random variables $\mathcal {X}$ and $\mathcal {Y}$ proposed first by Gretton et al. [14, 15]. It is computed based on the Hilbert-Schmidt norm of cross covariance operators in reproducing kernel Hilbert spaces (RKHSs) [15].

Our focus here is the empirical HSIC, which is computed using a finite set of data samples. To this end, considering $\mathcal {Z}:=\{(\mathbf {x}_1,\mathbf {y}_1,),...,(\mathbf {x}_{n_l},\mathbf {y}_{n_l})\}\subseteq \mathcal {X}\times \mathcal {Y}$ as $n_l$ independent observations drawn from joint probability distribution $P_{\mathcal {X}\times \mathcal {Y}}$, the empirical HSIC is computed using

$$\begin{aligned} \mathrm{HSIC}(\mathcal {Z})=\frac{1}{(n_l-1)^2}\mathrm{tr}(\mathbf {KHBH}), \end{aligned}$$

(4)

where $\mathrm{tr}$ is the trace operator, and $\mathbf {K}$, $\mathbf {B}$, $\mathbf {H}\in \mathbb {R}^{n_l\times n_l}$. $\mathbf {K}$ and $\mathbf {B}$ are kernels on the data and labels, respectively. $\mathbf {H}=\mathbf {I}-n_l^{-1}\mathbf {ee}^{\top }$, where $\mathbf {I}$ is an identity matrix, $\mathbf {e}$ is a vector of all ones and therefore, $\mathbf {H}$ is a centering matrix. Since the empirical HSIC given in (4) is a measure of dependency between $\mathcal {X}$ and $\mathcal {Y}$, in order to maximize this dependency, $\mathrm{tr}(\mathbf {KHBH})$ should be maximized.

2.3 Dictionary Learning

As mentioned in the problem statement (Subsect. 2.1), the dictionary is learned based on two criteria. In order to maximize the dependency between the labeled data and the corresponding labels, as shown in [11], the following optimization problem has to be solved:

(5)

where $\mathbf {H}$ is the centering matrix, $\mathbf {B}$ is a kernel on labels, and $\mathbf {D}$ is the dictionary to be learned. By a few manipulations on the objective function given in (5), it can be demonstrated that it is another form of empirical HSIC:

$$\begin{aligned} \underset{\mathbf {D}}{\text {max}}\;&\text {tr}(\mathbf {D}^{\top }\mathbf {X}_l\mathbf {HBH}\mathbf {X}_l^{\top }\mathbf {D}) \nonumber \\&=\underset{\mathbf {D}}{\text {max}}\; \text {tr}(\mathbf {X}_l^{\top }\mathbf {D}\mathbf {D}^{\top }\mathbf {X}_l\mathbf {HBH}) \nonumber \\&=\underset{\mathbf {D}}{\text {max}}\; \text {tr}\bigg (\bigg [(\mathbf {D}^{\top }\mathbf {X}_l)^{\top }\mathbf {D}^{\top }\mathbf {X}_l\bigg ]\mathbf {HBH}\bigg ) \nonumber \\&=\underset{\mathbf {D}}{\text {max}}\; \text {tr}(\mathbf {KHBH}), \end{aligned}$$

(6)

where $\mathbf {K}=(\mathbf {D}^{\top }\mathbf {X}_l)^{\top }\mathbf {D}^{\top }\mathbf {X}_l$ is a linear kernel on the projected labeled data into the space of learned dictionary $\mathbf {D}$. As can be clearly observed from the last statement in (6), the objective function in (5) has the form of the empirical HSIC and thus, the dictionary $\mathbf {D}$ projects the labeled data to the space of maximum dependency with the corresponding labels.

The second criterion is to minimize the distances between the unlabeled data and the nearest neighbor labeled data in the space of the dictionary learned. In other words, considering $\mathbf {z}=\mathbf {D}^{\top }\mathbf {x}$ as a projected data sample to the space of the learned dictionary, we would like to:

$$\begin{aligned} \underset{\mathbf {D}}{\text {min}} \frac{1}{2}\sum _{i=1}^{n_l}\sum _{j=1}^{n_u}w_{i,j}(\mathbf {z}_i - \mathbf {z}_j)^2, \end{aligned}$$

(7)

where $w_{i,j}$ are the weights that define the proximity (neighborhood) of the unlabeled to labeled data. One way to define it is based one nearest neighbor, i.e., $w_{i,j}=1$ if the jth unlabeled data is the nearest to the ith labeled data and $w_{i,j}=0$ otherwise.

It can be shown [16] that the objective function given in (7) can be written in matrix form as follows:

$$\begin{aligned} \underset{\mathbf {D}}{\text {min}} \frac{1}{2}\sum _{i=1}^{n_l}\sum _{j=1}^{n_u}w_{i,j}(\mathbf {z}_i - \mathbf {z}_j)^2=\underset{\mathbf {D}}{\text {min}}\;\;\text {tr}(\mathbf {ZLZ}^{\top })=\underset{\mathbf {D}}{\text {min}}\;\;\text {tr}(\mathbf {D}^{\top }\mathbf {XLX}^{\top }\mathbf {D}), \end{aligned}$$

(8)

where $\mathbf {L}$ is the Laplacian of the graph made by the projected data points $\mathbf {Z}=[\mathbf {z}_1,...,\mathbf {z}_n]$ in the space of learned dictionary, and is defined as $\mathbf {L}=\mathbf {Q}-\mathbf {W}$, where $\mathbf {W}(i,j)=w_{i,j}$ and $\mathbf {Q}$ is a diagonal matrix, where $q_{i,i}=\sum _j w_{i,j}$.

Combining the two objective functions given in (5) and 8, the overall optimization problem for the computation of the dictionary can be written as follows:

(9)

where $0\le \eta \le 1$ is a constant that determines the relative contributions of the two terms in the objective function. According to the Rayleigh-Ritz theorem [17], the solution for the optimization problem given in (9) is the corresponding eigenvectors of the largest eigenvalues of $\mathbf {\Phi }=(1-\eta )\mathbf {X}_l\mathbf {HBH}\mathbf {X}_l^{\top } - \eta \;\mathbf {XLX}^{\top }$.

2.4 Sparse Coefficients

After the computation of the dictionary using (9), the sparse coefficients can be computed using the formulation provided in (2), which is called lasso if the dictionary is known [18]. Although (2) can be solved using fast iterative methods, since the dictionary is orthogonal, as shown in [19, 20], the sparse coefficients can be computed using soft-thresholding with the soft-thresholding operator $S_{\lambda }(.)$:

$$\begin{aligned} \alpha _{ij}=S_{\lambda }\begin{pmatrix}\left[ \mathbf {D}^{\top }\mathbf {x}_i\right] _j\end{pmatrix}, \end{aligned}$$

(10)

where $\alpha _{i,j}$ is the (i, j)th element of $\varvec{\alpha }$ and $S_{\lambda }(t)$ is defined as follows:

$$\begin{aligned} S_{\lambda }(t)=\left\{ \begin{matrix} t-0.5\lambda \;\;\; \text {if}\;\;t>0.5\lambda \\ \;\;\,t+0.5\lambda \;\;\; \text {if}\;\;t<-0.5\lambda \\ 0 \;\;\;\;\;\;\;\;\;\;\;\text {otherwise} \end{matrix}\right. \end{aligned}$$

(11)

3 Experiments and Results

To validate the proposed semi-supervised dictionary learning and sparse representation method (SS-DLSR), two benchmark datasets publicly available from UCI machine learning repository^{Footnote 1} were used. The two datasets were the Sonar ($n=208$, $d=60$, and $c=2$) and the Parkinsons ($n=297$, $d=13$, and $c=2$) datasets.

Table 1. The classification rate (%) of the proposed SS-DLSR algorithm on two benchmark datasets. The results were compared for various settings in the proposed algorithm including different relative contributions of the labeled and unlabeled data on dictionary learning (varying $\eta $), and different ratios of labeled to unlabeled data (varying $n_l/(n_l+n_u)$).

Full size table

The performance of the proposed SS-DLSR was evaluated for a fixed dictionary size ($k=8$) and varying relative ratio of the labeled to unlabeled data $n_l/(n_l+n_u)$. To this end, 70 % of the data was randomly selected as the training set and 30 % as the test set. The training data was further divided to different ratios of labeled and unlabeled data as shown in Table 1 ($n_l/(n_l+n_u)=\{0.05, 0.1, 0.3, 0.5\}$). One nearest neighbor was used as the proximity measure between the unlabeled and labeled data to determine the matrix of weights in (7). The value of $\eta $ for the computation of the dictionary in (9) was set to three different values, i.e., 0 (ignoring unlabeled data), 1 (ignoring labeled data), and $\eta ^*$ (the most discriminative dictionary corresponding to best classification performance). The sparse coefficients were computed for the labeled portion of the training data as well as for the test data. A support vector machine (SVM) with radial basis function (RBF) kernel was used for the classification of the data by submission of the sparse coefficients to the classifier as suggested in [21]. The SVM was tuned using 5-fold cross validation on the labeled portion of training data to find the optimal kernel width ($\gamma ^*$) and optimal trade-off parameter ($C^*$). Subsequently, the SVM was trained on whole labeled data in the training set using the optimal $\gamma ^*$ and $C^*$ values and tested on the test set. The experiments were repeated 10 times for different random split of the data to training and test sets. The performance is reported in terms of classifier accuracy (averaged over 10 runs) in Table 1.

From the results provided in Table 1, there are several immediate observations. First, by adding unlabeled data to the learning of the dictionary (the columns in Table 1 corresponding with $\eta ^*$), the classification performance is increased, which means that the learned dictionary is more discriminative. This reveals that the proposed algorithm can effectively incorporate the information from both labeled and unlabeled data into the learning of the dictionary. Second, by decreasing the rate of labeled to unlabeled data ($n_l/(n_l+n_u)$), the gain in performance from adding unlabeled data is increased. In realistic settings, there usually exist many unlabeled data and only a small number of labeled data. The proposed SS-DLSR algorithm benefits more from the information provided by the unlabeled data in these situations as can be observed by comparing the column corresponding with $\eta ^*$ (including both the labeled and unlabeled data into the dictionary learning) and the column with $\eta =0$ (including only labeled data into the dictionary learning).

4 Discussion and Conclusion

In this paper, a novel semi-supervised dictionary learning and sparse representation method was proposed. A discriminative dictionary was learned in the space of maximum dependency between the labeled data and class labels, where the connectivity of the data was maintained by minimizing the distances between the unlabeled data and the corresponding nearest labeled data. As can be seen from (9), the dictionary has a closed form solution. Also, by using soft-thresholding, the sparse coefficients can be computed using a closed-form solution as given in (10). The proposed SS-DLSR approach is, therefore, very fast. The effectiveness of the proposed method in learning from both supervisory information (based on labeled data) and graph connectivity information (based on unlabeled data) was demonstrated by experiments on two benchmark datasets from UCI machine learning repository.

Notes

1.
http://archive.ics.uci.edu/ml/.

References

Zhong, C., Sun, Z., Tan, T.: Robust 3D face recognition using learned visual codebook. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–6 (2007)
Google Scholar
Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)
Article Google Scholar
Yang, M., Zhang, L., Feng, X., Zhang, D.: Fisher discrimination dictionary learning for sparse representation. In: 13th IEEE International Conference on Computer Vision (ICCV), pp. 543–550 (2011)
Google Scholar
Mairal, J., Elad, M., Sapiro, G.: Sparse representation for color image restoration. IEEE Trans. Image Process. 17(1), 53–69 (2008)
Article MathSciNet MATH Google Scholar
Gangeh, M.J., Ghodsi, A., Kamel, M.S.: Dictionary learning in texture classification. In: Kamel, M., Campilho, A. (eds.) ICIAR 2011, Part I. LNCS, vol. 6753, pp. 335–343. Springer, Heidelberg (2011)
Chapter Google Scholar
Gangeh, M.J., Fewzee, P., Ghodsi, A., Kamel, M.S., Karray, F.: Multiview supervised dictionary learning in speech emotion recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(6), 1056–1068 (2014)
Article Google Scholar
Elad, M.: Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing. Springer, New York (2010)
Book MATH Google Scholar
Gangeh, M.J., Farahat, A.K., Ghodsi, A., Kamel, M.S.: Supervised dictionary learning and sparse representation-a review. CoRR abs/1502.05928 (2015)
Google Scholar
Mairal, J., Bach, F., Ponce, J., Sapiro, G., Zisserman, A.: Supervised dictionary learning. In: Advances in Neural Information Processing Systems (NIPS), pp. 1033–1040 (2008)
Google Scholar
Wright, J., Ma, Y., Mairal, J., Sapiro, G., Huang, T.S., Yan, S.: Sparse representation for computer vision and pattern recognition. Proc. IEEE 98(6), 1031–1044 (2010)
Article Google Scholar
Gangeh, M.J., Ghodsi, A., Kamel, M.S.: Kernelized supervised dictionary learning. IEEE Trans. Sig. Process. 61(19), 4753–4767 (2013)
Article MathSciNet Google Scholar
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems (NIPS), pp. 321–328 (2004)
Google Scholar
Chapelle, O., Schölkopf, B.: Semi-supervised Learning. MIT Press, Cambridge (2006)
Book Google Scholar
Gretton, A., Herbrich, R., Smola, A.J., Bousquet, O., Schölkopf, B.: Kernel methods for measuring independence. J. Mach. Learn. Res. 6, 2075–2129 (2005)
MathSciNet MATH Google Scholar
Gretton, A., Bousquet, O., Smola, A.J., Schölkopf, B.: Measuring statistical dependence with hilbert-schmidt norms. In: Jain, S., Simon, H.U., Tomita, E. (eds.) ALT 2005. LNCS (LNAI), vol. 3734, pp. 63–77. Springer, Heidelberg (2005)
Chapter Google Scholar
von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Article MathSciNet Google Scholar
Lütkepohl, H.: Handbook of Matrices. Wiley, New York (1996)
MATH Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
Donoho, D.L., Johnstone, I.M.: Adapting to unknown smoothness via wavelet shrinkage. J. Am. Stat. Assoc. 90(432), 1200–1224 (1995)
Article MathSciNet MATH Google Scholar
Friedman, J., Hastie, T., Hofling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007)
Article MathSciNet MATH Google Scholar
Raina, R., Battle, A., Lee, H., Packer, B., Ng, A.Y.: Self-taught learning: transfer learning from unlabeled data. In: Proceedings of the 24th International Conference on Machine Learning (ICML), pp. 759–766 (2007)
Google Scholar

Download references

Acknowledgment

The first author gratefully acknowledges the funding from the Natural Sciences and Engineering Research Council (NSERC) of Canada under Postdoctoral Fellowship (PDF-454649-2014).

Author information

Authors and Affiliations

Department of Medical Biophysics, University of Toronto, Toronto, Canada
Mehrdad J. Gangeh
Departments of Radiation Oncology, and Imaging Research - Physical Sciences, Sunnybrook Health Sciences Center, Toronto, Canada
Mehrdad J. Gangeh
Department of Electrical and Computer Engineering, Center for Pattern Analysis and Machine Intelligence, University of Waterloo, Waterloo, Canada
Safaa M. A. Bedawi & Fakhri Karray
Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Canada
Ali Ghodsi

Authors

Mehrdad J. Gangeh
View author publications
You can also search for this author in PubMed Google Scholar
Safaa M. A. Bedawi
View author publications
You can also search for this author in PubMed Google Scholar
Ali Ghodsi
View author publications
You can also search for this author in PubMed Google Scholar
Fakhri Karray
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mehrdad J. Gangeh .

Editor information

Editors and Affiliations

University of Porto, Porto, Portugal
Aurélio Campilho
Department of Electrical, University of Waterloo, Waterloo, Ontario, Canada
Fakhri Karray

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gangeh, M.J., Bedawi, S.M.A., Ghodsi, A., Karray, F. (2016). Semi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion. In: Campilho, A., Karray, F. (eds) Image Analysis and Recognition. ICIAR 2016. Lecture Notes in Computer Science(), vol 9730. Springer, Cham. https://doi.org/10.1007/978-3-319-41501-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-41501-7_2
Published: 01 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41500-0
Online ISBN: 978-3-319-41501-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Semi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion

Abstract

Similar content being viewed by others

Laplacian Welsch Regularization for Robust Semi-supervised Dictionary Learning

Discriminative Sparse Coding by Nuclear Norm-Driven Semi-Supervised Dictionary Learning

PSSDL: Probabilistic Semi-supervised Dictionary Learning

Keywords

1 Introduction

2 Semi-supervised Dictionary Learning and Sparse Representation

2.1 Problem Statement

2.2 Hilbert-Schmidt Independence Criterion

2.3 Dictionary Learning

2.4 Sparse Coefficients

3 Experiments and Results

4 Discussion and Conclusion

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Semi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion

Abstract

Similar content being viewed by others

Laplacian Welsch Regularization for Robust Semi-supervised Dictionary Learning

Discriminative Sparse Coding by Nuclear Norm-Driven Semi-Supervised Dictionary Learning

PSSDL: Probabilistic Semi-supervised Dictionary Learning

Keywords

1 Introduction

2 Semi-supervised Dictionary Learning and Sparse Representation

2.1 Problem Statement

2.2 Hilbert-Schmidt Independence Criterion

2.3 Dictionary Learning

2.4 Sparse Coefficients

3 Experiments and Results

4 Discussion and Conclusion

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation