Abstract
Fisher Discriminant Analysis (FDA) is a subspace learning method which minimizes and maximizes the intra- and inter-class scatters of data, respectively. Although, in FDA, all the pairs of classes are treated the same way, some classes are closer than the others. Weighted FDA assigns weights to the pairs of classes to address this shortcoming of FDA. In this paper, we propose a cosine-weighted FDA as well as an automatically weighted FDA in which weights are found automatically. We also propose a weighted FDA in the feature space to establish a weighted kernel FDA for both existing and newly proposed weights. Our experiments on the ORL face recognition dataset show the effectiveness of the proposed weighting schemes.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
- Fisher Discriminant Analysis (FDA)
- Kernel FDA
- Cosine-weighted FDA
- Automatically weighted FDA
- Manually weighted FDA
1 Introduction
Fisher Discriminant Analysis (FDA) [1], first proposed in [2], is a powerful subspace learning method which tries to minimize the intra-class scatter and maximize the inter-class scatter of data for better separation of classes. FDA treats all pairs of the classes the same way; however, some classes might be much further from one another compared to other classes. In other words, the distances of classes are different. Treating closer classes need more attention because classifiers may more easily confuse them whereas classes far from each other are generally easier to separate. The same problem exists in Kernel FDA (KFDA) [3] and in most of subspace learning methods that are based on generalized eigenvalue problem such as FDA and KFDA [4]; hence, a weighting procedure might be more appropriate.
In this paper, we propose several weighting procedures for FDA and KFDA. The contributions of this paper are three-fold: (1) proposing Cosine-Weighted FDA (CW-FDA) as a new modification of FDA, (2) proposing Automatically Weighted FDA (AW-FDA) as a new version of FDA in which the weights are set automatically, and (3) proposing Weighted KFDA (W-KFDA) to have weighting procedures in the feature space, where both the existing and the newly proposed weighting methods can be used in the feature space.
The paper is organized as follows: In Sect. 2, we briefly review the theory of FDA and KFDA. In Sect. 3, we formulate the weighted FDA, review the existing weighting methods, and then propose CW-FDA and AW-FDA. Section 4 proposes weighted KFDA in the feature space. In addition to using the existing methods for weighted KFDA, two versions of CW-KFDA and also AW-KFDA are proposed. Section 5 reports the experiments. Finally, Sect. 6 concludes the paper.
2 Fisher and Kernel Discriminant Analysis
2.1 Fisher Discriminant Analysis
Let \(\{\varvec{x}_i^{(r)} \in \mathbb {R}^d\}_{i=1}^{n_r}\) denote the samples of the r-th class where \(n_r\) is the class’s sample size. Suppose \(\varvec{\mu }^{(r)} \in \mathbb {R}^d\), c, n, and \(\varvec{U} \in \mathbb {R}^{d \times d}\) denote the mean of r-th class, the number of classes, the total sample size, and the projection matrix in FDA, respectively. Although some methods solve FDA using least squares problem [5, 6], the regular FDA [2] maximizes the Fisher criterion [7]:
where \(\mathbf{tr} (\cdot )\) is the trace of matrix. The Fisher criterion is a generalized Rayleigh-Ritz Quotient [8]. We may recast the problem to [9]:
where the \(\varvec{S}_W \in \mathbb {R}^{d \times d}\) and \(\varvec{S}_B \in \mathbb {R}^{d \times d}\) are the intra- (within) and inter-class (between) scatters, respectively [9]:
where \(\mathbb {R}^{d \times n_r} \ni \breve{\varvec{X}}_r := [\varvec{x}_1^{(r)} - \varvec{\mu }^{(r)}, \dots , \varvec{x}_{n_r}^{(r)} - \varvec{\mu }^{(r)}]\), \(\mathbb {R}^{d \times c} \ni \varvec{M}_r := [\varvec{\mu }^{(r)} - \varvec{\mu }^{(1)}, \dots , \varvec{\mu }^{(r)} - \varvec{\mu }^{(c)}]\), and \(\mathbb {R}^{c \times c} \ni \varvec{N} := \mathbf{diag} ([n_1, \dots , n_c]^\top )\). The mean of the r-th class is \(\mathbb {R}^{d} \ni \varvec{\mu }^{(r)} := (1/n_r) \sum _{i=1}^{n_r} \varvec{x}_i^{(r)}\). The Lagrange relaxation [10] of the optimization problem is: \(\mathcal {L} = \mathbf{tr} (\varvec{U}^\top \varvec{S}_B\, \varvec{U}) - \mathbf{tr} \big (\varvec{\varLambda }^\top (\varvec{U}^\top \varvec{S}_W\, \varvec{U} - \varvec{I})\big )\), where \(\varvec{\varLambda }\) is a diagonal matrix which includes the Lagrange multipliers. Setting the derivative of Lagrangian to zero gives:
which is the generalized eigenvalue problem \((\varvec{S}_B, \varvec{S}_W)\) where the columns of \(\varvec{U}\) and the diagonal of \(\varvec{\varLambda }\) are the eigenvectors and eigenvalues, respectively [11]. The p leading columns of \(\varvec{U}\) (so to have \(\varvec{U} \in \mathbb {R}^{d \times p}\)) are the FDA projection directions where p is the dimensionality of the subspace. Note that \(p \le \min (d, n-1, c-1)\) because of the ranks of the inter- and intra-class scatter matrices [9].
2.2 Kernel Fisher Discriminant Analysis
Let the scalar and matrix kernels be denoted by \(k(\varvec{x}_i, \varvec{x}_j) := \varvec{\phi }(\varvec{x}_i)^\top \varvec{\phi }(\varvec{x}_j)\) and \(\varvec{K}(\varvec{X}_1, \varvec{X}_2) := \varvec{\varPhi }(\varvec{X}_1)^\top \varvec{\varPhi }(\varvec{X}_2)\), respectively, where \(\varvec{\phi }(.)\) and \(\varvec{\varPhi }(.)\) are the pulling functions. According to the representation theory [12], any solution must lie in the span of all the training vectors, hence, \(\varvec{\varPhi }(\varvec{U}) = \varvec{\varPhi }(\varvec{X})\, \varvec{Y}\) where \(\varvec{Y} \in \mathbb {R}^{n \times d}\) contains the coefficients. The optimization of kernel FDA is [3, 9]:
where \(\varvec{\varDelta }_W \in \mathbb {R}^{n \times n}\) and \(\varvec{\varDelta }_B \in \mathbb {R}^{n \times n}\) are the intra- and inter-class scatters in the feature space, respectively [3, 9]:
where \(\mathbb {R}^{n_r \times n_r} \ni \varvec{H}_r := \varvec{I} - (1/n_r) \varvec{1}\varvec{1}^\top \) is the centering matrix, the (i, j)-th entry of \(\varvec{K}_r \in \mathbb {R}^{n \times n_r}\) is \(\varvec{K}_r(i,j) := k(\varvec{x}_i, \varvec{x}_j^{(r)})\), the i-th entry of \(\varvec{\xi }^{(r)} \in \mathbb {R}^n\) is \(\varvec{\xi }^{(r)}(i) := (1/n_r) \sum _{j=1}^{n_r} k(\varvec{x}_i, \varvec{x}_j^{(r)})\), and \(\mathbb {R}^{n \times c} \ni \varvec{\varXi }_r := [\varvec{\xi }^{(r)} - \varvec{\xi }^{(1)}, \dots , \varvec{\xi }^{(r)} - \varvec{\xi }^{(c)}]\).
The p leading columns of \(\varvec{Y}\) (so to have \(\varvec{Y} \in \mathbb {R}^{n \times p}\)) are the KFDA projection directions which span the subspace. Note that \(p \le \min (n, c-1)\) because of the ranks of the inter- and intra-class scatter matrices in the feature space [9].
3 Weighted Fisher Discriminant Analysis
The optimization of Weighted FDA (W-FDA) is as follows:
where the weighted inter-class scatter, \(\widehat{\varvec{S}}_B \in \mathbb {R}^{d \times d}\), is defined as:
where \(\mathbb {R} \ni \alpha _{r\ell } \ge 0\) is the weight for the pair of the r-th and \(\ell \)-th classes, \(\mathbb {R}^{c \times c} \ni \varvec{A}_r := \mathbf{diag} ([\alpha _{r1}, \dots , \alpha _{rc}])\). In FDA, we have \(\alpha _{r\ell }=1,~ \forall r, \ell \in \{1, \dots , c\}\). However, it is better for the weights to be decreasing with the distances of classes to concentrate more on the nearby classes. We denote the distances of the r-th and \(\ell \)-th classes by \(d_{r\ell } := ||\varvec{\mu }^{(r)} - \varvec{\mu }^{(\ell )}||_2\). The solution to Eq. (9) is the generalized eigenvalue problem \((\widehat{\varvec{S}}_B, \varvec{S}_W)\) and the p leading columns of \(\varvec{U}\) span the subspace.
3.1 Existing Manual Methods
In the following, we review some of the existing weights for W-FDA.
Approximate Pairwise Accuracy Criterion: The Approximate Pairwise Accuracy Criterion (APAC) method [13] has the weight function:
where \(\text {erf}(x)\) is the error function:
This method approximates the Bayes error for class pairs.
Powered Distance Weighting: The powered distance (POW) method [14] uses the following weight function:
where \(m > 0\) is an integer. As \(\alpha _{r\ell }\) is supposed to drop faster than the increase of \(d_{k\ell }\), we should have \(m \ge 3\) (we use \(m=3\) in the experiments).
Confused Distance Maximization: The Confused Distance Maximization (CDM) [15] method uses the confusion probability among the classes as the weight function:
where \(n_{\ell |r}\) is the number of points of class r classified as class \(\ell \) by a classifier such as quadratic discriminant analysis [15, 16]. One problem of the CDM method is that if the classes are classified perfectly, all weights become zero. Conditioning the performance of a classifier is also another flaw of this method.
k-Nearest Neighbors Weighting: The k-Nearest Neighbor (kNN) method [17] tries to put every class away from its k-nearest neighbor classes by defining the weight function as
The kNN and CDM methods are sparse to make use of the betting on sparsity principle [1, 18]. However, these methods have some shortcomings. For example, if two classes are far from one another in the input space, they are not considered in kNN or CDM, but in the obtained subspace, they may fall close to each other, which is not desirable. Another flaw of kNN method is the assignment of 1 to all kNN pairs, but in the kNN, some pairs might be comparably closer.
3.2 Cosine Weighted Fisher Discriminant Analysis
Literature has shown that cosine similarity works very well with the FDA, especially for face recognition [19, 20]. Moreover, according to the opposition-based learning [21], capturing similarity and dissimilarity of data points can improve the performance of learning. A promising operator for capturing similarity and dissimilarity (opposition) is cosine. Hence, we propose CW-FDA, as a manually weighted method, with cosine to be the weight defined as
to have \(\alpha _{r\ell } \in [0, 1]\). Hence, the r-th weight matrix is \(\varvec{A}_r := \mathbf{diag} (\alpha _{r\ell }, \forall \ell )\), which is used in Eq. (10). Note that as we do not care about \(\alpha _{r,r}\), because inter-class scatter for \(r=\ell \) is zero, we can set \(\alpha _{rr}=0\).
3.3 Automatically Weighted Fisher Discriminant Analysis
In AW-FDA, there are \(c+1\) matrix optimization variables which are \(\varvec{V}\) and \(\varvec{A}_k \in \mathbb {R}^{c \times c}, \forall k \in \{1, \dots , c\}\) because at the same time where we want to maximize the Fisher criterion, the optimal weights are found. Moreover, to use the betting on sparsity principle [1, 18], we can make the weight matrix sparse, so we use “\(\ell _0\)” norm for the weights to be sparse. The optimization problem is as follows
We use alternating optimization [22] to solve this problem:
where \(\tau \) denotes the iteration.
Since we use an iterative solution for the optimization, it is better to normalize the weights in the weighted inter-class scatter; otherwise, the weights gradually explode to maximize the objective function. We use \(\ell _2\) (or Frobenius) norm for normalization for ease of taking derivatives. Hence, for OW-FDA, we slightly modify the weighted inter-class scatter as
where \(\breve{\varvec{A}}_r := \varvec{A}_r / ||\varvec{A}_r||_F^2\) because \(\varvec{A}_k\) is diagonal, and \(||.||_F\) is Frobenius norm.
As discussed before, the solution to Eq. (18) is the generalized eigenvalue problem \((\widehat{\varvec{S}}_B^{(\tau )}, \varvec{S}_W)\). We use a step of gradient descent [23] to solve Eq. (19) followed by satisfying the “\(\ell _0\)” norm constraint [22]. The gradient is calculated as follows. Let \(\mathbb {R} \ni f(\varvec{U}, \varvec{A}_k) := -\mathbf{tr} (\varvec{U}^{\top } \widehat{\varvec{S}}_B\, \varvec{U})\). Using the chain rule, we have:
where we use the Magnus-Neudecker convention in which matrices are vectorized, \(\mathbf{vec} (.)\) vectorizes the matrix, and \(\mathbf{vec} ^{-1}_{c \times c}\) is de-vectorization to \(c \times c\) matrix. We have \(\mathbb {R}^{d \times d} \ni \partial f / \partial \widehat{\varvec{S}}_B = -\varvec{U}\varvec{U}^\top \) whose vectorization has dimensionality \(d^2\). For the second derivative, we have:
where \(\otimes \) denotes the Kronecker product. The third derivative is:
The learning rate of gradient descent is calculated using line search [23].
After the gradient descent step, to satisfy the condition \(||\varvec{A}_r||_0 \le k\), the solution is projected onto the set of this condition. Because \(-f\) should be maximized, this projection is to set the \((c-k)\) smallest diagonal entries of \(\varvec{A}_r\) to zero [22]. In case \(k=c\), the projection of the solution is itself, and all the weights are kept.
After solving the optimization, the p leading columns of \(\varvec{U}\) are the OW-FDA projection directions that span the subspace.
4 Weighted Kernel Fisher Discriminant Analysis
We define the optimization for Weighted Kernel FDA (W-KFDA) as:
where the weighted inter-class scatter in the feature space, \(\widehat{\varvec{\varDelta }}_B \in \mathbb {R}^{n \times n}\), is defined as:
The solution to Eq. (25) is the generalized eigenvalue problem \((\widehat{\varvec{\varDelta }}_B, \varvec{\varDelta }_W)\) and the p leading columns of \(\varvec{Y}\) span the subspace.
4.1 Manually Weighted Methods in the Feature Space
All the existing weighting methods in the literature for W-FDA can be used as weights in W-KFDA to have W-FDA in the feature space. Therefore, Eqs. (11), (13), (14), and (15) can be used as weights in Eq. (26) to have W-KFDA with APAC, POW, CDM, and kNN weights, respectively. To the best of our knowledge, W-KFDA is novel and has not appeared in the literature. Note that there is a weighted KFDA in the literature [24], but that is for data integration, which is for another purpose and has an entirely different approach.
The CW-FDA can be used in the feature space to have CW-KFDA. For this, we propose two versions of CW-KFDA: (I) In the first version, we use Eq. (16) or \(\varvec{A}_r := \mathbf{diag} (\alpha _{r\ell }, \forall \ell )\) in the Eq. (26). (II) In the second version, we notice that cosine is based on inner product so the normalized kernel matrix between the means of classes can be used instead to use the similarity/dissimilarity in the feature space rather than in the input space. Let \(\mathbb {R}^{d \times c} \ni \varvec{M} := [\varvec{\mu }_1, \dots , \varvec{\mu }_c]\). Let \(\widehat{\varvec{K}}_{i,j} := \varvec{K}_{i,j} / \sqrt{\varvec{K}_{i,i} \varvec{K}_{j,j}}\) be the normalized kernel matrix [25] where \(\varvec{K}_{i,j}\) denotes the (i, j)-th element of the kernel matrix \(\mathbb {R}^{c \times c} \ni \varvec{K}(\varvec{M}, \varvec{M}) = \varvec{\varPhi }(\varvec{M})^\top \varvec{\varPhi }(\varvec{M})\). The weights are \([0,1] \ni \alpha _{r\ell } := \widehat{\varvec{K}}_{r,\ell }\) or \(\varvec{A}_r := \mathbf{diag} (\widehat{\varvec{K}}_{r,\ell }, \forall \ell )\). We set \(\alpha _{r,r}=0\).
4.2 Automatically Weighted Kernel Fisher Discriminant Analysis
Similar to before, the optimization in AW-KFDA is:
where \(\widehat{\varvec{\varDelta }}_B := \sum _{r=1}^c n_r\, \varvec{\varXi }_r\, \breve{\varvec{A}}_r\, \varvec{N}\, \varvec{\varXi }_r^\top \). This optimization is solved similar to how Eq. (17) was solved where we have \(\varvec{Y}\in \mathbb {R}^{n \times d}\) rather than \(\varvec{U} \in \mathbb {R}^{d \times d}\). Here, the solution to Eq. (18) is the generalized eigenvalue problem \((\widehat{\varvec{\varDelta }}_B^{(\tau )}, \varvec{\varDelta }_W)\). Let \(f(\varvec{Y}, \varvec{A}_k) := -\mathbf{tr} (\varvec{Y}^{\top } \widehat{\varvec{\varDelta }}_B\, \varvec{Y})\). The Eq. (19) is solved similarly but we use \(\mathbb {R}^{n \times n} \ni \partial f / \partial \widehat{\varvec{\varDelta }}_B = -\varvec{Y}\varvec{Y}^\top \) and
After solving the optimization, the p leading columns of \(\varvec{Y}\) span the OW-KFDA subspace. Recall \(\varvec{\varPhi }(\varvec{U}) = \varvec{\varPhi }(\varvec{X})\, \varvec{Y}\). The projection of some data \(\varvec{X}_t \in \mathbb {R}^{d \times n_t}\) is \(\mathbb {R}^{p \times n_t} \ni \widetilde{\varvec{X}}_t = \varvec{\varPhi }(\varvec{U})^\top \varvec{\varPhi }(\varvec{X}_t) = \varvec{Y}^\top \varvec{\varPhi }(\varvec{X})^\top \varvec{\varPhi }(\varvec{X}_t) = \varvec{Y}^\top \varvec{K}(\varvec{X}, \varvec{X}_t)\).
5 Experiments
5.1 Dataset
For experiments, we used the public ORL face recognition dataset [26] because face recognition has been a challenging task and FDA has numerously been used for face recognition (e.g., see [19, 20, 27]). This dataset includes 40 classes, each having ten different poses of the facial picture of a subject, resulting in 400 total images. For computational reasons, we selected the first 20 classes and resampled the images to \(44 \times 36\) pixels. Please note that massive datasets are not feasible for the KFDA/FDA because of having a generalized eigenvalue problem in it. Some samples of this dataset are shown in Fig. 1. The data were split into training and test sets with \(66\%/33\%\) portions and were standardized to have mean zero and variance one.
5.2 Evaluation of the Embedding Subspaces
For the evaluation of the embedded subspaces, we used the 1-Nearest Neighbor (1NN) classifier because it is useful to evaluate the subspace by the closeness of the projected data samples. The training and out-of-sample (test) accuracy of classifications are reported in Table 1. In the input space, kNN with \(k=1,3\) have the best results but in \(k=c-1\), AW-FDA outperforms it in generalization (test) result. The performances of CW-FDA and AW-FDA with \(k=1,3\) are promising, although not the best. For instance, AW-FDA with \(k=1\) outperforms weighted FDA with APAC, POW, and CDM methods in the training embedding, while has the same performance as kNN. In most cases, AW-FDA with all k values has better performance than the FDA, which shows the effectiveness of the obtained weights compared to equal weights in FDA. Also, the sparse k in AWF-FDA outperforming FDA (with dense weights equal to one) validates the betting on sparsity.
In the feature space, where we used the radial basis kernel, AW-KFDA has the best performance with entirely accurate recognition. Both versions of CW-KFDA outperform regular KFDA and KFDA with CDM, and kNN (with \(k=1, c-1\)) weighting. They also have better generalization than APAC, kNN with all k values. Overall, the results show the effectiveness of the proposed weights in the input and feature spaces. Moreover, the existing weighting methods, which were for the input space, have outstanding performance when used in our proposed weighted KFDA (in feature space). This shows the validness of the proposed weighted KFDA even for the existing weighting methods.
5.3 Comparison of Fisherfaces
Figure 2 depicts the four leading eigenvectors obtained from the different methods, including the FDA itself. These ghost faces, or so-called Fisherfaces [27], capture the critical discriminating facial features to discriminant the classes in subspace. Note that Fisherfaces cannot be shown in kernel FDA as its projection directions are n dimensional. CDM has captured some pixels as features because its all weights have become zero for its explained flaw (see Sect. 3.1 and Fig. 3). The Fisherfaces, in most of the methods including CW-FDA, capture information of facial organs such as hair, forehead, eyes, chin, and mouth.
The features of AW-FDA are more akin to the Haar wavelet features, which are useful for facial feature detection [28].
5.4 Comparison of the Weights
We show the obtained weights in different methods in Fig. 3. The weights of APAC and POW are too small, while the range of weights in the other methods is more reasonable. The weights of CDM have become all zero because the samples were purely classified (recall the flaw of CDM). The weights of kNN method are only zero and one, which is a flaw of this method because, amongst the neighbors, some classes are closer. This issue does not exist in AW-FDA with different k values. Moreover, although not all the obtained weights are visually interpretable, some non-zero weights in AW-FDA or AW-KFDA, with e.g. \(k=1\), show the meaningfulness of the obtained weights (noticing Fig. 1). For example, the non-zero pairs (2, 20), (4, 14), (13, 6), (19, 20), (17, 6) in AW-FDA and the pairs (2, 20), (4, 14), (19, 20), (17, 14) in AW-KFDA make sense visually because of having glasses so their classes are close to one another.
6 Conclusion
In this paper, we discussed that FDA and KFDA have a fundamental flaw, and that is treating all pairs of classes in the same way while some classes are closer to each other and should be processed with more care for a better discrimination. We proposed CW-FDA with cosine weights and also AW-FDA in which the weights are found automatically. We also proposed a weighted KFDA to weight FDA in the feature space. We proposed AW-KFDA and two versions of CW-KFDA as well as utilizing the existing weighting methods for weighted KFDA. The experiments in which we evaluated the embedding subspaces, the Fisherfaces, and the weights, showed the effectiveness of the proposed methods. The proposed weighted FDA methods outperformed regular FDA and many of the existing weighting methods for FDA. For example, AW-FDA with \(k=1\) outperformed weighted FDA with APAC, POW, and CDM methods in the training embedding. In feature space, AW-KFDA obtained perfect discrimination.
References
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer Series in Statistics, vol. 1. Springer, New York (2001). https://doi.org/10.1007/978-0-387-84858-7
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7(2), 179–188 (1936)
Mika, S., Ratsch, G., Weston, J., Scholkopf, B., Mullers, K.R.: Fisher discriminant analysis with kernels. In: Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop, pp. 41–48. IEEE (1999)
Ghojogh, B., Karray, F., Crowley, M.: Roweis discriminant analysis: a generalized subspace learning method. arXiv preprint arXiv:1910.05437 (2019)
Zhang, Z., Dai, G., Xu, C., Jordan, M.I.: Regularized discriminant analysis, ridge regression and beyond. J. Mach. Learn. Res. 11(Aug), 2199–2228 (2010)
Díaz-Vico, D., Dorronsoro, J.R.: Deep least squares Fisher discriminant analysis. IEEE Trans. Neural Netw. Learn. Syst. (2019)
Xu, Y., Lu, G.: Analysis on Fisher discriminant criterion and linear separability of feature space. In: 2006 International Conference on Computational Intelligence and Security, vol. 2, pp. 1671–1676. IEEE (2006)
Parlett, B.N.: The Symmetric Eigenvalue Problem, vol. 20. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (1998)
Ghojogh, B., Karray, F., Crowley, M.: Fisher and kernel Fisher discriminant analysis: tutorial. arXiv preprint arXiv:1906.09436 (2019)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Ghojogh, B., Karray, F., Crowley, M.: Eigenvalue and generalized eigenvalue problems: tutorial. arXiv preprint arXiv:1903.11240 (2019)
Alperin, J.L.: Local Representation Theory: Modular Representations as an Introduction to the Local Representation Theory of Finite Groups, vol. 11. Cambridge University Press, Cambridge (1993)
Loog, M., Duin, R.P., Haeb-Umbach, R.: Multiclass linear dimension reduction by weighted pairwise Fisher criteria. IEEE Trans. Pattern Anal. Mach. Intell. 23(7), 762–766 (2001)
Lotlikar, R., Kothari, R.: Fractional-step dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intell. 22(6), 623–627 (2000)
Zhang, X.Y., Liu, C.L.: Confused distance maximization for large category dimensionality reduction. In: 2012 International Conference on Frontiers in Handwriting Recognition, pp. 213–218. IEEE (2012)
Ghojogh, B., Crowley, M.: Linear and quadratic discriminant analysis: tutorial. arXiv preprint arXiv:1906.02590 (2019)
Zhang, X.Y., Liu, C.L.: Evaluation of weighted Fisher criteria for large category dimensionality reduction in application to Chinese handwriting recognition. Pattern Recogn. 46(9), 2599–2611 (2013)
Hastie, T., Tibshirani, R., Wainwright, M.: Statistical Learning with Sparsity: The Lasso and Generalizations. Chapman and Hall/CRC, London (2015)
Perlibakas, V.: Distance measures for PCA-based face recognition. Pattern Recogn. Lett. 25(6), 711–724 (2004)
Mohammadzade, H., Hatzinakos, D.: Projection into expression subspaces for face recognition from single sample per person. IEEE Trans. Affect. Comput. 4(1), 69–82 (2012)
Tizhoosh, H.R.: Opposition-based learning: a new scheme for machine intelligence. In: International Conference on Computational Intelligence for Modelling, Control and Automation, vol. 1, pp. 695–701. IEEE (2005)
Jain, P., Kar, P.: Non-convex optimization for machine learning. Found. Trends® Mach. Learn. 10(3–4), 142–336 (2017)
Nocedal, J., Wright, S.: Numerical Optimization. Springer, Berlin (2006). https://doi.org/10.1007/978-0-387-40065-5
Hamid, J.S., Greenwood, C.M., Beyene, J.: Weighted kernel Fisher discriminant analysis for integrating heterogeneous data. Comput. Stat. Data Anal. 56(6), 2031–2040 (2012)
Ah-Pine, J.: Normalized kernels as similarity indices. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS (LNAI), vol. 6119, pp. 362–373. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13672-6_36
AT&T Laboratories Cambridge: ORL Face Dataset. http://cam-orl.co.uk/facedatabase.html. Accessed 2019
Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J.: Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 711–720 (1997)
Wang, Y.Q.: An analysis of the Viola-Jones face detection algorithm. Image Process. On Line 4, 128–148 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Ghojogh, B., Sikaroudi, M., Tizhoosh, H.R., Karray, F., Crowley, M. (2020). Weighted Fisher Discriminant Analysis in the Input and Feature Spaces. In: Campilho, A., Karray, F., Wang, Z. (eds) Image Analysis and Recognition. ICIAR 2020. Lecture Notes in Computer Science(), vol 12132. Springer, Cham. https://doi.org/10.1007/978-3-030-50516-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-50516-5_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50515-8
Online ISBN: 978-3-030-50516-5
eBook Packages: Computer ScienceComputer Science (R0)