Lie group manifold analysis: an unsupervised domain adaptation approach for image classification

Yang, Hongwei; He, Hui; Zhang, Weizhe; Bai, Yawen; Li, Tao

doi:10.1007/s10489-021-02564-3

Lie group manifold analysis: an unsupervised domain adaptation approach for image classification

Published: 17 July 2021

Volume 52, pages 4074–4088, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Applied Intelligence Aims and scope Submit manuscript

Lie group manifold analysis: an unsupervised domain adaptation approach for image classification

Download PDF

Hongwei Yang ORCID: orcid.org/0000-0002-8386-0131¹,
Hui He¹,
Weizhe Zhang^1,2,
Yawen Bai¹ &
…
Tao Li¹

655 Accesses
7 Citations
Explore all metrics

Abstract

Domain adaptation aims to minimize the mismatch between the source domain in which models are trained and the target domain to which those models are applied. Most existing works focus on instance reweighting, feature representation, and classifier learning independently, which are ineffective when the domain discrepancy is substantially large. In this study, we propose a new unified hybrid approach that takes advantage of Lie group theory, weighted distribution alignment, and manifold alignment, which are referred to as Lie Group Manifold Analysis (LGMA). LGMA mainly finds a one-parameter sub-group decided by the Lie algebra elements of the intrinsic mean of all samples, and this one-parameter sub-group is a geodesic on the original Lie group. Moreover, the Lie group samples are projected onto the geodesics to maximize the separability of the projected samples for realizing discrimination in the nonlinear Lie group manifold space. As far as we know, LGMA is the first attempt to perform Lie algebra transformation to project the original features in the Lie group space onto Lie algebra manifold space for domain adaptation. Comprehensive experiments validate that our approach considerably outperforms competitive methods on real-world datasets.

Distribution matching and structure preservation for domain adaptation

Article Open access 13 October 2022

Joint Matrix Factorization and Structure Preserving for Domain Adaptation

Subdomain adaptation via correlation alignment with entropy minimization for unsupervised domain adaptation

Article 28 February 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The fields of machine learning [1] and pattern recognition have been widely and successfully applied to many practical applications, in which patterns can be extracted from training data to predict future results [2, 3]. Traditional machine learning methodologies assume that the training and test data come from the same domain, such that the input feature space and data distribution are the same. The performance of the predictive classifier can be degraded when data distribution between the training and the test data differs. In some scenarios, obtaining training data that matches the feature space and predicted data distribution of the test data can be exhausting and costly. Therefore, adaptive classifiers need to be created for target domains trained from related domains. This objective is the motivation of transfer learning.

Transfer learning is used to solve the problem in one domain (i.e., target domain) by using the information from a related domain (i.e., source domain). Domain adaptation is a subtopic of transfer learning, which constructs knowledge transfer from the labeled source domain to the unlabeled target domain by learning domain-invariant and label-discriminative knowledge representations that manifest similarities between domains under significant differences. To date, domain adaptation has been successfully applied in various fields, such as text sentiment classification [4, 5], image classification [6,7,8], human activity classification [9], and multi-language text classification [10]. Domain divergence poses a major obstacle for adapting predictive models across domains.

The main problem of domain adaptation is the reduction of distribution divergence between domains. To this end, existing approaches can be categorized into four main groups [2, 3]: (a) instance-based adaptation, which reweights samples in the source domain or in both domains to reduce domain discrepancy [11, 12], (b) feature representation-based adaptation, which learns feature representations to minimize domain shift or learning task error or both [13, 14], (c) classifier-based adaptation, which aims to learn a new model that minimizes the generalization error in the target domain via training data from both domains [15, 16], and (d) hybrid knowledge-based adaptation, which transfers more than one kind of knowledge, such as joint instance and feature representation-based adaptation [17,18,19,20], joint instance and classifier-based adaptation [21, 22], or joint feature representation and classifer-based adaptation [23,24,25].

Among the abovementioned classical approaches, the hybrid methods perform better than the single methods in reducing the cross-domain discrepancy. Most existing hybrid methods follow a two-step procedure: first, either instance reweighting or feature representation is performed independently, finally, the cross-domain classifier is trained separately, but these methods do not perform well in practical applications, and many factors cannot to be considered. For example, some methods are significantly influenced by feature representations or irrelevant instances, some ignore the importance of evaluating data distributions, and some fail to exploit hidden knowledge structures in data labels of the source and target domains. Therefore, a new hybrid method for robust unsupervised domain adaptation needs to be developed. Knowledge that can be successfully transferred across domains should be (1) invariant to feature representations and unbiased to irrelevant instances, (2) quantitatively estimated in terms of the importance of distributions, and (3) able to exploit the potential manifold structural features behind the data.

As far as we know, no research has optimized all the three challenges together in a unified learning machine for unsupervised domain adaptation. In this paper, we complete this challenge and propose a new Lie Group Manifold Analysis (LGMA) method based on FLDA [26], which learns a domain-invariant and label-discriminative classifier in Lie algebra manifold space by extracting invariant representations, estimating unbiased instance weights, performing evaluated distribution alignment and graph Laplacian regularization that jointly minimize the cross-domain distribution discrepancy. To the best of our knowledge, LGMA is the first attempt to minimize the cross-domain discrepancy in Lie algebra manifold space for domain adaptation. Extensive experiments on five real-world benchmark datasets validate that LGMA can outperform competitive state-of-the-art methods.

The rest of the paper is organized as follows. Section 2 introduces related works of domain adaptation. Section 3 presents the LGMA algorithm based on Lie algebra transformation. Section 4 provides experiments to illustrate the effectiveness and efficiency of the proposed method. Section 5 draws the conclusions of this paper.

2 Related work

According to a recent survey [2], existing domain adaptation problems can be roughly divided into four categories according to research methods: instance, feature representation, classifier, and hybrid knowledge-based adaptations.

Instance-based adaptation methods aim to minimize the cross-domain distribution discrepancy by reweighting the source samples according to the related samples in the target domain. Baktashmotlagh et al. [27] introduced a sample selection method and a subspace-based method by using the structure of Riemannian manifold to compare the source and target distributions. Transfer component analysis (TCA) [28] learns transfer components across domains in a reproducing kernel Hilbert space (RKHS) using maximum mean discrepancy (MMD) [29].

Feature representation-based adaptation methods aim to reduce distribution differences by learning a new feature representation. Fernando et al. [30] proposed a subspace alignment (SA) algorithm by learning a mapping function that aligns the source subspace with the target one. Geodesic flow kernel (GFK) [31] extends the concept of sampling points in manifold [32], and a method for learning the GFK between domains is proposed. Generalized unsupervised manifold alignment (GUMA) [33] is proposed as a method to build the connections between domains without any known correspondences by using manifold alignment. Low-rank transfer subspace learning (LTSL) [34] is proposed as a novel framework to solve transfer learning problem through subspace learning and low-rank representation constraints. Zhai et al. [35] proposed a novel manifold alignment method by learning the underlying common manifold with supervision from the corresponding data pairs of different observation sets.

Classifier-based adaptation methods aim to learn a new domain-invariant classifier that minimizes the generalization error in the target domain via training data from both domains. The works of distribution matching machine (DMM) [36] and adaptation regularization transfer learning (ARTL) [6] aim to learn a unified domain-invariant classifier based on structural risk minimization (SRM) [37].

Hybrid knowledge-based adaptation methods aim to learn domain-invariant knowledge by jointly utilizing multiple kinds of adaptations. Locality preserving joint transfer (LPJT) [19], domain invariant and class discriminative feature learning (DICD) [17], and transfer independently together (TIT) [20] jointly leverage instance-based and feature representation-based adaptations to learn domain-invariant and label-discriminative vector representations. Qin et al. [21] proposed a novel generatively inferential co-training (GICT) framework based on instance-based and classifier-based adaptations. In [25], three unsupervised transfer learning methods, i.e., discriminative subspace learning (DSL), joint geometrical and statistical distribution adaptation (GSDA), and joint subspace and distribution adaptation (DSL-GSDA) are proposed to transfer the common domain-invariant knowledge from the source domain to the target domain by jointly adapting feature representation and classifier.

Formally, the single adaptation methods explore instance reweighting, feature representation, or classifier learning independently, which are ineffective when the domain difference is substantially large. While hybrid knowledge-based adaptation methods perform better than single adaptation methods when the domain differences are large or some outlier source instances are unrelated to the target domain or the two conditions hold. Almost all the existing adaptation methods on image classification tasks proceed by linearizing the images, which makes an implicit Euclidean space assumption [38, 39]. However, when the domain divergence is extremely large, the classification performance of the adaptation method based on the assumption of Euclidean space will be degraded significantly. In general, most of the transformations used in image classification tasks have matrix Lie group structure. Thus, we first devise a nonlinear transformation to project samples in the original Lie group manifold space onto a corresponding Lie algebra manifold space, where the samples are more discriminative and can be classified more easily. Finally, we perform hybrid knowledge-based adaptation to further minimize the domain discrepancy between domains for higher cross-domain classification accuracy.

The most similar approaches to the proposed hybrid method LGMA are scatter component analysis (SCA) [40] and joint geometrical and statistical alignment (JGSA) [41]. However, LGMA differs significantly from SCA and JGSA in two key aspects: (a) LGMA jointly learns the invariant cross-domain classifier and transferable knowledge (invariant to feature representations) in a learning paradigm in a linear Lie algebra manifold space, whereas SCA and JGSA learn the transferable knowledge and transfer classifier in a nonlinear Lie group manifold space (reproducing kernel Hilbert space). (b) LGMA learns unbiased instance reweighting and unbiased to irrelevant instances not only by using the domain scatters but also by exploiting the weighted distribution alignment and the graph Laplacian regularization, whereas SCA and JGSA learn reweighting by scatters or unweighted distribution alignment. In summary, the proposed LGMA approach can jointly learn the cross-domain classifier and transferable knowledge with statistical and geometrical guarantees.

3 LGMA

In this section, we provide the LGMA approach in detail.

3.1 Problem definition

We begin with the formalized definition of domain adaptation [6, 42, 43]. For clarity, the frequently used notations are summarized in Table 1.

Table 1 Notations and corresponding descriptions used in this paper

Full size table

Definition 1

(Domain adaptation). A labeled source domain $ \mathcal {D}_{\mathit {s}} = \{x_{\mathit {s_{i}}},\mathit {y_{\mathit {s_{i}}}}\}_{i=1}^{n} $ and an unlabeled target domain $ \mathcal {D}_{\mathit {t}}= \{x_{\mathit {t_{j}}}\}_{j=n+1}^{n+m} $, we assume the feature space $ \mathcal {X}_{\mathit {s}} = \mathcal {X}_{\mathit {t}} $ and the label space $ \mathcal {Y}_{\mathit {s}} = \mathcal {Y}_{\mathit {t}} $. However, the marginal probability distribution P_s(x_s)≠P_t(x_t) with the conditional probability distribution Q_s(y_s|x_s)≠Q_t(y_t|x_t). The purpose of unsupervised domain adaptation is to learn a classifier $ f:x_{\mathit {t}} \mapsto y_{\mathit {t}}, y_{\mathit {t}} \in \mathcal {Y}_{\mathit {t}} $ to classify the samples for the target domain $ \mathcal {D}_{\mathit {t}}$ using the related label information in the source domain $ \mathcal {D}_{\mathit {s}}$. Data in the source and target domains can be denoted as $ \mathrm {X}_{\mathit {s}} \in \mathbb {R}^{D\times n}$, $ \mathrm {X}_{\mathit {t}} \in \mathbb {R}^{D\times m}$, respectively.

Classical Fisher’s linear discriminant analysis (FLDA) [26] can be represented as

$$ \underset{v}{\mathrm{arg max}} J(v)= \frac{v^{\mathrm{T}}\mathrm{S}_{b}v}{v^{\mathrm{T}}\mathrm{S}_{w}v} $$

(1)

where S_b and S_w are the matrices of between-class and within-class scatter, respectively. Maximizing FLDA increases the separation of samples with respect to the class cluster. However, the classification accuracy will be affected due to the different distributions between $ \mathcal {D}_{\mathit {s}}$ and $ \mathcal {D}_{\mathit {t}}$. Thus, minimizing the domain distribution discrepancy is the only way to improve classification performance when learning a cross-domain classifier f.

3.2 Main idea

LGMA mainly includes three steps. First, LGMA performs Lie algebra transformation to project features in Lie group manifold space onto the corresponding Lie algebra manifold space. Second, LGMA finds a paired transformation (i.e., A and B, A and B for source domain and target domain, respectively) to obtain new representations of respective domains. Third, LGMA performs weighted distribution alignment and manifold alignment to learn a cross-domain invariant classifier in the linear Lie algebra space. Figure 1 shows the main idea of the proposed LGMA method.

We first obtain the transformed features by means of Lie algebra transformation. Then, on the basis of FLDA and weighted distribution alignment and manifold alignment [41], the domain-invariant classifier f can be represented as

$$ \underset{\mathrm{A},\mathrm{B}}{\max}\frac{\alpha\mathit{S}_{f}(\mathcal{D}_{t})+\beta\mathit{S}_{bf}(\mathcal{D}_{s})} {\bar{D}_{f}(\mathcal{D}_{s},\mathcal{D}_{t}) + \delta\mathit{R}_{f}(\mathcal{D}_{s},\mathcal{\!D}_{t}) + \lambda\mathit{D}_{f}(S_{A},S_{B}) + \beta\mathit{S}_{wf}(\mathcal{D}_{s})} $$

(2)

where the terms S_f(⋅), S_bf(⋅), $ \bar {D}_{f}(\cdot ,\cdot ) $, R_f(⋅,⋅), D_f(⋅,⋅), and S_wf(⋅) represent the domain variance, the between-class variance, the weighted distribution alignment, the graph Laplacian regularization, the subspace divergence, and the within-class variance, respectively. α, β, δ, and λ are the regularization parameters.

3.3 Lie algebra transformation

Lie algebra transformation serves as the preprocessing step that aims to find a geodesic on the Lie group manifold and project all features onto this geodesic and perform weighted distribution and manifold alignment thereafter to maximize the ratio of (2).

Before Lie algebra transformation is introduced, we first elaborate the definition of Lie group and Lie algebra [44, 45].

Definition 2

(Lie group). A real Lie group [44] is a group that is also a finite-dimentional real smooth manifold, in which the group operations of multiplication and invertion are smooth maps. Smoothness of the group multiplication $ \mu :G\times G\rightarrow G\ \mu (x,y)=xy $ means that μ is a smooth mapping of the product manifold G × G into G. These two requirements can be combined to the single requirement that the mapping (x,y)↦x^− 1y be a smooth mapping of the product manifold into G.

Definition 3

(Lie algebra). A Lie algebra [45] is a vector space $ \mathfrak {g} $ over some field $ \mathbb {F} $ togeter with a binary operation $ [\cdot ,\cdot ]:\mathfrak {g}\times \mathfrak {g}\rightarrow \mathfrak {g}$ called the Lie bracket that satisfies the following axioms:

Bilinearity: [ax + by,z] = a[x,z] + b[y,z],[z,ax + by] = a[z,x] + b[z,y] for all scalars a, b in $ \mathbb {F} $ and all elements x, y, z in $ \mathfrak {g} $.
Alternativity: [x,x] = 0 for all x in $ \mathfrak {g} $.
The Jacobi identity: [x,[y,z]] + [z,[x,y]] + [y,[z,x]] = 0 for all x, y, z in $ \mathfrak {g} $.
Anticommutativity: [x,y] = −[y,x] for all x, y in $ \mathfrak {g} $.

Exponential and logarithmic transformations [45] are important theories in Lie group, and exponential transformation can be defined as

$$ \exp:\mathfrak{g}\rightarrow G, \ \ \exp(x)=\sum\limits_{i=0}^{\infty }\frac{x^{i}}{i!} $$

(3)

Elements in Lie algebra manifold space can be transformed into Lie group manifold space through this transformation. Similarly, logarithmic transformation can also be represented as

$$ \log:G\rightarrow \mathfrak{g}, \ \ \log(x)=\sum\limits_{i=0}^{\infty }\frac{(-1)^{i-1}}{i}(x-e)^{i} $$

(4)

Features in Lie group manifold space can be transformed into Lie algebra manifold space through this transformation.

We denote g(⋅) as the Lie algebra transformation. Thus, the feature in the Lie group manifold space can be transformed into Lie algebra manifold space through z = g(x).

3.4 Target domain variance maximization

The variance of the target domain can be maximized in the corresponding subspace to avoid projecting features onto some irrelevant dimensions. Therefore, the variance maximization term can be generalized as

$$ \underset{\mathrm{B}}{\max} \mathit{S}_{f}(\mathcal{D}_{t})=\underset{\mathrm{B}}{\max} \text{tr}(\mathrm{B}^{\mathrm{T}}\mathrm{S}_{t}\mathrm{B}) $$

(5)

where tr(⋅) denotes the trace of a matrix and

$$ \mathrm{S}_{t} = \mathrm{Z}_{t}\mathrm{H}_{t}\mathrm{Z}_{t}^{\mathrm{T}} $$

(6)

is the scatter matrix of the target domain, Z_t is the set of projected target samples, $ \mathrm {H}_{t} = \mathrm {I}_{t} - \frac {1}{m}\mathrm {1}_{t}\mathrm {1}_{t}^{\mathrm {T}} $ is the centering matrix, and $ \mathrm {1}_{t} \in \mathbb {R}^{m} $ is the column vector with all elements equal to 1.

3.5 Source domain discriminative feature preservation

We use the rich label information in the source domain to make the new representation of samples in the source domain discriminative as follows:

$$ \underset{\mathrm{A}}{\max} \mathit{S}_{bf}(\mathcal{D}_{s})=\underset{\mathrm{A}}{\max} \text{tr}\left( \mathrm{A}^{\mathrm{T}}\mathrm{S}_{b}\mathrm{A}\right) $$

(7)

$$ \underset{\mathrm{A}}{\min} \mathit{S}_{wf}(\mathcal{D}_{s})=\underset{\mathrm{A}}{\min} \text{tr}\left( \mathrm{A}^{\mathrm{T}}\mathrm{S}_{w}\mathrm{A}\right) $$

(8)

where S_b and S_w are the between-class and within-class scatter matrices, respectively, and are defined as follows:

$$ \mathrm{S}_{w} = \sum\limits_{c = 1}^{C}\mathrm{Z}_{\mathit{s}}^{\mathit{(c)}}\mathrm{H}_{\mathit{s}}^{\mathit{(c)}}\left( \mathrm{Z}_{\mathit{s}}^{\mathit{(c)}}\right)^{\mathrm{T}} $$

(9)

$$ \mathrm{S}_{b} = \sum\limits_{c = 1}^{C}n^{(c)}\left( m_{\mathit{s}}^{(c)}-\bar{m}_{\mathit{s}}\right)\left( m_{\mathit{s}}^{(c)}-\bar{m}_{\mathit{s}}\right)^{\mathrm{T}} $$

(10)

where $ \mathrm {Z}_{\mathit {s}}^{(c)} $ indicates the set of transformed source samples that belong to class c, $ m_{\mathit {s}}^{(c)} = \frac {1}{n^{(c)}}{\sum }_{i = 1}^{n^{(c)}}z_{s_{i}}^{(c)} $, $ \bar {m}_{\mathit {s}}=\frac {1}{n}{\sum }_{i=1}^{n}z_{s_{i}} $, and $ \mathrm {H}_{\mathit {s}}^{\mathit {(c)}}=\mathrm {I}_{\mathit {s}}^{\mathit {(c)}}-\frac {1}{n^{(c)}}\mathrm {1}_{\mathit {s}}^{\mathit {(c)}}\left (\mathrm {1}_{\mathit {s}}^{\mathit {(c)}}\right )^{\mathrm {T}} $ is the centering matrix of samples within class c, $ \mathrm {I}_{\mathit {s}}^{\mathit {(c)}} \in \mathbb {R}^{n^{(c)}\times {n^{(c)}}} $ is the identity matrix, $ \mathrm {1}_{\mathit {s}} \in \mathbb {R}^{n^{(c)}} $ is a column vector with all ones, and n^(c) is the number of source samples in class c.

3.6 Weighted distribution alignment

Weighted distribution alignment is devised to minimize the distribution divergence between the source and target domains by quantitatively assessing the importance of the marginal distribution (i.e., P) and the conditional distribution (i.e., Q). Formally, the weighted distribution alignment $ \bar {D}_{f}(\mathcal {D}_{s},\mathcal {D}_{t}) $ can be defined as follows:

$$ \bar{D}_{f}(\mathcal{D}_{s},\mathcal{D}_{t}) = (1-\mu)\mathit{D}(\mathit{P}_{s},\mathit{P}_{t})+\mu\mathit{D}(\mathit{Q}_{s},\mathit{Q}_{t}) $$

(11)

with μ ∈ [0,1] as the adaptive parameter. The projected MMD [6, 46, 47] methods can be adopted to compute the marginal and conditional distributions, which compare the different distributions on the basis of distance between the sample means of the two domains in the low-dimensional smooth manifold. The marginal distribution divergence D(P_s, P_t) can be detailed as

$$ \parallel\frac{1}{n}\sum\limits_{\mathrm{z}_{\mathit{s_{i}}} \in \mathrm{Z}_{\mathit{s}}}\mathrm{A}^{\mathrm{T}}\mathrm{z}_{\mathit{s_{i}}}-\frac{1}{m}\sum\limits_{\mathrm{z}_{\mathit{t_{j}}} \in \mathrm{Z}_{\mathit{t}}}\mathrm{B}^{\mathrm{T}}\mathrm{z}_{\mathit{t_{j}}}\parallel_{\mathrm{F}}^{\mathrm{2}} $$

(12)

Correspondingly, the conditional distribution divergence D(Q_s, Q_t) can be expressed as

$$ \sum\limits_{c=1}^{C}\parallel \frac{1}{n^{(c)}}\sum\limits\limits_{\mathrm{z}_{\mathit{s_{i}}} \in \mathrm{Z}_{\mathit{s}}^{\mathit{(c)}}}\mathrm{A}^{\mathrm{T}}\mathrm{z}_{\mathit{s_{i}}}-\frac{1}{m^{(c)}}\sum\limits_{\mathrm{z}_{\mathit{t_{j}}} \in \mathrm{Z}_{\mathit{t}}^{\mathit{(c)}}}\mathrm{B}^{\mathrm{T}}\mathrm{z}_{\mathit{t_{j}}}\parallel_{\mathrm{F}}^{\mathrm{2}} $$

(13)

where $ \mathrm {Z}_{\mathit {s}}^{\mathit {(c)}} = \left \{ \mathrm {z}_{\mathit {s_{i}}}:\mathrm {z}_{\mathit {s_{i}}}\in \mathrm {Z}_{\mathit {s}}\wedge \mathit {y}(\mathrm {z}_{\mathit {s_{i}}}) = c \right \} $ is the projected source samples that belong to class c and $\mathit {y}(\mathrm {z}_{\mathit {s_{i}}}) $ is the true label of $ \mathrm {z}_{\mathit {s_{i}}} $. $ \mathrm {Z}_{\mathit {t}}^{\mathit {(c)}} = \left \{ \mathrm {z}_{\mathit {t_{j}}}:\mathrm {z}_{\mathit {t_{j}}}\in \mathrm {Z}_{\mathit {t}}\wedge \hat {\mathit {y}}(\mathrm {z}_{\mathit {t_{j}}}) = c \right \} $ is the set of projected target samples that belong to class c, $ \hat {\mathit {y}}(\mathrm {z}_{\mathit {t_{j}}}) $ is the true label of $ \mathrm {z}_{\mathit {t_{j}}} $, and $ n^{(c)} = |\mathrm {Z}_{\mathit {s}}^{\mathit {(c)}}| $, $ m^{(c)} = |\mathrm {Z}_{\mathit {t}}^{\mathit {(c)}}| $ are the number of samples in class c in respective projected manifold spaces of the source and target domains. The evaluation of conditional distribution divergence D(Q_s, Q_t) is relative difficult because there is no labeled data are in the target domain. Long et al. [6] proposed to utilize the pseudo labels of the target domain which predicted by some supervised approaches (e.g., KNN) trained on the data in the source domain. The pseudo labels can be refined iteratively to minimize the difference in conditional distributions between the source and target domains. Thus, we follow this idea to further reduce the conditional MMD between domains.

Thus, combining the marginal and conditional MMDs together, the final weighted distribution alignment optimization can be stated in the following matrix form

$$ \underset{\mathrm{A},\mathrm{B}}{\min} \bar{D}_{f}(\mathcal{D}_{s},\mathcal{D}_{t})=\underset{\mathrm{A},\mathrm{B}}{\min} \text{tr} \left (\begin{bmatrix} \mathrm{A}^{\mathrm{T}} & \mathrm{B}^{\mathrm{T}} \end{bmatrix} \begin{bmatrix} \mathrm{M}_{\mathit{ss}} &\mathrm{M}_{\mathit{st}} \\ \mathrm{M}_{\mathit{ts}} & \mathrm{M}_{\mathit{tt}} \end{bmatrix} \begin{bmatrix} \mathrm{A} \\ \mathrm{B} \end{bmatrix}\right ) $$

(14)

where

$$ \begin{array}{ll} {\kern12pt}\mathrm{M}_{\mathit{ss}} &= \mathrm{Z}_{\mathit{s}}\left( (1-\mu)\mathrm{N}_{\mathit{ss}}+\mu{\sum}_{c=1}^{C}\mathrm{N}_{\mathit{ss}}^{\mathit{(c)}}\right)\mathrm{Z}_{\mathit{s}}^{\mathrm{T}}, \mathrm{N}_{\mathit{ss}} = \frac{1}{n^{2}}\mathrm{1}_{\mathit{n}}\mathrm{1}_{\mathit{n}}^{\mathrm{T}},\\ \left( \mathrm{N}_{\mathit{ss}}^{\mathit{(c)}}\right)_{ij} &= \begin{cases} \frac{1}{(n^{(c)})^{2}}, \mathrm{z}_{\mathit{i}},\mathrm{z}_{\mathit{j}}\in \mathrm{Z}_{\mathit{s}}^{(c)} \\ 0, \text{otherwise} \end{cases} \end{array} $$

(15)

$$ \begin{array}{ll} {\kern17pt}\mathrm{M}_{\mathit{tt}} = \mathrm{Z}_{\mathit{t}}((1-\mu)\mathrm{N}_{\mathit{tt}}+\mu{\sum}_{c=1}^{C}\mathrm{N}_{\mathit{tt}}^{\mathit{(c)}})\mathrm{Z}_{\mathit{t}}^{\mathrm{T}}, \mathrm{N}_{\mathit{tt}} = \frac{1}{m^{2}}\mathrm{1}_{\mathit{m}}\mathrm{1}_{\mathit{m}}^{\mathrm{T}},\\ \left( \mathrm{N}_{\mathit{tt}}^{\mathit{(c)}}\right)_{ij} = \begin{cases} \frac{1}{(m^{(c)})^{2}}, & \mathrm{z}_{\mathit{i}},\mathrm{z}_{\mathit{j}}\in \mathrm{Z}_{\mathit{t}}^{(c)} \\ 0, & \text{otherwise} \end{cases} \end{array} $$

(16)

$$ \begin{array}{ll} {\kern14pt}\mathrm{M}_{\mathit{st}} = \mathrm{Z}_{\mathit{s}}((1-\mu)\mathrm{N}_{\mathit{st}}+\mu{\sum}_{c=1}^{C}\mathrm{N}_{\mathit{st}}^{\mathit{(c)}})\mathrm{Z}_{\mathit{t}}^{\mathrm{T}}, \mathrm{N}_{\mathit{st}} = -\frac{1}{nm}\mathrm{1}_{\mathit{n}}\mathrm{1}_{\mathit{m}}^{\mathrm{T}},\\ (\mathrm{N}_{\mathit{st}}^{\mathit{(c)}})_{ij} = \begin{cases} -\frac{1}{n^{(c)}m^{(c)}}, \mathrm{z}_{\mathit{i}}\in\mathrm{Z}_{\mathit{s}}^{(c)},\mathrm{z}_{\mathit{j}}\in \mathrm{Z}_{\mathit{t}}^{(c)} \\ 0, \text{otherwise} \end{cases} \end{array} $$

(17)

$$ \begin{array}{ll} {\kern15pt}\mathrm{M}_{\mathit{ts}} = \mathrm{Z}_{\mathit{t}}((1-\mu)\mathrm{N}_{\mathit{ts}}+\mu{\sum}_{c=1}^{C}\mathrm{N}_{\mathit{ts}}^{\mathit{(c)}})\mathrm{Z}_{\mathit{s}}^{\mathrm{T}}, \mathrm{N}_{\mathit{ts}} = -\frac{1}{nm}\mathrm{1}_{\mathit{m}}\mathrm{1}_{\mathit{n}}^{\mathrm{T}},\\ (\mathrm{N}_{\mathit{ts}}^{\mathit{(c)}})_{ij} = \begin{cases} -\frac{1}{n^{(c)}m^{(c)}}, & \mathrm{z}_{\mathit{j}}\in\mathrm{Z}_{\mathit{s}}^{(c)},\mathrm{z}_{\mathit{i}}\in \mathrm{Z}_{\mathit{t}}^{(c)} \\ 0, & \text{otherwise} \end{cases} \end{array} $$

(18)

3.7 Graph Laplacian regularization

In this section, we use graph Laplacian regularization to guarantee the unbiased problem of irrelevant instances.

In domain adaptation, labeled and unlabeled data are used. It is expected that knowledge of marginal distributions (i.e., P_s and P_t) can be further exploited to improve the performance of function learning. Thus, the unlabeled samples may often reveal the underlying facts of the target domain, such as sample variances. The idea of manifold assumption [48] can be expressed as follows. If two points, namely, z_i,z_j ∈ g are close in the geometry of marginal distributions P_s(z_s) and P_t(z_t), then the conditional distributions Q_s(y_s|z_s) and Q_t(y_t|z_t) are similar. Under the hypothesis of the smooth properties of geodesics, Laplacian regularization can be used for further exploiting the similar geometrical properties of nearest points in Lie algebra manifold space $ \mathfrak {g} $. Thus, the final optimization of graph Laplacian regularization $\mathit {R}_{f}(\mathcal {D}_{s},\mathcal {D}_{t})$ can be computed as

$$ \underset{\mathrm{A},\mathrm{B}}{\min}\mathit{R}_{f}(\mathcal{D}_{s},\!\mathcal{D}_{t}) = \underset{\mathrm{A},\mathrm{B}}{\min} \text{tr}\! \left (\begin{bmatrix}\!\! \mathrm{A}^{\mathrm{T}} &\mathrm{0} \\ \!\!\!\!\mathrm{0} & \mathrm{B}^{\mathrm{T}}\!\!\end{bmatrix} \begin{bmatrix} \mathrm{Z}_{\mathit{s}}\mathrm{L}_{\mathit{ss}}\mathrm{Z}_{\mathit{s}}^{\mathrm{T}} &\mathrm{Z}_{\mathit{s}}\mathrm{L}_{\mathit{st}}\mathrm{Z}_{\mathit{t}}^{\mathrm{T}} \\ \mathrm{Z}_{\mathit{t}}\mathrm{L}_{\mathit{ts}}\mathrm{Z}_{\mathit{s}}^{\mathrm{T}} & \mathrm{Z}_{\mathit{t}}\mathrm{L}_{\mathit{tt}}\mathrm{Z}_{\mathit{t}}^{\mathrm{T}}\end{bmatrix} \begin{bmatrix} \!\mathrm{A} &\mathrm{0} \\ \!\!\mathrm{0} & \mathrm{B}\!\!\!\end{bmatrix}\! \right ) $$

(19)

where L = I −D^− 1/2WD^− 1/2 is the graph Laplacian matrix and D is a diagonal matrix with its i th diagonal element calculated as the sum of i th row of W, i.e., $ \mathrm {D}_{ii} = {\sum }_{j=1}^{n}\mathrm {W}_{ij} $. W is defined by

$$ \mathrm{W}_{ij} = \begin{cases} \cos(\mathrm{z}_{\mathit{i}},\mathrm{z}_{\mathit{j}}), & \mathrm{z}_{\mathit{i}} \in \mathcal{N}_{p}(\mathrm{z}_{\mathit{j}}) \vee \mathrm{z}_{\mathit{j}} \in \mathcal{N}_{p}(\mathrm{z}_{\mathit{i}})\\ 0, & \mathrm{otherwise,} \end{cases} $$

(20)

where $ \mathcal {N}_{p}(\mathrm {z}_{\mathit {i}}) $ is z_i’s p nearest neighbors which are from the same class with z_i.

3.8 Subspace divergence minimization

In this section, we further mitigate the domain divergence by moving the source and target subspaces closer together, which is similar to the aforementioned methods, such as transfer component analysis (TCA) [28] or joint distribution alignment (JDA) [42]. The differences in the two domains will be reduced but cannot be completely removed through this transformation. By contrast, we obtain the idea from [30] to minimize A and B simultaneously. In this way, the statistical and geometrical features can be preserved. Formally, we use the following minimization form of Frobenius-norm to move the two subspaces closer.

$$ \underset{\mathrm{A},\mathrm{B}}{\min}\mathit{D}_{f}(S_{A},S_{B})=\underset{\mathrm{A},\mathrm{B}}{\min}\left \| \mathrm{A}-\mathrm{B} \right \|_{\mathrm{F}}^{2} $$

(21)

3.9 Optimization

To control the scale of solution B, we follow [40, 41] to impose a constraint that tr(B^TB) is sufficiently small. We formulate the LGMA method by incorporating (5), (7), (8), (14), (19), and (21). Then, our objective function (2), therefore, can be formulated as follows:

$$ \underset{\mathrm{A},\mathrm{B}}{\mathrm{arg max}}\frac{\text{tr} \left (\begin{bmatrix} \mathrm{A}^{\mathrm{T}} & \mathrm{B}^{\mathrm{T}} \end{bmatrix} \begin{bmatrix} \beta\mathrm{S}_{\mathit{b}} & \mathrm{0} \\ \mathrm{0} & \alpha \mathrm{S}_{\mathit{t}} \end{bmatrix} \begin{bmatrix} \mathrm{A} \\ \mathrm{B} \end{bmatrix} \right )} {\text{tr}\left( \begin{bmatrix} \mathrm{A}^{\mathrm{T}} & \mathrm{B}^{\mathrm{T}} \end{bmatrix} \begin{bmatrix} \mathrm{M}_{\mathit{ss}}+\delta\mathrm{L}_{\mathit{ss}}+\lambda\mathrm{I}+\beta\mathrm{S}_{\mathit{w}} & \mathrm{M}_{\mathit{st}}+\delta\mathrm{L}_{\mathit{st}}-\lambda\mathrm{I}\\ \mathrm{M}_{\mathit{ts}}+\delta\mathrm{L}_{\mathit{ts}}-\lambda\mathrm{I} & \mathrm{M}_{\mathit{tt}}+\delta\mathrm{L}_{\mathit{tt}}+(\lambda+\alpha)\mathrm{I} \end{bmatrix} \begin{bmatrix} \mathrm{A} \\ \mathrm{B} \end{bmatrix} \right) } $$

(22)

where α, β, δ, and λ are penalty parameters, and $ \mathrm {I} \in \mathbb {R}^{d \times d} $ is the identity matrix.

LGMA aims to find a paired transformation A and B by solving the generalized eigendecomposition problem in the projected Lie algebra manifold space. To optimize (22), we define [A^T B^T] to be equal to U^T. Thus, we get

$$ \begin{array}{ll} &\underset{\mathrm{U}}{\mathrm{arg max}} \text{tr} \left (\mathrm{U}^{\mathrm{T}} \begin{bmatrix} \beta\mathrm{S}_{\mathit{b}} & \mathrm{0} \\ \mathrm{0} & \alpha\mathrm{S}_{\mathit{t}} \end{bmatrix} \mathrm{U} \right ) \\ &\mathrm{s.t.} \text{tr}\!\left( \!\!\mathrm{U}^{\mathrm{T}}\!\ \begin{bmatrix} \mathrm{M}_{\mathit{ss}} + \delta\mathrm{L}_{\mathit{ss}} + \lambda\mathrm{I} + \beta\mathrm{S}_{\mathit{w}} &\!\!\! \mathrm{M}_{\mathit{st}}+\delta\mathrm{L}_{\mathit{st}}-\lambda\mathrm{I}\\ \!\!\!\!\!\!\!\!\!\!\!\!\mathrm{M}_{\mathit{ts}}+\delta\mathrm{L}_{\mathit{ts}}-\lambda\mathrm{I} & \!\!\!\!\!\!\!\!\!\!\!\!\!\mathrm{M}_{\mathit{tt}}+\delta\mathrm{L}_{\mathit{tt}}+(\lambda+\alpha)\mathrm{I} \end{bmatrix}\! \mathrm{U} \right) = 1 \end{array} $$

(23)

Equivalently, the constraint optimization of (23) can be written in the form of Lagrangian. Thus, we have

$$ \begin{array}{ll} \mathit{L}(\mathrm{U}) &=\text{tr} \left (\mathrm{U}^{\mathrm{T}} \begin{bmatrix} \beta\mathrm{S}_{\mathit{b}} & \mathrm{0} \\ \mathrm{0}& \alpha\mathrm{S}_{\mathit{t}} \end{bmatrix} \mathrm{U} \right ) \\ &{\kern7pt}+\text{tr}\left( \left( \mathrm{U}^{\mathrm{T}} \begin{bmatrix} \mathrm{M}_{\mathit{ss}}+\delta\mathrm{L}_{\mathit{ss}}+\lambda\mathrm{I}+\beta\mathrm{S}_{\mathit{w}} & \mathrm{M}_{\mathit{st}}+\delta\mathrm{L}_{\mathit{st}}-\lambda\mathrm{I}\\ \mathrm{M}_{\mathit{ts}}+\delta\mathrm{L}_{\mathit{ts}}-\lambda\mathrm{I} & \mathrm{M}_{\mathit{tt}}+\delta\mathrm{L}_{\mathit{tt}}+(\lambda+\alpha)\mathrm{I} \end{bmatrix} \mathrm{U}-\mathrm{I} \right){\varLambda} \right) \end{array} $$

(24)

To solve (24), we set the first derivative $ \frac {\partial L(\mathrm {U})}{\partial \mathrm {U}}=\mathrm {0} $. Then, we obtain generalized eigendecomposition

$$ \begin{bmatrix} \beta\mathrm{S}_{\mathit{b}} & \mathrm{0} \\ \mathrm{0} & \alpha\mathrm{S}_{\mathit{t}} \end{bmatrix} \!\mathrm{U} = \begin{bmatrix} \mathrm{M}_{\mathit{ss}} + \delta\mathrm{L}_{\mathit{ss}} + \lambda\mathrm{I} + \beta\mathrm{S}_{\mathit{w}} & \mathrm{M}_{\mathit{st}} + \delta\mathrm{L}_{\mathit{st}}-\lambda\mathrm{I}\\ \!\!\!\!\!\!\!\!\!\!\!\!\mathrm{M}_{\mathit{ts}}+\delta\mathrm{L}_{\mathit{ts}}-\lambda\mathrm{I} &\mathrm{M}_{\mathit{tt}}+\delta\mathrm{L}_{\mathit{tt}}+(\lambda+\alpha)\mathrm{I} \end{bmatrix} \mathrm{U}{\varLambda} $$

(25)

where Λ = diag(λ₁,...,λ_k) is the k leading eigenvalue and $ \mathrm {U}=\begin {bmatrix} \mathrm {U}_{1},...,\mathrm {U}_{k} \end {bmatrix} $ contains the corresponding eigenvectors. Finding the optimal adaptation matrix U is decreased to solving (25) for k eigenvectors. Algorithm 1 provides a complete summary of LGMA.

3.10 Computational complexity

The computational complexity of Algorithm 1 consists of four parts as follows.

(1)
The computation of S_t, S_b, and S_w in step 2.
(2)
The construction of M_ss, M_tt, M_st, and M_ts in step 2.
(3)
The optimization of eigendecomposition problem in step 4.
(4)
The computation for all other processes.

Generally, in terms of the big O notation. The computation of S_t, S_b, and S_w cost O(m²), O(n²), and O(n²). The construction of M_ss, M_tt, M_st, and M_ts cost O(TCn²), O(TCm²), O(TCnm), and O(TCmn). The optimization of eigendecomposition problem costs O(Tkm²). The computation for all other processes cost O(Tmn). Denote T and k as the number of iterations and the subspace bases. The overall computational costs of Algorithm 1 would be O(T(k + C)m² + TCn² + TCmn).

4 Experiments

In this section, we perform extensive experiments on real-world image recognition datasets to evaluate the proposed LGMA approach against the state-of-the-art methods. The experiments are divided into three parts. Section 4.1 visualizes performance on image classification tasks. Section 4.2 evaluates the performance on a range of cross-domain image classification tasks with a standard and realistic hyper-parameter tuning. Section 4.3 reports the results with a tuning protocol established in the literature for completeness.

4.1 Feature visualization

Figure 2a, b, e, f, c, d, g and h show the visualization of transfer tasks V→ I and A$ \rightarrow $W after performing SCA, JGSA, and LGMA algorithms, respectively. Some interesting conclusions can be drawn. (a) SCA can not learn the invariant cross-domain features well because the differences between the source domain and target domain are still large. (b) JGSA does not learn the weighted distribution alignment because the distribution of the source domain are dissimilar to the target domain, thereby leading to large domain bias. The abovementioned conclusions show the inferior performance of SCA and JGSA and validate the superiority of LGMA.

4.2 Real world object recognition

4.2.1 Experimental setup

Five public large-scale image datasets are used, as shown in Table 2.

Table 2 Five benchmark datasets used in this paper

Full size table

The public large-scale image recognition datasets in our experiments include Office+Caltech10, Office-31, and ImageNet + VOC2007, which are popular image classification datasets that are widely used for evaluating machine learning and data mining models, such as [6, 31, 41].

Office + Caltech10

[49] contains 2,533 images from 10 different subcategories. The dataset includes 4 image domains, i.e., Amazon (A), DSLR (D), Webcam (W), and Caltech (C). Figure 3 depicts the sample images from the object monitor category in the four domains, namely, Caltech, Amazon, DSLR, and Webcam [31]. Features in Office and Caltech follow different distributions, domain adaptation can help the performance of cross-domain image classification. Formally, 10 classes are used in each dataset. Thus, 12 tasks are constructed, namely, A→C, A→D, A$\rightarrow $W,..., D→W. In this study, A$\rightarrow $B represents the transfer task from the source domain A to the target domain B.

Office-31

[49] is also a widely used dataset for transfer learning tasks in image recognition and multimedia analysis. It includes 4,652 images and 31 categories from three domains: Amazon (A), Webcam (W), and DSLR (D). Each of these two domains can construct a transfer learning task, thereby leading to 6 tasks: A→D, A$\rightarrow $W ,..., and W→D, respectively.

ImageNet + VOC2007

(I, V) are another widely used image datasets. Because images from the same classes of both domains follow different distributions, each dataset can be considered one domain. In this paper, we use the datasets in [50] to perform transfer learning tasks. Both of the two datasets have five classes, namely, bird, cat, chair, dog, and person, respectively. Thus, another two transfer learning tasks, i.e., I→V and V→I, are constructed.

For all the baseline approaches, we use the optimal parameters reported in the original papers. As for LGMA, we set λ = 1 and α = 1, such that the inner subspace bias and the target variance are treated as equally important. The subspace dimension k = 30 in the tasks of Office + Caltech10 datasets with DeCaf₆ features and the tasks of ImageNet + VOC2007 datasets, and the subspace dimension k = 100 in the tasks of Office-31 datasets with DeCaf₇ features. We empirically validate that the fixed parameters can obtain promising performance on different types of tasks. Therefore, the weighted coefficient μ, the regularization parameter β, the number of iteration T, the number of nearest neighbors p, and the coefficient of the graph Laplacian regularization term δ are free parameters.

We also exploit classification Accuracy on test data as the evaluation metric, which is widely used in many studies [28, 31, 47]:

$$ Accuracy = \frac{|\mathbf{x}:\mathbf{x}\in\mathcal{D}_{t}\wedge \hat{y}(\mathbf{x})=y(\mathbf{x})|}{|\mathbf{x}:\mathbf{x}\in\mathcal{D}_{t}|}, $$

(26)

where y(x) and $\hat {y}(\mathbf {x})$ indicate the truth and predicted labels in the target domain, respectively.

4.2.2 Baselines

To evaluate the robustness of the proposed LGMA approach to different configurations of datasets, we conduct comprehensive evaluation on image recognition datasets and compare LGMA with competitive state-of-the-art domain adaptation methods as follows:

1-Nearest neighbor (1NN) classifier;
Support vector machine (SVM) [51];
Transfer component analysis (TCA) [28], which adapts marginal distribution;
Transfer joint matching (TJM) [52], which performs marginal distribution with the sample selection of the source domain;
Distribution matching machine (DMM) [36], which aims to learn an SVM classifier to adapt distributions alignment based on SRM;
Scatter component analysis (SCA) [40], which leans a classifier through scatter component analysis;
Joint geometrical and statistical alignment (JGSA) [41], which performs geometrical and statistical alignment with label propagation.
Unsupervised transfer metric learning (UTML) [18], which decreases intra-class distance and increases inter-class distance;
Locality preserving joint transfer (LPJT) [19], which jointly exploits feature adaptation with distribution matching and sample adaptation with landmark selection;
Domain invariant and class discriminative feature learning (DICD) [17], which matches the marginal and conditional distributions, and maximizes the inter-class dispersion and minimizes the intra-class scatter;
Transfer independently together (TIT) [20], which learns multiple transformations for each domain to map data onto a shared latent space where the domains are well aligned.

4.2.3 Experimental results and analysis

The classification performance of all comparison models on the 12 transfer tasks of Office + Caltech10 datasets with DeCaf₆ features, the 6 transfer tasks of Office-31 datasets with DeCaf₇ features, and the 2 transfer tasks of ImageNet + VOC2007 datasets are shown in Tables 3, 4, and 5, respectively. LGMA considerably outperforms the competitive baseline methods on most of the transfer tasks. Specifically, LGMA achieves the following performance gains compared with the best baselines: (1) 1.3% on the 12 transfer tasks of Office+Caltech10 datasets with DeCaf₆ features, (2) 0.4% on the 6 transfer tasks of Office-31 datasets with DeCaf₇ features, and (3) 6.1% on the 2 transfer tasks of ImageNet + VOC2007 datasets. Although LGMA cannot perform the best on all tasks, if LGMA performs the best, then it usually performs considerably better than the best baseline approach; otherwise, it performs only slightly worse than the optimal baseline. This finding demonstrates that LGMA is robust to feature shift and instance bias for domain adaptation.

Table 3 Recognition accuracy(%) against other baseline methods on Office + Caltech10 (DeCaf6) datasets, the best results and the best baseline results are shown in boldface and italic, respectively

Full size table

Table 4 Recognition accuracy(%) against other baseline methods on Office-31 (DeCaf7) datasets, the best results and the best baseline results are shown in boldface and italic, respectively

Full size table

Table 5 Recognition accuracy(%) against other baseline methods on ImageNet + VOC2007 datasets, the best results and the best baseline results are shown in boldface and italic, respectively

Full size table

We can make more observations. (1) Domain adaptation methods (i.e., instance-based adaptation, feature representation-based adaptation, classifier-based adaptation, and hybrid knowledge-based adaptation methods) are generally superior to SVM and 1NN, which indicates that minimizing the distribution differences is the key to domain adaptation. (2) Classifier-based adaptation DMM method outperforms TCA, thereby showing the effectiveness of minimizing the distribution differences based on SRM in the infinite dimension reproducing kernel Hilbert space (DMM) rather than in the dimension reduced kernel PCA space (TCA). (3) Hybrid knowledge-based adaptation methods (i.e., SCA, JGSA, TIT, LPJT, UTML, DICD and LGMA) further outperform TCA and other single methods, whereas LGMA performs the best in most transfer tasks. Only single knowledge-based adaptation methods are insufficiently good for domain adaptation when the domain discrepancy is substantially large. The reason is that some source samples which are irrelevant to the target samples are not helpful for learning a unified classifier even when using the cross-domain invariant features or the high dimensional nonlinear features or both. LGMA addresses this limitation by reweighting the source instances according to their relevance to the target instances and performing weighted distribution alignment in the linear Lie algebra manifold space.

Although SCA, JGSA, LPJT, and DICD perform distribution matching by using hybrid knowledge based adaptation, the advantages of LGMA over these four methods are threefold. (1) LGMA corrects the domain mismatch by quantitatively evaluating the importance of the marginal and conditional distributions in the generalized FLDA framework. LGMA further performs feature matching to guarantee a large number of effective source instances for classifying the related target domain. In SCA, JGSA, LPJT, and DICD, the evaluation of distribution importance is ignored. (2) LGMA jointly learns the domain-invariant and label-discriminative transfer classifier and the transferable knowledge (invariant to feature representations and unbiased to irrelevant instances) in a learning paradigm in the nonlinear Lie group manifold space, whereas SCA, JGSA, LPJT, and DICD learn the transferable knowledge and cross-domain classifier in a linear manifold space. (3) LGMA aims to find a geodesic on the original Lie group and projects all the samples onto a Lie algebra manifold space along the geodesic direction, while ensuring the discrimination of the projected samples in a linear Lie algebra manifold space. However, the other four methods (i.e., SCA, JGSA, LPJT, and DICD) cannot guarantee that the transformed samples are linear separable in the RKHS.

We further verify the performance of LGMA on another Office + Caltech10 datasets using SURF features, and the performance results are reported in Table 6. It is worth noting that LGMA outperforms other baselines range from traditional machine learning methods (i.e., 1NN and SVM) to state-of-the-art transfer learning models (i.e., TJM, SCA, JGSA, UTML, TIT, DICD, and LPJT), which demonstrates that LGMA is significantly superior to other baselines in minimizing the cross-domain discrepancy.

Table 6 Recognition accuracy(%) against other baseline methods on Office + Caltech10 (SURF) datasets, the best results and the best baseline results are shown in boldface and italic, respectively

Full size table

We also evaluate the importance of the Lie algebra transformation, the graph Laplacian regularization term (including the parameters p and δ), and the weighted distribution alignment factor μ, where we stand out from the baseline methods. We randomly select several tasks and show the results in Figs. 4, 5, and 6. In Fig. 4, the dotted lines represent the baseline method, the solid lines represent the proposed LGMA method. We can make additional observations. (1) The Lie algebra transformation (L), the graph Laplacian regularization (GLR), and the weighted distribution alignment (WDA) are highly important in dealing with the domain adaptation problems (Figs. 5 and 6). (2) Compared with other methods (FLDA, F LDA with L ie algebra transfermation (FL), F LDA with L ie algebra transfermation and W eighted distribution alignment (FLW), F LDA with L ie algebra transfermation, W eighted distribution alignment, and G raph Laplacian regularization (FLWG)), the performance of LGMA method is better, which validates the effectiveness of the proposed method. (3) LGMA can reach a steady performance in approximately $ \mathrm {T\leqslant 10} $ iterations (Fig. 4d and Fig. 5). (4) LGMA can reach a high performance using the wide range of parameters (Fig. 4a, b, and c).

The reasons for these results are presented as follows. First is that the instances in Lie group manifold space are projected onto the linear Lie algebra manifold space by Lie algebra transformation to realize the data discrimination in the nonlinear Lie group manifold space. Second is that the graph Laplacian regularization can further exploit the similar geometrical properties of the nearest points in domain adaptation. The third is that the weighted distribution alignment factor μ ∈{0,0.01,...,0.99,1} can evaluate the importance of the marginal and conditional distributions. We do not perform experiments on the DeCaf₇ features of Office-31 datasets because the results are satisfactory.

4.3 Results with parameter tuning on target domain

In this section, we analyze the parameter fluctuations of LGMA on different types of datasets to validate that a wide range of parameter values can be selected for improved performance.

We find the sensitivity of the number of the nearest neighbors p by experimenting with a large range of p ∈{2,4,8,...,64} on randomly selected tasks. From Fig. 4a and the experimental results, we can conclude that LGMA is robust in terms of p = 32. μ is a weight factor with the value range μ ∈{0,0.01,...,0.99,1}, and we can choose the value of μ from the analysis of Fig. 4c.

LGMA uses a wide range of values for regularization parameters β, δ, and some other necessary parameters k, T. We follow the same setup of [41] that β ∈ [2^− 15,2^− 1] and k ∈ [20,180]. In this study, we set the number of iterations T = 10 (Fig. 4d). δ (Fig. 4b) is a factor with δ ∈{0,0.01,...,0.99,1}. We observe that LGMA can achieve robust performance for a wide range of parameter values.

In the experiment on Office + Caltech10 datasets using DeCaf₆ features, we set the free parameters β = 0.08, δ = 0.18, and μ = 0.81. In the experiment on Office-31 datasets using DeCaf₇ features, we set the free parameters β = 0.1, δ = 0.46, and μ = 0.74. In the experiment on ImageNet + VOC2007 datasets, we set the free parameters β = 0.1, δ = 0.11, and μ = 0.81.

5 Conclusions

In this paper, we proposed a new Lie group manifold analysis (LGMA) method for unsupervised domain adaptation. LGMA performs transformation using variances between subsets of data to suppress insignificant differences (within labels and between domains) and to amplify useful differences (between labels and overall variability) in a linear Lie algebra manifold space. In the meanwhile, LGMA learns a invariant cross-domain classifier by extracting domain-invariant feature representations, evaluating the importance of distributions (marginal and conditional distributions), exploiting the similar geometrical properties of the nearest points, and estimating irrelevant instance weights that jointly reduce the cross-domain distribution difference. Extensive experiments on several cross-domain image datasets validate that LGMA considerably outperforms state-of-the-art domain adaptation methods.

In general, the problem of dataset bias in domain adaptation is far from being solved. The actual performance of existing approaches ($\geqslant 90\%$ accuracy) is only achieved in several cross-domain tasks, even using advanced feature extraction methods, such as DeCaf₆ and DeCaf₇ features. Using raw features is clearly not satisfactory. Therefore, it is critical to develop more robust algorithms that can significantly reduce data bias in all cases.

References

Kim S, Jeong M, Ko BC (2020) Energy efficient pupil tracking based on rule distillation of cascade regression forest. Sensors 20(18):5141
Article Google Scholar
Weiss K, Khoshgoftaar TM, Wang D (2016) A survey of transfer learning. J Big Data 3(1):1–40
Article Google Scholar
Pan J, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Article Google Scholar
Xing FZ, Pallucchini F, Cambria E (2019) Cognitive-inspired domain adaptation of sentiment lexicons. Inf Process Manag 56(3):554–564
Article Google Scholar
Zhao C, Wang S, Li D (2020) Multi-source domain adaptation with joint learning for cross-domain sentiment classification. Knowl-Based Syst 191:105254
Article Google Scholar
Long M, Wang J, Ding G, Pan SJ, Yu PS (2014) Adaptation regularization: A general framework for transfer learning. IEEE Trans Knowl Data Eng 26(5):1076–1089
Article Google Scholar
Chong Y, Peng C, Zhang C, Wang Y, Feng W, Pan S (2021) Learning domain invariant and specific representation for cross-domain person re-identification. Appl Intell:1–14
Lee C-Y, Batra T, Baig MH, Ulbricht D (2019) Sliced wasserstein discrepancy for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 10285–10295
Khan MAAH, Roy N, Misra A (2018) Scaling human activity recognition via deep learning-based domain adaptation. In: 2018 IEEE International Conference on Pervasive Computing and Communications (PerCom). IEEE, pp 1–9
Ziser Y, Reichart R (2018) Pivot based language modeling for improved neural domain adaptation. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp 1241–1251
Quanz B, Huan J, Mishra M (2012) Knowledge transfer with low-quality data: A feature extraction issue. IEEE Trans Knowl Data Eng 24(10):1789–1802
Article Google Scholar
Si S, Tao D, Geng B (2010) Bregman divergence-based regularization for transfer subspace learning. IEEE Trans Knowl Data Eng 22(7):929–942
Article Google Scholar
Wei P, Ke Y, Goh CK (2016) Deep nonlinear feature coding for unsupervised domain adaptation. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, pp 2189–2195
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) Decaf: A deep convolutional activation feature for generic visual recognition. Int Conf Mach Learn:647–655
Bendavid S, Blitzer J, Crammer K, Kulesza A, Pereira F, Vaughan JW (2010) A theory of learning from different domains. Mach Learn 79(1):151–175
Article MathSciNet Google Scholar
Ma Z, Yang Y, Nie F, Sebe N, Yan S, Hauptmann AG (2014) Harnessing lab knowledge for real-world action recognition. Int J Comput Vis 109(1):60–73
Article Google Scholar
Li S, Song S, Huang G, Ding Z, Wu C (2018) Domain invariant and class discriminative feature learning for visual domain adaptation. IEEE Trans Image Process PP(99):1–1
Article MathSciNet Google Scholar
Huang J, Zhou Z (2019) Transfer metric learning for unsupervised domain adaptation. IET Image Process 13(5):804–810
Article Google Scholar
Li J, Jing M, Lu K, Zhu L, Shen HT (2019) Locality preserving joint transfer for domain adaptation. IEEE Trans Image Process 28(12):6103–6115
Article MathSciNet Google Scholar
Li J, Lu K, Huang Z, Zhu L, Shen HT (2018) Transfer independently together: A generalized framework for domain adaptation. IEEE Trans Cybern 49(6):2144–2155
Article Google Scholar
Qin C, Wang L, Zhang Y, Fu Y (2019) Generatively inferential co-training for unsupervised domain adaptation. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, pp 1055–1064
Ahmadvand M, Tahmoresnezhad J (2021) Metric transfer learning via geometric knowledge embedding. Appl Intell 51:921–934
Article Google Scholar
Hoffman J, Rodner E, Donahue J, Darrell T, Saenko K (2013) Efficient learning of domain-invariant image representations. International Conference on Learning Representations
Zhu F, Shao L (2014) Weakly-supervised cross-domain dictionary learning for visual recognition. Int J Comput Vis 109(1):42–59
Article Google Scholar
Gholenji E, Tahmoresnezhad J (2020) Joint discriminative subspace and distribution adaptation for unsupervised domain adaptation. Appl Intell 50(7):2050–2066
Article Google Scholar
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188
Article Google Scholar
Baktashmotlagh M, Harandi MT, Lovell BC, Salzmann M (2014) Domain adaptation on the statistical manifold. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 2481–2488
Pan SJ, Tsang IW, Kwok JT, Yang Q (2010) Domain adaptation via transfer component analysis. IEEE Trans Neural Netw 22(2):199–210
Article Google Scholar
Ben-David S, Blitzer J, Crammer K, Pereira F (2006) Analysis of representations for domain adaptation. In: International Conference on Neural Information Processing Systems
Fernando B, Habrard A, Sebban M, Tuytelaars T (2014) Unsupervised visual domain adaptation using subspace alignment. In: IEEE International Conference on Computer Vision, pp 2960–2967
Gong B, Shi Y, Sha F, Grauman K (2012) Geodesic flow kernel for unsupervised domain adaptation. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 2066– 2073
Gopalan R, Li R, Chellappa R (2011) Domain adaptation for object recognition: An unsupervised approach. In: 2011 international conference on computer vision. IEEE, pp 999– 1006
Cui Z, Chang H, Shan S, Chen X (2014) Generalized unsupervised manifold alignment. In: Advances in Neural Information Processing Systems, pp 2429–2437
Shao M, Castillo C, Gu Z, Fu Y (2012) Low-rank transfer subspace learning. In: 2012 IEEE 12th International Conference on Data Mining. IEEE, pp 1104–1109
Ben-David S, Blitzer J, Crammer K, Pereira F (2010) Manifold alignment via corresponding projections. In: British Machine Vision Conference, pp 1–11
Cao Y, Long M, Wang J (2018) Unsupervised domain adaptation with distribution matching machines.. In: AAAI, pp 2795– 2802
Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10 (5):988–999
Article Google Scholar
Tuzel O, Porikli F, Meer P (2008) Learning on lie groups for invariant detection and tracking. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 1– 8
Li F, Zhang L, Zhang Z (2018) Lie group machine learning. Walter de Gruyter GmbH & Co KG
Ghifary M, Balduzzi D, Kleijn WB, Zhang M (2017) Scatter component analysis: A unified framework for domain adaptation and domain generalization. IEEE Trans Pattern Anal Mach Intell 39(7):1414–1430
Article Google Scholar
Zhang J, Li W, Ogunbona P (2017) Joint geometrical and statistical alignment for visual domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1859–1867
Long M, Wang J, Ding G, Sun J, Yu PS (2014) Transfer feature learning with joint distribution adaptation. In: IEEE International Conference on Computer Vision, pp 2200–2207
Wang J, Feng W, Chen Y, Yu H, Huang M, Yu PS (2018) Visual domain adaptation with manifold embedded distribution alignment. In: Proceedings of the 26th ACM international conference on Multimedia, pp 402–410
Georgi H, Jagannathan K (1983) Lie algebras in particle physics. Phys Today 36(12):62–62
Article Google Scholar
Hall BC (2015) Lie groups, lie algebras, and representations. Springer Berl 659(5):xiv,351
MATH Google Scholar
Quanz B, Huan J (2009) Large margin transductive transfer learning. In: Acm Conference on Information & Knowledge Management
Wang J, Chen Y, Hao S, Feng W, Shen Z (2017) Balanced distribution adaptation for transfer learning. In: IEEE International Conference on Data Mining
Belkin M, Niyogi P, Sindhwani V, Bartlett P (2006) Manifold regularization: a geometric framework for learning from examples. J Mach Learn Res 7(1):2399–2434
MathSciNet MATH Google Scholar
Saenko K, Kulis B, Fritz M, Darrell T (2010) Adapting visual category models to new domains. In: European Conference on Computer Vision, pp 213–226
Fang C, Xu Y, Rockmore DN (2014) Unbiased metric learning: On the utilization of multiple datasets and web images for softening bias. In: IEEE International Conference on Computer Vision, pp 1657–1664
Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300
Article Google Scholar
Long M, Wang J, Ding G, Sun J, Yu PS (2014) Transfer joint matching for unsupervised domain adaptation. In: Computer Vision and Pattern Recognition, pp 1410– 1417

Download references

Acknowledgments

This work was supported in part by the Key-Area Research and Development Program for Guangdong Province (2019B010136001) and the National Key Research and Development Plan under Grant 2017YFB0801801, in part by the National Natural Science Foundation of China (NSFC) under Grant 61672186 and Grant 61872110.

Author information

Authors and Affiliations

School of Cyberspace Science, Harbin Institute of Technology, Harbin, 150001, China
Hongwei Yang, Hui He, Weizhe Zhang, Yawen Bai & Tao Li
Cyberspace Security Research Center, Pengcheng Laboratory, Shenzhen, 518055, China
Weizhe Zhang

Authors

Hongwei Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hui He
View author publications
You can also search for this author in PubMed Google Scholar
Weizhe Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yawen Bai
View author publications
You can also search for this author in PubMed Google Scholar
Tao Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongwei Yang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, H., He, H., Zhang, W. et al. Lie group manifold analysis: an unsupervised domain adaptation approach for image classification. Appl Intell 52, 4074–4088 (2022). https://doi.org/10.1007/s10489-021-02564-3

Download citation

Accepted: 25 May 2021
Published: 17 July 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10489-021-02564-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Lie group manifold analysis: an unsupervised domain adaptation approach for image classification

Abstract

Similar content being viewed by others

Distribution matching and structure preservation for domain adaptation

Joint Matrix Factorization and Structure Preserving for Domain Adaptation

Subdomain adaptation via correlation alignment with entropy minimization for unsupervised domain adaptation

Explore related subjects

1 Introduction

2 Related work

3 LGMA

3.1 Problem definition

Definition 1

3.2 Main idea

3.3 Lie algebra transformation

Definition 2

Definition 3

3.4 Target domain variance maximization

3.5 Source domain discriminative feature preservation

3.6 Weighted distribution alignment

3.7 Graph Laplacian regularization

3.8 Subspace divergence minimization

3.9 Optimization

3.10 Computational complexity

4 Experiments

4.1 Feature visualization

4.2 Real world object recognition

4.2.1 Experimental setup

Office + Caltech10

Office-31

ImageNet + VOC2007

4.2.2 Baselines

4.2.3 Experimental results and analysis

4.3 Results with parameter tuning on target domain

5 Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation