Abstract
In this paper, we address the problem of domain adaptation for binary classification. This problem arises when the distributions generating the source learning data and target test data are somewhat different. From a theoretical standpoint, a classifier has better generalization guarantees when the two domain marginal distributions of the input space are close. Classical approaches try mainly to build new projection spaces or to reweight the source data with the objective of moving closer the two distributions. We study an original direction based on a recent framework introduced by Balcan et al. enabling one to learn linear classifiers in an explicit projection space based on a similarity function, not necessarily symmetric nor positive semi-definite. We propose a well-founded general method for learning a low-error classifier on target data, which is effective with the help of an iterative procedure compatible with Balcan et al.’s framework. A reweighting scheme of the similarity function is then introduced in order to move closer the distributions in a new projection space. The hyperparameters and the reweighting quality are controlled by a reverse validation procedure. Our approach is based on a linear programming formulation and shows good adaptation performances with very sparse models. We first consider the challenging unsupervised case where no target label is accessible, which can be helpful when no manual annotation is possible. We also propose a generalization to the semi-supervised case allowing us to consider some few target labels when available. Finally, we evaluate our method on a synthetic problem and on a real image annotation task.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Abbasnejad M, Ramachandram D, Mandava R (2012) A survey of the state of the art in learning the kernels. Knowl Inf Syst 31(2): 193–221. doi:10.1007/s10115-011-0404-6
Ando R, Zhang T (2005) A framework for learning predictive structures from multiple tasks and unlabeled data. J Mach Learn Res 6: 1817–1853
Ayache S, Quénot G (2008) Video corpus annotation using active learning. In: Proceedings of the 30th European conference on information retrieval research (ECIR), vol 4956 of LNCS. Springer, pp 187–198
Ayache S, Quénot G, Gensel J (2007) Image and video indexing using networks of operators. J Image Video Process 1: 1–113
Bahadori MT, Liu Y, Zhang D (2011) Learning with minimum supervision: a general framework for transductive transfer learning. In: Proceedings of the 11th IEEE international conference on data mining (ICDM), pp 61–70
Balcan M, Blum A, Srebro N (2008a) Improved guarantees for learning via similarity functions. In: Proceedings of the annual conference on computational learning theory (COLT), pp 287–298
Balcan M, Blum A, Srebro N (2008) A theory of learning with similarity functions. Mach Learn J 72(1–2): 89–112
Bellet A, Habrard A, Sebban M (2011) Learning good edit similarities with generalization guarantees. In: Proceedings of European conference on machine learning and principles of data mining and knowledge discovery (ECML/PKDD), vol 6911 of LNCS, pp 188–203
Ben-David S, Blitzer J, Crammer K, Kulesza A, Pereira F, Vaughan J (2010) A theory of learning from different domains. Mach Learn J 79(1–2): 151–175
Ben-David S, Blitzer J, Crammer K, Pereira F (2007) Analysis of representations for domain adaptation. In: Proceedings of advances in neural information processing systems (NIPS), pp 137–144
Ben-David S, Lu T, Luu T, Pal D (2010) Impossibility theorems for domain adaptation. JMLR W&CP 9: 129–136
Bergamo A, Torresani L (2010) Exploiting weakly-labeled web images to improve object classification: a domain adaptation approach. In: Proceedings of advances in neural information processing systems (NIPS)
Blitzer J, Foster D, Kakade S (2011) Domain adaptation with coupled subspaces. In: Proceedings of AISTATS
Bruzzone L, Marconcini M (2010) Domain adaptation problems: a DASVM classification technique and a circular validation strategy. IEEE Trans Pattern Anal Mach Intell 32(5): 770–787
Cao B, Ni X, Sun J-T, Wang G, Yang Q (2011) Distance metric learning under covariate shift. In: Proceedings of international joint conference on artificial intelligence (IJCAI), pp 1204–1210
Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chattopadhyay R, Ye J, Panchanathan S, Fan W, Davidson I (2011) Multi-source domain adaptation and its application to early detection of fatigue. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining (KDD). ACM, pp 717–725
Chen M, Weinberger K, Blitzer J (2011) Co-training for domain adaptation. In: Proceedings of advances in neural information processing systems (NIPS)
Cortes C, Mohri M (2011) Domain adaptation in regression. In: Proceedings of international conference on algorithmic learning theory (ALT), vol 6925 of LNCS, pp 308–323
Daumé H III (2007) Frustratingly easy domain adaptation. In: Proceedings of the association for computational linguistics (ACL)
Daumé H III, Kumar A, Saha A (2010) Co-regularization based semi-supervised domain adaptation. In: Proceedings of advances in neural information processing systems (NIPS)
Duan L, Tsang I, Xu D, Chua T (2009) Domain adaptation from multiple sources via auxiliary classifiers. In: Proceedings of international conference on machine learning (ICML), p 37
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2007) The PASCAL visual object classes challenge 2007 (VOC2007) results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/
Fei H, Huan J (2011) Structured feature selection and task relationship inference for multi-task learning. In: Proceedings of the 11th IEEE international conference on data mining (ICDM). IEEE, pp 171–180
Freund R (1991) Polynomial-time algorithms for linear programming based only on primal scaling and projected gradients of a potential function. Math Program 51: 203–222
Geng B, Tao D, Xu C (2011) DAML: Domain adaptation metric learning. IEEE Trans Image Process (TIP) 20(10): 2980–2989
Guerra P, Veloso A Jr, WM, Almeida V (2011) From bias to opinion: a transfer-learning approach to real-time sentiment analysis. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining (KDD). ACM, pp 150–158
Huang J, Smola A, Gretton A, Borgwardt K, Schölkopf B (2006) Correcting sample selection bias by unlabeled data. In: Proceedings of advances in neural information processing systems (NIPS), pp 601–608
Jiang J (2008) A literature survey on domain adaptation of statistical classifiers. Technical report, Computer Science Department at University of Illinois at Urbana-Champaign. http://sifaka.cs.uiuc.edu/jiang4/domain_adaptation/da_survey.pdf
Jiang J, Zhai C (2007) Instance weighting for domain adaptation in nlp. In: Proceedings of the association for computational linguistics (ACL)
Joachims T (1999) Transductive inference for text classification using support vector machines. In: Proceedings of international conference on machine learning (ICML), pp 200–209
Junejo K, Karim A (2012) Robust personalizable spam filtering via local and global discrimination modeling. Knowl Inf Syst 1–36. doi:10.1007/s10115-012-0477-x
Kulis B, Saenko K, Darrell T (2011) What you saw is not what you get: domain adaptation using asymmetric kernel transforms. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR 2011), pp 1785–1792
Macqueen J (1967) Some methods of classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, pp 281–297
Mansour Y, Mohri M, Rostamizadeh A (2008) Domain adaptation with multiple sources. In: Proceedings of advances in neural information processing systems (NIPS), pp 1041–1048
Mansour Y, Mohri M, Rostamizadeh A (2009) Domain adaptation: learning bounds and algorithms. In: Proceedings of annual conference on learning theory (COLT), pp 19–30
Pan S, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10): 1345–1359
Quionero-Candela J, Sugiyama M, Schwaighofer A, Lawrence N (2009) Dataset shift in machine learning. MIT Press, Cambridge
Schweikert G, Widmer C, Schölkopf B, Rätsch G (2008) An empirical analysis of domain adaptation algorithms for genomic sequence analysis. In: Proceedings of advances in neural information processing systems (NIPS), pp 1433–1440
Seah C, Tsang I, Ong Y, Lee K (2010) Predictive distribution matching svm for multi-domain learning. In: Proceedings of European conference on machine learning and principles of data mining and knowledge discovery (ECML/PKDD), vol 6321 of LNCS. Springer, pp 231–247
Smeaton A, Over P, Kraaij W (2009) High-level feature detection from video in TRECVid: a 5-year retrospective of achievements. In: Multimedia content analysis, theory and applications. Springer, pp 151–174
Sugiyama M, Nakajima S, Kashima H, von Bünau P, Kawanabe M (2007) Direct importance estimation with model selection and its application to covariate shift adaptation. In: Proceedings of advances in neural information processing systems (NIPS)
Vapnik V (1998) Statistical learning theory. Springer, Berlin
Wang B, Tang J, Fan W, Chen S, Tan C, Yang Z (2012) Query-dependent cross-domain ranking in heterogeneous network. Knowl Inf Syst 1–37. doi:10.1007/s10115-011-0472-7
Xu H, Mannor S (2010) Robustness and generalization. In: Proceedings of annual conference on computational theory (COLT), pp 503–515
Xu H, Mannor S (2012) Robustness and generalization. Mach Learn J 86(3): 391–423
Xu Z, Kersting K (2011) Multi-task learning with task relations. In: Proceedings of the 11th IEEE international conference on data mining (ICDM). IEEE, pp 884–893
Xue G-R, Dai W, Yang Q, Yu Y (2008) Topic-bridged plsa for cross-domain text classification. In: Proceedings of international ACM SIGIR conference on research and development in information retrieval, pp 627–634
Ye Y (1991) ‘An O(n 3L) potential reduction algorithm for linear programming’. Math Program 50: 239–258
Zhang Y, Yeung D-Y (2010) Transfer metric learning by learning task relationships. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD). ACM, pp 1199–1208
Zhong E, Fan W, Yang Q, Verscheure O, Ren J (2010) Cross validation framework to choose amongst models and datasets for transfer learning. In: Proceedings of European conference on machine learning and principles of data mining and knowledge discovery (ECML/PKDD), vol 6323 of LNCS. Springer, pp 547–562
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Morvant, E., Habrard, A. & Ayache, S. Parsimonious unsupervised and semi-supervised domain adaptation with good similarity functions. Knowl Inf Syst 33, 309–349 (2012). https://doi.org/10.1007/s10115-012-0516-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-012-0516-7