Abstract
State-of-the-art statistical NLP systems for a variety of tasks learn from labeled training data that is often domain specific. However, there may be multiple domains or sources of interest on which the system must perform. For example, a spam filtering system must give high quality predictions for many users, each of whom receives emails from different sources and may make slightly different decisions about what is or is not spam. Rather than learning separate models for each domain, we explore systems that learn across multiple domains. We develop a new multi-domain online learning framework based on parameter combination from multiple classifiers. Our algorithms draw from multi-task learning and domain adaptation to adapt multiple source domain classifiers to a new target domain, learn across multiple similar domains, and learn across a large number of disparate domains. We evaluate our algorithms on two popular NLP domain adaptation tasks: sentiment classification and spam filtering.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Abernethy, J. D., Bartlett, P., & Rakhlin, A. (2007). Multitask learning with expert advice (Tech. Rep. UCB/EECS-2007-20). EECS Department, University of California, Berkeley.
Ando, R., & Zhang, T. (2005). A framework for learning predictive structure from multiple tasks and unlabeled data. Journal of Machine Learning Research (JMLR), 6, 1817–1853.
Arnold, A., Nallapati, R., & Cohen, W. W. (2008). Exploiting feature hierarchy for transfer learning in named entity recognition. In Association for computational linguistics (ACL).
Bakker, B., & Heskes, T. (2003). Task clustering and gating for Bayesian multi–task learning. Journal of Machine Learning Research (JMLR), 4, 83–99.
Ben-David, S., Blitzer, J., Crammer, K., & Pereira, F. (2006). Analysis of representations for domain adaptation. In Advances in neural information processing systems (NIPS).
Bickel, S., Brückner, M., & Scheffer, T. (2007). Discriminative learning for differing training and test distributions. In International conference on machine learning (ICML).
Bickel, S., Sawade, C., & Scheffer, T. (2009). Transfer learning by distribution matching for targeted advertising. In Advances in neural information processing systems (pp. 145–152).
Blitzer, J., McDonald, R., & Pereira, F. (2006). Domain adaptation with structural correspondence learning. In Empirical methods in natural language processing (EMNLP).
Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Wortman, J. (2007a). Learning bounds for domain adaptation. In Advances in neural information processing systems (NIPS).
Blitzer, J., Dredze, M., & Pereira, F. (2007b). Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Association for computational linguistics (ACL).
Caruana, R. (1997). Multitask learning. Machine Learning, 28, 41–75.
Chan, Y. S., & Ng, H. T. (2006). Estimating class priors in domain adaptation for word sense disambiguation. In Association for computational linguistics (ACL).
Chelba, C., & Acero, A. (2004). Adaptation of maximum entropy classifier: Little data can help a lot. In Empirical methods in natural language processing (EMNLP).
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2006). Online passive-aggressive algorithms. Journal of Machine Learning Research (JMLR), 7, 551–585.
Crammer, K., Dredze, M., & Pereira, F. (2008). Exact confidence-weighted learning. In Advances in neural information processing systems (NIPS).
Dai, W., Xue, G. R., Yang, Q., & Yu, Y. (2007). Transferring naive Bayes classifiers for text classification. In American national conference on artificial intelligence (AAAI).
Dai, W., Chen, Y., Xue, G. R., Yang, Q., & Yu, Y. (2009). Translated learning: Transfer learning across different feature spaces. In Advances in neural information processing systems (NIPS) (pp. 353–360).
Daumé, H. (2007). Frustratingly easy domain adaptation. In Association for computational linguistics (ACL).
Daumé, H. (2009). Bayesian multitask learning with latent hierarchies. In Uncertainty in artificial intelligence (UAI).
Daumé, H., & Marcu, D. (2006). Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research (JAIR), 26, 101–126.
Dekel, O., Long, P. M., & Singer, Y. (2006). Online multitask learning. In Conference on learning theory (COLT).
Do, C. B., & Ng, A. (2006). Transfer learning for text classification. In Advances in neural information processing systems (NIPS).
Dredze, M., & Crammer, K. (2008). Online methods for multi-domain learning and adaptation. In Empirical methods in natural language processing (EMNLP).
Dredze, M., Blitzer, J., Talukdar, P. P., Ganchev, K., Graca, J., & Pereira, F. (2007). Frustratingly hard domain adaptation for parsing. In Shared task—conference on natural language learning—CoNLL 2007 shared task.
Dredze, M., Crammer, K., & Pereira, F. (2008). Confidence-weighted linear classification. In International conference on machine learning (ICML).
Evgeniou, T., & Pontil, M. (2004). Regularized multi-task learning. In Conference on knowledge discovery and data mining (KDD).
Florian, R., Ittycheriah, A., Jing, H., & Zhang, T. (2003). Named entity recognition through classifier combination. In Conference on computational natural language learning (CONLL).
Jiang, J., & Zhai, C. (2007a). Instance weighting for domain adaptation in nlp. In Association for computational linguistics (ACL).
Jiang, J., & Zhai, C. (2007b). A two-stage approach to domain adaptation for statistical classifiers. In Conference on information and knowledge management (CIKM).
Kittler, J., Hatef, M., Duin, R., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 226–239.
Lease, M., & Charniak, E. (2005). Parsing biomedical literature. In International joint conference on natural language processing (IJCNLP).
Littlestone, N., & Warmuth, M. K. (1994). The weighted majority algorithm. Information and Computation, 108, 212–261.
Mansour, Y., Mohri, M., & Rostamizadeh, A. (2009). Domain adaptation with multiple sources. In Advances in neural information processing systems.
Marcus, M., Marcinkiewicz, M., & Santorini, B. (1993). Building a large annotated corpus of English: the penn treebank. Computational Linguistics, 19(2), 313–330.
Marx, Z., Rosenstein, M. T., Dietterich, T. G., & Kaelbling, L. P. (2008). Two algorithms for transfer learning. In Inductive transfer: 10 years later.
McClosky, D., & Charniak, E. (2008). Self-training for biomedical parsing. In Association for computational linguistics (ACL).
Obozinski, G., Taskar, B., & Jordan, M. (2006). Multi-task feature selection. In ICML-06 workshop on structural knowledge transfer for machine learning.
Raina, R., Ng, A., & Koller, D. (2006). Constructing informative priors using transfer learning. In International conference on machine learning (ICML).
Raina, R., Battle, A., Lee, H., Packer, B., & Ng, A. (2007). Self-taught learning: transfer learning from unlabeled data. In International conference on machine learning (ICML) (pp. 59–766).
Satpal, S., & Sarawagi, S. (2007). Domain adaptation of conditional probability models via feature subsetting. In European conference on principles and practice of knowledge discovery in databases.
Schweikert, G., Widmer, C., Schölkopf, B., & Rätsch, G. (2008). An empirical analysis of domain adaptation algorithms for genomic sequence analysis. In Advances in neural information processing systems (NIPS).
Tax, D. M. J., van Breukelen, M., Duina, R. P. W., & Kittler, J. (2000). Combining multiple classifiers by averaging or by multiplying? Pattern Recognition, 33(9), 1475–1485.
Thrun, S., & O’Sullivan, J. (1998). Clustering learning tasks and the selective cross–task transfer of knowledge. In S. Thrun & L. Pratt (Eds.), Learning to learn. Amsterdam: Kluwer Academic.
Woods, K., Kegelmeyer, W. P. Jr., & Bowyer, K. (1997). Combination of multiple classifiers using local accuracy estimates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4), 405–410. doi:10.1109/34.588027.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Nicolo Cesa-Bianchi, David R. Hardoon, and Gayle Leen.
Preliminary versions of the work contained in this article appeared in the proceedings of the conference on Empirical Methods in Natural Language Processing (Dredze and Crammer 2008).
K. Crammer is a Horev Fellow, supported by the Taub Foundations.
Rights and permissions
About this article
Cite this article
Dredze, M., Kulesza, A. & Crammer, K. Multi-domain learning by confidence-weighted parameter combination. Mach Learn 79, 123–149 (2010). https://doi.org/10.1007/s10994-009-5148-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-009-5148-0