Multi-domain learning by confidence-weighted parameter combination

Dredze, Mark; Kulesza, Alex; Crammer, Koby

doi:10.1007/s10994-009-5148-0

Multi-domain learning by confidence-weighted parameter combination

Published: 03 October 2009

Volume 79, pages 123–149, (2010)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Multi-domain learning by confidence-weighted parameter combination

Download PDF

Mark Dredze¹,
Alex Kulesza² &
Koby Crammer³

1837 Accesses
63 Citations
Explore all metrics

Abstract

State-of-the-art statistical NLP systems for a variety of tasks learn from labeled training data that is often domain specific. However, there may be multiple domains or sources of interest on which the system must perform. For example, a spam filtering system must give high quality predictions for many users, each of whom receives emails from different sources and may make slightly different decisions about what is or is not spam. Rather than learning separate models for each domain, we explore systems that learn across multiple domains. We develop a new multi-domain online learning framework based on parameter combination from multiple classifiers. Our algorithms draw from multi-task learning and domain adaptation to adapt multiple source domain classifiers to a new target domain, learn across multiple similar domains, and learn across a large number of disparate domains. We evaluate our algorithms on two popular NLP domain adaptation tasks: sentiment classification and spam filtering.

Article PDF

Learning Ranked Sentiment Lexicons

TISA: Topic Independence Scoring Algorithm

Hyperopt-Sklearn

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Abernethy, J. D., Bartlett, P., & Rakhlin, A. (2007). Multitask learning with expert advice (Tech. Rep. UCB/EECS-2007-20). EECS Department, University of California, Berkeley.
Ando, R., & Zhang, T. (2005). A framework for learning predictive structure from multiple tasks and unlabeled data. Journal of Machine Learning Research (JMLR), 6, 1817–1853.
MathSciNet Google Scholar
Arnold, A., Nallapati, R., & Cohen, W. W. (2008). Exploiting feature hierarchy for transfer learning in named entity recognition. In Association for computational linguistics (ACL).
Bakker, B., & Heskes, T. (2003). Task clustering and gating for Bayesian multi–task learning. Journal of Machine Learning Research (JMLR), 4, 83–99.
Article Google Scholar
Ben-David, S., Blitzer, J., Crammer, K., & Pereira, F. (2006). Analysis of representations for domain adaptation. In Advances in neural information processing systems (NIPS).
Bickel, S., Brückner, M., & Scheffer, T. (2007). Discriminative learning for differing training and test distributions. In International conference on machine learning (ICML).
Bickel, S., Sawade, C., & Scheffer, T. (2009). Transfer learning by distribution matching for targeted advertising. In Advances in neural information processing systems (pp. 145–152).
Blitzer, J., McDonald, R., & Pereira, F. (2006). Domain adaptation with structural correspondence learning. In Empirical methods in natural language processing (EMNLP).
Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Wortman, J. (2007a). Learning bounds for domain adaptation. In Advances in neural information processing systems (NIPS).
Blitzer, J., Dredze, M., & Pereira, F. (2007b). Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Association for computational linguistics (ACL).
Caruana, R. (1997). Multitask learning. Machine Learning, 28, 41–75.
Article Google Scholar
Chan, Y. S., & Ng, H. T. (2006). Estimating class priors in domain adaptation for word sense disambiguation. In Association for computational linguistics (ACL).
Chelba, C., & Acero, A. (2004). Adaptation of maximum entropy classifier: Little data can help a lot. In Empirical methods in natural language processing (EMNLP).
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2006). Online passive-aggressive algorithms. Journal of Machine Learning Research (JMLR), 7, 551–585.
MathSciNet Google Scholar
Crammer, K., Dredze, M., & Pereira, F. (2008). Exact confidence-weighted learning. In Advances in neural information processing systems (NIPS).
Dai, W., Xue, G. R., Yang, Q., & Yu, Y. (2007). Transferring naive Bayes classifiers for text classification. In American national conference on artificial intelligence (AAAI).
Dai, W., Chen, Y., Xue, G. R., Yang, Q., & Yu, Y. (2009). Translated learning: Transfer learning across different feature spaces. In Advances in neural information processing systems (NIPS) (pp. 353–360).
Daumé, H. (2007). Frustratingly easy domain adaptation. In Association for computational linguistics (ACL).
Daumé, H. (2009). Bayesian multitask learning with latent hierarchies. In Uncertainty in artificial intelligence (UAI).
Daumé, H., & Marcu, D. (2006). Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research (JAIR), 26, 101–126.
MATH Google Scholar
Dekel, O., Long, P. M., & Singer, Y. (2006). Online multitask learning. In Conference on learning theory (COLT).
Do, C. B., & Ng, A. (2006). Transfer learning for text classification. In Advances in neural information processing systems (NIPS).
Dredze, M., & Crammer, K. (2008). Online methods for multi-domain learning and adaptation. In Empirical methods in natural language processing (EMNLP).
Dredze, M., Blitzer, J., Talukdar, P. P., Ganchev, K., Graca, J., & Pereira, F. (2007). Frustratingly hard domain adaptation for parsing. In Shared task—conference on natural language learning—CoNLL 2007 shared task.
Dredze, M., Crammer, K., & Pereira, F. (2008). Confidence-weighted linear classification. In International conference on machine learning (ICML).
Evgeniou, T., & Pontil, M. (2004). Regularized multi-task learning. In Conference on knowledge discovery and data mining (KDD).
Florian, R., Ittycheriah, A., Jing, H., & Zhang, T. (2003). Named entity recognition through classifier combination. In Conference on computational natural language learning (CONLL).
Jiang, J., & Zhai, C. (2007a). Instance weighting for domain adaptation in nlp. In Association for computational linguistics (ACL).
Jiang, J., & Zhai, C. (2007b). A two-stage approach to domain adaptation for statistical classifiers. In Conference on information and knowledge management (CIKM).
Kittler, J., Hatef, M., Duin, R., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 226–239.
Article Google Scholar
Lease, M., & Charniak, E. (2005). Parsing biomedical literature. In International joint conference on natural language processing (IJCNLP).
Littlestone, N., & Warmuth, M. K. (1994). The weighted majority algorithm. Information and Computation, 108, 212–261.
Article MATH MathSciNet Google Scholar
Mansour, Y., Mohri, M., & Rostamizadeh, A. (2009). Domain adaptation with multiple sources. In Advances in neural information processing systems.
Marcus, M., Marcinkiewicz, M., & Santorini, B. (1993). Building a large annotated corpus of English: the penn treebank. Computational Linguistics, 19(2), 313–330.
Google Scholar
Marx, Z., Rosenstein, M. T., Dietterich, T. G., & Kaelbling, L. P. (2008). Two algorithms for transfer learning. In Inductive transfer: 10 years later.
McClosky, D., & Charniak, E. (2008). Self-training for biomedical parsing. In Association for computational linguistics (ACL).
Obozinski, G., Taskar, B., & Jordan, M. (2006). Multi-task feature selection. In ICML-06 workshop on structural knowledge transfer for machine learning.
Raina, R., Ng, A., & Koller, D. (2006). Constructing informative priors using transfer learning. In International conference on machine learning (ICML).
Raina, R., Battle, A., Lee, H., Packer, B., & Ng, A. (2007). Self-taught learning: transfer learning from unlabeled data. In International conference on machine learning (ICML) (pp. 59–766).
Satpal, S., & Sarawagi, S. (2007). Domain adaptation of conditional probability models via feature subsetting. In European conference on principles and practice of knowledge discovery in databases.
Schweikert, G., Widmer, C., Schölkopf, B., & Rätsch, G. (2008). An empirical analysis of domain adaptation algorithms for genomic sequence analysis. In Advances in neural information processing systems (NIPS).
Tax, D. M. J., van Breukelen, M., Duina, R. P. W., & Kittler, J. (2000). Combining multiple classifiers by averaging or by multiplying? Pattern Recognition, 33(9), 1475–1485.
Article Google Scholar
Thrun, S., & O’Sullivan, J. (1998). Clustering learning tasks and the selective cross–task transfer of knowledge. In S. Thrun & L. Pratt (Eds.), Learning to learn. Amsterdam: Kluwer Academic.
Google Scholar
Woods, K., Kegelmeyer, W. P. Jr., & Bowyer, K. (1997). Combination of multiple classifiers using local accuracy estimates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4), 405–410. doi:10.1109/34.588027.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Human Language Technology Center of Excellence, Johns Hopkins University, Baltimore, MD, 21211, USA
Mark Dredze
Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, 19104, USA
Alex Kulesza
Department of Electrical Engineering, The Technion, Haifa, 32000, Israel
Koby Crammer

Authors

Mark Dredze
View author publications
You can also search for this author in PubMed Google Scholar
Alex Kulesza
View author publications
You can also search for this author in PubMed Google Scholar
Koby Crammer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mark Dredze.

Additional information

Editors: Nicolo Cesa-Bianchi, David R. Hardoon, and Gayle Leen.

Preliminary versions of the work contained in this article appeared in the proceedings of the conference on Empirical Methods in Natural Language Processing (Dredze and Crammer 2008).

K. Crammer is a Horev Fellow, supported by the Taub Foundations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dredze, M., Kulesza, A. & Crammer, K. Multi-domain learning by confidence-weighted parameter combination. Mach Learn 79, 123–149 (2010). https://doi.org/10.1007/s10994-009-5148-0

Download citation

Received: 27 February 2009
Revised: 22 July 2009
Accepted: 04 September 2009
Published: 03 October 2009
Issue Date: May 2010
DOI: https://doi.org/10.1007/s10994-009-5148-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Multi-domain learning by confidence-weighted parameter combination

Abstract

Article PDF

Similar content being viewed by others

Learning Ranked Sentiment Lexicons

TISA: Topic Independence Scoring Algorithm

Hyperopt-Sklearn

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-domain learning by confidence-weighted parameter combination

Abstract

Article PDF

Similar content being viewed by others

Learning Ranked Sentiment Lexicons

TISA: Topic Independence Scoring Algorithm

Hyperopt-Sklearn

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation