Abstract
Multitask Learning is an inductive transfer method that improves generalization accuracy on a main task by using the information contained in the training signals of other related tasks. It does this by learning the extra tasks in parallel with the main task while using a shared representation; what is learned for each task can help other tasks be learned better. This chapter describes a dozen opportunities for applying multitask learning in real problems. At the end of the chapter we also make several suggestions for how to get the most our of multitask learning on real-world problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Y. S. Abu-Mostafa, “Learning from Hints in Neural Networks,” Journal of Complexity, 1990, 6(2), pp. 192–198.
Y. S. Abu-Mostafa, “Hints,” Neural Computation, 1995, 7, pp. 639–671.
J. Baxter, “Learning Internal Representations,” COLT-95, Santa Cruz, CA, 1995.
J. Baxter, “Learning Internal Representations,” Ph.D. Thesis, The Flinders Univeristy of South Australia, Dec. 1994.
R. Caruana, “Multitask Learning: A Knowledge-Based Source of Inductive Bias,” Proceedings of the 10th International Conference on Machine Learning, ML-93, University of Massachusetts, Amherst, 1993, pp. 41–48.
R. Caruana, “Multitask Connectionist Learning,” Proceedings of the 1993 Connectionist Models Summer School, 1994, pp. 372–379.
R. Caruana and D. Freitag, “Greedy Attribute Selection,” ICML-94, 1994, Rutgers, NJ, pp. 28–36.
R. Caruana, “Learning Many Related Tasks at the Same Time with Backpropagation,” NIPS-94, 1995, pp. 656–664.
R. Caruana, S. Baluja, and T. Mitchell, “Using the Future to “Sort Out” the Present: Rankprop and Multitask Learning for Medical Risk Prediction,” Advances in Neural Information Processing Systems 8, (Proceedings of NIPS-95), 1996, pp. 959–965.
R. Caruana, and V. R. de Sa, “Promoting Poor Features to Supervisors: Some Inputs Work Better As Outputs,” NIPS-96, 1997.
R. Caruana, “Multitask Learning,” Machine Learning, 28, pp. 41–75, 1997.
R. Caruana, “Multitask Learning,” Ph.D. thesis, Carnegie Mellon University, CMU-CS-97-203, 1997.
R. Caruana and J. O’Sullivan, “Multitask Pattern Recognition for Autonomous Robots,” to appear in The Proceedings of the IEEE Intelligent Robots and Systems Conference, (IROS’98), Victoria, 1998.
R. Caruana and V. R. de Sa, “Using Feature Selection to Find Inputs that Work Better as Outputs,” to appear in The Proceedings of the International Conference on Neural Nets, (ICANN’98), Sweden, 1998.
G. F. Cooper, C. F. Aliferis, R. Ambrosino, J. Aronis, B. G. Buchanan, R. Caruana, M. J. Fine, C. Glymour, G. Gordon, B. H. Hanusa, J. E. Janosky, C. Meek, T. Mitchell, T. Richardson, and P. Spirtes, “An Evaluation of Machine Learning Methods for Predicting Pneumonia Mortality,” Artificial Intelligence in Medicine 9, 1997, pp. 107–138.
M. Craven and J. Shavlik, “Using Sampling and Queries to Extract Rules from Trained Neural Networks,” Proceedings of the 11th International Conference on Machine Learning, ML-94, Rutgers University, New Jersey, 1994, pp. 37–45.
I. Davis and A. Stentz, “Sensor Fusion for Autonomous Outdoor Navigation Using Neural Networks,” Proceedings of IEEE’s Intelligent Robots and Systems Conference, 1995.
T. G. Dietterich and G. Bakiri, “Solving Multiclass Learning Problems via Error-Correcting Output Codes,” Journal of Artificial Intelligence Research, 1995, 2, pp. 263–286.
M. J. Fine, D. Singer, B. H. Hanusa, J. Lave, and W. Kapoor, “Validation of a Pneumonia Prognostic Index Using the MedisGroups Comparative Hospital Database,” American Journal of Medicine, 1993.
Ghosn, J. and Bengio, Y., “Multi-Task Learning for Stock Selection,” NIPS-96, 1997.
T. Heskes, “Solving a Huge Number of Similar Tasks: A Combination of Multitask Learning and a Hierarchical Bayesian Approach,” Proceedings of the 15th International Conference on Machine Learning, Madison, Wisconsin, pp. 233–241, 1998.
L. Holmstrom and P. Koistinen, “Using Additive Noise in Back-propagation Training,” IEEE Transactions on Neural Networks, 1992, 3(1), pp. 24–38.
G. John, R. Kohavi, and K. Pfleger, “Irrelevant Features and the Subset Selection Problem,” ICML-94, 1994, Rutgers, NJ, pp. 121–129.
D. Koller and M. Sahami, “Towards Optimal Feature Selection,” ICML-96, Bari, Italy, 1996, pp. 284–292.
Y. Le Cun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackal, “Backpropagation Applied to Handwritten Zip-Code Recognition,” Neural Computation, 1989, 1, pp. 541–551.
Y. Le Cun, private communication, 1997.
P. W. Munro and B. Parmanto, “Competition Among Networks Improves Committee Performance,” to appear in Advances in Neural Information Processing Systems 9, (Proceedings of NIPS-96), 1997.
D. A. Pomerleau, “Neural Network Perception for Mobile Robot Guidance,” Doctoral Thesis, Carnegie Mellon University: CMU-CS-92-115, 1992.
L. Y. Pratt, J. Mostow, and C. A. Kamm, “Direct Transfer of Learned Information Among Neural Networks,” Proceedings of AAAI-91, 1991.
T. J. Sejnowski and C. R. Rosenberg, “NETtalk: A Parallel Network that Learns to Read Aloud,” John Hopkins: JHU/EECS-86/01, 1986.
J. Sill and Y. Abu-Mostafa, “Monotonicity Hints,” to appear in Neural Information Processing Systems 9, (Proceedings of NIPS-96), 1997.
S. C. Suddarth and A. D. C. Holden, “Symbolic-neural Systems and the Use of Hints for Developing Complex Systems,” International Journal of Man-Machine Studies, 1991, 35(3), pp. 291–311.
S. C. Suddarth and Y. L. Kergosien, “Rule-injection Hints as a Means of Improving Network Performance and Learning Time,” Proceedings of EURASIP Workshop on Neural Nets, 1990, pp. 120–129.
S. Thrun, Explanation-Based Neural Network Learning: A Lifelong Learning Approach, 1996, Kluwer Academic Publisher.
S. Thrun and L. Pratt, editors, Machine Learning. Second Special Issue on Inductive Transfer, 1997.
S. Thrun and L. Pratt, editors, Learning to Learn, Kluwer, 1997.
R. Valdes-Perez and H. A. Simon, “A Powerful Heuristic for the Discovery of Complex Patterned Behavior,” Proceedings of the 11th International Conference on Machine Learning, ML-94, Rutgers University, New Jersey, 1994, pp. 326–334.
A. Weigend, D. Rumelhart, and B. Huberman, “Generalization by Weight-Elimination with Application to Forecasting,” Advances in Neural Information Processing Systems 3, (Proceedings of NIPS-90), 1991, pp. 875–882.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Caruana, R. (1998). A Dozen Tricks with Multitask Learning. In: Orr, G.B., Müller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 1524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49430-8_9
Download citation
DOI: https://doi.org/10.1007/3-540-49430-8_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65311-0
Online ISBN: 978-3-540-49430-0
eBook Packages: Springer Book Archive