Abstract
A Multi-task model (MTM) learns specific features using shared and task specific layers among different tasks, an approach that turned out to be effective in those tasks where limited data is available to train the model. In this research, we utilize this characteristic of MTM using knowledge distillation to enhance the performance of a single task model (STM). STMs have difficulties in learning complex feature representations from a limited amount of annotated data. Distilling knowledge from MTM will help STM to learn more complex feature representations during the training phase. We use feature representations from different layers of a MTM to teach the student model during its training. Our approach shows distinguishable improvements in terms of F1-score with respect to STM. We further performed a statistical analysis to investigate the effect of different teacher models on different student models. We found that a Softmax-based teacher model is more effective for token level knowledge distillation than a CRF-based teacher model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The datasets can be found at https://github.com/cambridgeltl/MTL-Bioinformatics-2016.
- 2.
References
Bansal, T., Belanger, D., McCallum, A.: Ask the GRU: Multi-task learning for deep text recommendations. In: Sen, S., Geyer, W., Freyne, J., Castells, P. (eds.) Proceedings of the 10th ACM Conf. on Recommender Systems, Boston, MA, USA, September 15-19, 2016. pp. 107–114. ACM (2016)
Crichton, G.K.O., Pyysalo, S., Chiu, B., Korhonen, A.: A neural network multi-task learning approach to biomedical named entity recognition. BMC Bioinform. 18(1), 368:1–368:14 (2017)
Giorgi, J.M., Bader, G.D.: Transfer learning for biomedical named entity recognition with neural networks. Bioinformatics 34(23), 4087–4094 (2018)
Gridach, M.: Character-level neural network for biomedical named entity recognition. J. Biomed. Informatics 70, 85–91 (2017)
Hinton, G.E., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. CoRR abs/1503.02531 (2015)
Mehmood, T., Gerevini, A., Lavelli, A., Serina, I.: Leveraging multi-task learning for biomedical named entity recognition. In: Alviano, M., Greco, G., Scarcello, F. (eds.) AI*IA 2019 - Advances in Artificial Intelligence - 18th Intl. Conf. of the Italian Association for Artificial Intelligence, Rende, Italy, November 19-22, 2019, Proceedings. LNCS, vol. 11946, pp. 431–444. Springer (2019)
Mehmood, T., Serina, I., Lavelli, A., Gerevini, A.: Knowledge distillation techniques for biomedical named entity recognition. In: Basile, P., Basile, V., Croce, D., Cabrio, E. (eds.) Proceedings of the 4th Workshop on Natural Language for Artificial Intelligence (NL4AI 2020) co-located with the 19th Intl. Conf. of the Italian Association for Artificial Intelligence (AI*IA 2020), November 25th-27th, 2020. CEUR Workshop Proceedings, vol. 2735, pp. 141–156. CEUR-WS.org (2020)
Putelli, L., Gerevini, A., Lavelli, A., Serina, I.: Applying self-interaction attention for extracting drug-drug interactions. In: Alviano, M., Greco, G., Scarcello, F. (eds.) AI*IA 2019 - Advances in Artificial Intelligence - 18th Intl. Conf. of the Italian Association for Artificial Intelligence, Rende, Italy, November 19-22, 2019, Proceedings. LNCS, vol. 11946, pp. 445–460. Springer (2019)
Putelli, L., Gerevini, A.E., Lavelli, A., Serina, I.: The impact of self-interaction attention on the extraction of drug-drug interactions. In: Bernardi, R., Navigli, R., Semeraro, G. (eds.) Proceedings of the Sixth Italian Conf. on Computational Linguistics, Bari, Italy, November 13-15, 2019. CEUR Workshop Proceedings, vol. 2481. CEUR-WS.org (2019)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27: Annual Conf. on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada. pp. 3104–3112 (2014)
Tang, R., Lu, Y., Liu, L., Mou, L., Vechtomova, O., Lin, J.: Distilling task-specific knowledge from BERT into simple neural networks. CoRR abs/1903.12136 (2019)
Wang, X., Jiang, Y., Bach, N., Wang, T., Huang, F., Tu, K.: Structure-level knowledge distillation for multilingual sequence labeling. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. pp. 3317–3330. Association for Computational Linguistics (2020)
Wang, X., Zhang, Y., Ren, X., Zhang, Y., Zitnik, M., Shang, J., Langlotz, C., Han, J.: Cross-type biomedical named entity recognition with deep multi-task learning. Bioinformatics 35(10), 1745–1752 (2019)
Zhou, J., Cao, Y., Wang, X., Li, P., Xu, W.: Deep recurrent models with fast-forward connections for neural machine translation. Trans. Assoc. Comput. Linguistics 4, 371–383 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Mehmood, T., Lavelli, A., Serina, I., Gerevini, A. (2021). Knowledge Distillation with Teacher Multi-task Model for Biomedical Named Entity Recognition. In: Chen, YW., Tanaka, S., Howlett, R.J., Jain, L.C. (eds) Innovation in Medicine and Healthcare. Smart Innovation, Systems and Technologies, vol 242. Springer, Singapore. https://doi.org/10.1007/978-981-16-3013-2_3
Download citation
DOI: https://doi.org/10.1007/978-981-16-3013-2_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-3012-5
Online ISBN: 978-981-16-3013-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)