Knowledge Distillation with Teacher Multi-task Model for Biomedical Named Entity Recognition

Mehmood, Tahir; Lavelli, Alberto; Serina, Ivan; Gerevini, Alfonso

doi:10.1007/978-981-16-3013-2_3

Tahir Mehmood⁷,
Alberto Lavelli⁸,
Ivan Serina⁷ &
…
Alfonso Gerevini⁷

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 242))

386 Accesses
1 Citations

Abstract

A Multi-task model (MTM) learns specific features using shared and task specific layers among different tasks, an approach that turned out to be effective in those tasks where limited data is available to train the model. In this research, we utilize this characteristic of MTM using knowledge distillation to enhance the performance of a single task model (STM). STMs have difficulties in learning complex feature representations from a limited amount of annotated data. Distilling knowledge from MTM will help STM to learn more complex feature representations during the training phase. We use feature representations from different layers of a MTM to teach the student model during its training. Our approach shows distinguishable improvements in terms of F1-score with respect to STM. We further performed a statistical analysis to investigate the effect of different teacher models on different student models. We found that a Softmax-based teacher model is more effective for token level knowledge distillation than a CRF-based teacher model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An Empirical Study of Multi-domain and Multi-task Learning in Chinese Named Entity Recognition

Improving Low-Resource Named Entity Recognition via Label-Aware Data Augmentation and Curriculum Denoising

Named Entity Extractors for New Domains by Transfer Learning with Automatically Annotated Data

Notes

1.
The datasets can be found at https://github.com/cambridgeltl/MTL-Bioinformatics-2016.
2.
https://github.com/yuzhimanhua/Multi-BioNER.

References

Bansal, T., Belanger, D., McCallum, A.: Ask the GRU: Multi-task learning for deep text recommendations. In: Sen, S., Geyer, W., Freyne, J., Castells, P. (eds.) Proceedings of the 10th ACM Conf. on Recommender Systems, Boston, MA, USA, September 15-19, 2016. pp. 107–114. ACM (2016)
Google Scholar
Crichton, G.K.O., Pyysalo, S., Chiu, B., Korhonen, A.: A neural network multi-task learning approach to biomedical named entity recognition. BMC Bioinform. 18(1), 368:1–368:14 (2017)
Google Scholar
Giorgi, J.M., Bader, G.D.: Transfer learning for biomedical named entity recognition with neural networks. Bioinformatics 34(23), 4087–4094 (2018)
Article Google Scholar
Gridach, M.: Character-level neural network for biomedical named entity recognition. J. Biomed. Informatics 70, 85–91 (2017)
Article Google Scholar
Hinton, G.E., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. CoRR abs/1503.02531 (2015)
Google Scholar
Mehmood, T., Gerevini, A., Lavelli, A., Serina, I.: Leveraging multi-task learning for biomedical named entity recognition. In: Alviano, M., Greco, G., Scarcello, F. (eds.) AI*IA 2019 - Advances in Artificial Intelligence - 18th Intl. Conf. of the Italian Association for Artificial Intelligence, Rende, Italy, November 19-22, 2019, Proceedings. LNCS, vol. 11946, pp. 431–444. Springer (2019)
Google Scholar
Mehmood, T., Serina, I., Lavelli, A., Gerevini, A.: Knowledge distillation techniques for biomedical named entity recognition. In: Basile, P., Basile, V., Croce, D., Cabrio, E. (eds.) Proceedings of the 4th Workshop on Natural Language for Artificial Intelligence (NL4AI 2020) co-located with the 19th Intl. Conf. of the Italian Association for Artificial Intelligence (AI*IA 2020), November 25th-27th, 2020. CEUR Workshop Proceedings, vol. 2735, pp. 141–156. CEUR-WS.org (2020)
Google Scholar
Putelli, L., Gerevini, A., Lavelli, A., Serina, I.: Applying self-interaction attention for extracting drug-drug interactions. In: Alviano, M., Greco, G., Scarcello, F. (eds.) AI*IA 2019 - Advances in Artificial Intelligence - 18th Intl. Conf. of the Italian Association for Artificial Intelligence, Rende, Italy, November 19-22, 2019, Proceedings. LNCS, vol. 11946, pp. 445–460. Springer (2019)
Google Scholar
Putelli, L., Gerevini, A.E., Lavelli, A., Serina, I.: The impact of self-interaction attention on the extraction of drug-drug interactions. In: Bernardi, R., Navigli, R., Semeraro, G. (eds.) Proceedings of the Sixth Italian Conf. on Computational Linguistics, Bari, Italy, November 13-15, 2019. CEUR Workshop Proceedings, vol. 2481. CEUR-WS.org (2019)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27: Annual Conf. on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada. pp. 3104–3112 (2014)
Google Scholar
Tang, R., Lu, Y., Liu, L., Mou, L., Vechtomova, O., Lin, J.: Distilling task-specific knowledge from BERT into simple neural networks. CoRR abs/1903.12136 (2019)
Google Scholar
Wang, X., Jiang, Y., Bach, N., Wang, T., Huang, F., Tu, K.: Structure-level knowledge distillation for multilingual sequence labeling. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. pp. 3317–3330. Association for Computational Linguistics (2020)
Google Scholar
Wang, X., Zhang, Y., Ren, X., Zhang, Y., Zitnik, M., Shang, J., Langlotz, C., Han, J.: Cross-type biomedical named entity recognition with deep multi-task learning. Bioinformatics 35(10), 1745–1752 (2019)
Article Google Scholar
Zhou, J., Cao, Y., Wang, X., Li, P., Xu, W.: Deep recurrent models with fast-forward connections for neural machine translation. Trans. Assoc. Comput. Linguistics 4, 371–383 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Brescia, 25121, Brescia, Italy
Tahir Mehmood, Ivan Serina & Alfonso Gerevini
Fondazione Bruno Kessler, 38123, Povo, Trento, Italy
Alberto Lavelli

Authors

Tahir Mehmood
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Lavelli
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Serina
View author publications
You can also search for this author in PubMed Google Scholar
Alfonso Gerevini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tahir Mehmood .

Editor information

Editors and Affiliations

Ritsumeikan University, Kyoto, Japan
Yen-Wei Chen
Ritsumeikan University, Kyoto, Japan
Satoshi Tanaka
Bournemouth University, Poole, UK
Robert J. Howlett
University of Technology Sydney, Sydney, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mehmood, T., Lavelli, A., Serina, I., Gerevini, A. (2021). Knowledge Distillation with Teacher Multi-task Model for Biomedical Named Entity Recognition. In: Chen, YW., Tanaka, S., Howlett, R.J., Jain, L.C. (eds) Innovation in Medicine and Healthcare. Smart Innovation, Systems and Technologies, vol 242. Springer, Singapore. https://doi.org/10.1007/978-981-16-3013-2_3

Download citation

DOI: https://doi.org/10.1007/978-981-16-3013-2_3
Published: 06 June 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-3012-5
Online ISBN: 978-981-16-3013-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Knowledge Distillation with Teacher Multi-task Model for Biomedical Named Entity Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An Empirical Study of Multi-domain and Multi-task Learning in Chinese Named Entity Recognition

Improving Low-Resource Named Entity Recognition via Label-Aware Data Augmentation and Curriculum Denoising

Named Entity Extractors for New Domains by Transfer Learning with Automatically Annotated Data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Knowledge Distillation with Teacher Multi-task Model for Biomedical Named Entity Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An Empirical Study of Multi-domain and Multi-task Learning in Chinese Named Entity Recognition

Improving Low-Resource Named Entity Recognition via Label-Aware Data Augmentation and Curriculum Denoising

Named Entity Extractors for New Domains by Transfer Learning with Automatically Annotated Data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation