Skip to main content

Knowledge Distillation with Teacher Multi-task Model for Biomedical Named Entity Recognition

  • Conference paper
  • First Online:
Innovation in Medicine and Healthcare

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 242))

Abstract

A Multi-task model (MTM) learns specific features using shared and task specific layers among different tasks, an approach that turned out to be effective in those tasks where limited data is available to train the model. In this research, we utilize this characteristic of MTM using knowledge distillation to enhance the performance of a single task model (STM). STMs have difficulties in learning complex feature representations from a limited amount of annotated data. Distilling knowledge from MTM will help STM to learn more complex feature representations during the training phase. We use feature representations from different layers of a MTM to teach the student model during its training. Our approach shows distinguishable improvements in terms of F1-score with respect to STM. We further performed a statistical analysis to investigate the effect of different teacher models on different student models. We found that a Softmax-based teacher model is more effective for token level knowledge distillation than a CRF-based teacher model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The datasets can be found at https://github.com/cambridgeltl/MTL-Bioinformatics-2016.

  2. 2.

    https://github.com/yuzhimanhua/Multi-BioNER.

References

  1. Bansal, T., Belanger, D., McCallum, A.: Ask the GRU: Multi-task learning for deep text recommendations. In: Sen, S., Geyer, W., Freyne, J., Castells, P. (eds.) Proceedings of the 10th ACM Conf. on Recommender Systems, Boston, MA, USA, September 15-19, 2016. pp. 107–114. ACM (2016)

    Google Scholar 

  2. Crichton, G.K.O., Pyysalo, S., Chiu, B., Korhonen, A.: A neural network multi-task learning approach to biomedical named entity recognition. BMC Bioinform. 18(1), 368:1–368:14 (2017)

    Google Scholar 

  3. Giorgi, J.M., Bader, G.D.: Transfer learning for biomedical named entity recognition with neural networks. Bioinformatics 34(23), 4087–4094 (2018)

    Article  Google Scholar 

  4. Gridach, M.: Character-level neural network for biomedical named entity recognition. J. Biomed. Informatics 70, 85–91 (2017)

    Article  Google Scholar 

  5. Hinton, G.E., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. CoRR abs/1503.02531 (2015)

    Google Scholar 

  6. Mehmood, T., Gerevini, A., Lavelli, A., Serina, I.: Leveraging multi-task learning for biomedical named entity recognition. In: Alviano, M., Greco, G., Scarcello, F. (eds.) AI*IA 2019 - Advances in Artificial Intelligence - 18th Intl. Conf. of the Italian Association for Artificial Intelligence, Rende, Italy, November 19-22, 2019, Proceedings. LNCS, vol. 11946, pp. 431–444. Springer (2019)

    Google Scholar 

  7. Mehmood, T., Serina, I., Lavelli, A., Gerevini, A.: Knowledge distillation techniques for biomedical named entity recognition. In: Basile, P., Basile, V., Croce, D., Cabrio, E. (eds.) Proceedings of the 4th Workshop on Natural Language for Artificial Intelligence (NL4AI 2020) co-located with the 19th Intl. Conf. of the Italian Association for Artificial Intelligence (AI*IA 2020), November 25th-27th, 2020. CEUR Workshop Proceedings, vol. 2735, pp. 141–156. CEUR-WS.org (2020)

    Google Scholar 

  8. Putelli, L., Gerevini, A., Lavelli, A., Serina, I.: Applying self-interaction attention for extracting drug-drug interactions. In: Alviano, M., Greco, G., Scarcello, F. (eds.) AI*IA 2019 - Advances in Artificial Intelligence - 18th Intl. Conf. of the Italian Association for Artificial Intelligence, Rende, Italy, November 19-22, 2019, Proceedings. LNCS, vol. 11946, pp. 445–460. Springer (2019)

    Google Scholar 

  9. Putelli, L., Gerevini, A.E., Lavelli, A., Serina, I.: The impact of self-interaction attention on the extraction of drug-drug interactions. In: Bernardi, R., Navigli, R., Semeraro, G. (eds.) Proceedings of the Sixth Italian Conf. on Computational Linguistics, Bari, Italy, November 13-15, 2019. CEUR Workshop Proceedings, vol. 2481. CEUR-WS.org (2019)

    Google Scholar 

  10. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27: Annual Conf. on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada. pp. 3104–3112 (2014)

    Google Scholar 

  11. Tang, R., Lu, Y., Liu, L., Mou, L., Vechtomova, O., Lin, J.: Distilling task-specific knowledge from BERT into simple neural networks. CoRR abs/1903.12136 (2019)

    Google Scholar 

  12. Wang, X., Jiang, Y., Bach, N., Wang, T., Huang, F., Tu, K.: Structure-level knowledge distillation for multilingual sequence labeling. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. pp. 3317–3330. Association for Computational Linguistics (2020)

    Google Scholar 

  13. Wang, X., Zhang, Y., Ren, X., Zhang, Y., Zitnik, M., Shang, J., Langlotz, C., Han, J.: Cross-type biomedical named entity recognition with deep multi-task learning. Bioinformatics 35(10), 1745–1752 (2019)

    Article  Google Scholar 

  14. Zhou, J., Cao, Y., Wang, X., Li, P., Xu, W.: Deep recurrent models with fast-forward connections for neural machine translation. Trans. Assoc. Comput. Linguistics 4, 371–383 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tahir Mehmood .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mehmood, T., Lavelli, A., Serina, I., Gerevini, A. (2021). Knowledge Distillation with Teacher Multi-task Model for Biomedical Named Entity Recognition. In: Chen, YW., Tanaka, S., Howlett, R.J., Jain, L.C. (eds) Innovation in Medicine and Healthcare. Smart Innovation, Systems and Technologies, vol 242. Springer, Singapore. https://doi.org/10.1007/978-981-16-3013-2_3

Download citation

Publish with us

Policies and ethics