Transfer Hierarchical Attention Network for Generative Dialog System

Zhang, Xiang; Yang, Qiang

doi:10.1007/s11633-019-1200-0

Transfer Hierarchical Attention Network for Generative Dialog System

Research Article
Open access
Published: 16 October 2019

Volume 16, pages 720–736, (2019)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Automation and Computing Aims and scope Submit manuscript

Transfer Hierarchical Attention Network for Generative Dialog System

Download PDF

830 Accesses
12 Citations
2 Altmetric
Explore all metrics

A Correction to this article was published on 17 January 2020

This article has been updated

Abstract

In generative dialog systems, learning representations for the dialog context is a crucial step in generating high quality responses. The dialog systems are required to capture useful and compact information from mutually dependent sentences such that the generation process can effectively attend to the central semantics. Unfortunately, existing methods may not effectively identify importance distributions for each lower position when computing an upper level feature, which may lead to the loss of information critical to the constitution of the final context representations. To address this issue, we propose a transfer learning based method named transfer hierarchical attention network (THAN). The THAN model can leverage useful prior knowledge from two related auxiliary tasks, i.e., keyword extraction and sentence entailment, to facilitate the dialog representation learning for the main dialog generation task. During the transfer process, the syntactic structure and semantic relationship from the auxiliary tasks are distilled to enhance both the word-level and sentence-level attention mechanisms for the dialog system. Empirically, extensive experiments on the Twitter Dialog Corpus and the PERSONA-CHAT dataset demonstrate the effectiveness of the proposed THAN model compared with the state-of-the-art methods.

Article PDF

Dynamic Multi-level Attention Models for Dialogue Response Generation

RoKGDS: A Robust Knowledge Grounded Dialog System

Knowledge-Aware Self-Attention Networks for Document Grounded Dialogue Generation

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Change history

17 January 2020
A Correction to this paper has been published: https://doi.org/10.1007/s11633-020-1223-6

References

J. Weizenbaum. ELIZA — A computer program for the study of natural language communication between man and machine. Communications of the ACM, vol. 9, no. 1, pp. 36–45, 1966. DOI: https://doi.org/10.1145/365153.365168.
H. Wang, Z. D. Lu, H. Li, E. H. Chen. A dataset for research on short-text conversations. In Proceedings of 2013 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Seattle, USA, pp. 935–945, 2013.
Google Scholar
Y. Wu, W. Wu, C. Xing, Z. J. Li, M. Zhou. Sequential matching network: A new architecture for multi-turn response selection in retrieval-based chatbots. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Vancouver, Canada, pp. 496–505, 2017. DOI: https://doi.org/10.18653/v1/P17-1046.
Google Scholar
X. Y. Zhou, D. X. Dong, H. Wu, S. Q. Zhao, D. H. Yu, H. Tian, X. Liu, R. Yan. Multi-view response selection for human-computer conversation. In Proceedings of 2016 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Austin, USA, pp. 372–381, 2016.
Chapter Google Scholar
T. Poggio, H. Mhaskar, L. Rosasco, B. Miranda, Q. L. Liao. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review. International Journal of Automation and Computing, vol. 14, no. 5, pp. 503–519, 2017. DOI: https://doi.org/10.1007/s11633-017-1054-2.
Article Google Scholar
Y. LeCun, Y. Bengio, G. Hinton. Deep learning. Nature, vol. 521, no. 7553, pp. 436–444, 2015. DOI: https://doi.org/10.1038/nature14539.
Article Google Scholar
L. Zhou, J. F. Gao, D. Li, H. Y. Shum. The design and implementation of XiaoIce, an empathetic social chatbot. arXiv. preprint arXiv: 1812.08989, 2018.
H. S. Chen, X. R. Liu, D. W. Yin, J. L. Tang. A survey on dialogue systems: Recent advances and new frontiers. ACM SIGKDD Explorations Newsletter, vol. 19, no. 2, pp. 25–35, 2017. DOI: https://doi.org/10.1145/3166054.3166058.
Article Google Scholar
O. Vinyals, Q. V. Le. A neural conversational model. In Proceedings of the 31st International Conference on Machine Learning Workshop, Lille, France, 2015.
L. F. Shang, Z. D. Lu, H. Li. Neural responding machine for short-text conversation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Association for Computational Linguistics, Beijing, China, pp. 1577–1586, 2015.
Google Scholar
D. Bahdanau, K. Cho, Y. Bengio. Neural machine translation by jointly learning to align and translate, arXiv preprint, arXiv: 1409.0473, 2014.
I. V. Serban, A. Sordoni, Y. Bengio, A. Courville, J. Pineau. Building end-to-end dialogue systems using generative hierarchical neural network models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, AAAI Press, Phoenix, USA, pp. 3776–3783, 2016.
Google Scholar
C. Xing, W. Wu, Y. Wu, M. Zhou, Y. L. Huang, W. Y. Ma. Hierarchical recurrent attention network for response generation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, AAAI Press, New Orleans, USA, 2018.
Google Scholar
L. M. Liu, M. Utiyama, A. Finch, E. Sumita. Neural machine translation with supervised attention. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Association for Computational Linguistics, Osaka, Japan, pp. 3093–3102, 2016.
Google Scholar
A. Radford, K. Narasimhan, T. Salimans, I. Sutskever. Improving language understanding by generative pre-training, [Online], Available: https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf, 2018.
J. Devlin, M. W. Chang, K. Lee, K. Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv preprint, arXiv: 1810.04805, 2018.
A. Søgaard, Y. Goldberg. Deep multi-task learning with low level tasks supervised at lower layers. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Berlin, Germany, pp. 231–235, 2016.
Google Scholar
K. Hashimoto, C. M. Xiong, Y. Tsuruoka, R. Socher. A joint many-task model: Growing a neural network for multiple NLP tasks. In Proceedings of International Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Copenhagen, Denmark, pp. 446–451, 2017.
Google Scholar
I. V. Serban, A. Sordoni, R. Lowe, L. Charlin, J. Pineau, A. Courville, Y. Bengio. A hierarchical latent variable encoder-decoder model for generating dialogues. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, AAAI Press, San Francisco, USA, pp. 3295–3301, 2017.
Google Scholar
T. C. Zhao, R. Zhao, M. Eskenazi. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Vancouver, Canada, pp. 654–664, 2017. DOI: https://doi.org/10.18653/v1/P17-1061.
Google Scholar
I. V. Serban, T. Klinger, G. Tesauro, K. Talamadupula, B. W. Zhou, Y. Bengio, A. Courville. Multiresolution recurrent neural networks: An application to dialogue response generation. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, AAAI Press, San Francisco, USA, pp. 3288–3294, 2017.
Google Scholar
M. Y. Zhang, G. H. Tian, C. C. Li, J. Gong. Learning to transform service instructions into actions with reinforcement learning and knowledge base. International Journal of Automation and Computing, vol. 15, no. 5, pp. 582–592, 2018. DOI: https://doi.org/10.1007/s11633-018-1128-9.
Article Google Scholar
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller. Playing Atari with deep reinforcement learning. arXiv preprint, arXiv: 1312.5602, 2013.
J. W. Li, W. Monroe, A. Ritter, M. Galley, J. F. Gao, D. Jurafsky. Deep reinforcement learning for dialogue generation. In Proceedings of International Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Austin, USA, pp. 1192–1202, 2016.
Google Scholar
J. W. Li, W. Monroe, T. L. Shi, S. Jean, A. Ritter, D. Jurafsky. Adversarial learning for neural dialogue generation. In Proceedings of 2017 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Copenhagen, Denmark, pp. 2157–2169, 2017.
Google Scholar
I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio. Generative adversarial nets. In Proceedings of International Conference on Neural Information Processing Systems, MIT Press, Montreal, Canada, pp. 2672–2680, 2014.
Google Scholar
C. Xing, W. Wu, Y. Wu, J. Liu, Y. L. Huang, M. Zhou, W. Y. Ma. Topic aware neural response generation. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, AAAI Press, San Francisco, USA, pp. 3351–3357, 2017.
Google Scholar
D. M. Blei, A. Y. Ng, M. I. Jordan. Latent dirichlet allocation. The Journal of machine Learning Research, vol. 3, pp. 993–1022, 2003.
MATH Google Scholar
L. L. Mou, Y. P. Song, R. Yan, G. Li, L. Zhang, Z. Jin. Sequence to backward and forward sequences: A content-introducing approach to generative short-text conversation. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers, Association for Computational Linguistics, Osaka, Japan, pp. 3349–3358, 2016.
Google Scholar
H. Zhou, T. Young, M. L. Huang, H. Z. Zhao, J. F. Xu, X. Y. Zhu. Commonsense knowledge aware conversation generation with graph attention. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI, Stockholm, Sweden, pp. 4623–4629, 2018.
Google Scholar
J. W. Li, M. Galley, C. Brockett, G. P. Spithourakis, J. F. Gao, B. Dolan. A persona-based neural conversation model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Berlin, Germany, pp. 994–1003, 2016.
Google Scholar
M. T. Luong, H. Pham, C. D. Manning. Effective approaches to attention-based neural machine translation. In Proceedings of 2015 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Lisbon, Portugal, pp. 1412–1421, 2015.
Chapter Google Scholar
K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, pp. 2048–2057, 2015.
H. T. Mi, Z. G. Wang, A. Ittycheriah. Supervised attentions for neural machine translation. In Proceedings of International Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Austin, USA, pp. 2283–2288, 2016.
Google Scholar
T. Cohn, C. D. V. Hoang, E. Vymolova, K. S. Yao, C. Dyer, G. Haffari. Incorporating structural alignment biases into an attentional neural translation model. In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, San Diego, USA, pp. 876–885, 2016.
Google Scholar
S. Feng, S. J. Liu, M. Li, M. Zhou. Implicit distortion and fertility models for attention-based encoder-decoder NMT model. arXiv preprint, arXiv: 1601.03317, 2016.
S. J. Pan, Q. Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–1359, 2010. DOI: https://doi.org/10.1109/TKDE.2009.191.
Article Google Scholar
J. Howard, S. Ruder. Fine-tuned language models for text classification. arXiv preprint, arXiv: 1801.06146, 2018.
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J. Dean. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems, Curran Associates Inc., Lake Tahoe, USA, pp. 3111–3119, 2013.
Google Scholar
J. Pennington, R. Socher, C. D. Manning. GloVe: Global vectors for word representation. In Proceedings of International Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Doha, Qatar, pp. 1532–1543, 2014.
Google Scholar
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, USA, pp. 5998–6008, 2017.
P. J. Liu, M. Saleh, E. Pot, B. Goodrich, R. Sepassi, L. Kaiser, N. Shazeer. Generating Wikipedia by summarizing long sequences. arXiv preprint, arXiv: 1801.10198, 2018.
N. Kitaev, D. Klein. Constituency parsing with a self-attentive encoder. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Melbourne, USA, 2018.
Google Scholar
H. S. Chen, Y. Zhang, Q. Liu. Neural network for heterogeneous annotations. In Proceedings of International Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Austin, USA, pp. 731–741, 2016.
Google Scholar
H. M. Wang, Y. Zhang, G. L. Chan, J. Yang, H. L. Chieu. Universal dependencies parsing for colloquial Singaporean English. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Vancouver, Canada, pp. 1732–1744, 2017. DOI: https://doi.org/10.18653/v1/P17-1159.
Google Scholar
L. Marujo, A. Gershman, J. Carbonell, R. Frederking, J. P. Neto. Supervised topical key phrase extraction of news stories using crowdsourcing, light filtering and co-reference normalization. In Proceedings of the 8th International Conference on Language Resources and Evaluation, European Language Resources Association, Istanbul, Turkey, 2012.
Google Scholar
A. Graves, J. Schmidhuber. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, vol. 18, no. 5–6, pp. 602–610, 2005. DOI: https://doi.org/10.1016/j.neunet.2005.06.042.
Article Google Scholar
S. Hochreiter, J. Schmidhuber. Long short-term memory. Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997. DOI: https://doi.org/10.1162/neco.1997.9.8.1735.
Article Google Scholar
S. R. Bowman, G. Angeli, C. Potts, C. D. Manning. A large annotated corpus for learning natural language inference. In Proceedings of International Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Lisbon, Portugal, pp. 632–642, 2015.
Google Scholar
T. Mikolov, M. Karafiát, L. Burget, J. Černockỳ, S. Khudanpur. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association, ISCA, Makuhari, Chiba, Japan, pp. 1045–1048, 2010.
Google Scholar
A. Ritter, C. Cherry, W. B. Dolan. Data-driven response generation in social media. In Proceedings of 2011 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Edinburgh, UK, pp. 583–593, 2011.
Google Scholar
S. Z. Zhang, E. Dinan, J. Urbanek, A. Szlam, D. Kiela, J. Weston. Personalizing dialogue agents: I have a dog, do you have pets too? In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Melbourne, Australia, pp. 2204–2213, 2018. DOI: https://doi.org/10.18653/v1/P18-1205.
Google Scholar
D. P. Kingma, J. Ba. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference for Learning Representations, San Diego, USA, 2014.
C. W. Liu, R. Lowe, I. V. Serban, M. Noseworthy, L. Charlin, J. Pineau. How not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. In Proceedings of International Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Austin, USA, pp. 2122–2132, 2016.
Google Scholar
Y. Bengio, R. Ducharme, P. Vincent, C. Jauvin. A neural probabilistic language model. Journal of Machine Learning Research, vol. 3, pp. 1137–1155, 2003.
MATH Google Scholar
J. Wieting, M. Bansal, K. Gimpel, K. Livescu. Towards universal paraphrastic sentence embeddings. arXiv preprint, arXiv: 1511.08198, 2015.
G. Forgues, J. Pineau, J. M. Larchevêque, R. Tremblay. Bootstrapping dialog systems with word embeddings. In Proceedings of NIPS Workshop on Modern Machine Learning and Natural Language Processing Workshop, Montreal, Canada, 2014.

Download references

Author information

Authors and Affiliations

Computer Science and Engineering Department, Hong Kong University of Science and Technology, Hong Kong, China
Xiang Zhang & Qiang Yang

Authors

Xiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiang Zhang.

Additional information

Recommended by Associate Editor Hongji Yang

The original version of this article was revised due to a retrospective Open Access order

Xiang Zhang is a master of Philosophy candidate of the Computer Science and Engineering Department in Hong Kong University of Science and Technology, China.

His research interests indude natural language processing, transfer learning and deep neural networks.

Qiang Yang received the Ph.D. degree form the University of Maryland, College Park, USA in 1989. He is the chief AI officer of WeBank, China’s first internet only bank with more than 100 million customers. He is also a chair professor at Computer Science and Engineering Department at Hong Kong University of Science and Technology, China. He is a Fellow of AAAI, ACM, IEEE, AAAS, and the founding Editor in Chief of the ACM Transactions on Intelligent Systems and Technology (ACM TIST) and the founding Editor in Chief of IEEE Transactions on Big Data (IEEE TBD). He has taught at the University of Waterloo and Simon Fraser University. He received the ACM SIGKDD Distinguished Service Award in 2017, AAAI Distinguished Applications Award in 2018, Best Paper Award of ACM TiiS in 2017, and the championship of ACM KDDCUP in 2004 and 2005. He is the current President of IJCAI (2017-2019) and an executive council member of AAAI.

His research interests include artificial intelligence, machine learning, especially transfer learning and federated machine learning.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits use, duplication, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhang, X., Yang, Q. Transfer Hierarchical Attention Network for Generative Dialog System. Int. J. Autom. Comput. 16, 720–736 (2019). https://doi.org/10.1007/s11633-019-1200-0

Download citation

Received: 27 June 2019
Accepted: 23 August 2019
Published: 16 October 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s11633-019-1200-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Transfer Hierarchical Attention Network for Generative Dialog System

Abstract

Article PDF

Similar content being viewed by others

Dynamic Multi-level Attention Models for Dialogue Response Generation

RoKGDS: A Robust Knowledge Grounded Dialog System

Knowledge-Aware Self-Attention Networks for Document Grounded Dialogue Generation

Change history

17 January 2020

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Transfer Hierarchical Attention Network for Generative Dialog System

Abstract

Article PDF

Similar content being viewed by others

Dynamic Multi-level Attention Models for Dialogue Response Generation

RoKGDS: A Robust Knowledge Grounded Dialog System

Knowledge-Aware Self-Attention Networks for Document Grounded Dialogue Generation

Explore related subjects

Change history

17 January 2020

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation