Abstract
In generative dialog systems, learning representations for the dialog context is a crucial step in generating high quality responses. The dialog systems are required to capture useful and compact information from mutually dependent sentences such that the generation process can effectively attend to the central semantics. Unfortunately, existing methods may not effectively identify importance distributions for each lower position when computing an upper level feature, which may lead to the loss of information critical to the constitution of the final context representations. To address this issue, we propose a transfer learning based method named transfer hierarchical attention network (THAN). The THAN model can leverage useful prior knowledge from two related auxiliary tasks, i.e., keyword extraction and sentence entailment, to facilitate the dialog representation learning for the main dialog generation task. During the transfer process, the syntactic structure and semantic relationship from the auxiliary tasks are distilled to enhance both the word-level and sentence-level attention mechanisms for the dialog system. Empirically, extensive experiments on the Twitter Dialog Corpus and the PERSONA-CHAT dataset demonstrate the effectiveness of the proposed THAN model compared with the state-of-the-art methods.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Change history
17 January 2020
A Correction to this paper has been published: https://doi.org/10.1007/s11633-020-1223-6
References
J. Weizenbaum. ELIZA — A computer program for the study of natural language communication between man and machine. Communications of the ACM, vol. 9, no. 1, pp. 36–45, 1966. DOI: https://doi.org/10.1145/365153.365168.
H. Wang, Z. D. Lu, H. Li, E. H. Chen. A dataset for research on short-text conversations. In Proceedings of 2013 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Seattle, USA, pp. 935–945, 2013.
Y. Wu, W. Wu, C. Xing, Z. J. Li, M. Zhou. Sequential matching network: A new architecture for multi-turn response selection in retrieval-based chatbots. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Vancouver, Canada, pp. 496–505, 2017. DOI: https://doi.org/10.18653/v1/P17-1046.
X. Y. Zhou, D. X. Dong, H. Wu, S. Q. Zhao, D. H. Yu, H. Tian, X. Liu, R. Yan. Multi-view response selection for human-computer conversation. In Proceedings of 2016 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Austin, USA, pp. 372–381, 2016.
T. Poggio, H. Mhaskar, L. Rosasco, B. Miranda, Q. L. Liao. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review. International Journal of Automation and Computing, vol. 14, no. 5, pp. 503–519, 2017. DOI: https://doi.org/10.1007/s11633-017-1054-2.
Y. LeCun, Y. Bengio, G. Hinton. Deep learning. Nature, vol. 521, no. 7553, pp. 436–444, 2015. DOI: https://doi.org/10.1038/nature14539.
L. Zhou, J. F. Gao, D. Li, H. Y. Shum. The design and implementation of XiaoIce, an empathetic social chatbot. arXiv. preprint arXiv: 1812.08989, 2018.
H. S. Chen, X. R. Liu, D. W. Yin, J. L. Tang. A survey on dialogue systems: Recent advances and new frontiers. ACM SIGKDD Explorations Newsletter, vol. 19, no. 2, pp. 25–35, 2017. DOI: https://doi.org/10.1145/3166054.3166058.
O. Vinyals, Q. V. Le. A neural conversational model. In Proceedings of the 31st International Conference on Machine Learning Workshop, Lille, France, 2015.
L. F. Shang, Z. D. Lu, H. Li. Neural responding machine for short-text conversation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Association for Computational Linguistics, Beijing, China, pp. 1577–1586, 2015.
D. Bahdanau, K. Cho, Y. Bengio. Neural machine translation by jointly learning to align and translate, arXiv preprint, arXiv: 1409.0473, 2014.
I. V. Serban, A. Sordoni, Y. Bengio, A. Courville, J. Pineau. Building end-to-end dialogue systems using generative hierarchical neural network models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, AAAI Press, Phoenix, USA, pp. 3776–3783, 2016.
C. Xing, W. Wu, Y. Wu, M. Zhou, Y. L. Huang, W. Y. Ma. Hierarchical recurrent attention network for response generation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, AAAI Press, New Orleans, USA, 2018.
L. M. Liu, M. Utiyama, A. Finch, E. Sumita. Neural machine translation with supervised attention. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Association for Computational Linguistics, Osaka, Japan, pp. 3093–3102, 2016.
A. Radford, K. Narasimhan, T. Salimans, I. Sutskever. Improving language understanding by generative pre-training, [Online], Available: https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf, 2018.
J. Devlin, M. W. Chang, K. Lee, K. Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv preprint, arXiv: 1810.04805, 2018.
A. Søgaard, Y. Goldberg. Deep multi-task learning with low level tasks supervised at lower layers. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Berlin, Germany, pp. 231–235, 2016.
K. Hashimoto, C. M. Xiong, Y. Tsuruoka, R. Socher. A joint many-task model: Growing a neural network for multiple NLP tasks. In Proceedings of International Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Copenhagen, Denmark, pp. 446–451, 2017.
I. V. Serban, A. Sordoni, R. Lowe, L. Charlin, J. Pineau, A. Courville, Y. Bengio. A hierarchical latent variable encoder-decoder model for generating dialogues. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, AAAI Press, San Francisco, USA, pp. 3295–3301, 2017.
T. C. Zhao, R. Zhao, M. Eskenazi. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Vancouver, Canada, pp. 654–664, 2017. DOI: https://doi.org/10.18653/v1/P17-1061.
I. V. Serban, T. Klinger, G. Tesauro, K. Talamadupula, B. W. Zhou, Y. Bengio, A. Courville. Multiresolution recurrent neural networks: An application to dialogue response generation. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, AAAI Press, San Francisco, USA, pp. 3288–3294, 2017.
M. Y. Zhang, G. H. Tian, C. C. Li, J. Gong. Learning to transform service instructions into actions with reinforcement learning and knowledge base. International Journal of Automation and Computing, vol. 15, no. 5, pp. 582–592, 2018. DOI: https://doi.org/10.1007/s11633-018-1128-9.
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller. Playing Atari with deep reinforcement learning. arXiv preprint, arXiv: 1312.5602, 2013.
J. W. Li, W. Monroe, A. Ritter, M. Galley, J. F. Gao, D. Jurafsky. Deep reinforcement learning for dialogue generation. In Proceedings of International Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Austin, USA, pp. 1192–1202, 2016.
J. W. Li, W. Monroe, T. L. Shi, S. Jean, A. Ritter, D. Jurafsky. Adversarial learning for neural dialogue generation. In Proceedings of 2017 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Copenhagen, Denmark, pp. 2157–2169, 2017.
I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio. Generative adversarial nets. In Proceedings of International Conference on Neural Information Processing Systems, MIT Press, Montreal, Canada, pp. 2672–2680, 2014.
C. Xing, W. Wu, Y. Wu, J. Liu, Y. L. Huang, M. Zhou, W. Y. Ma. Topic aware neural response generation. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, AAAI Press, San Francisco, USA, pp. 3351–3357, 2017.
D. M. Blei, A. Y. Ng, M. I. Jordan. Latent dirichlet allocation. The Journal of machine Learning Research, vol. 3, pp. 993–1022, 2003.
L. L. Mou, Y. P. Song, R. Yan, G. Li, L. Zhang, Z. Jin. Sequence to backward and forward sequences: A content-introducing approach to generative short-text conversation. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers, Association for Computational Linguistics, Osaka, Japan, pp. 3349–3358, 2016.
H. Zhou, T. Young, M. L. Huang, H. Z. Zhao, J. F. Xu, X. Y. Zhu. Commonsense knowledge aware conversation generation with graph attention. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI, Stockholm, Sweden, pp. 4623–4629, 2018.
J. W. Li, M. Galley, C. Brockett, G. P. Spithourakis, J. F. Gao, B. Dolan. A persona-based neural conversation model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Berlin, Germany, pp. 994–1003, 2016.
M. T. Luong, H. Pham, C. D. Manning. Effective approaches to attention-based neural machine translation. In Proceedings of 2015 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Lisbon, Portugal, pp. 1412–1421, 2015.
K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, pp. 2048–2057, 2015.
H. T. Mi, Z. G. Wang, A. Ittycheriah. Supervised attentions for neural machine translation. In Proceedings of International Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Austin, USA, pp. 2283–2288, 2016.
T. Cohn, C. D. V. Hoang, E. Vymolova, K. S. Yao, C. Dyer, G. Haffari. Incorporating structural alignment biases into an attentional neural translation model. In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, San Diego, USA, pp. 876–885, 2016.
S. Feng, S. J. Liu, M. Li, M. Zhou. Implicit distortion and fertility models for attention-based encoder-decoder NMT model. arXiv preprint, arXiv: 1601.03317, 2016.
S. J. Pan, Q. Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–1359, 2010. DOI: https://doi.org/10.1109/TKDE.2009.191.
J. Howard, S. Ruder. Fine-tuned language models for text classification. arXiv preprint, arXiv: 1801.06146, 2018.
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J. Dean. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems, Curran Associates Inc., Lake Tahoe, USA, pp. 3111–3119, 2013.
J. Pennington, R. Socher, C. D. Manning. GloVe: Global vectors for word representation. In Proceedings of International Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Doha, Qatar, pp. 1532–1543, 2014.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, USA, pp. 5998–6008, 2017.
P. J. Liu, M. Saleh, E. Pot, B. Goodrich, R. Sepassi, L. Kaiser, N. Shazeer. Generating Wikipedia by summarizing long sequences. arXiv preprint, arXiv: 1801.10198, 2018.
N. Kitaev, D. Klein. Constituency parsing with a self-attentive encoder. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Melbourne, USA, 2018.
H. S. Chen, Y. Zhang, Q. Liu. Neural network for heterogeneous annotations. In Proceedings of International Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Austin, USA, pp. 731–741, 2016.
H. M. Wang, Y. Zhang, G. L. Chan, J. Yang, H. L. Chieu. Universal dependencies parsing for colloquial Singaporean English. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Vancouver, Canada, pp. 1732–1744, 2017. DOI: https://doi.org/10.18653/v1/P17-1159.
L. Marujo, A. Gershman, J. Carbonell, R. Frederking, J. P. Neto. Supervised topical key phrase extraction of news stories using crowdsourcing, light filtering and co-reference normalization. In Proceedings of the 8th International Conference on Language Resources and Evaluation, European Language Resources Association, Istanbul, Turkey, 2012.
A. Graves, J. Schmidhuber. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, vol. 18, no. 5–6, pp. 602–610, 2005. DOI: https://doi.org/10.1016/j.neunet.2005.06.042.
S. Hochreiter, J. Schmidhuber. Long short-term memory. Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997. DOI: https://doi.org/10.1162/neco.1997.9.8.1735.
S. R. Bowman, G. Angeli, C. Potts, C. D. Manning. A large annotated corpus for learning natural language inference. In Proceedings of International Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Lisbon, Portugal, pp. 632–642, 2015.
T. Mikolov, M. Karafiát, L. Burget, J. Černockỳ, S. Khudanpur. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association, ISCA, Makuhari, Chiba, Japan, pp. 1045–1048, 2010.
A. Ritter, C. Cherry, W. B. Dolan. Data-driven response generation in social media. In Proceedings of 2011 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Edinburgh, UK, pp. 583–593, 2011.
S. Z. Zhang, E. Dinan, J. Urbanek, A. Szlam, D. Kiela, J. Weston. Personalizing dialogue agents: I have a dog, do you have pets too? In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Melbourne, Australia, pp. 2204–2213, 2018. DOI: https://doi.org/10.18653/v1/P18-1205.
D. P. Kingma, J. Ba. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference for Learning Representations, San Diego, USA, 2014.
C. W. Liu, R. Lowe, I. V. Serban, M. Noseworthy, L. Charlin, J. Pineau. How not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. In Proceedings of International Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Austin, USA, pp. 2122–2132, 2016.
Y. Bengio, R. Ducharme, P. Vincent, C. Jauvin. A neural probabilistic language model. Journal of Machine Learning Research, vol. 3, pp. 1137–1155, 2003.
J. Wieting, M. Bansal, K. Gimpel, K. Livescu. Towards universal paraphrastic sentence embeddings. arXiv preprint, arXiv: 1511.08198, 2015.
G. Forgues, J. Pineau, J. M. Larchevêque, R. Tremblay. Bootstrapping dialog systems with word embeddings. In Proceedings of NIPS Workshop on Modern Machine Learning and Natural Language Processing Workshop, Montreal, Canada, 2014.
Author information
Authors and Affiliations
Corresponding author
Additional information
Recommended by Associate Editor Hongji Yang
The original version of this article was revised due to a retrospective Open Access order
Xiang Zhang is a master of Philosophy candidate of the Computer Science and Engineering Department in Hong Kong University of Science and Technology, China.
His research interests indude natural language processing, transfer learning and deep neural networks.
Qiang Yang received the Ph.D. degree form the University of Maryland, College Park, USA in 1989. He is the chief AI officer of WeBank, China’s first internet only bank with more than 100 million customers. He is also a chair professor at Computer Science and Engineering Department at Hong Kong University of Science and Technology, China. He is a Fellow of AAAI, ACM, IEEE, AAAS, and the founding Editor in Chief of the ACM Transactions on Intelligent Systems and Technology (ACM TIST) and the founding Editor in Chief of IEEE Transactions on Big Data (IEEE TBD). He has taught at the University of Waterloo and Simon Fraser University. He received the ACM SIGKDD Distinguished Service Award in 2017, AAAI Distinguished Applications Award in 2018, Best Paper Award of ACM TiiS in 2017, and the championship of ACM KDDCUP in 2004 and 2005. He is the current President of IJCAI (2017-2019) and an executive council member of AAAI.
His research interests include artificial intelligence, machine learning, especially transfer learning and federated machine learning.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits use, duplication, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zhang, X., Yang, Q. Transfer Hierarchical Attention Network for Generative Dialog System. Int. J. Autom. Comput. 16, 720–736 (2019). https://doi.org/10.1007/s11633-019-1200-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11633-019-1200-0