Abstract
As the core component of intelligent dialogue systems, spoken language understanding (SLU) usually includes two tasks: intent detection and slot filling. In real-world scenarios, users may express multiple intents in an utterance, and a token-level slot label can belong to multiple intents. Intent detection and slot filling tasks are closely related and instruct each other. In this paper, we propose the heterogeneous interaction graph framework with window mechanism for joint multi-intent detection and slot filling, which can adequately capture the rich semantic information of different granularity in heterogeneous information. We leverage different types of nodes and edges to construct the heterogeneous graph to realize the interaction between coarse-grained sentence-level intent information and fine-grained word-level slot information. And we utilize window mechanism to accommodate the temporal locality of the slot information. Experimental results on two datasets show that our model achieves the state-of-the-art performance. Comprehensive analysis empirically verifies the effectiveness of each component.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availibility Statement
MixATIS and MixSNIPS datasets: https://github.com/LooperXX/AGIF.
References
Li X, Chen Y-N, Li L, Gao J, Celikyilmaz A (2017) End-to-end task-completion neural dialogue systems. In: Proceedings of the eighth international joint conference on natural language processing (vol 1: Long Papers), pp 733–743
Chen H, Liu X, Yin D, Tang J (2017) A survey on dialogue systems: recent advances and new frontiers. ACM SIGKDD Explorat Newsl 19(2):25–35
Ni J, Young T, Pandelea V, Xue F, Adiga V, Cambria E (2021) Recent advances in deep learning based dialogue systems: a systematic survey. arXiv:2105.04387
Qin L, Liu T, Che W, Kang B, Zhao S, Liu T (2021) A co-interactive transformer for joint slot filling and intent detection. In: ICASSP 2021 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 8193–8197. IEEE
Zhang L, Ma D, Zhang X, Yan X, Wang H (2020) Graph lstm with context-gated mechanism for spoken language understanding. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 9539–9546
Haihong E, Niu P, Chen Z, Song M (2019) A novel bi-directional interrelated model for joint intent detection and slot filling. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 5467–5471
Gangadharaiah R, Narayanaswamy B (2019) Joint multiple intent detection and slot labeling for goal-oriented dialog. In: Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: human language technologies, vol 1 (Long and Short Papers), pp 564–569
Qin L, Xu X, Che W, Liu T (2020) Agif: An adaptive graph-interactive framework for joint multiple intent detection and slot filling. In: Findings of the association for computational linguistics: EMNLP 2020, pp 1807–1816
Qin L, Wei F, Xie T, Xu X, Che W, Liu T (2021) Gl-gin: Fast and accurate non-autoregressive model for joint multiple intent detection and slot filling. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (vol 1: Long Papers), pp 178–188
Wang X, Ji H, Shi C, Wang B, Ye Y, Cui P, Yu PS (2019) Heterogeneous graph attention network. In: The world wide web conference, pp 2022–2032
Qin L, Xie T, Che W, Liu T (2021) A survey on spoken language understanding: Recent advances and new frontiers. In: IJCAI
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2018) Graph attention networks. In: International conference on learning representations
Shi C, Li Y, Zhang J, Sun Y, Philip SY (2016) A survey of heterogeneous information network analysis. IEEE Trans Knowl Data Eng 29(1):17–37
Hemphill CT, Godfrey JJ, Doddington GR (1990) The atis spoken language systems pilot corpus. In: Speech and natural language: proceedings of a workshop held at hidden valley, Pennsylvania, June 24–27, 1990
Coucke A, Saade A, Ball A, Bluche T, Caulier A, Leroy D, Doumouro C, Gisselbrecht T, Caltagirone F, Lavril T et al (2018) Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. arXiv:1805.10190
Sarikaya R, Hinton GE, Deoras A (2014) Application of deep belief networks for natural language understanding. IEEE/ACM Trans Audio Speech Language Process 22(4):778–784
Lee JY, Dernoncourt F (2016) Sequential short-text classification with recurrent and convolutional neural networks. In: Proceedings of NAACL-HLT, pp 515–520
Zhan L-M, Liang H, Liu B, Fan L, Wu X-M, Lam AY (2021) Out-of-scope intent detection with self-supervision and discriminative training. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (vol 1: Long Papers), pp 3521–3532
Zhang J, Bui T, Yoon S, Chen X, Liu Z, Xia C, Tran QH, Chang W, Philip SY (2021) Few-shot intent detection via contrastive pre-training and fine-tuning. In: Proceedings of the 2021 conference on empirical methods in natural language processing, pp 1906–1912
Sarikaya R, Hinton GE, Ramabhadran B (2011) Deep belief nets for natural language call-routing. In: 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 5680–5683. IEEE
Deoras A, Sarikaya R (2013) Deep belief network based semantic taggers for spoken language understanding. In: Interspeech, pp 2713–2717
Wang L, Li X, Liu J, He K, Yan Y, Xu W (2021) Bridge to target domain by prototypical contrastive learning and label confusion: re-explore zero-shot learning for slot filling. In: Proceedings of the 2021 conference on empirical methods in natural language processing, pp 9474–9480
Glass M, Rossiello G, Chowdhury MFM, Gliozzo A (2021) Robust retrieval augmented generation for zero-shot slot filling. In: Proceedings of the 2021 conference on empirical methods in natural language processing, pp 1939–1949
Hakkani-Tür D, Tür G, Celikyilmaz A, Chen Y-N, Gao J, Deng L, Wang Y-Y (2016) Multi-domain joint semantic frame parsing using bi-directional rnn-lstm. In: Interspeech, pp 715–719
Zhang C, Li Y, Du N, Fan W, Philip SY (2019) Joint slot filling and intent detection via capsule neural networks. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 5259–5267
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Goo C-W, Gao G, Hsu Y-K, Huo C-L, Chen T-C, Hsu K-W, Chen Y-N (2018) Slot-gated modeling for joint slot filling and intent prediction. In: Proceedings of the 2018 conference of the north American chapter of the association for computational linguistics: human language technologies, volume 2 (Short Papers), pp 753–757
Wang Y, Shen Y, Jin H (2018) A bi-model based RNN semantic frame parsing model for intent detection and slot filling. In: Proceedings of the 2018 conference of the north American chapter of the association for computational linguistics: human language technologies, vol 2 (Short Papers), pp 309–314
Qin L, Che W, Li Y, Wen H, Liu T (2019) A stack-propagation framework with token-level intent detection for spoken language understanding. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 2078–2087
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
No conflict of interest exits in the submission of this manuscript, and manuscript is approved by all authors for publication. I would like to declare on behalf of my co-authors that the work described was original research that has not been published previously, and not under consideration for publication elsewhere, in whole or in part. All the authors listed have approved the manuscript that is enclosed.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, Q., Wang, S. & Li, J. A Heterogeneous Interaction Graph Network for Multi-Intent Spoken Language Understanding. Neural Process Lett 55, 9483–9501 (2023). https://doi.org/10.1007/s11063-023-11210-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-023-11210-7