Unsupervised Dialogue State Tracking for End-to-End Task-Oriented Dialogue with a Multi-Span Prediction Network

Liu, Qing-Bin; He, Shi-Zhu; Liu, Cao; Liu, Kang; Zhao, Jun

doi:10.1007/s11390-021-1064-y

Unsupervised Dialogue State Tracking for End-to-End Task-Oriented Dialogue with a Multi-Span Prediction Network

Regular Paper
Published: 31 July 2023

Volume 38, pages 834–852, (2023)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Qing-Bin Liu^1,2,
Shi-Zhu He^1,2,
Cao Liu³,
Kang Liu^1,2 &
…
Jun Zhao^1,2

78 Accesses
1 Altmetric
Explore all metrics

Abstract

This paper focuses on end-to-end task-oriented dialogue systems, which jointly handle dialogue state tracking (DST) and response generation. Traditional methods usually adopt a supervised paradigm to learn DST from a manually labeled corpus. However, the annotation of the corpus is costly, time-consuming, and cannot cover a wide range of domains in the real world. To solve this problem, we propose a multi-span prediction network (MSPN) that performs unsupervised DST for end-to-end task-oriented dialogue. Specifically, MSPN contains a novel split-merge copy mechanism that captures long-term dependencies in dialogues to automatically extract multiple text spans as keywords. Based on these keywords, MSPN uses a semantic distance based clustering approach to obtain the values of each slot. In addition, we propose an ontology-based reinforcement learning approach, which employs the values of each slot to train MSPN to generate relevant values. Experimental results on single-domain and multi-domain task-oriented dialogue datasets show that MSPN achieves state-of-the-art performance with significant improvements. Besides, we construct a new Chinese dialogue dataset MeDial in the low-resource medical domain, which further demonstrates the adaptability of MSPN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Young S, Gašić M, Thomson B, Williams J D. POMDP-based statistical spoken dialog systems: A review. Proceedings of the IEEE, 2013, 101(5): 1160–1179. https://doi.org/10.1109/JPROC.2012.2225812.
Article Google Scholar
Madotto A, Wu C S, Fung P. Mem2Seq: Effectively incorporating knowledge bases into end-to-end task-oriented dialog systems. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, Jul. 2018, pp.1468–1478. https://doi.org/10.18653/v1/P18-1136.
Gao S L, Zhang Y C, Ou Z J, Yu Z. Paraphrase augmented task-oriented dialog generation. In Proc. the 58th Annual Meeting of the Association for Computational Linguistics, Jul. 2020, pp.639–649. https://doi.org/10.18653/v1/2020.acl-main.60.
Wen T H, Vandyke D, Mrkšić N, Gašić M, Rojas-Barahona L M, Su P H, Ultes S, Young S. A network-based end-to-end trainable task-oriented dialogue system. In Proc. the 15th Conference of the European Chapter of the Association for Computational Linguistics, Apr. 2017, pp.438–449.
Wen T H, Miao Y S, Blunsom P, Young S. Latent intention dialogue models. In Proc. the 34th International Conference on Machine Learning, Aug. 2017, pp.3732–3741.
Xu P Y, Hu Q. An end-to-end approach for handling unknown slot values in dialogue state tracking. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, Jul. 2018, pp.1448–1457. https://doi.org/10.18653/v1/P18-1134.
Zhong V, Xiong C M, Socher R. Global-locally self-attentive encoder for dialogue state tracking. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, Jul. 2018, pp.1458–1467. https://doi.org/10.18653/v1/P18-1135.
Eric M, Krishnan L, Charette F, Manning C D. Key-value retrieval networks for task-oriented dialogue. In Proc. the 18th Annual SIGdial Meeting on Discourse and Dialogue, Aug. 2017, pp.37–49. https://doi.org/10.18653/v1/W17-5506.
Wen H Y, Liu Y J, Che W X, Qin L B, Liu T. Sequence-to-sequence learning for task-oriented dialogue with dialogue state representation. In Proc. the 27th International Conference on Computational Linguistics, Aug. 2018, pp.3781–3792.
Lei W Q, Jin X S, Kan M Y, Ren Z C, He X N, Yin D W. Sequicity: Simplifying task-oriented dialogue systems with single sequence-to-sequence architectures. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, Jul. 2018, pp.1437–1447. https://doi.org/10.18653/v1/P18-1133.
Zhang Y C, Ou Z J, Yu Z. Task-oriented dialog systems that consider multiple appropriate responses under the same context. In Proc. the 34th AAAI Conference on Artificial Intelligence, Apr. 2020, pp.1437–1447. https://doi.org/10.1609/aaai.v34i05.6507.
Henderson M, Thomson B, Williams J D. The second dialog state tracking challenge. In Proc. the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Jun. 2014, pp.263–272. https://doi.org/10.3115/v1/W14-4337.
Mrkšić N, Séaghdha D Ó, Wen T H, Thomson B, Young S. Neural belief tracker: Data-driven dialogue state tracking. In Proc. the 55th Annual Meeting of the Association for Computational Linguistics, Jul. 2017, pp.1777–1788. https://doi.org/10.18653/v1/P17-1163.
Ren L L, Ni J M, McAuley J. Scalable and accurate dialogue state tracking via hierarchical sequence generation. In Proc. the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Nov. 2019, pp.1876–1885. https://doi.org/10.18653/v1/D19-1196.
Chen Y N, Hakkani-Tür D, He X D. Zero-shot learning of intent embeddings for expansion by convolutional deep structured semantic models. In Proc. the 41st IEEE International Conference on Acoustics, Speech and Signal Processing, Mar. 2016, pp.6045–6049. https://doi.org/10.1109/ICASSP.2016.7472838.
Wu C S, Madotto A, Hosseini-Asl E, Xiong C M, Socher R, Fung P. Transferable multi-domain state generator for task-oriented dialogue systems. In Proc. the 57th Annual Meeting of the Association for Computational Linguistics, Jul. 2019, pp.808–819. https://doi.org/10.18653/v1/P19-1078.
Jin X S, Lei W Q, Ren Z C, Chen H S, Liang S S, Zhao Y H, Yin D W. Explicit state tracking with semi-supervision for neural dialogue generation. In Proc. the 27th ACM International Conference on Information and Knowledge Management, Oct. 2018, pp.1403–1412. https://doi.org/10.1145/3269206.3271683.
Min Q K, Qin L B, Teng Z Y, Liu X, Zhang Y. Dialogue state induction using neural latent variable models. In Proc. the 29th International Joint Conference on Artificial Intelligence, Jan. 2021, Article No. 532. https://doi.org/10.24963/ijcai.2020/532.
Budzianowski P, Wen T H, Tseng B H, Casanueva I, Ultes S, Ramadan O, Gašić M. MultiWOZ-A large-scale multi-domain wizard-of-Oz dataset for task-oriented dialogue modelling. In Proc. the 2018 Conference on Empirical Methods in Natural Language Processing, Oct. 31–Nov. 4, 2018, pp.5016–5026. https://doi.org/10.18653/v1/D18-1547.
Eric M, Goel R, Paul S, Sethi A, Agarwal S, Gao S Y, Kumar A, Goyal A, Ku P, Hakkani-Tür D. MultiWOZ 2.1: A consolidated multi-domain dialogue dataset with state corrections and state tracking baselines. In Proc. the 12th Language Resources and Evaluation Conference, May 2020, pp.422–428.
Mesnil G, Dauphin Y, Yao K S, Bengio Y, Deng L, Hakkani-Tur D, He X D, Heck L, Tur G, Yu D, Zweig G. Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Trans. Audio, Speech, and Language Processing, 2015, 23(3): 530–539. https://doi.org/10.1109/TASLP.2014.2383614.
Article Google Scholar
Henderson M, Thomson B, Young S. Word-based dialog state tracking with recurrent neural networks. In Proc. the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Jun. 2014. pp.292–299. https://doi.org/10.3115/v1/W14-4340.
Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In Proc. the 27th International Conference on Neural Information Processing Systems, Dec. 2014, pp.3104–3112.
Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. In Proc. the 3rd International Conference on Learning Representations, May 2015.
Luong M T, Pham H, Manning C D. Effective approaches to attention-based neural machine translation. In Proc. the 2015 Conference on Empirical Methods in Natural Language Processing, Sept. 2015, pp.1412–1421. https://doi.org/10.18653/v1/D15-1166.
Wu C S, Socher R, Xiong C M. Global-to-local memory pointer networks for task-oriented dialogue. In Proc. the 7th International Conference on Learning Representations, May 2019.
Eric M, Manning C. A copy-augmented sequence-to-sequence architecture gives good performance on task-oriented dialogue. In Proc. the 15th Conference of the European Chapter of the Association for Computational Linguistics, Apr. 2017, pp.468–473.
Qin L B, Xu X, Che W X, Zhang Y, Liu T. Dynamic fusion network for multi-domain end-to-end task-oriented dialog. In Proc. the 59th Annual Meeting of the Association for Computational Linguistics, Jul. 2020, pp.6344–6354. https://doi.org/10.18653/v1/2020.acl-main.565.
Chen W H, Chen J S, Qin P D, Yan X F, Wang W Y. Semantically conditioned dialog response generation via hierarchical disentangled self-attention. In Proc. the 58th Annual Meeting of the Association for Computational Linguistics, Jul. 2019, pp.3696–3709. https://doi.org/10.18653/v1/P19-1360.
Mehri S, Srinivasan T, Eskenazi M. Structured fusion networks for dialog. In Proc. the 20th Annual SIGdial Meeting on Discourse and Dialogue, Sept. 2019, pp.165–177. https://doi.org/10.18653/v1/W19-5921.
Shu L, Molino P, Namazifar M, Xu H, Liu B, Zheng H X, Tur G. Flexibly-structured model for task-oriented dialogues. In Proc. the 20th Annual SIGdial Meeting on Discourse and Dialogue, Sept. 2019, pp.178–187. https://doi.org/10.18653/v1/W19-5922.
Liang W X, Tian Y Z, Chen C C, Yu Z. MOSS: End-to-end dialog system framework with modular supervision. In Proc. the 34th AAAI Conference on Artificial Intelligence, Apr. 2020, pp.8327–8335. https://doi.org/10.1609/aaai.v34i05.6349.
Le H, Sahoo D, Liu C H, Chen N, Hoi S C H. UniConv: A unified conversational neural architecture for multi-domain task-oriented dialogues. In Proc. the 2020 Conference on Empirical Methods in Natural Language Processing, Nov. 2020, pp.1860–1877. https://doi.org/10.18653/v1/2020.emnlp-main.146.
Hosseini-Asl E, McCann B, Wu C S, Yavuz S, Socher R. A simple language model for task-oriented dialogue. In Proc. the 34th International Conference on Neural Information Processing Systems, Dec. 2020, Article No. 1694.
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language models are unsupervised multitask learners. OpenAI Blog, 2019, 1(8).
Peng B L, Li C Y, Li J C, Shayandeh S, Liden L, Gao J F. SOLOIST: Building task bots at scale with transfer learning and machine teaching. arXiv: 2005.05298, 2020. https://arxiv.org/abs/2005.05298, Jun. 2023.
Zhang Y C, Ou Z J, Hu M, Feng J L. A probabilistic end-to-end task-oriented dialog model with latent belief states towards semi-supervised learning. In Proc. the 2020 Conference on Empirical Methods in Natural Language Processing, Nov. 2020, pp.9207–9219. https://doi.org/10.18653/v1/2020.emnlp-main.740.
Le H, Socher R, Hoi S C H. Non-autoregressive dialog state tracking. In Proc. the 8th International Conference on Learning Representations, Apr. 2020.
Kim S, Yang S, Kim G, Lee S W. Efficient dialogue state tracking by selectively overwriting memory. In Proc. the 59th Annual Meeting of the Association for Computational Linguistics, Jul. 2020, pp.567–582. https://doi.org/10.18653/v1/2020.acl-main.53.
Heck M, van Niekerk C, Lubis N, Geishauser C, Lin H C, Moresi M, Gasic M. TripPy: A triple copy strategy for value independent neural dialog state tracking. In Proc. the 21st Annual Meeting of the Special Interest Group on Discourse and Dialogue, Jul. 2020, pp.35–44.
Chen Y N, Wang W Y, Rudnicky A I. Unsupervised induction and filling of semantic slots for spoken dialogue systems using frame-semantic parsing. In Proc. the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, Dec. 2013, pp.120–125. https://doi.org/10.1109/AS-RU.2013.6707716.
Chen Y N, Wang W Y, Gershman A, Rudnicky A. Matrix factorization with knowledge graph propagation for unsupervised spoken language understanding. In Proc. the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Jul. 2015, pp.483–494. https://doi.org/10.3115/v1/P15-1047.
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M. Playing Atari with deep reinforcement learning. arXiv: 1312.5602, 2013. https://arx-iv.org/abs/1312.5602, Jun. 2023.
Sutton R S, Barto A G. Reinforcement learning: An introduction. IEEE Trans. Neural Networks, 1998, 9(5): 1054. https://doi.org/10.1109/TNN.1998.712192.
Article Google Scholar
Peng B L, Li X J, Gao J F, Liu J J, Chen Y N, Wong K F. Adversarial advantage actor-critic model for task-completion dialogue policy learning. In Proc. the 43rd IEEE International Conference on Acoustics, Speech and Signal Processing, Apr. 2018, pp.6149–6153. https://doi.org/10.1109/ICAS-SP.2018.8461918.
Chen L, Tan B W, Long S S, Yu K. Structured dialogue policy with graph neural networks. In Proc. the 27th International Conference on Computational Linguistics, Aug. 2018, pp.1257–1268.
Zhao T C, Xie K G, Eskenazi M. Rethinking action spaces for reinforcement learning in end-to-end dialog agents with latent variable models. In Proc. the 17th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jun. 2019, pp.1208–1218. https://doi.org/10.18653/v1/n19-1123.
Cho K, van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proc. the 2014 Conference on Empirical Methods in Natural Language Processing, Oct. 2014, pp.1724–1734. https://doi.org/10.3115/v1/d14-1179.
Osband I, Van Roy B. Why is posterior sampling better than optimism for reinforcement learning? In Proc. the 34th International Conference on Machine Learning, Aug. 2017, pp.2701–2710.
Papineni K, Roukos S, Ward T, Zhu W J. BLEU: A method for automatic evaluation of machine translation. In Proc. the 40th Annual Meeting on Association for Computational Linguistics, Jul. 2002, pp.311–318. https://doi.org/10.3115/1073083.1073135.
Kingma D P, Ba J. Adam: A method for stochastic optimization. In Proc. the 3rd International Conference on Learning Representations, May 2015.
Pennington J, Socher R, Manning C. GloVe: Global vectors for word representation. In Proc. the 2014 Conference on Empirical Methods in Natural Language Processing, Oct. 2014, pp.1532–1543. https://doi.org/10.3115/v1/d14-1162.
Hashimoto K, Xiong C M, Tsuruoka Y, Socher R. A joint many-task model: Growing a neural network for multiple NLP tasks. In Proc. the 2017 Conference on Empirical Methods in Natural Language Processing, Sept. 2017, pp.1923–1933. https://doi.org/10.18653/v1/d17-1206.
McHugh M L. Interrater reliability: The kappa statistic. Biochemia Medica, 2012, 22(3): 276–282.
Article MathSciNet Google Scholar

Download references

Acknowledgments

We would like to thank all reviewers and editors for their constructive suggestions. This research is supported by the Beijing Academy of Artificial Intelligence and the Beijing Sankuai Online Technology Company Limited.

Conflict of Interest

The authors declare that they have no conflict of interest.

Author information

Authors and Affiliations

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Qing-Bin Liu, Shi-Zhu He, Kang Liu & Jun Zhao
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China
Qing-Bin Liu, Shi-Zhu He, Kang Liu & Jun Zhao
Beijing Sankuai Online Technology Company Limited, Beijing, 100102, China
Cao Liu

Authors

Qing-Bin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shi-Zhu He
View author publications
You can also search for this author in PubMed Google Scholar
Cao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Kang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shi-Zhu He.

Supplementary Information

ESM 1

(PDF 196 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, QB., He, SZ., Liu, C. et al. Unsupervised Dialogue State Tracking for End-to-End Task-Oriented Dialogue with a Multi-Span Prediction Network. J. Comput. Sci. Technol. 38, 834–852 (2023). https://doi.org/10.1007/s11390-021-1064-y

Download citation

Received: 13 October 2020
Accepted: 17 February 2022
Published: 31 July 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s11390-021-1064-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised Dialogue State Tracking for End-to-End Task-Oriented Dialogue with a Multi-Span Prediction Network

Abstract

Access this article

References

Acknowledgments

Conflict of Interest

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation