Multi3 WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems

被引:0
|
作者
Hu, Songbo [1 ]
Zhou, Han [1 ]
Hergul, Mete [1 ]
Gritta, Milan [2 ]
Zhang, Guchun [2 ]
Iacobacci, Ignacio [2 ]
Vulic, Ivan [1 ]
Korhonen, Anna [1 ]
机构
[1] Univ Cambridge, Language Technol Lab, Cambridge, England
[2] Huawei Noahs Ark Lab, London, England
关键词
66;
D O I
10.1162/tacl_a_00609
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Creating high-quality annotated data for task-oriented dialog (ToD) is known to be notoriously difficult, and the challenges are amplified when the goal is to create equitable, culturally adapted, and large-scale ToD datasets for multiple languages. Therefore, the current datasets are still very scarce and suffer from limitations such as translation-based non-native dialogs with translation artefacts, small scale, or lack of cultural adaptation, among others. In this work, we first take stock of the current landscape of multilingual ToD datasets, offering a systematic overview of their properties and limitations. Aiming to reduce all the detected limitations, we then introduce Multi(3)WOZ, a novel multilingual, multi-domain, multi-parallel ToD dataset. It is large-scale and offers culturally adapted dialogs in 4 languages to enable training and evaluation of multilingual and cross-lingual ToD systems. We describe a complex bottom-up data collection process that yielded the final dataset, and offer the first sets of baseline scores across different ToD-related tasks for future reference, also highlighting its challenging nature.
引用
收藏
页码:1396 / 1415
页数:20
相关论文
共 21 条
  • [1] Multi2WOZ: A Robust Multilingual Dataset and Conversational Pretraining for Task-Oriented Dialog
    Hung, Chia-Chien
    Lauscher, Anne
    Vulic, Ivan
    Ponzetto, Simone Paolo
    Glavas, Goran
    [J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3687 - 3703
  • [2] Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented Dialog
    Takanobu, Ryuichi
    Zhu, Hanlin
    Huang, Minlie
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 100 - 110
  • [3] MatDC: A Multi-turn Multi-domain Annotated Task-oriented Dialogue Dataset in Chinese
    Tseng, Yu-Hsiang
    Hsieh, Shu-Kai
    Lian, Richard
    Chiang, Chiung-Yu
    Chang, Yu-Lin
    Chang, Li-Ping
    Hsieh, Ji-Lung
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI 2020), 2020, : 165 - 170
  • [4] Domain-Specific Multi-Agent Dialog Policy Learning in Multi-Domain Task-Oriented Scenarios
    Tang, Li
    Si, Yuke
    Wang, Longbiao
    Dang, Jianwu
    [J]. INTERSPEECH 2021, 2021, : 256 - 260
  • [5] Multi-task learning with graph attention networks for multi-domain task-oriented dialogue systems
    Zhao, Meng
    Wang, Lifang
    Jiang, Zejun
    Li, Ronghan
    Lu, Xinyu
    Hu, Zhongtian
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 259
  • [6] Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems
    Wu, Chien-Sheng
    Madotto, Andrea
    Hosseini-Asl, Ehsan
    Xiong, Caiming
    Socher, Richard
    Fung, Pascale
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 808 - 819
  • [7] A Large Multilingual and Multi-domain Dataset for Recommender Systems
    Di Tommaso, Giorgia
    Faralli, Stefano
    Velardi, Paola
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2806 - 2813
  • [8] Dialogue summarization enhanced response generation for multi-domain task-oriented dialogue systems
    Wang, Lifang
    Zhao, Meng
    Ji, Hongru
    Jiang, Zejun
    Li, Ronghan
    Hu, Zhongtian
    Lu, Xinyu
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (03)
  • [9] Advances and Challenges in Multi-Domain Task-Oriented Dialogue Policy Optimization
    Rohmatillah, Mahdin
    Chien, Jen-Tzung
    [J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2023, 12 (01)
  • [10] SUMBT plus LaRL: Effective Multi-Domain End-to-End Neural Task-Oriented Dialog System
    Lee, Hwaran
    Jo, Seokhwan
    Kim, Hyungjun
    Jung, Sangkeun
    Kim, Tae-Yoon
    [J]. IEEE ACCESS, 2021, 9 : 116133 - 116146